People 
Courses 
Seminar 
Dec 06, 2017 12:00PM 
Prashanth Vasudevan, MIT 108 WVG 
Nov 29, 2017 12:00PM 
Ran Cohen, NEU 366 WVH Abstract In the setting of secure multiparty computation (MPC), a set of mutually distrusting parties wish to jointly compute a function, while guaranteeing the privacy of their inputs and the correctness of the output. An MPC protocol is called fully secure if no adversary can prevent the honest parties from obtaining their outputs. A weaker definition is fairness ensuring that an adversary can prematurely abort the computation only before learning any new information. We present highly efficient transformations from fair computations to fully secure computations, assuming the fraction of honest parties is constant (e.g., 1% of the parties are honest). One application of these transformations is a new coinflipping protocol, whose round complexity has a superlogarithmic dependency on the security parameter, improving over the linear dependency on the number of parties in Beimel, Omri, and Orlov (Crypto 2010). A second application is a new fully secure protocol for computing the Boolean OR function, with a superconstant round complexity, improving over the protocol of Gordon and Katz (TCC 2009) whose round complexity is linear in the number of corrupted parties. Finally, we prove that for some functions, a superconstant number of invocations of the fair computation is necessary for computing the function in a fully secure manner. This is a joint work with Eran Omri, Iftach Haitner, and Lior Rotem. 
Nov 15, 2017 12:00PM 
Wolfgang Gatterbauer, NEU 366 WVH Abstract We develop upper and lower bounds for the probability of monotone Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach "dissociation" and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables. Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is #Phard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our bounds shed light on the connection between previous relaxationbased and modelbased approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system to both upper and lower bound hard probabilistic queries in guaranteed polynomial time. Based on joint work with Dan Suciu from TODS 2014. 
Nov 08, 2017 12:00PM 
Siyao Guo, NEU 366 WVH Abstract We initiate a quantitative study of the power of negations, asking how many negations are required to compute cryptographic primitives. We provide several lower bounds, some of them tight, for various cryptographic primitives and building blocks including oneway permutations, pseudorandom functions, smallbias generators, hardcore predicates, errorcorrecting codes, and randomness extractors. Among our results, we highlight the following.
In addition, we will mention some recent results in understanding the power of negations from multiple angles including Boolean formulas and property testing. Based on join works with Clément L. Canonne, Elena Grigorescu, Ilan Komargodski, Akash Kumar, Tal Malkin, Igor C. Oliveria, Alon Rosen, and Karl Wimmer. 
Nov 01, 2017 12:00PM 
Theory Lunch  no talk 142 ISEC 
Oct 25, 2017 12:00PM 
Mika Göös, Harvard 366 WVH Abstract For any unsatisfiable CNF formula F that is hard to refute in the Resolution proof system, we show that a "lifted" twoparty version of F is hard to refute in any proof system whose lines are computed by efficient communication protocolsor, equivalently, that a monotone function associated with F has large monotone circuit complexity. Our result extends to real monotone circuits, which yields new lower bounds for the Cutting Planes proof system. Joint work with Ankit Garg, Pritish Kamath, and Dmitry Sokolov. 
Oct 18, 2017 12:00PM 
Ali Vakilian, MIT 366 WVH Abstract We study the Fractional Set Cover problem in the streaming model. That is, we consider the relaxation of the set cover problem over a universe of $n$ elements and a collection of $m$ sets, where each set can be picked fractionally, with a value in $[0, 1]$. We present a randomized $(1+ \epsilon)$approximation algorithm that makes $p$ passes over the data, and uses $O(\textit{polylog} (m, n, 1/\epsilon)(mn^{(O(1/(p \epsilon))})+ n))$ memory space. The algorithm works in both the set arrival and the edge arrival models. To the best of our knowledge, this is the first streaming result for the fractional set cover problem. We obtain our results by employing the multiplicative weights update framework in the streaming settings. 
Oct 11, 2017 12:00PM 
Matt Dippel + Tanay Mehta 366 WVH 
Oct 04, 2017 12:00PM 
Yashvanth Kondi 366 WVH Abstract Garbled circuits are of central importance in cryptography, finding widespread application in secure computation, zeroknowledge (ZK) protocols, and verifiable outsourcing of computation to name a few. We are interested in a particular kind of garbling scheme, termed privacyfree in the literature. We show that Boolean formulas can be garbled informationtheoretically in the privacyfree setting, producing no ciphertexts at all. Existing garbling schemes either rely on cryptographic assumptions (and thus require cryptographic operations to construct and evaluate garbled circuits), produce garbled circuits of nonzero size, or are restricted to low depth formulaic circuits. Our result has both theoretical and practical implications for garbled circuits as a primitive. On the theory front, our result breaks the known theoretical lower bound of one ciphertext for garbling an AND gate in this setting. As an interesting implication of producing size zero garbled circuits, our scheme scores adaptive security for free. On the practical side, our garbling scheme involves only cheap XOR operations and produces size zero garbled circuits. As a side result, we propose several interesting extensions of our scheme. Namely, we show how to garble threshold and high fanin gates. An aspect of our garbling scheme that we believe is of theoretical interest is that it does not maintain the invariant that the garbled circuit evaluator must not at any point be in possession of both keys of any wire in the garbled circuit. 
Sep 27, 2017 12:00PM 
Ariel Hamlin 366 WVH Abstract Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, startups, and established companies. However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases. At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions: 1) An identification of the important primitive operations across database paradigms. We find there are a small number of base operations that can be used and combined to support a large number of database paradigms. 2) An evaluation of the current state of protected search systems in implementing these base operations. This evaluation describes the main approaches and tradeoffs for each base operation. Furthermore, it puts protected search in the context of unprotected search, identifying key gaps in functionality. 3) An analysis of attacks against protected search for different base queries. 4) A roadmap and tools for transforming a protected search system into a protected database, including an opensource performance evaluation platform and initial user opinions of protected search. 
Apr 27, 2017 3:30PM 
Omri Weinstein, Columbia 366 WVH Abstract We prove the first superlogarithmic lower bounds on the cell probe complexity of dynamic *boolean* (a.k.a. decision) data structure problems, a longstanding milestone in data structure lower bounds. We introduce a new technique and use it to prove a ~ $\log^{1.5}(n)$ lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting *over $GF_2$* ([Pat07]). Proving an $\omega(\log n)$ lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Patrascu's obituary [Tho13]. This result also implies the first $\omega(\log n)$ lower bound for the classical 2D range counting problem, one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D "rectangle stabbing", and for the (nonboolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient *oneway* communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to lowdegree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10]. Joint work with Kasper GreenLarsen and Huacheng Yu. 
Apr 20, 2017 3:30PM 
Ankur Moitra, MIT RY 126 Abstract Starting from the seminal works of Tukey (1960) and Huber (1964), the field of robust statistics asks: Are there estimators that provably work in the presence of noise? The trouble is that all known provably robust estimators are also hard to compute in highdimensions. Here, we study a basic problem in robust statistics, posed in various forms in the above works. Given corrupted samples from a highdimensional Gaussian, are there efficient algorithms to accurately estimate its parameters? We give the first algorithms that are able to tolerate a constant fraction of corruptions that is independent of the dimension. Additionally, we give several more applications of our techniques to product distributions and various mixture models. This is based on joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart. 
Apr 13, 2017 3:30PM 
No seminar 
Apr 06, 2017 3:30PM 
No seminar 
Mar 30, 2017 3:30PM 
Ariel Hamlin and Lucianna Kiffer 366 WVH 
Mar 23, 2017 3:30PM 
Mehraneh Liaee, NEU 366 WVH Abstract We study the problem of gossip in dynamic networks controlled by an adversary that can modify the network arbitrarily from one round to another, provided that the network is always connected. In the gossip problem, $n$ tokens are arbitrarily distributed among the $n$ network nodes, and the goal is to disseminate all the n tokens to every node. Our focus is on tokenforwarding algorithms, which do not manipulate tokens in any way other than storing, copying, and forwarding them. Gossip can be completed in linear time in any static network, but a basic open question for dynamic networks is the existence of a distributed protocol that can do significantly better than an easily achievable bound of $O(n^2)$ rounds. In previous work, it has been shown that under adaptive adversaries, every token forwarding algorithm requires $\Omega(n^2/\log n)$ rounds. In this paper, we study oblivious adversaries, which differ from adaptive adversaries in one crucial aspect they are oblivious to random choices made by the protocol. We present an $\tilde{\Omega}(n^{3/2})$ lower bound under an oblivious adversary for RANDDIFF, a natural algorithm in which neighbors exchange a token chosen uniformly at random from the difference of their token sets. We also present an $\tilde{\Omega}(n^{4/3})$ bound under a stronger notion of oblivious adversary for symmetric knowledgebased algorithms. On the positive side, we present a centralized algorithm that completes gossip in $\tilde{\Omega}(n^{3/2})$ rounds. We also show an $\tilde{O}(n^{5/3})$ upper bound for RANDDIFF in a restricted class of oblivious adversaries, which we call pathsrespecting. 
Mar 16, 2017 3:30PM 
Yael Kalai, MIT/MSR 366 WVH Abstract We construct an interactive coding scheme, a notion introduced by Schulman (FOCS 1992, STOC 1993). Loosely speaking, we show how to convert any twoparty interactive protocol into one that is resilient to constantfraction of *insertion* and *deletion* errors, while preserving computational efficiency, and blowing up the communication complexity and the *round* complexity by a constant factor that approaches 0 as the errorrate approaches 0. Previous works were not concerned with the round complexity, and typically assumed that one bit is sent per round. Moreover, most previous works (with the exception of the recent work of Braverman et. al.) considered only substitution errors or erasures errors. We consider the model where in each round each party may send a message of *arbitrary*, where the length of the messages and the length of the protocol may be adaptive, and may depend on the private inputs of the parties and on previous communication. This model is known as the (synchronous) message passing model, and is commonly used in distributed computing, and is the most common model used in cryptography. This is joint work with Klim Efremenko and Elad Haramaty. 
Mar 9, 2017 3:30PM 
No seminar  Spring Break 
Mar 2, 2017 3:30PM 
Gautam Kamath, MIT 366 WVH Abstract Given samples from an unknown distribution p, does it have some property, or is it far from every distribution possessing this property? This question has received enormous attention in statistics, information theory, and theoretical computer science. Some properties of interest include:
In this talk, I will discuss some recent results and directions on the frontiers of distribution testing. The main focus will be on testing whether a distribution belongs to a particular class, but I'll also talk about some newer directions, including testing on highdimensional domains under structural assumptions (i.e., where the underlying distribution is a graphical model) and testing while maintaining differential privacy. Based on joint works with Jayadev Acharya, Bryan Cai, Constantinos Daskalakis, and Nishanth Dikkala. 
Feb 21, 2017 3:30PM 
Jalaj Upadhyay, Penn State 366 WVH Abstract The problem of lowrank factorization of an m x n matrix $A$ requires outputting a singular value decomposition: an m x k matrix $U$, an n x k matrix $V$, and a k x k diagonal matrix $D$ such that $U D V^T$ approximates the matrix $A$ in the Frobenius norm. In this talk, we study releasing differentiallyprivate lowrank factorization of a matrix in the general turnstile update model. We give a differentiallyprivate algorithm instantiated with respect to a privacy level stronger than privacy levels for this and related problems studied in previous works, namely that of Blocki et al.(FOCS 2012), Dwork et al. (STOC 2014), Hardt and Roth (STOC 2012, STOC 2013), and Hardt and Price (NIPS 2014). We consider two matrices $A$ and $A'$ as neighboring if $A  A'$ can be represented as an outer product of two unit vectors. Our private algorithm with respect to this privacy level incurs optimal additive error. We also prove a lower bound that shows that the space required by this algorithm is optimal up to a logarithmic factor. Based on the paper arXiv:1604.01429 
Feb 16, 2017 3:30PM 
Adam Sealfon, MIT 366 WVH Abstract Motivated by the goal of improving the concrete efficiency of secure multiparty computation (MPC), we study the possibility of implementing an infrastructure for MPC. We propose an infrastructure based on oblivious transfer (OT), which would consist of OT channels between some pairs of parties in the network. We devise informationtheoretically secure protocols that allow additional pairs of parties to establish secure OT correlations using the help of other parties in the network in the presence of a dishonest majority. Our main technical contribution is an upper bound that matches a lower bound of Harnik, Ishai, and Kushilevitz (Crypto 2007), who studied the number of OT channels necessary and sufficient for MPC. In particular, we characterize which nparty OT graphs G allow tsecure computation of OT correlations between all pairs of parties, showing that this is possible if and only if the complement of G does not contain as a subgraph the complete bipartite graph $K_{nt,nt}$. Joint work with Ranjit Kumaresan and Srinivasan Raghuraman. 
Feb 9, 2017 3:30PM 
Talk cancelled due to weather 
Feb 2, 2017 3:30PM 
Constantinos Daskalakis, MIT 108 WVH Abstract We provide global convergence guarantees for the expectationmaximization (EM) algorithm applied to mixtures of two Gaussians with known covariance matrices. We show that EM converges geometrically to the correct mean vectors, and provide simple, closedform expressions for the convergence rate. As a simple illustration, we show that in one dimension ten steps of the EM algorithm initialized at infinity result in less than 1% error estimation of the means. Joint work with Christos Tzamos and Manolis Zampetakis. 
Jan 26, 2017 3:30PM 
No talk 
Jan 19, 2017 3:30PM 
Stratis Ioannidis, Northeastern 366 WVH Abstract Secure Function Evaluation (SFE) allows an interested party to evaluate a function over private data without learning anything about the inputs other than the outcome of this computation. This offers a strong privacy guarantee: SFE enables, e.g., a medical researcher, a statistician, or a data analyst, to conduct a study over private, sensitive data, without jeopardizing the privacy of the study's participants (patients, online users, etc.). Nevertheless, applying SFE to "big data" poses several challenges. First, beyond any computational overheads due to encryptions and decryptions, executing an algorithm securely may lead to a polynomial blowup in the total work compared to execution in the clear. For large datasets, even going from linear to quadratic complexity is prohibitive. Second, secure evaluations of algorithms should also maintain parallelizability: an algorithm that is easy to parallelize in the clear should also maintain this property in its SFE version, if its execution is to scale. This poses a challenge, as communication patterns between processors may reveal a lot about the private inputs. In this talk, we show that several machine learning and data mining algorithms can be executed securely while leading to only (a) a logarithmic blowup in total work and (b) a logarithmic increase in the execution time when executed in parallel. Our techniques generalize to any algorithm that is graphparallel, i.e., can be expressed through a sequence of scatter, gather, and apply operations. This includes several algorithms of practical interest, including page rank, matrix factorization, and training neural networks. 
Jan 12, 2017 3:30PM 
Jonathan Ullman, Northeastern 366 WVH Abstract Adaptivity is an important feature of data analysis  the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $P$ and a set of $n$ independent samples $x$ is drawn from $P$. We seek an algorithm that, given $x$ as input, accurately answers a sequence of adaptively chosen "queries" about the unknown distribution $P$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? We give new upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Uri Stemmer. 
Jan 03, 2017 2:00PM 
LiYang Tan, TTIChicago 166 WVH Abstract
We consider the fundamental derandomization problem of deterministically finding a satisfying assignment to a CNF formula that has many satisfying assignments. We give a deterministic algorithm which, given an $n$variable $poly(n)$clause CNF formula $F$ that has $F1(1) /eq \epsilon 2n$, runs in time $nO^~(loglogn)2$ for $\epsilon \geq 1/\polylog(n)$ and outputs a satisfying assignment of $F$. Prior to our work the fastest known algorithm for this problem was simply to enumerate over all seeds of a pseudorandom generator for CNFs; using the best known PRGs for CNFs [DETT10], this takes time $n\Omega^~(logn)$ even for constant $\epsilon$. Joint work with Rocco Servedio.

Sep 21, 2016 12:30PM 
Daniel Wichs and Jack Doerner 366 WVH 
Sep 28, 2016 12:30PM 
Yaron Singer, Harvard 366 WVH Abstract
In this talk we will consider the problem of maximizing a monotone submodular function under noise. There has been a great deal of work on optimization of submodular functions under various constraints, with many algorithms that provide desirable approximation guarantees. However, in many applications we do not have access to the function we aim to optimize, but rather to some erroneous or noisy version of it. This raises the question of whether provable guarantees are obtainable in presence of error and noise. We provide initial answers, by focusing on the question of maximizing a monotone submodular function under a cardinality constraint when given access to a noisy oracle of the function.

Oct 05, 2016 12:15PM 
Huy Nguyen, Northeastern 366 WVH Abstract In the turnstile L_p heavy hitters problem with parameter eps, one must maintain a highdimensional vector x in R^n subject to updates of the form update(i, Delta), which increases x_i by Delta, where i is in [n], and Delta is an integer. Upon receiving a query, the goal is to report every ``heavy hitter'' i in[n] with x_i >= eps*x_p as part of a list L, which is a subset of [n] of size O(1/\eps^p), i.e. proportional to the maximum possible number of heavy hitters. In fact we solve the stronger tail version, in which L should include every i such that x_i >= \eps x_{proj{1/eps^p}}_p and x_i > 0, where x_{proj{k}} denotes the vector obtained by zeroing out the largest k entries of x in magnitude. We provide a new algorithm, which in the most general turnstile model achieves optimal O(\eps^{p}\log n) space, O(\log n) update time, and fast O(\eps^{p}\poly(\log n)) query time, providing correctness whp. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a ``clusterpreserving clustering'' algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our clusterpreserving clustering may be of broader interest much beyond heavy hitters. Joint work with Kasper Green Larsen, Jelani Nelson, and Mikkel Thorup. 
Oct 12, 2016 12:30PM 
Mohsen Ghaffari, MIT/ETH Zurich 366 WVH (Abstract) How can the computers in a network interact and communicate to solve the network's graph problems efficiently? This is the subject of the area of (local) distributed graph algorithms, which dates back to the 1980s. The recent years have witnessed significant developments in this classical area. In this talk, we survey some of these developments. Noting the gap between randomized and deterministic distributed graph algorithms, the talk has three (closelyrelated) parts: 1) On the randomized side, we present the first nearlyoptimal randomized distributed algorithm for maximal independent set. Using wellknown reductions, this provides a unified efficient algorithm for all the classical problems of the area. 2) On the deterministic side, we present an improved deterministic distributed algorithm for edge coloring, which almost settles a quartercentury old open problem. 3) We conclude with mentioning a surprising result that pinpoints the gap between randomized and deterministic distributed algorithms: we present a rudimentarylooking rounding problem, which admits a trivial 0round randomized algorithm, and prove that if one can solve it deterministically in polylog(n) rounds, we would get polylog(n)round deterministic distributed algorithms for all the local problems. The latter remains the most wellknown open question of the area. 
Oct 19, 2016 12:30PM 
Mor Weiss, Northeastern 366 WVH (Abstract) In the past few decades, probabilistic checking techniques were shown to yield dramatic efficiency improvements in verifying proofs and approximating distance from errorcorrecting codes. We show connections between cryptography and "zeroknowledge" variants of these techniques, such as Probabilistically Checkable Proofs (PCPs) and PCPs of Proximity (PCPPs), and use them to improve the complexity of cryptographic protocols. In this talk we will discuss two constructions: 1. PCPs for NP with zeroknowledge guarantees, that can be verified nonadaptively, namely by making a single round of queries to the proof. Our construction is based on a novel connection to leakageresilient circuits. 2. Zeroknowledge PCPPs for NP, a generalization of zeroknowledge PCPs, which we construct by combining standard PCPPs with secure multiparty computation protocols. Based on joint works with Yuval Ishai, Amit Sahai, Michael Viderman and Guang Yang. 
Oct 26, 2016 12:30PM 
Matt Dippel and Chin Ho Lee 366 WVH 
November 1, 2016 12:30PM 
Lorenzo Orecchia, Boston University 366 WVH (Abstract) Fast iterative methods from Convex Optimization play a crucial role in a number of recent breakthroughs in the design of nearlylineartime algorithms for fundamental combinatorial problems, such as maximum flow and graph partitioning. Multiplicative Weight Updates, Gradient Descent, and Nesterov's Method have become a mainstay in the construction and analysis of fast algorithms. The power of these approaches raises a number of questions: What is the exact relation between Multiplicative Weight Updates and Gradient Descent? Why do Multiplicative Weight Updates show up in so many settings? What is the intuition behind Nesterov's iterative algorithm that achieves the asymptotically optimal iteration count for smooth convex functions? In this survey talk, I will provide answers by presenting a unified framework that reveals the power and limitations of each method and provides much needed geometric intuition. Among other insights, I will explain how Multiplicative Weight Updates are a particular instance of a dual iterative method, known as Mirror Descent, and how Nesterov's algorithm can be naturally derived as a combination of Mirror and Gradient Descent. As an example of the application of these insights, I will also discuss their crucial role in recent progress in the nearlylineartime solution of packing linear programs, both in parallel and sequentially. 
Nov 9, 2016 12:30PM 
Mitali Bafna and Hamid Jahanjou 366 WVH 
Nov 16, 2016 12:30PM 
Harry Lang, Johns Hopkins 366 WVH (Abstract) In the streaming model, data points arrive sequentially and algorithms attempt to use a sublinear amount of memory. For a class of clustering functions (including kmeans, kmedian, and L_p norms), we introduce a new technique for converting a coreset construction in the RAM model into a coreset construction in the streaming model. On a stream of n points, our method yields streaming coreset algorithms requiring the storage of O(S + k log n) points, where S is the size of the coreset. The previous stateoftheart required O(S log^3 n) points. Note that S = \Omega(k), so this improves all parameters. In the RAM setting, we improve the size of coresets for a larger class of problems. This includes coresets for kmeans with nonnegative weights that depend on k as k \log k, and a Euclidean construction where the coreset size is independent of the dimension. 
Nov 23, 2016 
Thanksgiving  No Seminar 
Nov 30, 2016 12:30PM 
Ankit Garg, MSR 411 Ell Hall (Abstract) The celebrated BrascampLieb (BL) inequalities (and their extensions) are
an important mathematical tool, unifying and generalizing numerous inequalities in analysis,
convex geometry and information theory. While their structural theory is very well understood,
far less is known about computing their main parameters. We give polynomial time algorithms to compute: The algorithms are obtained by a simple efficient reduction of a given BLdatum to an instance of the Operator Scaling problem defined by Gurvits, for which the present authors provided a polynomial time algorithm recently. This will be based on joint work with Leonid Gurvits, Rafael Oliveira and Avi Wigderson. 
Dec 07, 2016 12:30PM 
Vitaly Shmatikov, Cornell Tech 411 Ell Hall (Abstract) Mylar is a framework by Popa et al. that uses multikey searchable encryption (MKSE) to build Web applications on top of encrypted data. We demonstrate that (1) the PopaZeldovich model for MKSE does not imply security against either passive or active attacks; (2) Mylarbased Web applications reveal usersâ€™ data and queries to passive and active adversarial servers; and (3) Mylar is generically insecure against active attacks due to system design flaws. Our results show that the problem of securing clientserver applications against actively malicious servers is challenging and still unsolved. 
Jun 16, 2016 3:30PM 
Ryan Rogers, University of Pennsylania 366 WVH (Abstract)
In this work, we initiate a principled study of how the generalization properties of approximate differential privacy can be used to perform adaptive hypothesis testing, while giving statistically valid pvalue corrections. We do this by observing that the guarantees of algorithms with bounded approximate maxinformation are sufficient to correct the pvalues of adaptively chosen hypotheses, and then by proving that algorithms that satisfy (approximate) (epsilon,delta)differential privacy have bounded approximate max information when their inputs are drawn from a product distribution. This substantially extends the known connection between differential privacy and maxinformation, which previously was only known to hold for (pure) (epsilon,0)differential privacy. It also extends our understanding of maxinformation as a partially unifying measure controlling the generalization properties of adaptive data analyses. We also show a lower bound, proving that (despite the strong composition properties of maxinformation), when data is drawn from a product distribution, approximate differentially private algorithms can come first in a composition with other algorithms satisfying maxinformation bounds, but not necessarily second if the composition is required to itself satisfy a nontrivial maxinformation bound. This, in particular, implies that the connection between approximate differential privacy and maxinformation holds only for inputs drawn from product distributions, unlike the connection between pure differential privacy and maxinformation.

Apr 14, 2016 2:50PM 
Gabor Lippner, Northeastern University Dept. of Mathematics 366 WVH (Abstract)
A local algorithm aims to compute some global structure on a graph using only simultaneous, local, decisions at the nodes. I will review recent developments on the possibilities and the limitations of such algorithms, and explain their connection to the theory of graph limits.

Apr 7, 2016 1:30PM 
Huy Nguyen, TTI Chicago 366 WVH (Abstract)
Challenges abound in modern large scale data analysis, ranging from the sheer volume of the data to its complex structure and the need for timely responses. A promising common approach centers on capturing key information from the data using concise representations that can be constructed in a distributed manner or from an evolving stream of data. In this talk, we will illustrate both the power and limitations of this approach using three research vignettes. In the first example, we describe an algorithmic framework for fundamental linear algebra problems based on short summaries. In the second example, we design distributed algorithms for a family of nonconvex problems arising in learning applications. In the third example, we show that even basic statistical estimation tasks require large summaries.

Apr 4, 2016 10:30AM 
Matt Weinberg, Princeton 366 WVH (Abstract) When real people interact with algorithms (e.g. in auctions, crowdsourcing, Bitcoin, etc.), they impose additional desiderata beyond simply that the algorithm is correct, fast, and uses little storage. People strategize during these interactions, so algorithms deployed in these settings must be robust against strategic manipulation. Additionally, people prefer transparent interactions, so these algorithms must also be as simple as possible. My research addresses these, and other novel challenges that arise when algorithms interact with strategic agents. In this talk, I will focus on robustness against strategic manipulation, and present a new algorithmic framework for these settings. Specifically, I will present a blackbox reduction from solving any optimization problem in strategic settings, where the input is held by selfish agents with interests of their own, to solving a perturbed version of that same optimization problem in traditional settings, where the input is directly given. I will further apply this framework to resolve two longstanding open problems: extending Myerson's celebrated characterization of optimal singleitem auctions to multiple items (Myerson 1981), and designing truthful mechanisms for job scheduling on unrelated machines (Nisan and Ronen 1999). Finally, I will briefly show how strategic considerations motivate nice questions in "traditional" areas of algorithm design as well, and present some of my work in convex optimization, parallel algorithms, and prophet inequalities. 
Mar 23, 2016 10:30AM 
Sofya Raskhodnikova, Penn State 366 WVH (Abstract) Massive datasets are becoming increasingly common. What useful computations can be performed on a dataset when reading all of it is prohibitively expensive? This question, fundamental to several fields, is at the heart of the research area, called sublineartime algorithms, that has provided important insights into fast approximate computation. In this talk, we will consider types of computational tasks central to sublineartime algorithms: approximation, testing, and learning. We will see examples of sublineartime algorithms in several domains. The algorithms themselves are typically simple and efficient, but their analysis requires insights into basic combinatorial, algebraic, and geometric questions. We will also discuss new directions in sublineartime algorithms, including new computational tasks, new measures for accuracy guarantees, and new models for data access. These directions enable applications of sublineartime algorithms in privacy, analysis of realvalued data, and situations where the data is noisy or incomplete. 
Mar 16, 2016 10:30AM 
Mohit Singh, Microsoft Research Redmond 366 WVH (Abstract) In this talk I will discuss the problem of computing maximum entropy distributions over a discrete set of objects subject to observed marginals. While there has been a tremendous amount of interest in such distributions due to their applicability in areas such as statistical physics, economics, information theory, machine learning, combinatorics and algorithms, a rigorous and systematic study of how to compute such distributions had been lacking. In this talk, I will show that the problem of computing maximum entropy distributions, in a fairly general setting, is equivalent to a counting problem on the underlying set. One of the consequences of this equivalence is that, for several combinatorial settings, there is an efficient algorithm to compute maxentropy distributions. I will also talk about applications of maximum entropy distributions to design approximation algorithms for discrete optimization problems including the Traveling Salesman Problem (TSP). These results include the first improvements over the classical Christofides' algorithm from 1970's for TSP on graph metrics. 
Mar 15, 2016 4:00PM 
Nils Olberg, Aachen 366 WVH (Abstract) In the concurrent multicast problem, we are given a graph representing a communication network together with a set of senderreceiver pairs. The goal is to deliver the message of each sender to the corresponding receiver. The concurrent multicast problem and its variants are motivated by information dissemination problems arising in diverse network scenarios. In this talk, we consider two of the most prominent communication models, namely the telephone and the radio model. For each of these models, our aim to develop algorithms that give good approximation algorithms, since the underlying problems are known to be NPhard. Using sparse partition covers we provide a first nontrivial bound for radio concurrent multicast. We further demonstrate the connection of telephone concurrent multicast to other network design problems, which motivates the development of approximation algorithms for the latter. Finally, we show that our methods can be extended to yield new results for minimum degree graph spanners. 
Mar 14, 2016 10:30AM 
Vipul Goyal, Microsoft Research India 366 WVH (Abstract) A central challenge in the design of secure systems is to defend against maninthemiddle attacks, where an adversary can arbitrarily tamper with the messages exchanged by two parties over a communication channel. Starting with the early nineties, an important research goal in cryptography has been to build ``nonmalleable'' cryptographic protocols that are resilient to such attacks. In this talk, I will describe my work that culminates this twodecade long research quest by constructing roundoptimal nonmalleable protocols based on almost minimal cryptographic assumptions. I will also discuss how the techniques developed in these works have transcended cryptography and found applications in randomness extraction, coding theory, and complexity theory. I will also briefly talk about my work on applied cryptography and its impact. 
Feb 29, 2016 10:30AM 
Timothy Chan, University of Waterloo 366 WVH (Abstract) Although data structures in computational geometry have been studied since the 1970s, recent years have seen exciting new developments that deepen our theoretical understanding of some of the most fundamental problems in the area. In this talk, I will survey work on a number of these core problems, including (static and dynamic) 2D point location, 2D nearest neighbor searching, and orthogonal and nonorthogonal range searching. If time permits, I will also mention some unexpected applications of geometric data structures, to text indexing and allpairs shortest paths. 
Feb 9, 2016 10:30AM 
Mary Wootters, Carnegie Mellon University 366 WVH (Abstract) Error correcting codes are an important tool in communication, and the family of ReedSolomon (RS) codes is an especially important example. I'll introduce error correcting and ReedSolomon codes, and I'll talk about three very different ways that RS codes have come up in my research. These three ways (none of which immediately has to do with communication) run the gamut from theory to practice. The first way, on the very theoretical end, is a combinatorial problem called list decoding, which has motivations in complexity theory. The second way, a bit more downtoearth, has to do with designing schemes for distributed storage: how do we store a file on several servers robustly? The third way, on the practical side of things, is combinatorial group testing: in our work we collaborated with computational biologists to design better algorithms for highthroughput genetic screening and de novo gene sequencing. 
Jan 21, 2016 3:30PM 
Zhiwei Steven Wu, University of Pennsylvania 366 WVH (Abstract) We initiate the study of a quantity that we call coordination complexity. In a distributed optimization problem, the information defining a problem instance is distributed among n parties, who need to each choose an action, which jointly will form a solution to the optimization problem. The coordination complexity represents the minimal amount of information that a centralized coordinator, who has full knowledge of the problem instance, needs to broadcast in order to coordinate the n parties to play a nearly optimal solution. We show that upper bounds on the coordination complexity of a problem imply the existence of good jointly differentially private algorithms for solving that problem, which in turn are known to upper bound the price of anarchy in certain games with dynamically changing populations. We show several results. We fully characterize the coordination complexity for the problem of computing a manytoone matching in a bipartite graph by giving almost matching lower and upper bounds.Our upper bound in fact extends much more generally, to the problem of solving a linearly separable convex program. We also give a different upper bound technique, which we use to bound the coordination complexity of coordinating a Nash equilibrium in a routing game, and of computing a stable matching. 
Jan 6, 2016 12:00PM 
ChienChung Huang, Chalmers University of Technology 366 WVH (Abstract) We propose new exact and approximation algorithms for the weighted matroid intersection problem. Our exact algorithm is faster than previous algorithms when the largest weight is relatively small. Our approximation algorithm delivers a $(1\epsilon)$approximate solution with a running time significantly faster than most known exact algorithms. The core of our algorithms is a decomposition technique: we decompose an instance of the weighted matroid intersection problem into a set of instances of the unweighted matroid intersection problem. The computational advantage of this approach is that we can make use of fast unweighted matroid intersection algorithms as a black box for designing algorithms. Precisely speaking, we prove that we can solve the weighted matroid intersection problem via solving $W$ instances of the unweighted matroid intersection problem, where $W$ is the largest given weight. Furthermore, we can find a $(1\epsilon)$approximate solution via solving $O(\epsilon^{1} \log r)$ instances of the unweighted matroid intersection problem, where $r$ is the smallest rank of the given two matroids. 
Dec 8, 2015 3:30PM 
Elliot Anshelevich, RPI 366 WVH (Abstract)
We examine the quality of social choice mechanisms using a utilitarian view, in which all of the agents have costs for each of the possible alternatives. While these underlying costs determine what the optimal alternative is, they may be unknown to the social choice mechanism; instead the mechanism must decide on a good alternative based only on the ordinal preferences of the agents which are induced by the underlying costs. Due to its limited information, such a social choice mechanism cannot simply select the alternative that minimizes the total social cost (or minimizes some other objective function). Thus, we seek to bound the distortion: the worstcase ratio between the social cost of the alternative selected and the optimal alternative. Distortion measures how good a mechanism is at approximating the alternative with minimum social cost, while using only ordinal preference information. The underlying costs can be arbitrary, implicit, and unknown; our only assumption is that the agent costs form a metric space, which is a natural assumption in many settings. We quantify the distortion of many wellknown social choice mechanisms. We show that for both total social cost and median agent cost, many positional scoring rules have large distortion, while on the other hand Copeland and similar mechanisms perform optimally or nearoptimally, always obtaining a distortion of at most 5. We also give lower bounds on the distortion that could be obtained by *any* deterministic social choice mechanism, and extend our results on median agent cost to more general objective functions.

Nov 12, 2015 3:30PM 
Cris Moore, Santa Fe Institute 177 Huntington Ave, 11th Floor (Abstract) October 22, 3:305:00 PM Lecture I: Computational complexity and landscapes o Two puzzles: Euler vs. Hamilton o P vs. NP and NPcompleteness: building computers o When greed works: Minimum Spanning Tree and Max Flow o Bumpy energy landscapes and local optima October 29, 3:305:00 PM Lecture II: Phase transitions in random graphs o The emergence and size of the giant component o Branching processes and differential equations o Power laws at criticality o The kcore and discontinuous transitions November 5, 3:305:00 PM Lecture III: Phase transitions in random kSAT o Early history and phenomenology o First and second moment bounds o Algorithms, clustering, and frozen variables o Why we believe in a regime where solutions exist but are hard to find November 12, 3:305:00 PM Lecture IV: Community detection in networks o The stochastic block model o The analogy between statistical physics and statistical inference o Belief propagation and variational methods o The detectability transition o (If there's time) Spectral clustering 
Nov 10, 2015 12:00PM 
Bobby Kleinberg 366 WVH (Abstract) How much information can be learnt about an individual by observing the outcome of a computation? If the computation is differentially private, the answer is: "not much more than if the individual's data had been excluded from the input." A stronger notion of privacy, originally propounded by Dalenius in the 1970's, requires instead that it should not be possible to learn much more about an individual than if the outcome of the computation had never been revealed. Simple examples, as well as a general impossibility theorem of Dwork and Naor, preclude the possibility of supplying this stronger "inferential" guarantee against attackers with arbitrary auxiliary information, assuming the computation is at least minimally useful. In this talk we revisit the notion of inferential privacy and ask: under what limitations on the adversary's side information can we deduce an inferential guarantee from a differential one? We model the adversary's side information as a prior distribution over datasets (or, more generally, a set of possible priors) and prove two main results. The first result pertains to distributions that satisfy a natural positiveaffiliation condition, and gives an upper bound on the inferential privacy guarantee for any differentially private mechanism. This upper bound is matched by a simple mechanism that adds Laplace noise to the sum of the data. The second result pertains to distributions that have weak correlations, defined in terms of a suitable "influence matrix". The result provides an upper bound for inferential privacy in terms of the differential privacy parameter and the spectral norm of this matrix. 
Nov 5, 2015 3:30PM 
Cris Moore, Santa Fe Institute 177 Huntington Ave, 11th Floor (Abstract) October 22, 3:305:00 PM Lecture I: Computational complexity and landscapes o Two puzzles: Euler vs. Hamilton o P vs. NP and NPcompleteness: building computers o When greed works: Minimum Spanning Tree and Max Flow o Bumpy energy landscapes and local optima October 29, 3:305:00 PM Lecture II: Phase transitions in random graphs o The emergence and size of the giant component o Branching processes and differential equations o Power laws at criticality o The kcore and discontinuous transitions November 5, 3:305:00 PM Lecture III: Phase transitions in random kSAT o Early history and phenomenology o First and second moment bounds o Algorithms, clustering, and frozen variables o Why we believe in a regime where solutions exist but are hard to find November 12, 3:305:00 PM Lecture IV: Community detection in networks o The stochastic block model o The analogy between statistical physics and statistical inference o Belief propagation and variational methods o The detectability transition o (If there's time) Spectral clustering 
Oct 29, 2015 3:30PM 
Cris Moore, Santa Fe Institute 177 Huntington Ave, 11th Floor (Abstract) October 22, 3:305:00 PM Lecture I: Computational complexity and landscapes o Two puzzles: Euler vs. Hamilton o P vs. NP and NPcompleteness: building computers o When greed works: Minimum Spanning Tree and Max Flow o Bumpy energy landscapes and local optima October 29, 3:305:00 PM Lecture II: Phase transitions in random graphs o The emergence and size of the giant component o Branching processes and differential equations o Power laws at criticality o The kcore and discontinuous transitions November 5, 3:305:00 PM Lecture III: Phase transitions in random kSAT o Early history and phenomenology o First and second moment bounds o Algorithms, clustering, and frozen variables o Why we believe in a regime where solutions exist but are hard to find November 12, 3:305:00 PM Lecture IV: Community detection in networks o The stochastic block model o The analogy between statistical physics and statistical inference o Belief propagation and variational methods o The detectability transition o (If there's time) Spectral clustering 
Oct 22, 2015 3:30PM 
Cris Moore, Santa Fe Institute 177 Huntington Ave, 11th Floor (Abstract) October 22, 3:305:00 PM Lecture I: Computational complexity and landscapes o Two puzzles: Euler vs. Hamilton o P vs. NP and NPcompleteness: building computers o When greed works: Minimum Spanning Tree and Max Flow o Bumpy energy landscapes and local optima October 29, 3:305:00 PM Lecture II: Phase transitions in random graphs o The emergence and size of the giant component o Branching processes and differential equations o Power laws at criticality o The kcore and discontinuous transitions November 5, 3:305:00 PM Lecture III: Phase transitions in random kSAT o Early history and phenomenology o First and second moment bounds o Algorithms, clustering, and frozen variables o Why we believe in a regime where solutions exist but are hard to find November 12, 3:305:00 PM Lecture IV: Community detection in networks o The stochastic block model o The analogy between statistical physics and statistical inference o Belief propagation and variational methods o The detectability transition o (If there's time) Spectral clustering 