# Computer Science Theory at Northeastern

## Seminar

Click here to subscribe to our mailing list.
Click here to subscribe to our Google Calendar.

## Theory Seminar

This semester the theory seminar will be every Thursday from 12:00-1:30pm. We have reserved Forsyth 150, but Room 655 in ISEC will sometimes be available. The Theory Seminar will occur every week. If we do not have an invited speaker, we will have short presentations from students or faculty. Write to Jonathan Ullman with suggestions for speakers!

## Fall Semester, 2018

 September 1312:00PM150 Forsyth Eylon Yogev, Weizmann Institute of Science Search Problems: A Cryptographic Perspective Abstract The class TFNP is the search analog of NP with the additional guarantee that any instance has a solution. TFNP has attracted extensive attention due to its natural syntactic subclasses that capture the computational complexity of important search problems from algorithmic game theory, combinatorial optimization and computational topology. Thus, one of the main research objectives in the context of TFNP is to search for efficient algorithms for its subclasses, and at the same time proving hardness results where efficient algorithms cannot exist. Currently, no problem in TFNP is known to be hard under assumptions such as NP hardness, the existence of one-way functions, or even public-key cryptography. The only known hardness results are based on less general assumptions such as the existence of collision-resistant hash functions, one-way permutations less established cryptographic primitives (e.g. program obfuscation or functional encryption). Several works explained this status by showing various barriers to proving hardness of TFNP. In particular, it has been shown that hardness of TFNP hardness cannot be based on worst-case NP hardness, unless NP=coNP. Therefore, we ask the following question: What is the weakest assumption sufficient for showing hardness in TFNP? In this talk, I will answer this question and show that hard-on-average TFNP problems can be based on the weak assumption that there exists a hard-on-average language in NP. In particular, this includes the assumption of the existence of one-way functions. In terms of techniques, there is an interesting interplay between problems in TFNP and derandomization techniques. Based on joint works with Pavel Hubáček, and Moni Nao September 2012:00PM150 Forsyth Alina Ene, Boston University Submodular Maximization with Nearly-optimal Approximation and Adaptivity Abstract In this talk, we present recent progress on understanding the tradeoff between the approximation guarantee and adaptivity for submodular maximization. The adaptivity of an algorithm is the number of sequential rounds of queries it makes to the evaluation oracle of the function, where in every round the algorithm is allowed to make polynomially-many parallel queries. Adaptivity is an important consideration in settings where the objective function is estimated using samples and in applications where adaptivity is the main running time bottleneck. We present nearly-optimal algorithms for submodular maximization subject to a cardinality constraint and more generally, subject to multiple packing constraints. This talk is based on joint work with Huy Nguyen (Northeastern) and Adrian Vladu (Boston University). September 2712:00PM655 ISEC Lucianna Kiffer (internal) A Better Method to Analyze Blockchain Consistency Abstract The celebrated Nakamoto consensus protocol ushered in several new consensus applications including cryptocurrencies. A few recent works have analyzed important properties of blockchains, including most significantly, consistency, which is a guarantee that all honest parties output the same sequence of blocks throughout the execution of the protocol. To establish consistency, the prior analysis of Pass, Seeman and shelat required a careful counting of certain combinatorial events that was difficult to apply to variations of Nakamoto. Work by Garay, Kiayas, and Leonardas provides another method of analyzing the blockchain under the simplifying assumption that the network was synchronous. The contribution of this paper is the development of a simple Markov-chain based method for analyzing consistency properties of blockchain protocols. The method includes a formal way of stating strong concentration bounds as well as easy ways to concretely compute the bounds. We use our new method to answer a number of basic questions about consistency of blockchains: Our new analysis provides a tighter guarantee on the consistency property of Nakamoto’s protocol, including for parameter regimes which previous analysis could not consider We analyze a family of delaying attacks and extend them to other protocols; We analyze how long a participant should wait before considering a high-value transaction “confirmed”; We analyze the consistency of CliqueChain, a variation of the Chainweb system; We provide the first rigorous consistency analysis of GHOST and also analyze a folklore “balancing"-attack. In each case, we use our framework to experimentally analyze the consensus bounds for various network delay parameters and adversarial computing percentages. We hope our techniques enable authors of future blockchain proposals to provide a more rigorous analysis of their schemes. October 412:00PM150 Forsyth Maryam Aliakbarpour, MIT New directions in distribution testing Abstract In this talk, we will discuss new directions in testing properties of distributions. Distributions are at the core of data-related scientific endeavors, and due to their importance, they have been studied for past decades in Statistics and Computer Science. The goal is to find an algorithm to test the properties of distributions with provable performance guarantees, using as few samples as possible. We discuss the fundamental problems of identity (goodness of fit) and closeness (equivalence) testing concerning the new challenges: When the samples are sensitive data of participants, how can we make sure the output of our test preserves the privacy of the participants? We proposed algorithms to test uniformity, identity, and closeness of distributions that can achieve privacy with almost no cost. We demonstrate Our experimental evaluation. How can we test distribution in the presence of the noise? In what noise models can we perform better than almost linear? We consider the model that the underlying distribution is a mixture of the original distribution and the noise distribution. We proved if the noise distribution is known or if we can sample the noise distribution, we can test identity and closeness with almost no extra cost. October 1112:00PM150 Forsyth Tanay Mehta (internal) Resilience and the Coevolution of Interdependent Multiplex Networks Abstract We propose a new model for the study of resilience of coevolving multiplex scale-free networks. Our network model, called preferential interdependent networks, is a novel continuum over scale-free networks parameterized by their correlation ρ, 0 ≤ ρ ≤ 1. Our failure and recovery model ties the propensity of a node, both to fail and to assist in recovery, to its importance. We show, analytically, that our network model can achieve any γ,2 ≤ γ ≤ 3 for the exponent of the power law of the degree distribution; this is superior to existing multiplex models and allows us better fidelity in representing real-world networks. Our failure and recovery model is also a departure from the much studied cascading error model based on the giant component; it allows for surviving important nodes to send assistance to the damaged nodes to enable their recovery. This better reflects the reality of recovery in man-made networks such as social networks and infrastructure networks. Our main finding, based on simulations, is that resilient preferential interdependent networks are those in which the layers are neither completely correlated (ρ = 1) nor completely uncorrelated (ρ = 0) but instead semi-correlated (ρ ≈ 0.1 − 0.3). This finding is consistent with the real-world experience where complex man-made networks typically bounce back quickly from stress. In an attempt to explain our intriguing empirical discovery we present an argument for why semi-correlated multiplex networks can be the most resilient. Our argument can be seen as an explanation of plausibility or as an incomplete mathematical proof subject to certain technical conjectures that we make explicit. Joint work with Auroop Ganguly, Ravi Sundaram and Devesh Tiwari October 1812:00PM655 ISEC Manolis Zampetakis, MIT Iterative Methods for Efficient Statistics, in High Dimensions Abstract We consider the classical statistical problem of estimating to arbitrary accuracy the parameters of a multidimensional Gaussian distribution under the following real-world obstacles: (1) truncated samples, (2) inhomogeneous population. Statistical estimation from truncated samples is a classical problem, going back to Galton, Pearson, and Fisher. Truncated samples are samples that are only revealed if they fall in some subset S of the multi-dimensional Euclidean space. We provide and analyze an iterative algorithm with almost optimal sample and time complexity that recovers the parameters of a Gaussian distribution with arbitrary accuracy, from truncated samples. We prove the fast convergence of the celebrated iterative algorithm EM to efficiently estimate the means of Gaussian distributions in the inhomogeneous case of mixture of two Gaussian distributions. Abstracting the use of iterative algorithms in the previous important statistical problems, we consider the general question of analyzing the convergence of iterative algorithms. We prove that contraction maps provide a universal analysis tool for proving global convergence of iterative maps. Based on joint works with: Constantinos Daskalakis, Themis Gouleakis and Christos Tzamos. November 112:00PM655 ISEC Ran Cohen (internal) Must the Communication Graph of MPC Protocols be an Expander? Abstract Secure multiparty computation (MPC) on incomplete communication networks has been studied within two primary models: (1) Where a partial network is fixed a priori, and thus corruptions can occur dependent on its structure, and (2) Where edges in the communication graph are determined dynamically as part of the protocol. Whereas a rich literature has succeeded in mapping out the feasibility and limitations of graph structures supporting secure computation in the fixed-graph model (including strong classical lower bounds), these bounds do not apply in the latter dynamic-graph setting, which has recently seen exciting new results, but remains relatively unexplored. In this work, we initiate a similar foundational study of MPC within the dynamic-graph model. As a first step, we investigate the property of graph expansion. All existing protocols (implicitly or explicitly) yield communication graphs which are expanders, but it is not clear whether this is inherent. Our results consist of two types (for constant fraction of corruptions): Upper bounds: We demonstrate secure protocols whose induced communication graphs are not expander graphs, within a wide range of settings (computational, information theoretic, with low locality, even with low locality and adaptive security) each assuming some form of input-independent setup. Lower bounds: In the setting without setup and adaptive corruptions, we demonstrate that for certain functionalities, no protocol can maintain a non-expanding communication graph against all adversarial strategies. Our lower bound relies only on protocol correctness (not privacy), and requires a surprisingly delicate argument. More generally, we provide a formal framework for analyzing the evolving communication graph of MPC protocols, giving a starting point for studying the relation between secure computation and further, more general graph properties. November 812:00PM655 ISEC Seth Neel, University of Pennsylvania Heuristics for Differential Privacy Abstract We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven — without making heuristic assumptions. We show that learning problems over broad classes of functions — those that have polynomially sized universal identification sets — can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider classes of functions for which both they and their dual classes have small universal identification sets. This includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete halfspaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non- private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper). November 1512:00PM655 ISEC Albert Cheu (internal) Distributed Differential Privacy via Shuffling Abstract We consider the problem of designing scalable, robust protocols for computing statistics about sensitive data. Specifically, we look at how best to design differentially private protocols in a distributed setting, where each user holds a private datum. The literature has mostly considered two models: the “central” model, in which a trusted server collects users’ data in the clear, which allows greater accuracy; and the “local” model, in which users individually randomize their data, and need not trust the server, but accuracy is limited. Attempts to achieve the accuracy of the central model without a trusted server have so far focused on variants of cryptographic MPC, which limits scalability. In this paper, we initiate the analytic study of a shuffled model for distributed differentially private algorithms, which lies between the local and central models. This simple-to-implement model, a special case of the ESA framework of [Bittau et al., '17], augments the local model with an anonymous channel that randomly permutes a set of user-supplied messages. For sum queries, we show that this model provides the power of the central model while avoiding the need to trust a central server and the complexity of cryptographic secure function evaluation. More generally, we give evidence that the power of the shuffled model lies strictly between those of the central and local models: for a natural restriction of the model, we show that shuffled protocols for a widely studied selection problem require exponentially higher sample complexity than do central-model protocols. November 2912:00PM655 ISEC Ilias Zadik, MIT Revealing Network Structure, Confidentially Abstract Motivated by growing concerns over ensuring privacy on social networks, we develop new algorithms and impossibility results for fitting complex statistical models to network data subject to rigorous privacy guarantees. We consider the so-called node-differentially private algorithms, which compute information about a graph or network while provably revealing almost no information about the presence or absence of a particular node in the graph. We provide new algorithms for node-differentially private estimation for a popular and expressive family of network models: stochastic block models and their generalization, graphons. Our algorithms improve on prior work reducing their error quadratically and matching, in many regimes, the optimal nonprivate algorithm. We also show that for the simplest random graph models (G(n, p) and G(n, m)), nodeprivate algorithms can be qualitatively more accurate than for more complex models—converging at a rate of $1 / \epsilon^2 n^3$ instead of $1 / \epsilon^2 n^2$. This result uses a new extension lemma for differentially private algorithms that we hope will be broadly useful.