Mariana Raykova
I work in the areas of cryptography and security. I am interested in both theoretical work that develops new cryptographic tools and applied cryptography projects that aim to use and implement cryptographic protocols in systems in order to enhance their security properties. My research includes work in the areas of secure computation, oblivious data structures, zero knowledge and verifiable computation, obfuscation.
I received my PhD from the Computer Science Department of Columbia University and I was co-advised by Tal Malkin and Steve Bellovin. After my PhD I spent a year as a postdoc at the Cryptography Group at IBM Research Watson. I was a Research Scientist at the Computer Science Laboratory at SRI International between 2013 and 2015. Following that I was an Assistant Professor at the Department of Computer Science at Yale University between 2016 and 2018. I joined Google as a Research Scientist in 2019.
I received my PhD from the Computer Science Department of Columbia University and I was co-advised by Tal Malkin and Steve Bellovin. After my PhD I spent a year as a postdoc at the Cryptography Group at IBM Research Watson. I was a Research Scientist at the Computer Science Laboratory at SRI International between 2013 and 2015. Following that I was an Assistant Professor at the Department of Computer Science at Yale University between 2016 and 2018. I joined Google as a Research Scientist in 2019.
Research Areas
Authored Publications
Sort By
Distributed, Private, Sparse Histograms in the Two-Server Model
Adria Gascon
James Bell
Phillipp Schoppmann
CCS 2022
Preview abstract
We consider the computation of sparse, (ε, ϑ)-differentially private~(DP) histograms in the two-server model of secure multi-party computation~(MPC), which has recently gained traction in the context of privacy-preserving measurements of aggregate user data. We introduce protocols that enable two semi-honest non-colluding servers to compute histograms over the data held by multiple users, while only learning a private view of the data. Our solution achieves the same asymptotic l∞-error of O(log(1/ϑ)/ε) as in the central model of DP, but without relying on a trusted curator. The server communication and computation costs of our protocol are independent of the number of histogram buckets, and are linear in the number of users, while the client cost is independent of the number of users, ε, and ϑ. Its linear dependence on the number of users lets our protocol scale well, which we confirm using microbenchmarks: for a billion users, ε = 0.5, and ϑ = 10-11, the per-user cost of our protocol is only 1.08 ms of server computation and 339 bytes of communication. In contrast, a baseline protocol using garbled circuits only allows up to 106 users, where it requires 600 KB communication per user.
View details
Preview abstract
We introduce the first construction for secure two-party computation of Poisson regression,
which enables two parties who hold shares of the input samples to learn only the resulting
Poisson model while protecting the privacy of the inputs.
Our construction relies on new protocols for secure fixed-point exponentiation and correlated matrix multiplications. Our secure exponentiation construction avoids expensive bit
decomposition and achieves orders of magnitude improvement in both online and offline costs
over state of the art works. As a result, the dominant cost for our secure Poisson regression
are matrix multiplications with one fixed matrix. We introduce a new technique, called correlated Beaver triples, which enables many such multiplications at the cost of roughly one matrix
multiplication. This further brings down the cost of secure Poisson regression.
We implement our constructions and show their extreme efficiency. In a LAN setting, our
secure exponentiation for 20-bit fractional precision takes less than 0.07ms with a batch-size of
100,000. One iteration of secure Poisson regression on a dataset with 10, 000 samples with 1000
binary features needs about 65.82s in the offline phase, 55.14s in the online phase and 17MB
total communication. For several real datasets this translates into training that takes seconds
and only a couple of MB communication
View details
Communication–Computation Trade-offs in PIR
Asra Ali
Tancrède Lepoint
Sarvar Patel
Phillipp Schoppmann
Kevin Yeo
30th USENIX Security Symposium (2021)
Preview abstract
We study the computation and communication costs and their possible trade-offs in various constructions for private information retrieval (PIR), including schemes based on homomorphic encryption and the Gentry–Ramzan PIR (ICALP'05).
We improve over the construction of SealPIR (S&P'18) using compression techniques and a new oblivious expansion, which reduce the communication bandwidth by 80% while preserving essentially the same computation cost. We then present MulPIR, a PIR protocol additionally leveraging multiplicative homomorphism to implement the recursion steps in PIR. While using the multiplicative homomorphism has been considered in prior work, we observe that in combination with our other techniques, it introduces a meaningful tradeoff by significantly reducing communication, at the cost of an increased computational cost for the server, when the databases have large entries. For some applications, we show that this could reduce the total monetary server cost by up to 35%.
On the other end of the communication–computation spectrum, we take a closer look at Gentry–Ramzan PIR, a scheme with asymptotically optimal communication rate. Here, the bottleneck is the server's computation, which we manage to reduce significantly. Our optimizations enable a tunable tradeoff between communication and computation, which allows us to reduce server computation by as much as 85%, at the cost of an increased query size.
Finally, we introduce new ways to handle PIR over sparse databases (keyword PIR), based on different hashing techniques. We implement all of our constructions, and compare their communication and computation overheads with respect to each other for several application scenarios.
View details
Preview abstract
The private join and compute (PJC) functionality enables secure computation over data distributed across different databases, and is applicable to a wide range of applications, many of which address settings where the input databases are of significantly different sizes.
We introduce the notion of private information retrieval (PIR) with default, which enables two-party PJC functionalities in a way that hides the size of the intersection of the two databases and incurs sublinear communication cost in the size of the bigger database. We provide two constructions for this functionality, one of which requires offline linear communication, which can be amortized across queries, and one that provides sublinear cost for each query but relies on more computationally expensive tools. We construct inner-product PJC, which has applications to ads conversion measurement and contact tracing, relying on an extension of PIR with default. We evaluate the efficiency of our constructions, which can enable 28 PIR with default lookups on a database of size 2^25 (or inner-product PJC on databases with such sizes) with the communication of 44 MB, which costs less than 0.17 c. for the client and 26.48 c. for the server.
View details
Two-Sided Malicious Security for Private Intersection-Sum with Cardinality
Peihan Miao
Sarvar Patel
Advances in Cryptology – CRYPTO 2020 (2020), pp. 3-33
Preview abstract
Private intersection-sum with cardinality allows two parties, where each party holds a private set and one of the parties additionally holds a private integer value associated with each element in her set, to jointly compute the cardinality of the intersection of the two sets as well as the sum of the associated integer values for all the elements in the intersection, and nothing beyond that.
We present a new construction for private intersection sum with cardinality that provides malicious security with abort and guarantees that both parties receive the output upon successful completion of the protocol. A central building block for our constructions is a primitive called shuffled distributed oblivious PRF (DOPRF), which is a PRF that offers oblivious evaluation using a secret key shared between two parties, and in addition to this allows obliviously permuting the PRF outputs of several parallel oblivious evaluations. We present the first construction for shuffled DOPRF with malicious security. We further present several new sigma proof protocols for relations across Pedersen commitments, ElGamal encryptions, and Camenisch-Shoup encryptions that we use in our main construction, for which we develop new batching techniques to reduce communication.
We implement and evaluate the efficiency of our protocol and show that we can achieve communication cost that is only 4-5 times greater than the most efficient semi-honest protocol. When measuring monetary cost of executing the protocol in the cloud, our protocol is 25 times more expensive than the semi-honest protocol. Our construction also allows for different parameter regimes that enable trade-offs between communication and computation.
View details
Preview abstract
Secure aggregation is a cryptographic primitive that enables a server to learn the sum of the vector inputs of many clients. Bonawitz et al. (CCS 2017) presented a construction that incurs computation and communication for each client linear in the number of parties. While this functionality enables a broad range of privacy preserving computational tasks, scaling concerns limit its scope of use.
We present the first constructions for secure aggregation that achieve polylogarithmic communication and computation per client. Our constructions provide security in the semi-honest and the semi-malicious setting where the adversary controls the server and a γ-fraction of the clients, and correctness with up to δ-fraction dropouts among the clients. Our constructions show how to replace the complete communication graph of Bonawitz et al., which entails the linear overheads, with a k-regular graph of logarithmic degree while maintaining the security guarantees.
Beyond improving the known asymptotics for secure aggregation, our constructions also achieve very efficient concrete parameters. The semi-honest secure aggregation can handle a billion clients at the per client cost of the protocol of Bonawitz et al. for a thousand clients. In the semi-malicious setting with 104 clients, each client needs to communicate only with 3% of the clients to have a guarantee that its input has been added together with the inputs of at least 5000 other clients, while withstanding up to 5% corrupt clients and 5% dropouts. We also show an application of secure aggregation to the task of secure shuffling which enables the first cryptographically secure instantiation of the shuffle model of differential privacy.
View details
Private Intersection-Sum Protocols with Applications to Attributing Aggregate Ad Conversions
Mihaela Ion
Benjamin Kreuter
Erhan Nergiz
Sarvar Patel
Shobhit Saxena
David Shanahan
2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 370-389
Preview abstract
In this work, we discuss our successful efforts for
industry deployment of a cryptographic secure computation
protocol. The problem we consider is privately computing aggregate conversion rate of advertising campaigns.
This underlying functionality can be abstracted as Private
Intersection-Sum (PI-Sum) with Cardinality. In this setting
two parties hold datasets containing user identifiers, and one
of the parties additionally has an integer value associated
with each of its user identifiers. The parties want to learn
the number of identifiers they have in common and the sum
of the integer values associated with these users without
revealing any more information about their private inputs.
We identify the major properties and enabling factors
which make the deployment of a cryptographic protocol
possible, practical, and uniquely positioned as a solution for
the task at hand. We describe our deployment setting and
the most relevant efficiency measure, which in our setting is
communication overhead rather than computation. We also
present a monetary cost model that can be used as a unifying
cost measure and the computation model which reflect out
use-case: a low-priority batch computing.
We present three PI-Sum with cardinality protocols: our
currently deployed protocol, which relies on a Diffie-Hellman
style double masking, and two new protocols which leverage
more recent techniques for private set intersection (PSI) that
use Random Oblivious Transfer and encrypted Bloom filters.
We compare the later two protocol with our original solution
when instantiated with different additively homomorphic
encryption schemes. We implement our constructions and
compare their costs. We also compare with recent generic
approaches for computing on the intersection of two datasets
and show that our best protocol has monetary cost that is
20× less than the best known generic approach.
View details
Advances and Open Problems in Federated Learning
Brendan Avent
Aurélien Bellet
Mehdi Bennis
Arjun Nitin Bhagoji
Graham Cormode
Rachel Cummings
Rafael G.L. D'Oliveira
Salim El Rouayheb
David Evans
Josh Gardner
Adrià Gascón
Phillip B. Gibbons
Marco Gruteser
Zaid Harchaoui
Chaoyang He
Lie He
Zhouyuan Huo
Justin Hsu
Martin Jaggi
Tara Javidi
Gauri Joshi
Mikhail Khodak
Jakub Konečný
Aleksandra Korolova
Farinaz Koushanfar
Sanmi Koyejo
Tancrède Lepoint
Yang Liu
Prateek Mittal
Richard Nock
Ayfer Özgür
Rasmus Pagh
Ramesh Raskar
Dawn Song
Weikang Song
Sebastian U. Stich
Ziteng Sun
Florian Tramèr
Praneeth Vepakomma
Jianyu Wang
Li Xiong
Qiang Yang
Felix X. Yu
Han Yu
Arxiv (2019)
Preview abstract
Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges.
View details