Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10796 publications
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Not Like Us, Hunty: Measuring Perceptions and Behavioral Effects of Minoritized Anthropomorphic Cues in LLMs
Jeffrey Basoah
Daniel Chechelnitsky
Tao Long
Katharina Reinecke
Chrysoula Zerva
Kaitlyn Zhou
Maarten Sap
Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, ACM (2025), pp. 710-745
Preview abstract
As large language models (LLMs) increasingly adapt and personalize to diverse sets of users, there is an increased risk of systems appropriating sociolects, i.e., language styles or dialects that are associated with specific minoritized lived experiences (e.g., African American
English, Queer slang). In this work, we examine whether sociolect usage by a LLM agent affects user reliance on its outputs and user perception (satisfaction, frustration, trust, and social presence). We designed and conducted user studies where 498 African American English (AAE) speakers and 487 Queer slang speakers performed a set of question-answering tasks with LLM-based suggestions in either standard American English (SAE) or their self-identified sociolect.
Our findings showed that sociolect usage by LLMs influenced both reliance and perceptions, though in some surprising ways. Results suggest that both AAE and Queer slang speakers relied more on the SAELM, and had more positive perceptions of the SAELM. Yet, only Queer slang speakers felt more social presence from the QSLM over the SAE one, whereas only AAE speakers preferred and trusted the SAELM over the AAE one. These findings emphasize the need to test for behavioral outcomes rather than simply assume that personalization would lead to a better and safer reliance outcome. They also highlight the nuanced dynamics of minoritized language in machine interactions, underscoring the need for LLMs to be carefully designed to respect cultural and linguistic boundaries while fostering genuine user engagement and trust.
View details
DORA 2025 State of AI-assisted Software Development Report
Derek DeBellis
Matt Beane
Edward Fraser
Ben Good
Eirini Kalliamvakou
Gene Kim
Daniella Villalba
DORA, Google (2025)
Preview abstract
In 2025, the central question for technology leaders is no longer if they should adopt AI, but how to realize its value. DORA’s research includes more than 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world. The research reveals a critical truth: AI’s primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.
View details
Preview abstract
We study a variant of the job-shop scheduling problem, where unit-sized jobs must be assigned to a single machine, but the capacity of the machine changes over time.
View details
Preview abstract
The quest to identify quantum advantages, where quantum physics truly outperforms classical physics, lies at the heart of quantum technology. While quantum devices promise extraordinary capabilities, from exponential computational speedups to unprecedented measurement precision, distinguishing genuine advantages from mere illusions remains a formidable challenge. In this endeavor, quantum theorists are like prophets trying to foretell a future where quantum technologies reign supreme. Yet, the boundary between visionary insight and unfounded fantasy is perilously thin. In this perspective, we explore the properties defining an ideal quantum advantage and examine our mathematical tools for navigating the vast world of quantum advantages across computation, learning, sensing, communication, and beyond. We show that some quantum advantages are inherently unpredictable using classical resources alone, suggesting a landscape far richer than what we can currently foresee. While mathematical rigor remains our indispensable guide in this exploration, the ultimate power of quantum technologies may emerge from the quantum advantages we cannot yet conceive.
View details
Algebraic Geometry Codes and Decoded Quantum Interferometry
Stephen Jordan
Andi Gu
(2025)
Preview abstract
Decoded Quantum Interferometry (DQI) defines a duality that pairs optimization
problems with optimization problems. The original work on DQI considered Reed-
Solomon decoding, whose dual optimization problem, called Optimal Polynomial In-
tersection (OPI), is a polynomial regression problem over a finite field. Here, we
consider a class of algebraic geometry codes called Hermitian codes. We show that the
dual optimization problem, which we call Hermitian Optimal Polynomial Intersection
(HOPI), is a polynomial regression problem over a Hermitian curve. Because the dual
to a Hermitian code is another Hermitian code, the HOPI problem can also be viewed
as approximate list recovery for Hermitian codes. By comparing to Prange’s algorithm,
simulated annealing, and algebraic list recovery algorithms, we find a large parameter
regime in which DQI efficiently achieves a better approximation than these classical
algorithms. Our findings suggest that the apparent quantum speedup offered by DQI
on the OPI problem may be a special case of a broader quantum speedup for a more
general class of problems regarding polynomial regression on algebraic varieties.
View details
Preview abstract
Deep residual architectures, such as ResNet and the Transformer, have enabled models of unprecedented depth, yet a formal understanding of why depth is so effective remains an open question. A popular intuition, following Veit et al. (2016), is that these residual networks behave like ensembles of many shallower models. Our key finding is an explicit analytical formula that verifies this ensemble perspective, proving that increasing network depth is mathematically equivalent to expanding the size of this implicit ensemble. Furthermore, our expansion reveals a hierarchical ensemble structure in which the combinatorial growth of computation paths leads to an explosion in the output signal, explaining the historical necessity of normalization layers in training deep models. This insight offers a first-principles explanation for the historical dependence on normalization layers and sheds new light on a family of successful normalization-free techniques like SkipInit and Fixup. However, while these previous approaches infer scaling factors through optimizer analysis or a heuristic analogy to
Batch Normalization, our work offers the first explanation derived directly from
the network’s inherent functional structure. Specifically, our Residual Expansion
Theorem reveals that scaling each residual module provides a principled solution to taming the combinatorial explosion inherent to these architectures. We further show that this scaling acts as a capacity controls that also implicitly regularizes the model’s complexity.
View details
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan
Chen Wu
Charles Ding
Aditi Raghunathan
2025
Preview abstract
We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic; multi-token approaches, namely teacherless training and diffusion models, comparatively excel in producing diverse and original output. Secondly, to elicit randomness without hurting coherence, we find that injecting noise at the input layer (dubbed seed-conditioning) works surprisingly as well as (and in some conditions, better than) temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and temperature sampling.
View details
Improved Lower Bound for Differentially Private Facility Location
Information Processing Letters, 187 (2025)
Preview abstract
We consider the differentially private (DP) facility location problem in the so called super-set output setting proposed by Gupta et al. [GLM+10]. The current best known expected approximation ratio for an ε-DP algorithm is O(log n / √ε) due to Cohen-Addad et al. [CEF+22] where n denote the size of the metric space, meanwhile the best known lower bound is Ω(1/√ε) [EGLW19].
In this short note, we give a lower bound of Ω(min{log n, √(log n/ε)}) on the expected approximation ratio of any ε-DP algorithm, which is the first evidence that the approximation ratio has to grow with the size of the metric space.
View details
Machine learning the effects of many quantum measurements
Wanda Hou
Samuel Garratt
Norhan Eassa
Yi-Zhuang You
Ehud Altman
arXiv (2025)
Preview abstract
Measurements are essential for the processing and protection of information in quantum computers. They can also induce long-range entanglement between unmeasured qubits. However, when post-measurement states depend on many non-deterministic measurement outcomes, there is a barrier to observing and using the entanglement induced by prior measurements. Here we demonstrate a new approach for detecting such measurement-induced entanglement. We create short-range entangled states of one- and two-dimensional arrays of qubits in a superconducting quantum processor, and aim to characterize the long-range entanglement induced between distant pairs of qubits when we measure all of the others. To do this we use unsupervised training of neural networks on observations to create computational models for post-measurement states and, by correlating these models with experimental data, we reveal measurement-induced entanglement. Our results additionally demonstrate a transition in the ability of a classical agent to accurately model the experimental data; this is closely related to a measurement-induced phase transition. We anticipate that our work can act as a basis for future experiments on quantum error correction and more general problems in quantum control.
View details
Quasiparticle-induced decoherence of a driven superconducting qubit
Mykola Kishmar
Pavel Kurilovich
Vlad Kurilovich
Thomas Connolly
Andrey Klots
Igor Aleiner
arXiv (2025)
Preview abstract
We develop a theory for two quasiparticle-induced decoherence mechanisms of a driven superconducting qubit. In the first mechanism, an existing quasiparticle (QP) tunnels across the qubit’s Josephson junction while simultaneously absorbing a qubit excitation and one (or several) photons from the drive. In the second mechanism, a qubit transition occurs during the non-linear absorption process converting multiple drive quanta into a pair of new QPs. Both mechanisms can remain significant in gap engineered qubits whose coherence is insensitive to QPs without the drive. Our theory establishes a fundamental limitation on fidelity of the microwave qubit operations—such as readout and gates—stemming from QPs.
View details
Preview abstract
Several resource allocation settings involve agents with unequal entitlements represented by weights. We analyze weighted fair division from an asymptotic perspective: if m items are divided among n agents whose utilities are independently sampled from a probability distribution, when is it likely that a fair allocation exist? We show that if the ratio between the weights is bounded, a weighted envy-free allocation exists with high probability provided that m = Ω(n log n/ log log n), generalizing a prior unweighted result. For weighted proportionality, we establish a sharp threshold of m = n/(1 − μ) for the transition from non-existence to existence, where μ ∈ (0, 1) denotes the mean of the distribution. In addition, we prove that for two agents, a weighted envy-free (and weighted proportional) allocation is likely to exist if m = ω(√r), where r denotes the ratio between the two weights.
View details
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Hailey Joren
Jianyi Zhang
Chun-Sung Ferng
Ankur Taly
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a method to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that larger models with higher baseline performance (Gemini 1.5 Pro, GPT 4o, Claude 3.5) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, smaller models with lower baseline performance (Llama 3.1, Mistral 3, Gemma 2) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma.
View details
Steering Self-Evaluation: Interpreting LLM’s Reasoning Across Domains and Languages
Praveen Hegde
2025
Preview abstract
Understanding and controlling the reasoning processes of large language models (LLMs) is crucial for their reliable deployment. In this work, we investigate the latent representation of self-evaluation behavior - the ability of a model to assess its own reasoning steps - a vital behavior for robust reasoning. Through targeted steering vector computation, we identify a direction within LLM activations that represents this self-evaluation behavior. Crucially, we demonstrate that this steering vector for self-evaluation exhibits remarkable cross-contextual efficacy, working well across different domains (e.g., math and medicine) and languages (e.g., English and Spanish). This suggests that the identified latent direction captures a fundamental, abstract representation of self-evaluation within the LLM's internal state, offering a promising avenue for interpretable and controllable reasoning across diverse applications.
View details