Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10435 publications
Anchored diffusion for video face reenactment
Idan Kligvasser
Regev Cohen
Ehud Rivlin
Michael Elad
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2025), pp. 4087-4097
Preview abstract
Video generation has drawn significant interest recently, pushing the development of large-scale models capable of producing realistic videos with coherent motion. Due to memory constraints, these models typically generate short video segments that are then combined into long videos. The merging process poses a significant challenge, as it requires ensuring smooth transitions and overall consistency. In this paper, we introduce Anchored Diffusion, a novel method for synthesizing relatively long and seamless videos. We extend Diffusion Transformers (DiTs) to incorporate temporal information, creating our sequence-DiT (sDiT) model for generating short video segments. Unlike previous works, we train our model on video sequences with random non-uniform temporal spacing and incorporate temporal information via external guidance, increasing flexibility and allowing it to capture both short and long-term relationships. Furthermore, during inference, we leverage the transformer architecture to modify the diffusion process, generating a batch of non-uniform sequences anchored to a common frame, ensuring consistency regardless of temporal distance. To demonstrate our method, we focus on face reenactment, a task of transforming the action from the driving video to the source face. Through comprehensive experiments, we show our approach outperforms current techniques in producing longer consistent high-quality videos while offering editing capabilities.
View details
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona
Andrei Zanfir
Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (2025)
Preview abstract
We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to3d-motion diffusion model, and 2) a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls. This supports the generation of high quality video of variable length, easily controllable through text or speech via high-level representations of human faces and bodies. In contrast to previous work, our method does not require training for each person, does not rely on face detection and cropping, generates the complete image (not just the face or the lips), and considers a broad spectrum of scenarios (e.g. visible torso or diverse subject identities) that are critical to correctly synthesize humans who communicate. We also curate MENTOR, a new and diverse dataset with 3d pose and expression annotations, one order of magnitude larger than previous ones (800,000 identities) and with dynamic gestures, where we train and ablate our main technical contributions. VLOGGER outperforms state-of-the-art methods in three public benchmarks, considering image quality, identity preservation and temporal consistency while also generating upperbody gestures. We analyze the performance of VLOGGER with respect to multiple diversity metrics, showing that our architectural choices and the use of MENTOR benefit training a fair and unbiased model at scale. Finally we show applications in video editing and personalization.
View details
Preview abstract
We describe an efficient quantum algorithm for solving the linear matrix equation AX+XB=C, where A, B and C are given complex matrices and X is unknown. This is known as the Sylvester equation, a fundamental equation with applications in control theory and physics. Rather than encoding the solution in a quantum state in a fashion analogous to prior quantum linear algebra solvers, our approach constructs the solution matrix X in a block-encoding, rescaled by some factor. This allows us to obtain certain properties of the entries of X exponentially faster than would be possible from preparing X as a quantum state. The query and gate complexities of the quantum circuit that implements this block-encoding are almost linear in a condition number that depends on A and B, and depend logarithmically in the dimension and inverse error. We show how our quantum circuits can solve BQP-complete problems efficiently, discuss potential applications and extensions of our approach, its connection to Riccati equation, and comment on open problems.
View details
"It is important to consult" a linguist: Verb-Argument Constructions in ChatGPT and human experts' medical and financial advice
Chris Stewart
Alistair Windsor
J. Elliott Casal
PLOS One (2025)
Preview abstract
This paper adopts a Usage-Based Construction Grammar perspective to compare human- and AI-generated language, focusing on Verb-Argument Constructions (VACs) as a lens for analysis. Specifically, we examine solicited advice texts in two domains—Finance and Medicine—produced by humans and ChatGPT across different GPT models (3.5, 4, and 4o) and interfaces (3.5 Web vs. 3.5 API). Our findings reveal broad consistency in the frequency and distribution of the most common VACs across human- and AI-generated texts, though ChatGPT exhibits a slightly higher reliance on the most frequent constructions. A closer examination of the verbs occupying these constructions uncovers significant differences in the meanings conveyed, with a notable growth away from human-like language production in macro level perspectives (e.g., length) and towards humanlike verb-VAC patterns with newer models. These results underscore the potential of VACs as a powerful tool for analyzing AI-generated language and tracking its evolution over time.
View details
Preview abstract
This invited OFC 2025 tutorial will review recent progress and scaling limitations of IM-DD-based low-cost optical interconnects. It will examine how datacenter-reach optimized coherent optics can address these challenges
View details
Mind the GAP: Geometry Aware Passthrough Mitigates Cybersickness
Trishia Chemaly
Mohit Goyal
Sakar Khattar
Bjorn Vlaskamp
Aveek Purohit
Konstantine Tsotsos
2025
Preview abstract
Virtual Reality headsets isolate users from the real-world by restricting their perception to the virtual-world. Video See-Through (VST) headsets address this by utilizing world-facing cameras to create Augmented Reality experiences. However, directly displaying camera feeds can cause visual discomfort and cybersickness due to the inaccurate perception of scale and exaggerated motion parallax. This paper presents initial findings on the potential of geometry aware passthrough systems to mitigate cybersickness through enhanced depth perception. We introduce a promising protocol for quantitatively measuring cybersickness experienced by users in VST headsets. Using this protocol, we conduct a user study to compare direct passthrough and geometry aware passthrough systems. To the best of our knowledge, our study is the first one to reveal reduced nausea, disorientation, and total scores of cybersickness with geometry aware passthrough. It also uncovers several potential avenues to further mitigate visually-induced discomfort.
View details
Preview abstract
We study the existence of almost fair and near-optimal solutions to a routing problem as defined in the seminal work of Rosenthal. We focus on the setting where multiple alternative routes are available for each potential request (which corresponds to a potential user of the network). This model captures a collection of diverse applications such as packet routing in communication networks, routing in road networks with multiple alternative routes, and the economics of transportation of goods.
Our recommended routes have provable guarantees in terms of both the total cost and fairness concepts such as approximate envy-freeness. We employ and appropriately combine tools from algorithmic game theory and fair division. Our results apply on two distinct models: the splittable case where the request is split among the selected paths (e.g., routing a fleet of trucks) and the unsplittable case where the request is assigned to one of its designated paths (e.g., a single user request). Finally, we conduct an empirical analysis to test the performance of our approach against simpler baselines using the real world road network of New York City.
View details
Avoid global outages by partitioning cloud applications to reduce blast radius
Karan Anand
https://cloud.google.com/ (2025)
Preview abstract
Cloud application development faces the inherent challenge of balancing rapid innovation with high availability. This blog post details how Google Workspace's Site Reliability Engineering team addresses this conflict by implementing vertical partitioning of serving stacks. By isolating application servers and storage into distinct partitions, the "blast radius" of code changes and updates is significantly reduced, minimizing the risk of global outages. This approach, which complements canary deployments, enhances service availability, provides flexibility for experimentation, and facilitates data localization. While challenges such as data model complexities and inter-service partition misalignment exist, the benefits of improved reliability and controlled deployments make partitioning a crucial strategy for maintaining robust cloud applications
View details
I know what I don't know: improving model cascades through confidence tuning
Stephan Rabanser
Nathalie Rauschmayr
Petra Poklukar
Congchao Wang
2025
Preview abstract
Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluated our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.
View details
Matryoshka Model Learning for Improved Elastic Student Models
Chetan Verma
Aditya Srinivas Timmaraju
Cho-Jui Hsieh
Ngot Bui
Yang Zhang
Wen Chen
Xin Liu
Inderjit Dhillon
2025
Preview abstract
Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better relate to the Teacher model and also bring in more domain-specific expertise. Furthermore, multiple accurate Student models can be extracted from the TA model. Therefore, despite only one training run, our methodology provides multiple servable options to trade off accuracy for lower serving cost. We demonstrate the proposed method, MatTA, on proprietary datasets and models. Its practical efficacy is underscored by live A/B tests within a production ML system, demonstrating 20% improvement on a key metric. We also demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.
View details
Mitigating Clinician Information Overload: Generative AI for Integrated EHR and RPM Data Analysis
Shashank Kapoor
Aman Raj
2025
Preview abstract
Generative AI (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHR). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload.
In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities.
View details
Preview abstract
Julia's strength in mathematical computation and high performance makes it a popular choice across scientific fields, mostly due to its focus on mathematics in a broad sense and execution performance. It is a language of choice to implement new numerical algorithms, but it really shines in modelling for optimisation thanks to JuMP.jl and MathOptInterface.jl.
These libraries are, first and foremost, made for mathematical optimisation (linear, mixed-integer, conic, etc.), yet they are now generic enough to support more paradigms, such as constraint programming. This talk will introduce the basic principles behind the current implementation of JuMP.jl and explain why and how they are very good matches for modelling using constraint programming… and solving using any kind of mixed-integer-programming solver.
Constraint-programming solvers can also be implemented using linear programming, in a great collaboration between discrete and continuous optimisation. This talk will briefly explain the connection and its implementation in Google’s CP-SAT, a leading, award-winning constraint solver that uses linear programs in its solving process — a solver that will soon be available in Julia too.
View details
SSDTrain: Faster Large Language Model Training Using SSD-Based Activation Offloading
Kun Wu
Jeongmin Brian Park
Mert Hidayetoğlu
Vikram Sharma Mailthody
Sitao Huang
Steven Lumetta
Wen-mei Hwu
Design Automation Conference (DAC) (2025)
Preview abstract
The scaling up of Large Language Models (LLMs) demands more memory than current GPUs can provide, hindering the training process. To address this challenge, we propose SSDTrain to efficiently offload activations, the intermediate tensors produced during LLM training, to SSDs. This approach reduces GPU memory usage without impacting performance by adaptively overlapping data transfers with computation. SSDTrain is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication, forwarding, and adaptive offloading to further enhance efficiency. We conduct extensive experiments on Llama, BERT, and T5. Results demonstrate that SSDTrain effectively reduces 45% of the activation peak memory usage. It can perfectly overlap the IO with the computation without introducing performance penalty. SSDTrain can achieve a performance boost of up to 31% compared to the conventional training strategy using the same GPU systems.
View details
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Alexander Immer
Alex Bo-Yuan Chen
Mariela D. Petkova
Nirmala A. Iyer
Luuk Willem Hesselink
Aparna Dev
Gudrun Ihrke
Woohyun Park
Alyson Petruncio
Aubrey Weigel
Wyatt Korff
Florian Engert
Jeff W. Lichtman
Misha B. Ahrens
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods.
View details
Preview abstract
As part of Google's ongoing efforts to define best practices for secure AI systems, we’re sharing our aspirational framework for secure AI agents. We advocate for a hybrid, defense-in-depth strategy that combines the strengths of traditional, deterministic security controls with dynamic, reasoning-based defenses. This approach is grounded in three core principles: agents must have well-defined human controllers, their powers must be carefully limited, and their actions and planning must be observable. This paper reflects our current thinking and the direction of our efforts as we work towards ensuring that AI agents can be powerful, useful, and secure by default.
View details