Software Engineering

At Google, we pride ourselves on our ability to develop and launch new products and features at a very fast pace. This is made possible in part by our world-class engineers, but our approach to software development enables us to balance speed and quality, and is integral to our success. Our obsession for speed and scale is evident in our developer infrastructure and tools. Developers across the world continually write, build, test and release code in multiple programming languages like C++, Java, Python, Javascript and others, and the Engineering Tools team, for example, is challenged to keep this development ecosystem running smoothly. Our engineers leverage these tools and infrastructure to produce clean code and keep software development running at an ever-increasing scale. In our publications, we share associated technical challenges and lessons learned along the way.

Recent Publications

Towards AI as a Collaborative Partner: A Taxonomy of AI Agent Behavior in Software Engineering
Sherry Y. Shi
Proceedings of the 3rd ACM International Conference on AI-Powered Software (AIware '26), ACM, Montreal, QC, Canada (2026) (to appear)
Preview abstract The ongoing transition of Large Language Models (LLMs) in software engineering from one-shot code generators into agentic partners requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an interactive software engineering (SWE) agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable SWE agent behaviors, synthesized from 91 sets of developer-defined rules for SWE agents and validated through interviewing 15 experienced professional developers. In this taxonomy, we identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the Developer. These findings offer a concrete vocabulary for aligning SWE agent behavior with developer preferences, enabling researchers and practitioners to move beyond correctness-only benchmarks and start designing evaluations that reflect the socio-technical nature of professional software development in enterprises. View details
Taming the Variants Multi-Architecture Continuous Testing at Google
Sushmita Azad
Chandrakanth Chittappa
Ali Esmaeeli
Laura Macaddino
Sam Manfreda
David Margolin
Dharma Naidu
Sabuj Pattanayek
Sachin Sable
Ruslan Sakevych
Dushyant Acharya
Adrian Berding
Kevin Crossan
Wolff Dobson
Abhay Singh
19th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2026, Daejeon, Republic of Korea, IEEE
Preview abstract Enterprises are increasingly adopting multiple general-purpose computer architectures in the data center. This leads to new testing challenges as it creates demand to qualify the software for the additional architectures. Naively double-testing all software for both architectures is costly and unnecessary. Further, reconfiguring CI/CD to take advantage of the new architecture can be non-trivial at scale. This paper introduces CI/CD variants and an optimized testing cycle to solve these twin challenges. We empirically evaluate our solution's impact on human and machine expenses using 44k projects at Google on real production data. First, we estimate saving ~25% of machine expenses at the negligible cost of a few delayed breakage detections per day. Second, we estimate a 90+% reduction in human cost for migrating the configuration. All features described in this paper are now Generally Available at Google and we report this as an empirical case study in scaling CI/CD to new architectures. View details
From Correctness to Collaboration: A Human-Centered Taxonomy of AI Agent Behavior in Software Engineering
Sherry Y. Shi
Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26), ACM, New York, NY, USA (2026)
Preview abstract The ongoing transition of Large Language Models in software engineering from code generators into autonomous agents requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable agent behaviors, synthesized from 91 sets of user-defined rules for coding agents. We identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the User. These findings offer a concrete vocabulary for agent behavior, enabling researchers to move beyond correctness-only benchmarks and design evaluations that reflect the realities of professional software development in large enterprises. View details
Preview abstract Measuring software development can help drive impactful change. However, it’s a complex task, and getting started can be daunting as it involves understanding what you should measure, and determining what you can measure. This article provides a guide to selecting a framework that aligns with organizational measurement strategy. View details
Preview abstract Generative AI is transforming how software is built, offering unprecedented opportunities and raising new challenges. Based on extensive research and developer interviews, this DORA report provides a nuanced understanding of AI's impact on individuals, teams, and organizations. View details
Preview abstract Artificial Intelligence (AI) is rapidly expanding and integrating more into daily life to automate tasks, guide decision-making and enhance efficiency. However, complex AI models, which make decisions without providing clear explanations (known as the "black-box problem"), currently restrict trust and widespread adoption of AI. Explainable Artificial intelligence (XAI) has emerged to address the black-box problem of making AI systems more interpretable and transparent so stakeholders can trust, verify, and act upon AI-based outcomes. Researcher have come up with various techniques to foster XAI in Software Development Lifecycle. However, there are gaps in the application of XAI in Software Engineering phases. Literature shows that 68% of XAI in Software Engineering research focused on maintenance as opposed to 8% on software management and requirements [7]. In this paper we present a comprehensive survey of the applications of XAI methods (e.g., concept-based explanations, LIME/SHAP, rule extraction, attention mechanisms, counterfactual explanations, example-based explanations) to the different phases of Software Development Lifecycles (SDLC) mainly requirements elicitation, design and development, testing and deployment, and evolution. To the best of our knowledge, this paper presents the first comprehensive survey of XAI techniques for every phase of the Software Development Life Cycle (SDLC). In doing so, we aim to promote explainable AI in Software Engineering and facilitate the use of complex AI models in AI-driven software development. View details
×