Data Management

Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets, Docs, and many others). The overarching goal is to create a plethora of structured data on the Web that maximally help Google users consume, interact and explore information. Through those projects, we study various cutting-edge data management research issues including information extraction and integration, large scale data analysis, effective data exploration, etc., using a variety of techniques, such as information retrieval, data mining and machine learning.

A major research effort involves the management of structured data within the enterprise. The goal is to discover, index, monitor, and organize this type of data in order to make it easier to access high-quality datasets. This type of data carries different, and often richer, semantics than structured data on the Web, which in turn raises new opportunities and technical challenges in their management.

Furthermore, Data Management research across Google allows us to build technologies that power Google's largest businesses through scalable, reliable, fast, and general-purpose infrastructure for large-scale data processing as a service. Some examples of such technologies include F1, the database serving our ads infrastructure; Mesa, a petabyte-scale analytic data warehousing system; and Dremel, for petabyte-scale data processing with interactive response times. Dremel is available for external customers to use as part of Google Cloud’s BigQuery.

Recent Publications

Vortex: A Stream-oriented Storage Engine For Big Data Analytics

Pavan Edara

Jonathan Forbes

Bigang Li

SIGMOD (2024)

BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse

Justin Levandoski

Garrett Casto

Mingge Deng

Rushabh Desai

Pavan Edara

Thibaud Hottelier

Amir Hormati

Anoop Johnson

Jeff Johnson

Dawid Kurzyniec

Sam McVeety

Prem Ramanathan

Gaurav Saxena

Vidya Shanmugam

Yuri Volobuev

SIGMOD (2024)

Chain-of-Table: Evolves Tables in the LLM Reasoning Chain for Table Understanding

Zilong Wang

Hao Zhang

Chun-Liang Li

Julian Eisenschlos

Vincent Perot

Zifeng Wang

Lesly Miculicich

Yasuhisa Fujii

Jingbo Shang

Chen-Yu Lee

Tomas Pfister

ICLR (2024)

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

Emily Reif

Crystal Qian

James Wexler

Minsuk Kahng

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM, Honolulu, HI, USA (2024), pp. 9

In-path Oracles for Road Networks

Debajyoti Ghosh

Jagan Sankaranarayanan

Kiran Khatter

Hanan Samet

International Journal of Geo-Information, 12(7) (2023), pp. 277

Firestore: The NoSQL Serverless Database for the Application Developer

Ram Kesavan

David Gay

Daniel Thevessen

Jimit Shah

C. Mohan

2023 IEEE 39th International Conference on Data Engineering (ICDE), pp. 3367-3379

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Management

Recent Publications

Some of our teams

Join us