Data Commons

R.V. Guha
Prashanth Radhakrishnan
Bo Xu
Carolyn Au
Wei Sun
Jehangir Amjad
Ajai Tirumali
Jennifer Chen
Julia Wu
Natalie Diaz
Samantha Piekos
Prem Ramaswami
James Manyika
(2023)

Abstract

Publicly available data from open sources (E.g., Census [1], BLS [2], WHO [3],
IPCC [4]) are vital resources for policy makers, students and researchers across different
disciplines. Combining data from different sources requires the user to reconcile the
differences in schemas, formats, assumptions, and more. This data wrangling is time
consuming, tedious and needs to be repeated by every user of the data. Our goal with
Data Commons is to address this problem by doing this once and making the processed
data widely available via standard schemas and Cloud APIs. Data Commons is a
distributed network of sites that publish data in a common schema and interoperate
using the Data Commons APIs. Data from different Data Commons can be ‘joined’
easily. The aggregate of these Data Commons can be viewed as a single Knowledge
Graph. This paper describes the architecture of Data Commons, some of the major
deployments and highlights directions for future work.

Research Areas