SpreadsheetCoder: Formula Prediction from Semi-structured Context

Xinyun Chen

Petros Maniatis

Rishabh Singh

Charles Sutton

Hanjun Dai

Max Lin

Denny Zhou

Proceedings of the 38th International Conference on Machine Learning (ICML) (2021)

Download Google Scholar

Abstract

Spreadsheet formula prediction has been an important
program synthesis problem with many
real-world applications. Previous works typically
utilize input-output examples as the specification
for spreadsheet formula synthesis, where each
input-output pair simulates a separate row in the
spreadsheet. However, this formulation does not
fully capture the rich context in real-world spreadsheets.
First, spreadsheet data entries are organized
as tables, thus rows and columns are not necessarily
independent from each other. In addition,
many spreadsheet tables include headers, which
provide high-level descriptions of the cell data.
However, previous synthesis approaches do not
consider headers as part of the specification. In
this work, we present the first approach for synthesizing
spreadsheet formulas from tabular context,
which includes both headers and semi-structured
tabular data. In particular, we propose SpreadsheetCoder,
a BERT-based model architecture
to represent the tabular context in both row-based
and column-based formats. We train our model on
a large dataset of spreadsheets, and demonstrate
that SpreadsheetCoder achieves top-1 prediction
accuracy of 42:51%, which is a considerable
improvement over baselines that do not employ
rich tabular context. Compared to a rule-based
system, SpreadsheetCoder assists 82% more
users in composing formulas on Google Sheets.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

SpreadsheetCoder: Formula Prediction from Semi-structured Context

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

SpreadsheetCoder: Formula Prediction from Semi-structured Context

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities