David Bieber
Research Areas
Authored Publications
Sort By
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
Rishab Goel
International Conference on Learning Representations (ICLR) (2023)
Preview abstract
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a ``static'' setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. As an alternative, we develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and ``learns to execute'' descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.
View details
Preview abstract
Identifying invariants in programs is an important program analysis task with applications towards program understanding, vulnerability analysis, and formal verification. Existing tools for identifying invariants rely on dynamic analysis, requiring traces collected from multiple executions in order to produce reliable invariants. We study the application of large language models to invariant prediction, finding that models training on source code and fine-tuned to invariant prediction can perform invariant prediction as static rather than dynamic analysis. Using a scratchpad approach gives the best performance, finding invariants statically of quality comparable to those obtained by a dynamic analysis tool with access to five program traces.
View details
Graph Representations of Python Programs via Source-level Static Analysis
Daniel Johnson
Vincent Josua Hellendoorn
Arxiv (2022)
Preview abstract
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite "program graphs" that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case-study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
View details
Preview abstract
The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.
View details
BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena
Rishabh Singh
Hanjun Dai
International Conference on Learning Representations (ICLR) (2021)
Preview abstract
Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analyzing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a given set of input-output examples. This is a powerful combination because of several emergent properties. First, in bottom-up search, intermediate programs can be executed, providing semantic information to the neural network. Second, given the concrete values from those executions, we can exploit rich features based on recent work on property signatures. Finally, bottom-up search allows the system substantial flexibility in what order to generate the solution, allowing the synthesizer to build up a program from multiple smaller sub-programs. Overall, our empirical evaluation finds that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches. We demonstrate the effectiveness of our technique on two datasets, one from the SyGuS competition and one of our own creation.
View details
Learning Semantic Representations to Verify Hardware Designs
Shobha Vasudevan
Rishabh Singh
Hamid Shojaei
Richard Ho
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) (2021)
Preview abstract
We introduce Design2Vec, a representation learning approach to learn semantic abstractions of hardware designs at the Register Transfer Level (RTL). The key idea of our approach is to design a graph convolution based neural architecture that embeds RTL syntax and semantics. We train the architecture on the task of predicting coverage in the design, given some input test stimulus. We then present an approach to use the learnt RTL representation to automatically generate new tests for unseen coverage locations in the design. Our experimental results demonstrate that Design2Vec outperforms several baseline approaches that do not incorporate the RTL semantics and it can be used to generate instantaneous coverage predictions compared to nightly simulation times. Moreover, the tests generated using Design2Vec result in coverage of design points that are difficult to cover for design verification experts using the current manual approaches for test generation.
View details
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Maxwell Nye
Guy Gur-Ari
Henryk Witold Michalewski
David Martin Dohan
Aitor Lewkowycz
Maarten Paul Bosma
David Luan
Augustus Odena
(2021)
Preview abstract
Large pre-trained language models perform remarkably well on tasks that can be done “in one pass”, such as generating realistic text (Brown et al., 2020) or synthesizing computer programs (Chen et al., 2021; Austin et al., 2021). However, they struggle with tasks that require unbounded multi-step computation, such as adding integers (Brown et al., 2020) or executing programs (Austin et al., 2021). Surprisingly, we find that these same models are able to perform complex multistep computations—even in the few-shot regime—when asked to perform the operation “step by step”, showing the results of intermediate computations. In particular, we train Transformers to perform multi-step computations by asking them to emit intermediate computation steps into a “scratchpad”. On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations.
View details
Global Relational Models of Source Code
Vincent Josua Hellendoorn
Rishabh Singh
International Conference on Learning Representations (ICLR) (2020)
Preview abstract
Models of code can learn distributed representations of a program’s syntax and
semantics to predict many non-trivial properties of a program. Recent state-of-the-art models leverage highly structured representations of programs, such as
trees, graphs and paths therein (e.g., data-flow relations), which are precise and
abundantly available for code. This provides a strong inductive bias towards semantically
meaningful relations, yielding more generalizable representations than
classical sequence-based models. Unfortunately, these models primarily rely on
graph-based message passing to represent relations in code, which makes them
de facto local due to the high cost of message-passing steps, quite in contrast to
modern, global sequence-based models, such as the Transformer. In this work,
we bridge this divide between global and structured models by introducing two
new hybrid model families that are both global and incorporate structural bias:
Graph Sandwiches, which wrap traditional (gated) graph message-passing layers
in sequential message-passing layers; and Graph Relational Embedding Attention
Transformers (GREAT for short), which bias traditional Transformers with
relational information from graph edge types. By studying a popular, non-trivial
program repair task, variable-misuse identification, we explore the relative merits
of traditional and hybrid model families for code representation. Starting with a
graph-based model that already improves upon the prior state-of-the-art for this
task by 20%, we show that our proposed hybrid models improve an additional
10–15%, while training both faster and using fewer parameters.
View details
Incremental Sampling Without Replacement for Sequence Models
Proceedings of the 37th International Conference on Machine Learning (2020)
Preview abstract
Sampling is a fundamental technique, and sampling without replacement is often desirable when duplicate samples are not beneficial. Within machine learning, sampling is useful for generating diverse outputs from a trained model. We present an elegant procedure for sampling without replacement from a broad class of randomized programs, including generative neural models that construct outputs sequentially. Our procedure is efficient even for exponentially-large output spaces. Unlike prior work, our approach is incremental, i.e., samples can be drawn one at a time, allowing for increased flexibility. We also present a new estimator for computing expectations from samples drawn without replacement. We show that incremental sampling without replacement is applicable to many domains, e.g., program synthesis and combinatorial optimization.
View details
Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
Thirty-fourth Conference on Neural Information Processing Systems (2020)
Preview abstract
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks including code completion, bug finding, and program repair. They benefit from leveraging program structure like control flow graphs, but they are not well-suited to tasks like program execution that require far more sequential reasoning steps than number of GNN propagation steps. Recurrent neural networks (RNNs), on the other hand, are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure and generally perform worse on the above tasks. Our aim is to achieve the best of both worlds, and we do so by introducing a novel GNN architecture, the Instruction Pointer Attention Graph Neural Network (IPA-GNN), which achieves systematic generalization on the task of learning to execute programs using control flow graphs. The model arises by developing a spectrum of models between RNNs operating on program traces with branch decisions as latent variables and GNNs. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution. To test the models, we propose evaluating systematic generalization on learning to execute using control flow graphs, which tests sequential reasoning and use of program structure. More practically, we evaluate these models on the task of learning to execute partial programs, as might arise if using the model as a value function in program synthesis. Results show that the IPA-GNN outperforms a variety of RNN and GNN baselines on both tasks.
View details