Azade Nova
I am a Research Scientist at Google Brain. Before joining Google, I was a postdoctoral researcher at Microsoft research in Data Management, Exploration and Mining (DMX) group. My research interests are in the broad areas of social network analysis, graph mining, machine learning, data mining, and database. I completed my PhD in Department of Computer Science and Engineering at the University of Texas, Arlington under the supervision of Dr. Gautam Das in Database Exploration Lab (DBXLAB). My PhD research focused on data exploration and analysis over online community networks such as GooglePlus, Twitter, and Amazon and I solved novel problems that have a practical impact and the solutions often involve the design of new techniques or adapting techniques from various fields such as graph theory, algorithms, statistics, etc. Google AI Residency has given me the opportunity to collaborate with brilliant researcher on challenging machine learning problems. Many important real-world datasets are in the form of graphs or networks: social networks, knowledge graphs, protein-interaction networks, the World Wide Web, etc. (just to name a few). Most of my current research devoted to the generalization of neural network models to such real-world datasets, where the goal is to exploit the graph structure of such datasets in the training process.
Authored Publications
Sort By
UQE: A Query Engine for Unstructured Databases
Hanjun Dai
Bethany Wang
Sherry Yang
Phitchaya Mangpo Phothilimthana
Advances in Neural Information Processing Systems (NeurIPS) (2024)
Preview abstract
Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable unstructured data analytics. In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections. This engine accepts queries in a Universal Query Language (UQL), a dialect of SQL that provides full natural language flexibility in specifying conditions and operators. The new engine leverages the ability of LLMs to conduct analysis of unstructured data, while also allowing us to exploit advances in sampling and optimization techniques to achieve efficient and accurate query execution. In addition, we borrow techniques from classical compiler theory to better orchestrate the workflow between sampling methods and foundation model calls. We demonstrate the efficiency of UQE on data analytics across different modalities, including images, dialogs and reviews, across a range of useful query types, including conditional aggregation, semantic retrieval and abstraction aggregation.
View details
Scalable Deep Generative Modeling for Sparse Graphs
Hanjun Dai
Yujia Li
International Conference on Machine Learning (2020)
Preview abstract
Learning graph generative models is a challenging task for deep learning and has wide applicability to a range of domains like chemistry, biology and social science. However current deep neural methods suffer from limited scalability: for a graph with n nodes and m edges, existing deep neural methods require Ω(n2) complexity by building up the adjacency matrix. On the other hand, many real world graphs are actually sparse in the sense that m≪n2. Based on this, we develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix, and importantly reduces the graph generation time complexity to O((n+m)log n). Furthermore, during training this autoregressive model can be parallelized with O(log n) synchronization stages, which makes it much more efficient than other autoregressive models that require Ω(n). Experiments on several benchmarks show that the proposed approach not only scales to orders of magnitude larger graphs than previously possible with deep autoregressive graph generative models, but also yields better graph generation quality.
View details
Preview abstract
Graph partitioning is the problem of dividing the nodes of a graph into balanced partitions while minimizing the edge cut across the partitions. Due to its combinatorial
nature, many approximate solutions have been developed. We propose GAP, a Generalizable Approximate Partitioning framework that takes a deep learning approach
to graph partitioning. We define a differentiable loss function that represents the
partitioning objective. Unlike baselines that redo the optimization per graph, GAP
is capable of generalization, allowing us to train models that produce performant
partitions at inference time, even on unseen graphs. Furthermore, because we learn
the representation of the graph while jointly optimizing for the partitioning loss
function, GAP can be easily tuned for a variety of graph structures. We evaluate the
performance of GAP on graphs of varying sizes and structures, including graphs
of widely used machine learning models (e.g., ResNet, VGG, and Inception-V3),
scale-free graphs, and random graphs.
View details