Luheng He

Luheng He

Luheng is a research scientist at Google AI Language. She finished her Ph.D. at the University of Washington, advised by Luke Zettlemoyer. Her research focuses on semantic role labeling (SRL) and explores other NLP structured prediction problems. She built DeepSRL, a BiLSTM-based neural SRL model for PropBank. She also introduced QA-SRL, a question-answer based SRL annotation scheme that allows us to gather SRL data from annotators without linguistic training.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of table contents in certain levels, where row or column orders are encoded as unwanted biases. Such spurious biases make the model vulnerable to row and column order perturbations. Also, prior work did not explicitly and thoroughly model structural biases, hindering the table-text modeling ability. In this work, we propose a robust table-text encoding architecture TableFormer, where tabular structural biases are incorporated completely through learnable attention biases. TableFormer is invariant to row and column orders, and could understand tables better due to its tabular inductive biases. Experiments showed that TableFormer outperforms strong baselines in all settings on SQA, WTQ and TabFact table reasoning datasets, and achieves state-of-the-art performance on SQA, especially when facing answer-invariant row and column perturbations (6% improvement over the best baseline), because previous SOTA models' performance drops by 4% - 6% when facing such perturbations while TableFormer is not affected. View details
    Preview abstract Slot-filling is an essential component for building task-oriented dialog systems. In this work, we focus on the zero-shot slot-filling (ZSSF) problem, where the model needs to predict slots and their values given utterances from new domains with zero training data. Prior methods for ZSSF directly learn representations for slots descriptions and utterances for extracting slot fillers. However, there are ambiguity and loss of information in encoding the raw slot description, which can hurt the models' zero-shot capacity. To address this problem, we introduce QA-driven slot filling (QASF), which extracts slot-filler spans from utterances with a span-based QA model. We use a linguistically motivated questioning strategy for turning the descriptions into questions, allowing the model to generalize to unseen slot types. Furthermore, our QASF model better utilizes weak supervision signals from QA pairs synthetically generated from conversations. View details
    TimeDial: Temporal Commonsense Reasoning in Dialog
    Lianhui Qin
    Aditya Gupta
    Yejin Choi
    Manaal Faruqui
    Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2021)
    Preview abstract Everyday conversations require understanding everyday events, which in turn, requires understanding temporal commonsense concepts interwoven with those events. Despite recent progress with massive pre-trained language models (LMs) such as T5 and GPT-3, their capability of temporal reasoning in dialogs remains largely under-explored. In this paper, we present the first study to investigate pre-trained LMs for their temporal reasoning capabilities in dialogs by introducing a new task and a crowd-sourced English challenge set, TimeDial. We formulate TimeDial as a multiple choice cloze task with over 1.1K carefully curated dialogs. Empirical results demonstrate that even the best performing models struggle on this task compared to humans, with 23 absolute points of gap in accuracy. Furthermore, our analysis reveals that the models fail to reason about dialog context correctly; instead, they rely on shallow cues based on existing temporal patterns in context, motivating future research for modeling temporal concepts in text and robust contextual reasoning about them. The dataset is publicly available at https://github.com/google-research-datasets/timedial. View details
    Preview abstract Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a largescale dataset for widget captioning with crowdsourcing. Our dataset contains 162,859 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces. View details
    Few-shot Slot Filling and Intent Classification with Retrieved Examples
    Dian Yu
    Ice Pasupat
    Qi Li
    Xinya Du
    2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2020)
    Preview abstract Few-shot learning is an important problem in natural language understanding tasks due to scenarios such as inclusion of new domains and labels. In this paper, we explore retrieval-based methods for tackling the few-shot intent classification and slot filling tasks due to their advantage of 1) better adaptation to new domains; and 2) not requiring model retraining with new labels. However, structured prediction beyond intent classification is challenging for retrieval-based methods. In this work, we propose a span-level retrieval method by learning similar contextualized representations for spans with the same label. At inference time, we use the labels of the retrieved spans to construct the final structure. We show that our method outperforms previous systems in the few-shot setting on the CLINC and SNIPS benchmarks. View details
    Preview abstract Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases. Models trained on such data fail to distinguish pairs like flights from New York to Florida and flights from Florida to New York. This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 wellformed paraphrase and non-paraphrase pairs with high lexical overlap. Challenging pairs are generated by controlled word swapping and back translation, followed by fluency and paraphrase judgments by human raters. State-of-the-art models trained on existing datasets have dismal performance on PAWS (<40% accuracy); however, including PAWS training data for these models improves their accuracy to 85% while maintaining performance on existing tasks. In contrast, models that do not capture non-local contextual information fail even with PAWS training examples. As such, PAWS provides an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons. View details
    Preview abstract Reading comprehension models have been successfully applied to extractive text answers, but it is unclear how best to generalize these models to abstractive numerical answers. We enable a BERT-based reading comprehension model to perform lightweight numerical reasoning. We augment the model with a predefined set of executable 'programs' which encompass simple arithmetic as well as extraction. Rather than having to learn to manipulate numbers directly, the model can pick a program and execute it. On the recent Discrete Reasoning Over Passages (DROP) dataset, designed to challenge reading comprehension models, we show a 33% absolute improvement by adding shallow programs. The model can learn to predict new operations when appropriate in a math word problem setting (Roy and Roth, 2015) with very few training examples. View details