Lucas Dixon

Lucas Dixon

Lucas is a principal research scientist in Google DeepMind and co-lead of PAIR (People and AI Research). He works on visualisation, interpretability and control of machine learning systems, and specifically language models. His work explores how people can productively and fairly benefit from machine learning systems.

Previously, he was Chief Scientist at Jigsaw where he founded engineering and research. He has contributed scientific advances and systems in multiple disciplines including digital security, formal logic, machine learning, and data visualization. For example he co-founded uProxy & Outline, Project Shield, DigitalAttackMap; Syria Defection Tracker, unfiltered.news, Conversation AI and Perspective API.

Before Google, Lucas completed his PhD and worked at the University of Edinburgh on the automation of mathematical reasoning and graphical languages applied to quantum information. He also helped run a non-profit working towards more rational and informed discussion and decision making, and was a co-founder of TheoryMine - a playful take on automating mathematical discovery. Outside of scientific advances, Lucas is also a martial arts instructor in Paris.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Parameter Efficient Reinforcement Learning from Human Feedback
    Hakim Sidahmed
    Alex Hutcheson
    Zhuonan Lin
    Zhang Chen
    Zac Yu
    Jarvis Jin
    Simral Chaudhary
    Roman Komarytsia
    Christiane Ahlheim
    Yonghao Zhu
    Bowen Li
    Jessica Hoffmann
    Hassan Mansoor
    Wei Li
    Abhinav Rastogi
    2024
    Preview abstract While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language Models (LLMs) with human preferences, its computational cost and complexity hinder wider adoption. This work introduces Parameter-Efficient Reinforcement Learning (PERL): by leveraging Low-Rank Adaptation (LoRA) \citep{hu2021lora} for reward model training and reinforcement learning, we are able to perform RL loops while updating only a fraction of the parameters required by traditional RLHF. We demonstrate that the effectiveness of this method is not confined to a specific task. We compare PERL to conventional fine-tuning (full-tuning) across X highly diverse tasks, spanning from summarization to X and X, for a total of X different benchmarks - including two novel preference datasets released with this paper. Our findings show that PERL achieves comparable performance to RLHF while significantly reducing training time (up to 2x faster for reward models and 15\% faster for RL loops), and memory footprint (up to 50\% reduction for reward models and 25\% for RL loops). Finally, we provide a single set of parameters that achieves results on par with RLHF on every task, which shows the accessibility of the method. By mitigating the computational cost and the burden of hyperparameter search, PERL facilitates broader adoption of RLHF as an LLM alignment technique. View details
    "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
    Michael Xieyang Liu
    Frederick Liu
    Alex Fiannaca
    Terry Koo
    In Extended Abstract in ACM CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024), pp. 9 (to appear)
    Preview abstract Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool. View details
    LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
    Minsuk Kahng
    Michael Xieyang Liu
    Krystal Kallarackal
    Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024)
    Preview abstract Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive workflows for users to understand when and why a model performs better or worse than a baseline model, and how the responses from two models are qualitatively different. We iteratively designed and developed the tool by closely working with researchers and engineers at Google. This paper details the user challenges we identified, the design and development of the tool, and an observational study with participants who regularly evaluate their models. View details
    Preview abstract Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations. View details
    On Natural Language User Profiles for Transparent and Scrutable Recommendation
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22) (2022)
    Preview abstract Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests. View details
    Context Sensitivity Estimation in Toxicity Detection
    Alexandros Xenos
    Ioannis Pavlopoulos
    Ion Androutsopoulos
    First Monday (2022)
    Preview abstract Context-sensitive posts are rare in toxicity de-tection datasets. This fact leads to modelsthat disregard even the conversational context(e.g., the parent post) when they predict toxic-ity. This work introduces the task of context-sensitivity estimation in toxicity detection andpresents. We present and publicly release thefirst dataset that can be used to build context-sensitivity estimation systems.We furthershow that systems trained on our dataset canbe effectively used to detect posts that dependto the parent post, regarding toxicity detection. View details
    Sparsely Activated Language Models are Efficient In-Context Learners
    Andrew Dai
    Barret Richard Zoph
    Dmitry (Dima) Lepikhin
    Emma Wang
    Kathy Meier-Hellstern
    Kun Zhang
    Liam B. Fedus
    Maarten Paul Bosma
    Marie Pellat
    Maxim Krikun
    Nan Du
    Simon Tong
    Tao Wang
    Toju Duke
    Yanping Huang
    Yonghui Wu
    Yuanzhong Xu
    Zhifeng Chen
    Zongwei Zhou
    (2022)
    Preview abstract Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong performance on few-shot learning. However, training these large dense models require significant amounts of computing resources. In this paper, we develop a family of sparsely activated mixture-of-expert language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which can have many more parameters but require significant less training cost than dense models. The largest \glam has 1.2 trillion parameters, which is approximately 7x larger than GPT-3 but can be trained more efficiently. With only 1/3 of energy consumption to train GPT-3, \glam achieves better overall performance on 29 zero-shot and one-shot NLP tasks. For example, \glam gets 75.0\% one-shot exact match accuracy on the TriviaQA test server, a significant improvement over 68.0\% obtained by GPT-3. View details
    Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
    Shayegan Omidshafiei
    Yannick Assogba
    Advances in Neural Information Processing Systems (NeurIPS) (2022) (to appear)
    Preview abstract Each year, expert-level performance is attained in increasingly-complex multiagent domains, notable examples including Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain. View details
    Preview abstract Toxicity detection is of growing importance in social and other media to allow healthy discussions. Most previous work ignores the context of user posts, which can mislead systems and moderators to incorrectly classify toxic posts as non-toxic, or vice versa. Recent work concluded that datasets containing many more context-aware posts are needed to correctly train and evaluate context-aware toxicity classifiers. We re-annotated an existing toxicity dataset, adding context-aware ground truth to the existing context-unaware ground truth. Exploiting both types of ground truth, context aware and unaware, we develop and evaluate a classifier that can determine if a post is context-sensitive or not. The classifier can be used to collect more context-sensitive posts. It can also be used to determine when a moderator needs to consider the parent post (to decrease the moderation cost) or when a context-aware toxicity detection system has to be evoked, as opposed to using a simpler context-unaware system. We also discuss how the context-sensitivity classifier can help avoid a possibly malicious exploitation of the context-unawareness of current toxicity detectors. Datasets and code of models addressing this novel task will become publicly available. View details
    Preview abstract Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models. View details
    ×