Craig Boutilier

Craig Boutilier

Craig Boutilier is Principal Scientist at Google. He works on various aspects of decision making under uncertainty, with a current focus on sequential decision models: reinforcement learning, Markov decision processes, temporal models, etc.

Positions and Appointments:
He was a Professor in the Department of Computer Science at the University of Toronto (on leave) and Canada Research Chair in Adaptive Decision Making for Intelligent Systems. He received his Ph.D. in Computer Science from the University of Toronto in 1992, and worked as an Assistant and Associate Professor at the University of British Columbia from 1991 until his return to Toronto in 1999. He served as Chair of the Department of Computer Science at Toronto from 2004-2010. He was co-founder (with Tyler Lu) of Granata Decision Systems from 2012-2015, until his move to Google in 2015.

Boutilier was a consulting professor at Stanford University from 1998-2000, an adjunct professor at the University of British Columbia from 1999-2010, and a visiting professor at Brown University in 1998, at the University of Toronto in 1997-98, at Carnegie Mellon University in 2008-09, and at Université Paris-Dauphine (Paris IX) in the spring of 2011. He served on the Technical Advisory Board of CombineNet, Inc. from 2001 to 2010.

Research:
Boutilier's current research efforts focus on various aspects of decision making under uncertainty, including the use of generative models and LLMs, in areas such as: recommender systems, preference modeling and elicitation, mechanism design, game theory and multiagent decision processes, economic models, social choice, computational advertising, Markov decision processes, reinforcement learning and probabilistic inference. His research interests have spanned a wide range of topics, from knowledge representation, belief revision, default reasoning, and philosophical logic, to probabilistic reasoning, decision making under uncertainty, multiagent systems, and machine learning.

Research & Academic Service:
Boutilier is a past Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). He was a past Associate Editor with the ACM Transactions on Economics and Computation (TEAC), the Journal of Artificial Intelligence Research (JAIR), the Journal of Machine Learning Research (JMLR), and Autonomous Agents and Multiagent Systems (AAMAS); and he has sat on the editorial/advisory boards of several other journals. Boutilier has organized several international conferences and workshops, including his work as Program Chair of the Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09) and Program Chair of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000). He has also served on the conference program committees of roughly 75 leading international conferences.

He will serve as Conference Chair of the Thirty-seventh International Joint Conference on Artificial Intelligence (IJCAI-28).

Awards and Honors:
Boutilier is a Fellow of the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM) and the Association for the Advancement of Artificial Intelligence (AAAI). He was the recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award, He was awarded a Tier I Canada Research Chair, an Isaac Walton Killam Research Fellowship, and an IBM Faculty Award. He received the Killam Teaching Award from the University of British Columbia in 1997. He has also received a number of Best Paper awards including:

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract LLM-based user simulators are a scalable solution for improving conversational AI, but a critical realism gap undermines their effectiveness. To close this gap, we introduce a framework for building and validating high-fidelity simulators. We present a novel dataset of human-AI shopping conversations designed to capture a wide spectrum of user experiences. To measure fidelity, we propose a hybrid evaluation protocol that combines statistical alignment with a learned, discriminator-based Human-Likeness Score. Our most sophisticated simulator, trained via reinforcement learning with iterative critique, achieves a significant leap in realism. Critically, we demonstrate through counterfactual validation that our simulator—trained exclusively on optimal interactions—realistically adapts its behavior to suboptimal system responses, mirroring real user reactions and marking a key advance in creating reliable simulators for robust AI development. View details
    ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders
    Jihwan Jeong
    Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL-26), Rabat, Morocco (2026), pp. 5270-5304
    Preview abstract LLM-based user simulators are a scalable solution for improving conversational AI, but a critical realism gap undermines their effectiveness. To close this gap, we introduce a framework for building and validating high-fidelity simulators. We present a novel dataset of human-AI shopping conversations designed to capture a wide spectrum of user experiences. To measure fidelity, we propose a hybrid evaluation protocol that combines statistical alignment with a learned, discriminator-based Human-Likeness Score. Our most sophisticated simulator, trained via reinforcement learning with iterative critique, achieves a significant leap in realism. Critically, we demonstrate through counterfactual validation that our simulator—trained exclusively on optimal interactions—realistically adapts its behavior to suboptimal system responses, mirroring real user reactions and marking a key advance in creating reliable simulators for robust AI development. View details
    Synthetic Dialogue Generation for Interactive Conversational Elicitation & Recommendation (ICER)
    Moonkyung Ryu
    Mohammad Ghavamzadeh
    GENNEXT@SIGIR’25: The 1st Workshop on Next Generation of IR and Recommender Systems with Language Agents, Generative Models, and Conversational AI (2025)
    Preview abstract Large language models (LLM), despite their success in conducting natural conversations and solving various challenging NLP tasks, may not aim to understand and solicit a user’s preferences and suggest recommendations using the learned user preferences through the interactions. One primary challenge for pursuing this task is the lack of rich conversational recommendation data sets (of user/agent dialog conversations) that contain diverse preference elicitation scenarios on top of item recommendations. While several standard conversational recommender datasets do contain dialogs that the agents query users for preferences, they are often restricted to the use of particular key-phrases that limit users’ preference expressions w.r.t. particular item features (e.g., location, price, rating, etc). On the other hand, hiring human raters to create a personalized conversational recommender dataset that consists of rich preference queries (with the use of various abstract preference-describing attributes) and recommendations is a very expensive, error-prone, and time-consuming process because it requires data collection on numerous personalized recommendation scenarios, where each of them may involve exponential possibilities of preference elicitation interactions (with many different queries). We propose a synthetic data generation methodology that utilizes a synthetic recommendation dialog simulator that generates templatized dialogs and uses Gemini Ultra LLM to in-paint the templatized dialogs so that the dialogs become linguistically more natural. The simulator generates recommendation dialogs that align to the user preferences that are represented by embeddings. View details
    Preview abstract Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge, as even advanced LLMs still struggle with this task. To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models: starting from a user profile, in a forward process we generate clarifying questions, obtain answers, and then remove the corresponding information from the user profile, which is analogous to adding noise to the user profile. In the reverse process, zour model learns to “denoise” the user profile by learning to ask effective clarifying questions. Our results show that our method significantly boosts the LLM’s proficiency in asking funnel questions and elicit user preferences effectively. View details
    Asking Clarifying Questions for Preference Elicitation with Large Language Models
    Ali Montazer
    1st Workshop on Next Generation of IR and Recommender Systems with Language Agents, Generative Models, and Conversational AI (GENNEXT@SIGIR'25), Padua, IT (2025)
    Preview abstract Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge, as even advanced LLMs still struggle with this task. To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models: starting from a user profile, in a forward process we generate clarifying questions, obtain answers, and then remove the corresponding information from the user profile, which is analogous to adding noise to the user profile. In the reverse process, zour model learns to “denoise” the user profile by learning to ask effective clarifying questions. Our results show that our method significantly boosts the LLM’s proficiency in asking funnel questions and elicit user preferences effectively. View details
    Preference Adaptive and Sequential Text-to-Image Generation
    Ofir Nabati
    Moonkyung Ryu
    Sean Li
    42nd International Conference on Machine Learning (ICML-25), Vancouver (2025), pp. 45362-45394
    Preview abstract We consider the problem of sequential text-to-image generation. Specifically, we formulate a personalized interactive framework, where an agent iteratively improves a user's prompt through a series of sequential prompt expansions. We formulate the problem as a sequential decision-making task. Using human raters, we create a dataset of sequential preferences for this problem. We then leverage our sequential data, together with large-scale open-source non-sequential datasets to construct user-preference and user-choice models. Particularly, we employ an EM strategy to develop a personalized sequential user model. We then leverage a multi-modal large language model (MM-LLM) and a value-based reinforcement learning (RL) agent to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) empowers diffusion models with personalized multi-turn capabilities, fostering collaborative co-creation, and addressing uncertainties or under-specifications in user intent. We evaluate our agent using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and additional simulated data of user-agent interactions to advance future research in personalized multi-turn text-to-image generation. View details
    Preview abstract We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems. View details
    Preview abstract We address the problem of interactive text-toimage (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with largescale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varyinguser preference types. We then leverage a large multimodal language model (LMM) and a valuebased RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-toimage Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems. View details
    Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies
    Martin Mladenov
    James Pine
    Hubert Pham
    Shane Li
    Xujian Liang
    Anton Polishko
    Li Yang
    Ben Scheetz
    Proceedings of he 47th International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR-24), Washington, DC (2024), pp. 2925-2929
    Preview abstract Evaluation of policies in recommender systems (RSs) typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for onboarding new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of preference elicitation algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live, sometimes more reliably than live experiments due to the scale at which simulation can be realized. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments. View details
    Preview abstract Personalized recommendation systems are increasingly essential in our information-rich society, aiding users in navigating the expansive online realm. However, accurately modeling the diverse and dynamic interests of the users remains a formidable challenge. Existing user modeling methods, like Single-point User Representation (SUR) and Multi-point User Representation (MUR), have their limitations in terms of accuracy, diversity, computation cost, and adaptability. To overcome these challenges, we introduce a novel model, the Density-based User Representation (DUR), leveraging Gaussian Process Regression (GPR), which has not been extensively explored in multi-interest recommendation and retrieval. Our approach inherently captures user interest dynamics without manual tuning, provides uncertainty-awareness, and is more efficient than point-based representation methods. This paper outlines the development and implementation of GPR4DUR, details its evaluation protocols, and presents extensive analysis demonstrating its effectiveness and efficiency. Experiments on real-world offline datasets confirm our method’s adaptability and efficiency. Further online experiments simulating user behavior illuminate the benefits of our method in the exploration-exploitation trade-off by effectively utilizing model uncertainty. View details
    ×