Carrie Jun Cai

Carrie Jun Cai

My research aims to make artificial intelligence systems usable to human beings, so that human-AI interactions are more productive, enjoyable, and fair. I believe AI systems should be designed to augment human agency, and thus approach this process by considering the capabilities and limits of human intelligence. Before joining Google, I did my PhD research in the User Interface Design group at MIT, where I built "wait-learning" tools to help people practice desired skills in short chunks while waiting, thereby making use of fleeting moments in the day.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., “let me know if my toddler is getting into mischief in the living room”), with the MLLM analyzing the camera feed and responding within just seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements to the model and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users “stress test” sensors on potentially unforeseen scenarios. In a 12-participant user study, users reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users’ own “blind spots” by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we describe insights into how unique characteristics of MLLMs–such as hallucinations and inconsistent responses–can impact the sensor-creation process. Together, these findings contribute to the design of future MLLM-powered sensing systems that are intuitive and customizable by everyday users. View details
    Preview abstract AI tools are ubiquitous and designers are increasingly mocking up AI-based applications. While large language model (LLM) prompting has democratized AI prototyping, designers are still prototyping AI functionality separately from their UI design. In this paper, we investigate how tightly coupling both prompt and UI design affects designers' workflows and thought processes. To ground our research, we developed PromptInfuser, a Figma widget that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of LLM prompts. In a user study with 14 professional designers we compare PromptInfuser to their current disconnected workflow. Participants felt that PromptInfuser enabled prototypes that better communicated a product idea, hued more closely to the envisioned artifact, and better anticipated UI issues and potential technical constraints. PromptInfuser also transformed designers' workflow into a back and forth, tandem improvement of both prompt and UI, which in turn, helped them to (1) identify layout and prompt incompatibilities and (2) reflect upon and improve their total solution. Taken together, these findings inform future systems for prototyping ML-based applications. View details
    Preview abstract Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the agent and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the agent’s response; each mode of feedback automatically generates a principle that is inserted into the chatbot’s prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the agent, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs. View details
    "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
    Michael Xieyang Liu
    Frederick Liu
    Alex Fiannaca
    Terry Koo
    In Extended Abstract in ACM CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024), pp. 9 (to appear)
    Preview abstract Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool. View details
    In situ AI prototyping: Infusing multimodal prompts into mobile settings with MobileMaker
    Michael Xieyang Liu
    Alex Fiannaca
    Vivian Tsai
    2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2024)
    Preview abstract Recent advances in multimodal large language models (LLMs) have made it easier to rapidly prototype AI-powered features, especially for mobile use cases. However, gathering early, mobile-situated user feedback on these AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-case-specific prototype, there is a crucial need to understand the wide range of in-the-wild input users are likely to provide and their in-context expectations for the AI’s behavior. To explore the concept of in situ AI prototyping and testing, we created MobileMaker: a platform that enables designers to rapidly create and test mobile AI prototypes directly on devices. This tool also enables testers to make on-device, in-the-field revisions of prototypes using natural language. In an exploratory study with 16 participants, we explored how user feedback on prototypes created with MobileMaker compares to that of existing prototyping tools (e.g., Figma, prompt editors). Our findings suggest that MobileMaker prototypes enabled more serendipitous discovery of: model input edge cases, discrepancies between AI’s and user’s in-context interpretation of the task, and contextual signals missed by the AI. Furthermore, we learned that while the ability to make in-the-wild revisions led users to feel more fulfilled as active participants in the design process, it might also constrain their feedback to the subset of changes perceived as more actionable or implementable by the prototyping tool. View details
    The Prompt Artists
    Stefania Druga
    Alex Fiannaca
    Pedro Vergani
    Chinmay Kulkarni
    Creativity and Cognition 2023 (2023)
    Preview abstract In this paper, we present the results of a study examining the art practices, artwork, and motivations of prolific users of the latest generation of text-to-image models. Through interviews, observations, and a survey, we present a sampling of the artistic styles, and describe the developed community of practice. We find that: 1) the text prompt and resulting image collectively can be considered the art piece (prompts as art), and 2) prompt templates (prompts with “slots” for others to fill in with their own words) are developed to create generative art pieces. We also find that this community’s premium on unique outputs leads to artists seeking specialized vocabulary to produce distinctive art pieces (e.g., by going to architectural blogs), while others look for “glitches” in the model that can turn into artistic styles in their own right. From these findings, we outline specific implications for design. View details
    Generative Agents: Interactive Simulacra of Human Behavior
    Joon Sung Park
    Joseph C. O'Brien
    Percy Liang
    Michael Bernstein
    Proceedings of UIST 2023, ACM (2023)
    Preview abstract Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior. View details
    PromptInfuser: Bringing User Interface Mock-ups to Life with Large Language Model Prompts
    Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery (to appear)
    Preview abstract Large Language Models have enabled novices without machine learning (ML) experience to quickly prototype ML functionalities with prompt programming. This paper investigates incorporating prompt-based prototyping into designing functional user interface (UI) mock-ups. To understand how infusing LLM prompts into UI mock-ups might affect the prototyping process, we conduct a exploratory study with five designers, and find that this capability might significantly speed up creating functional prototypes, inform designers earlier on how their designs will integrate ML, and enable user studies with functional prototypes earlier. From these findings, we built PromptInfuser, a Figma plugin for authoring LLM-infused mock-ups. PromptInfuser introduces two novel LLM-interactions: input-output, which makes content interactive and dynamic, and frame-change, which directs users to different frames depending on their natural language input. From initial observations, we find that PromptInfuser has the potential to transform the design process by tightly integrating UI and AI prototyping in a single interface. View details
    Programming with a Programming Language: Challenges and Opportunities for Designing Developer Tools for Prompt Programming
    Alex Fiannaca
    Chinmay Kulkarni
    Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA ’23), ACM, Hamburg, Germany (2023) (to appear)
    Preview abstract Existing tools for prompt programming provide little support to prompt programmers. Consequently, as prompts become more complex, they can be hard to read, understand, and edit. In this work, we draw on modern integrated development environments for traditional programming to improve the editor experience of prompt programming. We describe methods for understanding the semantically meaningful structure of natural language prompts in the absence of a rigid formal grammars, and demonstrate a range of editor features that can leverage this information to assist prompt programmers. Finally, we relate initial feedback from design probe explorations with a set of domain experts and provide insights to help guide the development of future prompt editors. View details
    Preview abstract In this paper, we present a natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts. To understand the user experience of natural language code synthesis with these types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s "syntax,'' despite their input being natural language. Participants also faced challenges in debugging model input, and demonstrated a wide range of variability in the scope and specificity of their requests. From these findings, we discuss design implications for future natural language code synthesis tools built using generating language models. View details
    ×