Savvas Petridis
Hi, I'm a research scientist in PAIR (People + AI Research) -- researching how we can best make use of large, generative models.
Please check out my personal website: https://savvaspetridis.github.io/
Research Areas
Authored Publications
Sort By
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Michael Xieyang Liu
Vivian Tsai
Alex Fiannaca
Alex Olwal
ACM IUI 2025 (2025)
Preview abstract
Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., “let me know if my toddler is getting into mischief in the living room”), with the MLLM analyzing the camera feed and responding within just seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements to the model and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users “stress test” sensors on potentially unforeseen scenarios. In a 12-participant user study, users reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users’ own “blind spots” by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we describe insights into how unique characteristics of MLLMs–such as hallucinations and inconsistent responses–can impact the sensor-creation process. Together, these findings contribute to the design of future MLLM-powered sensing systems that are intuitive and customizable by everyday users.
View details
Preview abstract
AI tools are ubiquitous and designers are increasingly mocking up AI-based applications.
While large language model (LLM) prompting has democratized AI prototyping, designers are still prototyping AI functionality separately from their UI design. In this paper, we investigate how tightly coupling both prompt and UI design affects designers' workflows and thought processes. To ground our research, we developed PromptInfuser, a Figma widget that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of LLM prompts. In a user study with 14 professional designers we compare PromptInfuser to their current disconnected workflow. Participants felt that PromptInfuser enabled prototypes that better communicated a product idea, hued more closely to the envisioned artifact, and better anticipated UI issues and potential technical constraints. PromptInfuser also transformed designers' workflow into a back and forth, tandem improvement of both prompt and UI, which in turn, helped them to (1) identify layout and prompt incompatibilities and (2) reflect upon and improve their total solution. Taken together, these findings inform future systems for prototyping ML-based applications.
View details
ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles
Aaron Donsbach
ACM IUI 2024 (2024)
Preview abstract
Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the agent and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the agent’s response; each mode of feedback automatically generates a principle that is inserted into the chatbot’s prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the agent, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.
View details
Preview abstract
Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still a difficult and tedious process. In this work, we introduce ConstitutionalExperts, a method for learning a prompt consisting of constitutional principles (i.e. rules), given a training dataset. Unlike prior methods that optimize the prompt as a single entity, our method incrementally improves the prompt by surgically editing individual principles. We also show that we can improve overall performance by learning unique prompts for different semantic regions of the training data and using a mixture-of-experts (MoE) architecture to route inputs at inference time. We compare ConstitutionalExperts to other state of the art prompt-optimization techniques across six benchmark datasets. We also investigate whether MoE improves these other techniques. Our results suggest that ConstitutionalExperts outperforms other prompt optimization techniques and that mixture-of-experts improves all techniques on average by 4.7%, suggesting broader applicability.
View details
In situ AI prototyping: Infusing multimodal prompts into mobile settings with MobileMaker
Michael Xieyang Liu
Alex Fiannaca
Vivian Tsai
2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2024)
Preview abstract
Recent advances in multimodal large language models (LLMs) have made it easier to rapidly prototype AI-powered features, especially for mobile use cases. However, gathering early, mobile-situated user feedback on these AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-case-specific prototype, there is a crucial need to understand the wide range of in-the-wild input users are likely to provide and their in-context expectations for the AI’s behavior. To explore the concept of in situ AI prototyping and testing, we created MobileMaker: a platform that enables designers to rapidly create and test mobile AI prototypes directly on devices. This tool also enables testers to make on-device, in-the-field revisions of prototypes using natural language. In an exploratory study with 16 participants, we explored how user feedback on prototypes created with MobileMaker compares to that of existing prototyping tools (e.g., Figma, prompt editors). Our findings suggest that MobileMaker prototypes enabled more serendipitous discovery of: model input edge cases, discrepancies between AI’s and user’s in-context interpretation of the task, and contextual signals missed by the AI. Furthermore, we learned that while the ability to make in-the-wild revisions led users to feel more fulfilled as active participants in the design process, it might also constrain their feedback to the subset of changes perceived as more actionable or implementable by the prototyping tool.
View details
Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models
Minsuk Kahng
IEEE Conference on Visualization and Visual Analytics (VIS), IEEE (2023)
Preview abstract
Large language models (LLMs) can be used to generate smaller, more refined datasets via few-shot prompting for benchmarking, fine-tuning or other use cases. However, understanding and evaluating these datasets is difficult, and the failure modes of LLM-generated data are still not well understood. Specifically, the data can be repetitive in surprising ways, not only semantically but also syntactically and lexically. We present LinguisticLens, a novel interactive visualization tool for making sense of and analyzing syntactic diversity of LLM-generated datasets. LinguisticLens clusters text along syntactic, lexical, and semantic axes. It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples. The live demo is available at https://shorturl.at/zHOUV.
View details
PromptInfuser: Bringing User Interface Mock-ups to Life with Large Language Model Prompts
Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery (to appear)
Preview abstract
Large Language Models have enabled novices without machine learning (ML) experience to quickly prototype ML functionalities with prompt programming. This paper investigates incorporating prompt-based prototyping into designing functional user interface (UI) mock-ups. To understand how infusing LLM prompts into UI mock-ups might affect the prototyping process, we conduct a exploratory study with five designers, and find that this capability might significantly speed up creating functional prototypes, inform designers earlier on how their designs will integrate ML, and enable user studies with functional prototypes earlier. From these findings, we built PromptInfuser, a Figma plugin for authoring LLM-infused mock-ups. PromptInfuser introduces two novel LLM-interactions: input-output, which makes content interactive and dynamic, and frame-change, which directs users to different frames depending on their natural language input. From initial observations, we find that PromptInfuser has the potential to transform the design process by tightly integrating UI and AI prototyping in a single interface.
View details