Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 848 publications
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Zhongyi Zhou
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Kristen Wright
Jason Mayes
Mark Sherwood
Alex Olwal
Ram Iyengar
Na Li
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
Preview abstract
Visual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas.
View details
Beyond Touchscreens: Dynamic and Multimodal Interaction Needs
Melissa Barnhart Wantland
Mai Kobori
Universal Access in Human-Computer Interaction, Springer-Verlag (2025) (to appear)
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
Online-EYE: Multimodal Implicit Eye Tracking Calibration for XR
Baosheng James Hou
Lucy Abramyan
Prasanthi Gurumurthy
Khushman Patel
Haley Adams
Andrea Colaco
Ken Pfeuffer
Hans Gellersen
Karan Ahuja
2025
Preview abstract
Unlike other inputs for VR that work out of the box, eye tracking typically requires custom calibration per user or session. We present a multimodal inputs approach for implicit calibration of eye tracker in VR, leveraging UI interaction for continuous, background calibration. Our method analyzes gaze data alongside controller interaction with UI elements, and employing ML techniques it continuously refines the calibration matrix without interrupting users from their current tasks. Potentially eliminating the need for explicit calibration. We demonstrate the accuracy and effectiveness of this implicit approach across various tasks and real time applications achieving comparable eye tracking accuracy to native, explicit calibration.
View details
Participatory AI Considerations for Advancing Racial Health Equity
Andrea G. Parker
Jatin Alla
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025) (to appear)
PAIGE: Examining Student Learning Outcomes and Experiences with Personalized AI-Generated Podcasts
Tiffany Do
Usama Bin Shafqat
Elsie Ling
Νikhil Sarda
2025
Preview abstract
Generative AI is revolutionizing content creation and holds promise for real-time, personalized educational experiences. We investigated the effectiveness of converting textbook chapters into AI-generated podcasts and explored the impact of personalizing these podcasts
for individual learner profiles. We conducted a 3x3 user study with 180 college students in the United States, comparing traditional textbook reading with both generalized and personalized AI-generated podcasts across three textbook subjects. The personalized podcasts were tailored to students’ majors, interests, and learning styles. Our findings show that students found the AI-generated podcast format to be more enjoyable than textbooks and that personalized podcasts led to significantly improved learning outcomes, although this was subject-specific. These results highlight that AI-generated podcasts can offer an engaging and effective modality
transformation of textbook material, with personalization enhancing content relevance. We conclude with design recommendations for leveraging AI in education, informed by student feedback.
View details
Zoom in, Zoom out, Reframe: Domain Experts’ Strategies for Addressing Non-Experts’ Complex Questions
Roma Ruparel
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025) (to appear)
Preview abstract
Consumers rely on the Internet for expert information in domains such as healthcare and law. Large Language Models (LLMs) have the potential to increase access to expert knowledge. However, past research has not addressed how to handle certain aspects of complex questions that commonly occur in expert-layperson interactions. We conducted in-depth interviews with 26 experts across multiple domains to understand how they experience and respond to challenges associated with non-experts’ questions. Results from a thematic analysis reveal three recurring strategies that experts across domains employ when fielding complex questions. Experts zoom in to clarify details of a broad information request, zoom out to address overly narrow questions or assumptions, and reframe when the underlying need is unstated or poorly represented. We discuss implications for the design of LLM-based experiences that facilitate access to expert information.
View details
Generative AI in Creative Practice: ML-Artist Folk Theories of T2I Use, Harm, and Harm-Reduction
Shalaleh Rismani
Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), Association for Computing Machinery (2024), pp. 1-17 (to appear)
Preview abstract
Understanding how communities experience algorithms is necessary to mitigate potential harmful impacts. This paper presents folk theories of text-to-image (T2I) models to enrich understanding of how artist communities experience creative machine learning (ML) systems. This research draws on data collected from a workshop with 15 artists from 10 countries who incorporate T2I models in their creative practice. Through reflexive thematic analysis of workshop data, we highlight theorization of T2I use, harm, and harm-reduction. Folk theories of use envision T2I models as an artistic medium, a mundane tool, and locate true creativity as rising above model affordances. Theories of harm articulate T2I models as harmed by engineering efforts to eliminate glitches and product policy efforts to limit functionality. Theories of harm-reduction orient towards protecting T2I models for creative practice through transparency and distributed governance. We examine how these theories relate, and conclude by discussing how folk theorization informs responsible AI efforts.
View details
Solidarity not Charity! Empowering Local Communities for Disaster Relief during COVID-19 through Grassroots Support
Tiffany Knearem
Jeongwon Jo
Oluwafunke Alliyu
John M. Carroll
Computer Supported Cooperative Work (2024) (2024)
Preview abstract
The COVID-19 pandemic brought wide-ranging, unanticipated societal changes as communities rushed to slow the spread of the novel coronavirus. In response, mutual aid groups bloomed online across the United States to fill in the gaps in social services and help local communities cope with infrastructural breakdowns. Unlike many previous disasters, the long-haul nature of COVID-19 necessitates sustained disaster relief efforts. In this paper, we conducted an interview study with online mutual aid group administrators to understand how groups facilitated disaster relief, and how disaster relief initiatives developed and maintained over the course of the first year of COVID-19. Our findings suggest that the groups were crucial sources of community-based support for immediate needs, innovated long-term solutions for chronic community issues and grew into a vehicle for justice-centered work. Our insights shed light on the strength of mutual aid as a community capacity that can support communities to collectively be more prepared for future long-haul disasters than they were with COVID-19.
View details
Preview abstract
Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in everyday activities. Despite their prevalence, existing adaptive systems predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a real-time system that detects SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves good performance in availability prediction across 60 in-the-wild egocentric videos in 32 different scenarios. Further, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the utility of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.
View details
Preview abstract
As AI systems quickly improve in both breadth and depth of performance, they lend themselves to creating increasingly powerful and realistic agents, including the possibility of agents modeled on specific people. We anticipate that within our lifetimes it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death. We call these generative ghosts, since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. In this paper, we first discuss the design space of potential implementations of generative ghosts. We then discuss the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Based on these considerations, we lay out a research agenda for the AI and HCI research communities to empower people to create and interact with AI afterlives in a safe and beneficial manner.
View details
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Gilles Baechler
Srinivas Sunkara
Maria Wang
Hassan Mansoor
Vincent Etter
Jason Lin
(2024)
Preview abstract
Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.
View details
Creative ML Assemblages: The Interactive Politics of People, Processes, and Products
Ramya Malur Srinivasan
Katharina Burgdorf
Jennifer Lena
ACM Conference on Computer Supported Cooperative Work and Social Computing (2024) (to appear)
Preview abstract
Creative ML tools are collaborative systems that afford artistic creativity through their myriad interactive relationships. We propose using ``assemblage thinking" to support analyses of creative ML by approaching it as a system in which the elements of people, organizations, culture, practices, and technology constantly influence each other. We model these interactions as ``coordinating elements" that give rise to the social and political characteristics of a particular creative ML context, and call attention to three dynamic elements of creative ML whose interactions provide unique context for the social impact a particular system as: people, creative processes, and products. As creative assemblages are highly contextual, we present these as analytical concepts that computing researchers can adapt to better understand the functioning of a particular system or phenomena and identify intervention points to foster desired change. This paper contributes to theorizing interactions with AI in the context of art, and how these interactions shape the production of algorithmic art.
View details
From Provenance to Aberrations: Image Creator and Screen Reader User Perspectives on Alt Text for AI-Generated Images
Maitraye Das
Alexander J. Fiannaca
CHI Conference on Human Factors in Computing Systems (2024)
Preview abstract
AI-generated images are proliferating as a new visual medium. However, state-of-the-art image generation models do not output alternative (alt) text with
their images, rendering them largely inaccessible to screen reader users (SRUs). Moreover, less is known about what information would be most desirable
to SRUs in this new medium. To address this, we invited AI image creators and SRUs to evaluate alt text prepared from various sources and write their own
alt text for AI images. Our mixed-methods analysis makes three contributions. First, we highlight creators’ perspectives on alt text, as creators are well-positioned
to write descriptions of their images. Second, we illustrate SRUs’ alt text needs particular to the emerging medium of AI images. Finally, we discuss the
promises and pitfalls of utilizing text prompts written as input for AI models in alt text generation, and areas where broader digital accessibility guidelines
could expand to account for AI images.
View details
Preview abstract
Extended reality (XR) has the potential for seamless user interface (UI) transitions across people, objects, and environments. However, the design space, applications, and common practices of 3D UI transitions remain underexplored. To address this gap, we conducted a need-finding study with 11 participants, identifying and distilling a taxonomy based on three types of UI placements --- affixed to static, dynamic, or self entities. We further surveyed 113 commercial applications to understand the common practices of 3D UI mobility control, where only 6.2% of these applications allowed users to transition UI between entities. In response, we built interaction prototypes to facilitate UI transitions between entities. We report on results from a qualitative user study (N=14) on 3D UI mobility control using our FingerSwitches technique, which suggests that perceived usefulness is affected by types of entities and environments. We aspire to tackle a vital need in UI mobility within XR.
View details
Preview abstract
Interruptions in digital services are a common occurrence for users. These disruptions, however, exact a cost in terms of attention, task completion rate, and, most importantly, emotional state. While several methods currently employed by service providers attempt to address this, the paper will argue that browser games or similar interactive interfaces should become a standard mechanism to ease the aforementioned effects.
View details