Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a largescale dataset for widget captioning with crowdsourcing. Our dataset contains 162,859 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces. View details
    Using Bayes' Theorem for Command Input: Principle, Models, and Applications
    Suwen Zhu
    Yoonsang Kim
    Jennifer Yi Luo
    Ryan Qin
    Liuping Wang
    Xiangmin Fan
    Feng Tian
    Xiaojun Bi
    Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, 642:1 - 642:15 (to appear)
    Preview abstract Entering commands on touchscreens can be noisy, but existing interfaces commonly adopt deterministic principles for deciding targets and often result in errors. Building on prior research of using Bayes’ theorem to handle uncertainty in input, this paper formalized Bayes’ theorem as a generic guiding principle for deciding targets in command input (referred to as "BayesianCommand"), developed three models for estimating prior and likelihood probabilities, and carried out experiments to demonstrate the effectiveness of this formalization. More specifically, we applied BayesianCommand to improve the input accuracy of (1) point-and-click and (2) word-gesture command input. Our evaluation showed that applying BayesianCommand reduced errors compared to using deterministic principles (by over 26.9% for point-and-click and by 39.9% for word-gesture command input) or applying the principle partially (by over 28.0% and 24.5%). View details
    Modeling and Reducing Spatial Jitter caused by Asynchronous Input and Output Rates
    Axel Antoine
    Mathieu Nancel
    Ella Ge
    Navid Zolghadr
    Géry Casiez
    The 33rd Annual ACM Symposium on User Interface Software and Technology, ACM, New York, NY (2020), 13p (to appear)
    Preview abstract Jitter in interactive systems occurs when visual feedback is perceived as unstable or trembling even though the input signal is smooth or stationary. It can have multiple causes such as sensing noise, or feedback calculations introducing or exacerbating sensing imprecisions. Jitter can however occur even when each individual component of the pipeline works perfectly, as a result of the differences between the input frequency and the display refresh rate. This asynchronicity can introduce rapidly-shifting latencies between the rendered feedbacks and their display on screen, which can result in trembling cursors or viewports. This paper contributes a better understanding of this particular type of jitter. We first detail the problem from a mathematical standpoint, from which we develop a predictive model of jitter amplitude as a function of input and output frequencies, and a new metric to measure this spatial jitter. Using touch input data gathered in a study, we developed a simulator to validate this model and to assess the effects of different techniques and settings with any output frequency. The most promising approach, when the time of the next display refresh is known, is to estimate (interpolate or extrapolate) the user’s position at a fixed time interval before that refresh. When input events occur at 125 Hz, as is common in touch screens, we show that an interval of 4 to 6 ms works well for a wide range of display refresh rates. This method effectively cancels most of the jitter introduced by input/output asynchronicity, while introducing minimal imprecision or latency. View details
    i’sFree: Eyes-Free Gesture Typing via a Touch-Enabled Remote Control
    Suwen Zhu
    Xiaojun Bi
    Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 448:1-448:12 (to appear)
    Preview abstract Entering text without having to pay attention to the keyboard is compelling but challenging due to the lack of visual guidance. We propose i'sFree to enable eyes-free gesture typing on a distant display from a touch-enabled remote control. i'sFree does not display the keyboard or gesture trace but decodes gestures drawn on the remote control into text according to an invisible and shifting Qwerty layout. i'sFree decodes gestures similar to a general gesture typing decoder, but learns from the instantaneous and historical input gestures to dynamically adjust the keyboard location. We designed it based on the understanding of how users perform eyes-free gesture typing. Our evaluation shows eyes-free gesture typing is feasible: reducing visual guidance on the distant display hardly affects the typing speed. Results also show that the i’sFree gesture decoding algorithm is effective, enabling an input speed of 23 WPM, 46% faster than the baseline eyes-free condition built on a general gesture decoder. Finally, i'sFree is easy to learn: participants reached 22 WPM in the first ten minutes, even though 40% of them were first-time gesture typing users. View details
    HotStrokes: Word-Gesture Shortcuts on a Trackpad
    Wenzhe Cui
    Blaine Lewis
    Daniel Vogel
    Xiaojun Bi
    Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, USA, 165:1-165:13 (to appear)
    Preview abstract Expert interaction techniques like hotkeys are efficient, but poorly adopted because they are hard to learn. HotStrokes removes the need for learning arbitrary mappings of commands to hotkeys. A user enters a HotStroke by holding a modifier key, then gesture typing a command name on a laptop trackpad as if on an imaginary virtual keyboard. The gestures are recognized using an adaptation of the SHARK2 algorithm with a new spatial model and a refined method for dynamic suggestions. A controlled experiment shows HotStrokes effectively augments the existing "menu and hotkey" command activation paradigm. Results show the method is efficient by reducing command activation time by 43% compared to linear menus. The method is also easy to learn with a high adoption rate, replacing 91% of linear menu usage. Finally, combining linear menus, hotkeys, and HotStrokes leads to 24% faster command activation overall. View details
    M3 Gesture Menu: Design and Experimental Analyses of Marking Menus for Touchscreen Mobile Interaction
    Kun Li
    Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 249:1-249:14
    Preview abstract Despite their learning advantages in theory, marking menus have faced adoption challenges in practice, even on today's touchscreen-based mobile devices. We address these challenges by designing, implementing, and evaluating multiple versions of M3 Gesture Menu (M3), a reimagination of marking menus targeted at mobile interfaces. M3 is defined on a grid rather than in a radial space, relies on gestural shapes rather than directional marks, and has constant and stationary space use. Our first controlled experiment on expert performance showed M3 was faster and less error-prone by a factor of two than traditional marking menus. A second experiment on learning demonstrated for the first time that users could successfully transition to recall-based execution of a dozen commands after three ten-minute practice sessions with both M3 and Multi-Stroke Marking Menu. Together, M3, with its demonstrated resolution, learning, and space use benefits, contributes to the design and understanding of menu selection in the mobile-first era of end-user computing. View details
    FingerArc and FingerChord: Supporting Novice to Expert Transitions with Guided Finger-Aware Shortcuts
    Blaine Lewis
    Jeff Avery
    Daniel Vogel
    The 31st Annual ACM Symposium on User Interface Software and Technology, ACM, New York, NY (2018), pp. 347-363
    Preview abstract Keyboard shortcuts can be more efficient than graphical input, but they are underused by most users. To alleviate this, we present "Guided Finger-Aware Shortcuts" to reduce the gulf between graphical input and shortcut activation. The interaction technique works by recognising when a special hand posture is used to press a key, then allowing secondary finger movements to select among related shortcuts if desired. Novice users can learn the mappings through dynamic visual guidance revealed by holding a key down, but experts can trigger shortcuts directly without pausing. Two variations are described: FingerArc uses the angle of the thumb, and FingerChord uses a second key press. The techniques are motivated by an interview study identifying factors hindering the learning, use, and exploration of keyboard shortcuts. A controlled comparison with conventional keyboard shortcuts shows the techniques encourage overall shortcut usage, make interaction faster, less error-prone, and provide advantages over simply adding visual guidance to standard shortcuts. View details