
David Li
Interested in computer graphics, computer vision, and AR/VR applications.
Research Areas
Authored Publications
Sort By
XR Blocks: Accelerating Human-Centered AI + XR Innovation
Nels Numan
Evgenii Alekseev
Alex Cooper
Min Xia
Scott Chung
Jeremy Nelson
Xiuxiu Yuan
Jolica Dias
Tim Bettridge
Benjamin Hersh
Michelle Huynh
Konrad Piascik
Ricardo Cabello
Google, XR, XR Labs (2025)
Preview abstract
We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype.
View details
Sensible Agent: A Framework for Unobtrusive Interaction with Proactive AR Agent
Min Xia
Nels Numan
Dinesh Manocha
Proceedings of the 39th Annual ACM Symposium on User Interface Software and Technology (UIST), ACM (2025), pp. 22
Preview abstract
Proactive AR agents promise context-aware assistance, but their interactions often rely on explicit voice prompts or responses, which can be disruptive or socially awkward. We introduce Sensible Agent, a framework designed for unobtrusive interaction with these proactive agents. Sensible Agent dynamically adapts both “what” assistance to offer and, crucially, “how” to deliver it, based on real-time multimodal context sensing. Informed by an expert workshop (n=12) and a data annotation study (n=40), the framework leverages egocentric cameras, multimodal sensing, and Large Multimodal Models (LMMs) to infer context and suggest appropriate actions delivered via minimally intrusive interaction modes. We demonstrate our prototype on an XR headset through a user study (n=10) in both AR and VR scenarios. Results indicate that Sensible Agent significantly reduces perceived intrusiveness and interaction effort compared to voice-prompted baseline, while maintaining high utility.
View details
XR Blocks: Accelerating Human-Centered AI+XR Innovation
Nels Numan
Evgenii Alekseev
Alex Cooper
Min Xia
Scott Chung
Jeremy Nelson
Xiuxiu Yuan
Jolica Dias
Tim Bettridge
Benjamin Hersh
Michelle Huynh
Ricardo Cabello
arXiv, Google XR (2025)
Preview abstract
We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI+XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI+XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI+XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype. Site: https://xrblocks.github.io
View details
Preview abstract
Advanced AR/VR headsets often have a dedicated depth sensor or multiple cameras, high processing power, and a highcapacity battery to track hands or controllers. However, these approaches are not compatible with the small form factor and limited thermal capacity of lightweight AR devices. In this paper, we present RetroSphere, a self-contained 6 degree of freedom (6DoF) controller tracker that can be integrated with almost any device. RetroSphere tracks a passive controller with just 3 retroreflective spheres using a stereo pair of mass-produced infrared blob trackers, each with its own infrared LED emitters. As the sphere is completely passive, no electronics or recharging is required. Each object tracking camera provides a tiny Arduino-compatible ESP32 microcontroller with the 2D position of the spheres. A lightweight stereo depth estimation algorithm that runs on the ESP32 performs 6DoF tracking of the passive controller. Also, RetroSphere provides an auto-calibration procedure to calibrate the stereo IR tracker setup. Our work builds upon Johnny Lee’s Wii remote hacks and aims to enable a community of researchers, designers, and makers to use 3D input in their projects with affordable off-the-shelf components. RetroSphere achieves a tracking accuracy of about 96.5% with errors as low as ∼3.5 cm over a 100 cm tracking range, validated with ground truth 3D data obtained using a LIDAR camera while consuming around 400 mW. We provide implementation details, evaluate the accuracy of our system, and demonstrate example applications, such as mobile AR drawing, 3D measurement, etc. with our Retrosphere-enabled AR glass prototype.
View details
OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas
Christian Haene
Danhang "Danny" Tang
Amitabh Varshney
2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), IEEE
Preview abstract
Immersive maps such as Google Street View and Bing Streetside provide true-to-life views with a massive collection of panoramas. However, these panoramas are only available at sparse intervals along the path they are taken, resulting in visual discontinuities during navigation. Prior art in view synthesis is usually built upon a set of perspective images, a pair of stereoscopic images, or a monocular image, but barely examines wide-baseline panoramas, which are widely adopted in commercial platforms to optimize bandwidth and storage usage. In this paper, we leverage the unique characteristics of wide-baseline panoramas and present OmniSyn, a novel pipeline for 360° view synthesis between wide-baseline panoramas. OmniSyn predicts omnidirectional depth maps using a spherical cost volume and a monocular skip connection, renders meshes in 360° images, and synthesizes intermediate views with a fusion network. We demonstrate the effectiveness of OmniSyn via comprehensive experimental results including comparison with the state-of-the-art methods on CARLA and Matterport datasets, ablation studies, and generalization studies on street views. We envision our work may inspire future research for this unheeded real-world task and eventually produce a smoother experience for navigating immersive maps.
View details
Preview abstract
With the rapidly increasing resolutions of 360° cameras, head-mounted displays, and live-streaming services, streaming high-resolution panoramic videos over limited-bandwidth networks is becoming a critical challenge. Foveated video streaming can address this rising challenge in the context of eye-tracking-equipped virtual reality head-mounted displays. However, conventional log-polar foveated rendering suffers from a number of visual artifacts such as aliasing and flickering. In this paper, we introduce a new log-rectilinear transformation that incorporates summed-area table filtering and off-the-shelf video codecs to enable foveated streaming of 360° videos suitable for VR headsets with built-in eye-tracking. To validate our approach, we build a client-server system prototype for streaming 360° videos which leverages parallel algorithms over real-time video transcoding. We conduct quantitative experiments on an existing 360° video dataset and observe that the log-rectilinear transformation paired with summed-area table filtering heavily reduces flickering compared to log-polar subsampling while also yielding an additional 11% reduction in bandwidth usage.
View details