
Mar Gonzalez-Franco
Mar Gonzalez-Franco, PhD, is a Computer Scientist and Neuroscientist at Google working on a new generation of Immersive technologies. With a background in real-time systems in her research she tries to build better interactions for immersive technologies using different disciplines: Virtual Reality, Augmented Reality, AI, computer graphics, computer vision, Avatars, and haptics. All while studying human behavior, perception and neuroscience.
She was awarded the 2022 IEEE VGTC VR New Researcher Award, and the NAE Frontiers Engineer.
She leads the BIRD lab, working on Blended Interactions Research and Devices.
Authored Publications
Sort By
Online-EYE: Multimodal Implicit Eye Tracking Calibration for XR
Baosheng James Hou
Lucy Abramyan
Prasanthi Gurumurthy
Khushman Patel
Haley Adams
Andrea Colaco
Ken Pfeuffer
Hans Gellersen
Karan Ahuja
2025
Preview abstract
Unlike other inputs for VR that work out of the box, eye tracking typically requires custom calibration per user or session. We present a multimodal inputs approach for implicit calibration of eye tracker in VR, leveraging UI interaction for continuous, background calibration. Our method analyzes gaze data alongside controller interaction with UI elements, and employing ML techniques it continuously refines the calibration matrix without interrupting users from their current tasks. Potentially eliminating the need for explicit calibration. We demonstrate the accuracy and effectiveness of this implicit approach across various tasks and real time applications achieving comparable eye tracking accuracy to native, explicit calibration.
View details
Beyond the Phone: Exploring Context-aware Interaction Between Mobile andMixed Reality Devices
Fengyuan Zhu
Daniel Kalmar
Mahdi Tayarani
2025
Preview abstract
Despite the surge in popularity of virtual reality (VR), mobile phones remain the primary medium for accessing digital content, offering both privacy and portability. This short paper presents Beyond the Phone, a novel framework that enhances mobile phones in VR with context-aware controls and spatial augmentation. We first establish a comprehensive design space through brainstorming and iterative discussions with VR experts. We then develop a proof-of-concept system that analyzes UI layouts to offer context-aware controls and spatial augmentation, targeting six key application areas within our design space. Finally, we demonstrate that our system can effectively adapt to a broad spectrum of applications at runtime, and discuss future directions with reviews with seven experts.
View details
H2E: Hand, Head, Eye: A Multimodal Cascade of Natural Inputs
Khushman Patel
Ken Pfeuffer
Hans Gellersen
IEEE VR (2025)
Preview abstract
Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline.
View details
Geometry Fidelity for Spherical Images
Anders Christensen
Nooshin Mojab
Khushman Patel
Karan Ahuja
Zeynep Akata
Ole Winther
Andrea Colaco
ECCV (2024)
Preview abstract
Spherical, or omni-directional, images offer an immersive format appealing to a wide range of computer vision applications. However, the geometric properties of spherical images pose a major challenge for existing models and metrics designed for 2D images. Concretely, we demonstrate that the established generative evaluation metric FID fails to quantify shortcomings in these properties. To this end, we introduce two quantitative evaluation metrics accounting for geometric constraints of spherical images, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID, tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS detect issues with spherical structure better than previously utilized metrics.
View details
Hovering Over the Key to Text Input in XR
Diar Abdlkarim
Arpit Bhatia
Stuart Macgregor
Jason Fotso-Puepi
Hasti Seifi
Massimiliano Di Luca
Karan Ahuja
2024
Preview abstract
Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboards (both physical and virtual) as essential tools. We discuss the unique challenges and opportunities presented by XR, synthesizing key trends from existing solutions.
View details
Preview abstract
We present XDTK, an open-source Unity/Android toolkit for prototyping multi-device interactions in extended reality (XR). With the Unity package and Android app provided in XDTK, data from any number of devices (phones, tablets, or wearables) can be streamed to and surfaced within a Unity-based XR application. ARCore-supported device also provide self-tracked pose data. Devices on the same local network are automatically discovered by the Unity server and their inputs are routed using a custom event framework. We designed XDTK to be modular and easily extendable to enable fast, simple, and effective prototyping of multi-device experiences by both researchers and developers.
View details
Preview abstract
Interactions with Extended Reality Head Mounted Devices (XR HMDs) applications require precise, intuitive and efficient input methods. Current approaches either rely on power-intensive sensors, such as cameras for hand-tracking, or specialized hardware in the form of handheld controllers. As an alternative, past works have explored the use of devices already present with the user, in the form of smartphones and smartwatches as practical input solutions. However, this approach risks interaction overload---how can one determine whether the user’s interaction gestures on the watch-face or phone screen are directed toward control of the mobile device itself or the XR device? To this effect, we propose a novel framework for cross-device input routing and device arbitration by employing Inertial Measurement Units (IMUs) within these devices. We validate our approach in a user study with six participants. By making use of the relative orientation between the headset and the target input device, we can estimate the intended device of interaction with 93.7% accuracy. Our method offers a seamless, energy-efficient alternative for input management in XR, enhancing user experience through natural and ergonomic interactions.
View details
Preview abstract
WindowMirror is a framework for using XR headsets in productivity scenarios. The toolkit provides users with a simulated, extended screen real-estate. It allows users to interact with multiple desktop applications in real-time within a XR environment. Our architecture has two main modules: one a Unity package and a Python backend, which makes it easy to use and extend. WindowMirror supports traditional desktop interaction methods such as mouse, keyboard, and hand tracking. Furthermore, it features a Cylindrical Window Layout, an emerging design pattern which is particularly effective for single-user, egocentric perspectives. The introduction of WindowMirror aims to set a foundation for future research in XR screen-focused productivity scenarios.
View details
Preview abstract
For Extended Reality (XR) headsets, a key aim is the natural interaction in 3D space beyond what traditional methods of keyboard, mouse, and touchscreen can offer. With the release of the Apple Vision Pro, a novel interaction paradigm is now widely available where users seamlessly navigate content through the combined use of their eyes and hands. However, blending these modalities poses unique design challenges due to their dynamic nature and the absence of established principles and standards.
In this article, we present five design principles and issues for the Gaze + Pinch interaction technique, informed by eye-hand research in the human-computer interaction field. The design principles encompass mechanisms like division of labor and minimalistic timing, which are crucial for usability, alongside enhancements for the manipulation of objects, indirect interactions, and drag & drop. Whether in design, technology, or research domains, this exploration offers valuable perspectives for navigating the evolving landscape of 3D interaction.
View details
Augmented Object Intelligence with XR-Objects
Mustafa Doga Dogan
Karan Ahuja
Andrea Colaco
Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST), ACM (2024), pp. 1-15
Preview abstract
Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper explores Augmented Object Intelligence (AOI) in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to digital functionalities. Our approach utilizes real-time object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions without the need for object pre-registration. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in contextually relevant ways using object-based context menus. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system’s open-source design and implementation, and (3) show its versatility through various use cases and a user study.
View details