Adarsh Kowdle

Adarsh Kowdle

I am a Senior Staff Software Engineer and R&D group Manager on Google's Augmented Reality team leading the efforts around geometric and human perception, working on end-to-end solutions from research to product at the intersection of real-time computer vision, geometric/human sensing and applied machine learning such as ARCore Depth API, Relightables. Previously at Google, I was the Hardware/Systems Lead for uDepth: real-time active depth sensing on Pixel 4 that powers Face Unlock and computational photography use cases such as bokeh. My areas of interest are computer vision and machine learning with a focus on real-time applications.

Previously, I was a Senior Scientist and part of the founding team at perceptiveIO, where I developed computer vision and machine learning algorithms for 3D sensing, visual recognition and human-computer interaction. Prior to this, I spent 3 years at Microsoft as a Senior SDE / Researcher in the Applied Vision and Imaging Team at Microsoft, where I worked on Surface Hub among other projects. I also worked with the Interactive 3D Technologies group at Microsoft Research at Redmond for 6 months on projects such as Holoportation.

I graduated with a PhD in Electrical and Computer Engineering from Cornell University in July 2013. I was advised by Prof. Tsuhan Chen. My thesis focus was on interactive computer vision algorithms and image based modeling; putting the user in the loop intelligently by leveraging the power of the automatic algorithm.

Google Scholar Page
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Experiencing Rapid Prototyping of Machine Learning Based Multimedia Applications in Rapsai
    Na Li
    Jing Jin
    Michelle Carney
    Xiuxiu Yuan
    Ping Yu
    Ram Iyengar
    CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, ACM, 448:1-4
    Preview abstract We demonstrate Rapsai, a visual programming platform that aims to streamline the rapid and iterative development of end-to-end machine learning (ML)-based multimedia applications. Rapsai features a node-graph editor that enables interactive characterization and visualization of ML model performance, which facilitates the understanding of how the model behaves in different scenarios. Moreover, the platform streamlines end-to-end prototyping by providing interactive data augmentation and model comparison capabilities within a no-coding environment. Our demonstration showcases the versatility of Rapsai through several use cases, including virtual background, visual effects with depth estimation, and audio denoising. The implementation of Rapsai is intended to support ML practitioners in streamlining their workflow, making data-driven decisions, and comprehensively evaluating model behavior with real-world input. View details
    Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming
    Na Li
    Jing Jin
    Michelle Carney
    Scott Joseph Miles
    Maria Kleiner
    Xiuxiu Yuan
    Anuva Kulkarni
    Xingyu “Bruce” Liu
    Ahmed K Sabie
    Abhishek Kar
    Ping Yu
    Ram Iyengar
    Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI), ACM
    Preview abstract In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows. This study helped us derive six design goals, which informed Rapsai, a visual programming platform for rapid and iterative development of end-to-end ML-based multimedia applications. Rapsai is based on a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input. View details
    Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines
    Na Li
    Jing Jin
    Michelle Carney
    Jun Jiang
    Xiuxiu Yuan
    Kristen Wright
    Mark Sherwood
    Jason Mayes
    Lin Chen
    Jingtao Zhou
    Zhongyi Zhou
    Ping Yu
    Ram Iyengar
    ACM (2023) (to appear)
    Preview abstract We demonstrate Visual Blocks for ML, a visual programming platform that facilitates rapid prototyping of ML-based multimedia applications. As the public version of Rapsai , we further integrated large language models and custom APIs into the platform. In this demonstration, we will showcase how to build interactive AI pipelines in a few drag-and-drops, how to perform interactive data augmentation, and how to integrate pipelines into Colabs. In addition, we demonstrate a wide range of community-contributed pipelines in Visual Blocks for ML, covering various aspects including interactive graphics, chains of large language models, computer vision, and multi-modal applications. Finally, we encourage students, designers, and ML practitioners to contribute ML pipelines through https://github.com/google/visualblocks/tree/main/pipelines to inspire creative use cases. Visual Blocks for ML is available at http://visualblocks.withgoogle.com. View details
    DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality
    Maksym Dzitsiuk
    Luca Prasso
    Ivo Duarte
    Jason Dourgarian
    Joao Afonso
    Jose Pascoal
    Josh Gladstone
    Nuno Moura e Silva Cruces
    Shahram Izadi
    Konstantine Nicholas John Tsotsos
    Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2020), pp. 829-843
    Preview abstract Mobile devices with passive depth sensing capabilities are ubiquitous, and recently active depth sensors have become available on some tablets and VR/AR devices. Although real-time depth data is accessible, its rich value to mainstream AR applications has been sorely under-explored. Adoption of depth-based UX has been impeded by the complexity of performing even simple operations with raw depth data, such as detecting intersections or constructing meshes. In this paper, we introduce DepthLab, a software library that encapsulates a variety of depth-based UI/UX paradigms, including geometry-aware rendering (occlusion, shadows), surface interaction behaviors (physics-based collisions, avatar path planning), and visual effects (relighting, depth-of-field effects). We break down depth usage into localized depth, surface depth, and dense depth, and describe our real-time algorithms for interaction and rendering tasks. We present the design process, system, and components of DepthLab to streamline and centralize the development of interactive depth features. We have open-sourced our software to external developers, conducted performance evaluation, and discussed how DepthLab can accelerate the workflow of mobile AR designers and developers. We envision that DepthLab may help mobile AR developers amplify their prototyping efforts, empowering them to unleash their creativity and effortlessly integrate depth into mobile AR experiences. View details
    Experiencing Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality in DepthLab
    Maksym Dzitsiuk
    Luca Prasso
    Ivo Duarte
    Jason Dourgarian
    Joao Afonso
    Jose Pascoal
    Josh Gladstone
    Nuno Moura e Silva Cruces
    Shahram Izadi
    Konstantine Nicholas John Tsotsos
    Adjunct Publication of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2020), pp. 108-110
    Preview abstract We demonstrate DepthLab, a wide range of experiences using the ARCore Depth API that allows users to detect the shape and depth in the physical environment with a mobile phone. DepthLab encapsulates a variety of depth-based UI/UX paradigms, including geometry-aware rendering (occlusion, shadows, texture decals), surface interaction behaviors (physics, collision detection, avatar path planning), and visual effects (relighting, 3D-anchored focus and aperture effects, 3D photos). We have open-sourced our software at https://github.com/googlesamples/arcore-depth-lab to facilitate future research and development in depth-aware mobile AR experiences. With DepthLab, we aim to help mobile developers to effortlessly integrate depth into their AR experiences and amplify the expression of their creative vision. View details
    Deep Reflectance Fields - High-Quality Facial Reflectance Field Inference from Color Gradient Illumination
    Abhi Meka
    Christian Haene
    Michael Zollhöfer
    Graham Fyffe
    Xueming Yu
    Jason Dourgarian
    Peter Denny
    Sofien Bouaziz
    Peter Lincoln
    Matt Whalen
    Geoff Harvey
    Jonathan Taylor
    Shahram Izadi
    Paul Debevec
    Christian Theobalt
    Julien Valentin
    Christoph Rhemann
    SIGGRAPH (2019)
    Preview abstract Photo-realistic relighting of human faces is a highly sought after feature with many applications ranging from visual effects to truly immersive virtual experiences. Despite tremendous technological advances in the field, humans are often capable of distinguishing real faces from synthetic renders. Photo-realistically relighting any human face is indeed a challenge with many difficulties going from modelling sub-surface scattering and blood flow to estimating the interaction between light and individual strands of hair. We introduce the first system that combines the ability to deal with dynamic performances to the realism of 4D reflectance fields, enabling photo-realistic relighting of non-static faces. The core of our method consists of a Deep Neural network that is able to predict full 4D reflectance fields from two images captured under spherical gradient illumination. Extensive experiments not only show that two images under spherical gradient illumination can be easily captured in real time, but also that these particular images contain all the information needed to estimate the full reflectance field, including specularities and high frequency details. Finally, side by side comparisons demonstrate that the proposed system outperforms the current state-of-the-art in terms of realism and speed. View details
    The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting
    Kaiwen Guo
    Peter Lincoln
    Philip Davidson
    Xueming Yu
    Matt Whalen
    Geoff Harvey
    Jason Dourgarian
    Danhang Tang
    Anastasia Tkach
    Emily Cooper
    Mingsong Dou
    Graham Fyffe
    Christoph Rhemann
    Jonathan Taylor
    Paul Debevec
    Shahram Izadi
    SIGGRAPH Asia (2019) (to appear)
    Preview abstract We present ''The Relightables'', a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on volumetric capture systems, focusing on 3D geometric reconstruction with high resolution textures, much less work has been done to recover photometric properties needed for relighting. Results from such systems lack high-frequency details and the subject's shading is prebaked into the texture. In contrast, a large body of work has addressed relightable acquisition for image-based approaches, which photograph the subject under a set of basis lighting conditions and recombine the images to show the subject as they would appear in a target lighting environment. However, to date, these approaches have not been adapted for use in the context of a high-resolution volumetric capture system. Our method combines this ability to realistically relight humans for arbitrary environments, with the benefits of free-viewpoint volumetric capture and new levels of geometric accuracy for dynamic performances. Our subjects are recorded inside a custom geodesic sphere outfitted with 331 custom color LED lights, an array of high-resolution cameras, and a set of custom high-resolution depth sensors. Our system innovates in multiple areas: First, we designed a novel active depth sensor to capture 12.4MP depth maps, which we describe in detail. Second, we show how to design a hybrid geometric and machine learning reconstruction pipeline to process the high resolution input and output a volumetric video. Third, we generate temporally consistent reflectance maps for dynamic performers by leveraging the information contained in two alternating color gradient illumination images acquired at 60Hz. Multiple experiments, comparisons, and applications show that The Relightables significantly improves upon the level of realism in placing volumetrically captured human performances into arbitrary CG scenes. View details
    ActiveStereoNet: Unsupervised End-to-End Learning for Active Stereo Systems
    Yinda Zhang
    Sameh Khamis
    Christoph Rhemann
    Julien Valentin
    Vladimir Tankovich
    Michael Schoenberg
    Shahram Izadi
    European Conference on Computer Vision (2018)
    Preview abstract In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1/30th of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes. View details
    LookinGood: Enhancing Performance Capture with Real-Time Neural Re-Rendering
    Ricardo Martin Brualla
    Shuoran Yang
    Pavel Pidlypenskyi
    Jonathan Taylor
    Julien Valentin
    Sameh Khamis
    Philip Davidson
    Anastasia Tkach
    Peter Lincoln
    Christoph Rhemann
    Dan Goldman
    Cem Keskin
    Steve Seitz
    Shahram Izadi
    SIGGRAPH Asia (2018)
    Preview abstract Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time. We call this approach neural (re-)rendering, and our live system "LookinGood". Our deep architecture is trained to produce high resolution and high quality images from a coarse rendering in real-time. First, we propose a self-supervised training method that does not require manual ground-truth annotation. We contribute a specialized reconstruction error that uses semantic information to focus on relevant parts of the subject, e.g. the face. We also introduce a salient reweighing scheme of the loss function that is able to discard outliers. We specifically design the system for virtual and augmented reality headsets where the consistency between the left and right eye plays a crucial role in the final user experience. Finally, we generate temporally stable results by explicitly minimizing the difference between two consecutive frames. We tested the proposed system in two different scenarios: one involving a single RGB-D sensor, and upper body reconstruction of an actor, the second consisting of full body 360 degree capture. Through extensive experimentation, we demonstrate how our system generalizes across unseen sequences and subjects. View details
    Depth from motion for smartphone AR
    Julien Valentin
    Neal Wadhwa
    Max Dzitsiuk
    Michael John Schoenberg
    Vivek Verma
    Ambrus Csaszar
    Ivan Dryanovski
    Joao Afonso
    Jose Pascoal
    Konstantine Nicholas John Tsotsos
    Mira Angela Leung
    Mirko Schmidt
    Sameh Khamis
    Vladimir Tankovich
    Shahram Izadi
    Christoph Rhemann
    ACM Transactions on Graphics (2018)
    Preview abstract Augmented reality (AR) for smartphones has matured from a technology for earlier adopters, available only on select high-end phones, to one that is truly available to the general public. One of the key breakthroughs has been in low-compute methods for six degree of freedom (6DoF) tracking on phones using only the existing hardware (camera and inertial sensors). 6DoF tracking is the cornerstone of smartphone AR allowing virtual content to be precisely locked on top of the real world. However, to really give users the impression of believable AR, one requires mobile depth. Without depth, even simple effects such as a virtual object being correctly occluded by the real-world is impossible. However, requiring a mobile depth sensor would severely restrict the access to such features. In this article, we provide a novel pipeline for mobile depth that supports a wide array of mobile phones, and uses only the existing monocular color sensor. Through several technical contributions, we provide the ability to compute low latency dense depth maps using only a single CPU core of a wide range of (medium-high) mobile phones. We demonstrate the capabilities of our approach on high-level AR applications including real-time navigation and shopping. View details