Mark J. Matthews

Mark J. Matthews

Mark Matthews is a senior software engineer in Google Research. His research focuses on neural radiance fields and synthetic data generation. Prior to joining Google in 2017 he worked in research and development at DreamWorks Animation specializing in volumetric rendering, hair and fluid simulation. He has 22 feature film credits including "How to Train Your Dragon", "Kung Fu Panda" and "Shrek Forever After". He is also the inventor of a hair data compression patent and two DreamWorks Technical Achievement Awards.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates
    Shengjie Zhu
    Xiaoming Liu
    Vincent Chu
    International Conference on 3D Vision (2026)
    Preview abstract Structure-from-Motion (SfM) is a classical 3D vision task for recovering camera parameters and scene geometry from multi-view images. Recent advances in deep learning enable accurate monocular depth estimation (MDE) that infers structure from a single image without depending on camera motion. But integrating MDE into SfM remains challenging. Unlike classical triangulated sparse pointclouds, MDE produces dense depthmaps with significantly higher error variance. Inspired by modern RANSAC estimators, we propose a Marginalized Bundle Adjustment (MBA) to accommodate MDE error variance with its density. With MBA, we show that MDE depthmaps are sufficiently accurate to support SoTA or competitive results in Structure-from-Motion and camera relocalization. Our benchmark demonstrates consistent remarkable results from two-view, few-frames small multiview, to thousands-frames large multiview system. Our method highlights the significant potential of MDE on multi-view 3D vision tasks. View details
    LOLNeRF: Learn from One Look
    Daniel Rebain
    Kwang Yi
    Dmitry Lagun
    Andrea Tagliasacchi
    Computer Vision Pattern Recognition (CVPR) (2022)
    Preview abstract We present a method for learning a generative 3D model based on neural radiance fields, trained solely from single-views of objects. While generating realistic images is no longer a difficult task, producing the corresponding 3D structure such that they can be rendered from different views is non-trivial. Here, we show that, unlike existing methods, one does not need any multi-view data to achieve this goal. Specifically, we show that by learning to reconstruct many images aligned to an approximate canonical pose, with a single network conditioned on a shared latent space, you can learn a space of radiance fields that models the shape and appearance of a class of objects. We demonstrate this by training models to reconstruct a number of object categories including humans, cats, and cars, all using datasets that contain only single views of each subject and no depth or geometry information. Our experiments show that this method achieves state-of-the-art results in novel view synthesis and monocular depth prediction. View details
    ×