Associating Objects and their Effects in Unconstrained Monocular Video

Erika Lu
Zhengqi Li
Leonid Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023
Google Scholar

Abstract

We propose a method to decompose a video into a back-
ground and a set of foreground layers, where the back-
ground captures stationary elements while the foreground
layers capture moving objects along with their associated
effects (e.g. shadows and reflections). Our approach is de-
signed for unconstrained monocular videos, with arbitrary
camera and object motion. Prior work that tackles this
problem assumes that the video can be mapped onto a fixed
2D canvas, severely limiting the possible space of camera
motion. Instead, our method applies recent progress in
monocular camera pose and depth estimation to create a
full, RGBD video layer for the background, along with a
video layer for each foreground object. To solve the under-
constrained decomposition problem, we propose a new loss
formulation based on multi-view consistency. We test our
method on challenging videos with complex camera motion
and show significant qualitative improvement over current
methods.

Research Areas