FILM: Frame Interpolation for Large Motion

Fitsum Reda
Eric Tabellion
Proceedings of the European conference on computer vision (ECCV) (2022)
Google Scholar


We present a frame interpolation algorithm that synthesizes
an engaging slow-motion video from near-duplicate photos which often
exhibit large scene motion. Near-duplicates interpolation is an interesting
new application, but large motion poses challenges to existing methods.
To address this issue, we adapt a feature extractor that shares weights
across the scales, and present a “scale-agnostic” motion estimator. It
relies on the intuition that large motion at finer scales should be similar
to small motion at coarser scales, which boosts the number of available
pixels for large motion supervision. To inpaint wide disocclusions caused
by large motion and synthesize crisp frames, we propose to optimize
our network with the Gram matrix loss that measures the correlation
difference between features. To simplify the training process, we further
propose a unified single-network approach that removes the reliance on
additional optical-flow or depth network and is trainable from frame
triplets alone. Our approach outperforms state-of-the-art methods on
the Xiph large motion benchmark while performing favorably on Vimeo90K, Middlebury and UCF101. Source codes and pre-trained models are
available at

Research Areas