MaskSketch: Unpaired Structure-guided Masked Image Generation

Dina Bashkirova
Kate Saenko
Kihyuk Sohn
CVPR 2023 (2023)
Google Scholar

Abstract

Recent conditional image generation methods produce images of
remarkable diversity, fidelity and realism. However, the majority of
these methods allow conditioning only on labels or text prompts, which
limits their level of control over the generation result. In this
paper, we introduce MaskSketch, a masked image generation method that
allows spatial conditioning of the generation result, using a guiding
sketch as an extra conditioning signal during sampling. MaskSketch
utilizes a pre-trained masked image generator, requires no model
training or paired supervision, and works with input sketches of
different levels of abstraction. We propose a novel parallel sampling
scheme that leverages the structural information encoded in the
intermediate self-attention maps of a masked generative transformer,
such as scene layout and object shape. Our results show that
MaskSketch achieves high image realism and fidelity to the guiding
structure. Evaluated on standard benchmark datasets, MaskSketch
outperforms state-of-the-art methods for sketch-to-image translation,
as well as generic image-to-image translation approaches.

Research Areas