Daily Reading 20200522

Shapes and Context: In-the-Wild Image Synthesis & Manipulation

posted on: CVPR2019

In image synthesis and image manipulation field, recent works are mainly learning-based parametric methods. In this paper, they propose a data-driven model with no learning for interactively synthesizing in-the-wild images from semantic label input masks. Their model is controllable and interpretable, following stages as: (1) global scene context, filter the list of training examples using labels and pixel overlap of labels; (2) instance shape consistency, search boundaries and extract the shapes with similar context; (3) local part consistency, a more fine-grain constrain when global shape is not able to capture, (4) pixel-level consistency, similar to part consistency, fill the remaining holes after (2) and (3).

In their quantitative comparison, they measure image realism by applying FID scores and measure image quality by comparing the segmentation outputs between synthesized image and the original. Compared with pix2pix and pix2pix-HD, their method could generate both high-quality and realistic images. In their qualitative comparison,the user study indicates that their results are more favorable than pix2pix. And it could generate diverse outputs without additional efforts.

Pros:

  1. Compared to other parametric methods, their work has notable advantages: 1) it is not limited to specific training data dataset and distribution, 2) it performs better with more given data, while parametric methods will perform worse, 3) it could generate arbitrarily high-resolution images, 4) it can generate an exponentially large set of viable synthesized images. 5) it’s highly controllable and interpretable.

Cons:

  1. The synthesized images has a good structure and semantic consistency, but the appearance of different instances is not consistent, making it visual unpleasant.