Daily Reading 20200510

Toward Multimodal Image-to-Image Translation

posted on: NIPS2017

In this paper, they mentioned that in image-to-image translation, a single input may correspond to multiple possible outputs, making it a multi-model problem. So they propose to learn the distribution of possible outputs. They propose a hybrid model BicycleGAN by combining cVAE-GAN and cLR-GAN. cVAE-GAN learns the hidden distribution of image output through VAE, and models the multi-style output distribution. cVAE-GAN starts with ground truth target image B and encodes it into latent space. The generator then attempts to inversely map the input image A along with the sample z to the original image B. cLR-GAN starts with randomly sampled latent code and the condition generator is supposed to produce an output. When the output is used as input to the encoder, the same latent code should be returned to achieve self-consistency. cLR-GAN randomly samples the latent code from a known distribution, uses this code to map A to the output B, and then attempts to reconstruct the latent code from the output.

They perform quantitative and qualitative comparison. For quantitative comparison, they measure diversity using average LPIPS distance, and realism using a real vs. fake AMT test.

Pros:

  1. Their method re-define the image-to-image translation problem in a multi-model way.

  2. Combining multiple objectives for encouraging a bijective mapping between the latent and output spaces could address the problem of mode collapse in image generation.