Daily Reading 20200514

Multimodal Unsupervised Image-to-Image Translation

posted on: ECCV2018

Image-to-image translation is simplified to the problem as a deterministic one-to-one mapping, which makes it difficult to generate diverse outputs from a given source domain image. To address this problem, they extend their previous work UNIT(one-to-one mapping) to multi-model task by combining to BicycleGAN. The image representation is decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To perform translation, they recombine its content code with a random style code sampled from the style space of the target domain. Based on the content code and style code, they propose bidirectional reconstruction loss, including image reconstruction (after encoding to two codes, decodes the original image) and latent reconstruction (same content code as input and same style code as the style). Besides user study, they also use metrics LPIPS and CIS (a modified version of IS).

Pros:

  1. From 1-to-1 to 1-to-many, the image-to-image translation task is more clear and reasonable.

  2. Their assumption of style code and content code is a proper abstract of image-to-image. It is using such an assumption to solve a more challenging problem.

Cons:

  1. In another paper, they mention that the style code lack many details and is not such beneficial to image-to-image translation?