Multimodal Unsupervised Image-to-Image Translation
posted on: CVPR2019
This work is the first exemplar-based video colorization algorithm which is like ‘Deep Exemplar-based Colorization’ except that it is extended to video. It adopts a recurrent structure and also contains two major sub-nets: correspondence subnet and colorization subnet. Compared to image colorization, they must consider temporal consistency besides the color and semantic correspondence. And that’s why they adopt a recurrent structure and take the result of previous frame as input. The loss function is also similar. Besides the perceptual loss and smoothness loss used in [1], they also introduced contextual loss and used adversarial loss and the temporal consistency loss. The contextual loss measures the local feature similarity between the output frame and the reference in a forward matching way. In addition, to degenerate to the common case that the reference comes from the video frames, they add L1 loss to make the output close to the ground truth. They compare to image colorization, auto video colorization and color propagation methods for quantitatively comparison. And they conducted user study for qualitatively comparison.