Daily Reading 20200509

Deep Exemplar-based Colorization

posted on: TOG2018

In this paper, they proposed the first deep learning approach for exemplar-based local colorization, which could directly select, propagate, and predict colors from an aligned reference for a gray-scale image.

Their network contains two dub-nets. 1) Similarity sub-net, a preprocessing step which measures the semantic similarity between the reference to the target. Feeding two luminance channels of target and reference to gray VGG-19, they compute cosine distance between the feature maps and output a similarity map. 2) Colorization sub-net, colorization for similar/dissimilar patch/pixel pairs. Taking gray target, aligned reference with chrominance channels and the similarity map, it predicts the ab channels of target image. It contains two different branches with two different loss functions to predict plausible colorization in conditions with and without reliable reference. Chrominance loss in chrominance branch computes smooth L1 distance at each pixel to selectively propagate the correct reference colors. Perceptual loss in perceptual branch minimize the semantic difference between predicted one and target image when the reference is not reliable.

To recommend good reference to the user, they also propose an image retrieve algorithm to find a proper reference for a given target. They apply both global ranking (cosine distance between feature maps from the first fully connected layer) and local ranking (cosine distance between feature maps from relu5_2 and correlation coefficient between the illuminance histograms of two local windows) to select proper candidates.

Pros:

  1. Instead of coloring the image with user strike or by learning from large-scale data, the methods they proposed makes a good balance between controllability from interaction and robustness from learning.

  2. For references with proper semantic correspondence, it could propagate correct colors to the target. For those improper reference, they still could generate a plausible result by predict the dominant colors. So it loosen the constrain on references.