Daily Reading 20200507

Posted on 2019-05-07 | In Paper Note

Cross-domain Correspondence Learning for Exemplar-based Image Translation

posted on: CVPR2020

In this paper, they proposed exemplar-based image translation CoCosNet which learns the dense cross-domain correspondence and outputs images resembling the fine structures of the exemplar at instance level. The cross-domain correspondence and the image translation are jointly learnt with weak supervision because both tasks facilitate each other. Given an exemplar image, they focus on converting a semantic segmentation mask, an edge map, and pose keypoints to a photo-realistic image.

CoCosNet has two main sub-network: 1) cross-domain correspondence network, which transforms the inputs from different domains to an intermediate domain. 2) translation network, which progressively synthesizes the output based on the warped exemplar. Take mask to image synthesis as an example. They first align the input semantic image and the input reference style image(exemplar) through the encoder, and use the feature to calculate the similarity between each pixel of the two. After obtain the warped exemplar image according to the similarity, it uses the methods of positional normalization and spatially-variant denormalizaiton (similar to AdaIN) to inject the style into the image during the process of generating the final image from fixed noise z.

They apply domain alignment loss and correspondence regularization to guarantee the inputs are aligned to the same domain and the network could learn meaningful correspondence. They also use 1) perceptual loss to minimize the semantic discrepancy, 2) context loss to maintain the same style information (color or texture), 3) feature matching loss to penalize the difference between the translation output and the ground truth for pseudo exemplar pairs, 4) adversarial loss to discriminate the translation output with the real sample of exemplar domain.

In their experiment, they select three datasets, ADE20K corresponds to mask-to-image subtask, CelebA-HQ corresponds to edge-to-image subtask and Deepfashion corresponds to keypoints-to-image subtask. They conduct quantitative and qualitative comparisons. For quantitative comparison, they compare in three aspects. 1) image quality: they use two metrics to measure, FID for measuring the distribution distance and SWD for measuring the statistical distance of low-level patch distribution. 2) semantic consistency: they use relu3_2, relu4_2 and relu5_2 of VGG19 to measure this. 3) color and texture distance: they use relu1_2 and relu2_2 to measure this between the semantically corresponding patches.

They also present two interesting applications of their work. One is image editing. By manipulating the segmentation layout, it’s feasible to output a translation with the same content manipulation. The other is the makeup transfer.

Pros:

It’s a general framework for exemplar-based image translation. The development of image translation follows a clear vein, from paired supervision to unsupervised translation, and to multi-model translation. Since then, image translation has been further expanded toward higher resolution, higher quality, video, small sample adaptation, etc. But there are two main problems. 1) The style of the generated image is unpredictable, and the user cannot specify the style of the specific instance; 2) the outputs of existing methods often have obvious artifacts. And their method effectively solves these two problems.
They module the input image and the exemplar image as in two distinct domains. And translating images from distinct domains is a general expression, which could generalize to many different kinds of inputs.
Their quantitative experiments are comprehensive and the three aspects they consider are very related to their task.

Cons:

As the translation is based on the warped exemplar, it’s essential that their exemplar has the same semantic labels as the mask for mask-to-image task, which limits the applications.
In their ablation study, feature loss shows little improvements. And it seems that L_feat is only applied on pseudo exemplar pairs. So I wonder if it’s necessary to include this loss.

Daily Reading 20200503

Posted on 2019-05-03 | In Paper Note

Where and Who? Automatic Semantic-Aware Person Composition

posted on: WACV2018

In this paper, they proposed an image composition method which focuses on person instance foreground and starts from selecting proper location and size, selecting proper foreground to final compositing. Their method contains three components. 1) using CNN to predict the bounding box, indicating the location and size of the potential person foreground. 2) person segment retrieval, which aims to find a specific segment that semantically matches the local and global context of the background. 3) leveraging alpha matting to make the foreground compatible with the background.

In the first step, person instances on COCO images are removed and inpainted with the background. Then after using Faster RCNN object detector to obtain object detection, the layout image and inpained image are fed to predict the normalized coordinates (x_stand, y_stand, w, h) of the bounding box. In the second step, they build a candidate pool using person instances from COCO, filter out those highly occluded ones and manually segment again. To select proper candidates, they compute cosine distance between the global and local feature representations of target segment and candidates. In the last step, the proper candidate is resized according to given bounding box. The alpha matting method is also applied to smooth the transitions.

To evaluate the box prediction, they measure the histogram correlation between the predicted and target histogram, including location histogram and size histogram. They conduct ablation studies on their special cases quantitatively and qualitatively.

Pros:

Most image composition and image harmonization methods focus on appearance consistency of a user selected foreground and a background image. In this work, they focus mainly on predicting candidate person locations and retrieve person instance from candidate pools.
The way that uses cosine distance between the global and local feature representations of target segment and candidates to select foreground is similar to our method.
It’s a possible solution to few shot problem in other image editing tasks.

Cons:

It pays little attention to color and illumination consistency. Though the foreground person instance is in a proper location, the composites suffer from lighting inconsistency problems.
In the first phase, the results rely on object detector and the following location/size predictor. And the performance of object detector influences the final results significantly.

Daily Reading 20200502

Posted on 2019-05-02 | In Paper Note

Toward Realistic Image Compositing with Adversarial Learning

posted on: CVPR2019

Generating a realistic composite image requires both the adjustment of color and illumination and the geometrical transformations. In this paper, based on this observation, they proposed an image composition network consisting of four parts. The transformation network and refinement network act together as the generative compositing model. In this part, they apply spatial transformer network to predict the geometric correction function and linear brightness and contrast model to predict color correction parameters. The refinement network with encoder-decoder architecture is used to deal with boundary artifacts. Adversarial loss is applied to classify real images and generated composite images. They also add an additional segmentation network to predict the foreground mask. To avoid model from removing the foreground during geometric transformation, they propose a geometric loss to penalize large transformations and too small foreground masks.

They conducted experiments on synthesized 3-D images and COCO images. For synthesized 3-D images, they generate foreground, background and ground-truth composite images. For COCO, they apply a similar procedure to DIH, processing objects on real images to generate training images. The main difference is that, besides color distortion, they also use another auxiliary mask to simulate the boundary mismatch. They use only user studies to compare between baselines.

At the end, they also present to use image manipulation detection model RGB-N to detect different methods, demonstrating the realism of their generated composites.

Pros:

It a comprehensive method in image compositing field. Geometrical and color consistent adjustment is vital for realistic composites, which matches our intuition.
The experiment about image manipulation detection model is also a good way to compare different baselines in image harmonization.

Cons:

The purpose and effectiveness of additional segmentation network is not so clear. And the effectiveness of refinement network is not presented.
Though their model is constructed with many reasonable considerations, their experiments are too simple to test the overall effects.
They use only user studies to compare between baselines.

Daily Reading 20200501

Posted on 2019-05-01 | In Paper Note

Deep Painterly Harmonization

posted on: Computer Graphics Forum 2018

For local painting transfer, either photographic compositing or global painting transfer performs poorly. In this paper, they proposed a two-pass algorithm to transfer the style of a painting to the photo object pasted on it. The first pass aims for coarse harmonization by first performing a rough match of the color and texture properties. They designed a robust algorithm to deal with different painting styles. The second pass starts from the intermediate result in the first pass and focuses on removing spatial outliers to improve visual quality. More specifically, they adopt pretrained VGG to perform the transfers. In the first pass, they treat each layer independently during input-to-style mapping. The coarsely harmonized results are robust because a poor match in a layer can be compensated for by better matches in the other layers. In the second pass, they enforce consistency across layers and in image space during input-to-style mapping and make some constraints to the style loss. They used style and content loss, histogram loss and total variation loss following previous works. Given a foreground mask, the losses are computed and backpropagated only within the mask. As for the level of stylization varies in different styles of painting, the weights between style and histogram weights need to be considered. So they proposed to train a painting detector to predict the weights and assign to different styles of paintings.

Based on the output from the second pass, they also performed a two stage post-processing to deal with medium scale and large scale paintings. The first stage, chrominance denoising, suppresses the highest-frequency color artifacts after converting into CIE-lab color space. The second stage, patch synthesis, filters the base layer and averages the filtered base layer and the detail layer, which contains the high-frequency details.

Pros:

Style transfer are mostly performed on the whole image, whether between photos or paintings. And transferring such global statistics to local area could lead to artifacts due to some irrelevant regions. They design an effective local approach to perform transfer on local region directly.

Cons:

The two pass and the painting detector seems reasonable. But with the output from the second pass, they perform a two-stage postprocessing again, making it doubtful whether the second pass makes sense and how much the quality relies on the postprocessing.

Daily Reading 20200430

Posted on 2019-04-30 | In Paper Note

Deep Image Blending

posted on: arXiv 2020.4

In this paper, they proposed two-stage blending algorithm, which first seamlessly blends the object onto the background, and then further refine the texture and style of the blending region. To apply Poisson blending, they presented a differentiable Poisson gradient loss to gain the equivalent efforts and easily combine with other losses. Their algorithm doesn’t rely on any training data and could generalize to any source and target images, such as real-world images and styled paintings. Experiments shows that their algorithm outperforms baselines including Poisson blending, style transfer and deep image harmonization.

Pros:

It tries to solve the main problems faced in many blending/harmonization methods. For Poisson blending, the style and texture inside the foreground region is not consistent with the background. While in style transfer and image harmonization, the boundary consistency remains a problem. In this two-stage algorithm, it focuses on each problem in each stage and could generate visual pleasant results.
The Poisson gradient loss they present is reasonable variant of Poisson blending, which could be easily applied in deep learning and combined with other losses.
They also use the style loss and the content loss to maintain style consistency and maintain more content information. But different from other content similarity loss [1] or style loss, it’s applied on feature maps, making content close to foreground and style close to background.
It uses pretrained VGG to extract feature maps used for style and content loss, and apply a Laplacian filter to compute gradient blending loss, so it doesn’t rely on any training data.

Cons:

The VGG is pretrained on ImageNet, I wonder whether the feature extracted from a styled painting is reasonable.
As there is no ground truth, it could only perform user studies. No quantity comparison between baselines.

[1] Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks

Daily Reading 20200429

Posted on 2019-04-29 | In Paper Note

High-Resolution Daytime Translation Without Domain Labels

posted on: arXiv 2020

In this paper, they proposed an image-to-image translation model that doesn’t rely on domain labels during training and testing. The model can transfer style from specific image and from a style distribution. They conducted quantitative comparison and qualitative comparison, showing that their model has comparable performance to models requiring labels at least at training time. They also proposed a post-processing model to up-sample the result to a higher resolution. They also demonstrated that their model could be generalized to other domains besides landscapes and also generalized to videos.

Pros:

Convert video translation task to image-to-image translation. Training with still landscape images and could generate video frames.
Compare to previous methods that need labels for training or training and testing, domain labels are not essential here, largely facilitating research.
They propose a style distribution loss, which can constrain the style space to be representative and with large diversity.
The post-processing model is practical for image generation tasks. And the combination of skip connections and adaptive instance normalization seems good.
It’s quite a mixture of components of many state-of-the-art methods.

Cons:

The model is really complicated with 5 parts and 6 kinds of losses.
This task is training and testing specifically on landscape images.

nohup

Posted on 2019-04-15 | In ubuntu

nohup

When the program is deployed, we hope to keep it running in the background.

nohup python test.py &

It will return a pid number.

To look up the output of the process, you can use tail

tail -f nohup.out

OpenCV contrib

Posted on 2019-04-08 | In python

OpenCV Contrib is an extension module of OpenCV and it’s also a superset of general OpenCV.

Uninstall previous OpenCV

If you have previously installed OpenCV, remove it before installation to avoid conflicts.

pip uninstall opencv-python

Install OpenCV Contrib

Use pip to install pre-compiled main and contrib modules (check extra modules listing from OpenCV documentation).

pip install opencv-contrib-python

Import the package

import cv2

Mount by SSHFS

Posted on 2019-04-07 | In ubuntu

SSHFS (SSH Filesystem) is a filesystem client to mount and interact with files located on a remote server or workstation over a normal ssh connection. It’s safer and more convenient than nfs. And it’s helpful especially when we are facing multiple servers and large dataset.

Install

sudo apt-get install sshfs

Mount remote filesystem

1 2	$ mkdir ~/remoteshared $ sshfs <user>@<host>:/remotepath ~/remoteshared

Uninstall

umount ~/remoteshared

Tips

read: Connection reset by peer
1
2
3
xxx@xxx:~$ sudo sshfs yanhd@172.23.65.122:/home/yanhd ./122
read: Connection reset by peer
xxx@xxx:~$
Use the ssh port -p <portnumber>. Check whether the ssh works. If ssh works, sshfs will work.

Image Harmonization

Posted on 2019-04-06 | In Paper Note

Traditional Harmonization Methods

Histogram Matching

CG2Real: Improving the Realism of Computer Generated Images Using a Large Collection of Photographs

use co-segmentation to segment and match regions in a single step .the content of all images is taken into account during co-segmentation and matching regions are automatically produced as a byproduct of [1]

[1] Cosegmentation of Image Pairs by Histogram Matching- Incorporating a Global Constraint into MRFs
mean-shift framework [2]

[2] The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition
1. feature vector: concatenation of the pixel color in L a b* space, the normalized x and y coordinates at p, and a binary indicator vector (i0; . . . ; ik) such that ij is 1 when pixel p is in the jth image and 0 otherwise
2. a combination of joint-bilateral up-sampling and local color transfer

alpha matting: Fuse image by combining their absolute pixel values. The color of the image are linearly interpolated using weights specified by the alpha matte.
gradient domain compositing
image blending
- Poisson blending
image pyramids
- Laplacian pyramids

Cross-domain Correspondence Learning for Exemplar-based Image Translation

Pros:

Cons:

Where and Who? Automatic Semantic-Aware Person Composition

Pros:

Cons:

Toward Realistic Image Compositing with Adversarial Learning

Pros:

Cons:

Deep Painterly Harmonization

Pros:

Cons:

Deep Image Blending

Pros:

Cons:

High-Resolution Daytime Translation Without Domain Labels

Pros:

Cons:

nohup

Uninstall previous OpenCV

Install OpenCV Contrib

Import the package

Install

Mount remote filesystem

Uninstall

Tips

Traditional Harmonization Methods

related topics