Design with Elene - First thoughts about DCT-Net

source

Domain-Calibrated Translation for a portrait stylization, or DCT-Net for short is a novel image translation architecture for few-shot portrait stylization. It’s uniqueness lies in its ability to handle complex scenes, not only portrait but full body shots as well. The new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes. In the best case scenario it shouldn’t simplify occlusions, accessories and etc. As the researchers point out,

The main goal was to handle the challenge of few-shot learning based style transfer by adopting the key idea of “calibration first, translation later.”- this strategy makes it easier to learn stable cross-domain translation and generate high-fidelity results.

The proposed DCT-Net consists of 3 modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. (1)

Framework

We’ve seen a couple of tools running on models like StyleGAN and BigGAN, for example, Artbreeder (Gained a lot of popularity lately), but the main achievement of DCT-Net is not only that it can translate arbitrary real faces to artistic portraits in corresponding styles, (e.g., 3D-cartoon, anime, and hand-drawn), but it also can properly process full-body images with adaptive deformations (e.g., exaggerated facial features and faithful body textures).

Currently, Gradio Demo generates only in Anime style, I tried to generate head-size, mid-body, full-body, side perspective, and different facial expressions. here are the generations.

1. Half body. Strangely it colorized the head.

model: Domink Sadoch

2. Head only. Didn’t simplify the environment, facial expression exaggeration

3. Side perspective. Loss of facial perception, not convincing enough.

4. Facial expressions. Loss of facial perception, bias.

Photo by Joseph Gonzalez on Unsplash

5. Full body shot. Didn’t simplify the environment. Arbitrary images’ good execution with complementing tones and adaptive deformations

Photo by Dom Hill on Unsplash

...

I’d love to try different styles as well whenever it’s available. As for the idea itself, calibrating the biased target domain first and learning a fine-grained translation after that, is, no doubt, innovative and interesting. Besides this, what made me excited was. the ability to generate full-body shots, which is quite rare. As someone who takes a lot of time getting references from real-life people and the environment in general, this tool can be a time saver and will come in handy.