[2017 ICLR] A Learned Representation For Artistic Style #198

Jasonlee1995 · 2024-05-09T12:43:43Z

Style transfer task : render an image in the style of another one
(source image의 content는 preserve하되, style을 입히는 것)

기존 연구들은 1개의 single style마다 1개의 style transfer network를 학습해야 했음

해당 논문은 1개의 style transfer network로 여러 가지 style로 style transfer할 수 있음을 보임

논문의 main contribution 3가지는 다음과 같음

conditional instance normalization을 제안함
conditional instance normalization constitutes a simple, efficient and scalable modification of style transfer networks that allows them to model multiple styles at the same time
conditional instance normalization은 simple하면서 flexible함
despite its simplicity, the method is flexible enough to capture very different styles while having very little impact on training time and final performance of the trained network
learned style representation간 combine해서 arbitrarily artistic styles를 생성할 수 있음
learned representation of style is useful in arbitrarily combining artistic styles

중요하다고 생각되는 부분만 간단히 요약

1. Style transfer with deep networks

Previous works

Style transfer : find a pastiche image p whose content is similar to that of a content image c but whose style is similar to that of a style image s

그렇다면 content가 similar한지, style이 similar한지를 어떻게 측정할지?에 대한 문제가 남게 됨

Empirical observation 1 : high-level features in classifiers correspond to higher levels of abstractions
Empirical observation 2 : artistic style of a painting may be interpreted as a visual texture

이를 이용하여, artistic style process에 대한 neural algorithm을 다음과 같이 설정 가능

two images are similar in content if their high-level features are close in Euclidean distance
two images are similar in style if their low-level features share the same statistics
(difference between the features' Gram matrices has a small Frobenius norm)

기존 연구들도 이를 따라가며, 크게 2 갈래로 나뉨 → direct image optimization, style transfer network

1. direct image optimization

source image와 content loss + style image와 style loss가 minimize하도록 initialized image를 update
단점 : 생성 속도가 느림

2. style transfer network

style transfer network가 content image를 받으면, forward process를 통해 pastiche image를 생성
(input : content image, output : pastiche image)
단점 : style transfer network is tied to one specific painting style
(N개의 style을 만들기 위해서, N개의 style transfer network가 필요함)

Instance normalization

instance normalization = contrast normalization
(N, C, H, W)가 있을 때 H, W인 spatial axes에 대해 normalize
즉, 각 이미지마다, 각 channel마다의 (H, W)를 normalize

1.1. N-styles feedforward style transfer networks

기존의 style transfer network의 단점을 극복하는 가장 간단한 방법은, style도 input으로 넣어준 single conditional style transfer network T(c, s)를 학습하면 됨

따라서, 학습 방법은 다음과 같음

content image c와 style image s가 주어졌을 때, style transfer network T(c, s)를 통해 pastiche image p 생성
pastiche image p와 content image c 간의 content loss 계산 (pre-trained VGG-16)
pastiche image p와 style image s간의 style loss 계산 (pre-trained VGG-16)
content loss + style loss를 minimize하도록 T(c, s) update

여기서 나올 수 있는 질문은, conditioning을 어떻게 할 것인가?임

저자들은 다양하게 실험한 결과, normalization 이후의 scaling, shifting parameters만 specialize해줘도 된다는 것을 발견함
(CNN의 모든 weight parameter는 동일, style마다 normalization parameter만 다름)

저자들은 해당 approach를 conditional instance normalization이라 명칭

conditional instance normalization details

instance normalization한 다음, 각 style별로 scaling + shifting해주는 parameter가 있다고 생각하면 됨
(scaling, shifting parameter는 num_channel dimension을 가지는 vector)

2. Experimental results

2.1. Methodology

2 key details

zero-padding is replaced with mirror-padding
to avoid border patterns caused by zero-padding
transposed convolutions are replaced with nearest-neighbor upsampling + convolution
to avoid checkerboard patterning caused by transposed convolutions

2.2. Training a single network on N styles produces stylizations comparable to independently-trained models

Figure 4
the model captures different color palettes and textures

Figure 5 - left, center column
10-styles network converges as quickly as the single-style networks in terms of style loss, but lags slightly behind in terms of content loss

Figure 5 - right column
results produced by 10-styles network and single-style network
both results are qualitatively similar

2.3. The N-styles model is flexible enough to capture very different styles

Figure 1 (a)
model appears to be capable of modeling all 32 styles in spite of the tremendous variation in color palette and the spatial scale of the painting styles

2.4. The trained network generalizes across painting styles

결론 : 새로운 스타일을 추가하고 싶을 경우, 기존 network는 freeze하고 style parameter만 학습하면 됨

Figure 6 - left
fine-tuning is much faster than training a new network from scratch

Figure 6 - right
even after fewer parameter updates, the fine-tuned model produces comparable results

2.5. The trained network can arbitrarily combine painting styles

서로 다른 style A, B의 style parameter를 linear interpolation해서 generation
결론 : learned style representation을 combine하여 임의의 artistic style을 생성할 수 있음

Figure 7 - left column
style A에서 style B로 smooth transition이 일어남

Figure 7 - right column
generated image에 대해 style loss를 측정한 결과, smooth fading out이 일어남

The text was updated successfully, but these errors were encountered:

Jasonlee1995 added Generative Generative Modeling Vision Related with Computer Vision tasks labels May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2017 ICLR] A Learned Representation For Artistic Style #198

[2017 ICLR] A Learned Representation For Artistic Style #198

Jasonlee1995 commented May 9, 2024 •

edited

Loading

[2017 ICLR] A Learned Representation For Artistic Style #198

[2017 ICLR] A Learned Representation For Artistic Style #198

Comments

Jasonlee1995 commented May 9, 2024 • edited Loading

1. Style transfer with deep networks

1.1. N-styles feedforward style transfer networks

2. Experimental results

Jasonlee1995 commented May 9, 2024 •

edited

Loading