[2019 NeurIPS Oral] Generative Modeling by Estimating Gradients of the Data Distribution #126

Jasonlee1995 · 2023-04-22T13:35:35Z

Score-based generative model을 제안한 첫 논문

Score matching network를 학습하고, Langevin dynamics를 이용하여 sample generation하는게 naive approach

하지만 naive approach를 적용하면 문제가 있어, Noise Conditional Score Networks (NCSN) 방법과 sampling with annealed Langevin dynamics 방법을 이용

CIFAR-10 dataset에서 Inception score 기준으로 sota를 달성했으며, 학습된 score network를 이용하여 image inpainting도 가능

중요하다고 생각되는 부분만 간단히 요약

1. Introduction

background

Generative model로 likelihood-based models와 GAN이 주목할만한 성과를 거두었으나, 다음과 같은 intrinsic limitations이 있음

Likelihood-based models
specialized architecture (ex. autoregressive model, flow model)
surrogate loss (ex. VAE, energy-based model)
GAN
unstable training due to adversarial training procedure
GAN objective is not suitable for evaluate or comparing different GAN models

이외에도 다른 generative model들이 존재하나, low-dimensional data에서만 잘 작동

score-based generative model + 2 main challenges

저자들이 제안한 score-based generative model의 naive approach는 다음과 같음

Naive approach : score matching + sample generation using Langevin dynamics
score matching : input data point의 log-density gradient인 score을 estimate하도록 neural net을 학습
sample generation using Langevin dynamics : estimated score을 이용하여 random initial sample을 이동하여 high density region으로 이동

하지만 naive approach를 적용하는데 있어, 2가지 main challenge가 있음

Data distribution이 low dimensional manifold에 있는 경우
score이 ambient space에서 정의되지 않아, score matching으로 consistent score estimator가 될 수 없음
(참고로 ambient space에서 score이 정의되지 않는다는 말은, data는 low dimensional manifold에 존재하지만 data 자체는 high dimension이기에 high dimension의 많은 subspace에서 score가 0이 되어 제대로 작동을 안하게 될 것이라는 의미)
Low density region에서 training data가 부족한 경우
low density region에서 score estimation이 inaccurate하기에, Langevin dynamics sampling 속도가 느려짐
(generation할 때 data distribution의 low-density region에서 initialize된 경우, inaccurate score로 인해 더 많은 step을 진행해야함)

위 2가지 main challenge의 공통점은, data의 high dimension으로 인해 score을 잘 근사하지 못하는 것임

proposed method - NCSN

저자들은 data에 various magnitude의 random Gaussian noise를 perturb하여 문제를 해결

Random noise로 perturb함에 따라, distribution이 low dimensional manifold에만 존재하지 않게 됨

또한 large noise level의 random noise로 perturb하면, original unperturbed data distribution의 low density region에서 score의 accuracy를 향상시킬 수 있음

저자들은 noise level에 condition을 준 single neural network로 학습하며, 이를 Noise Conditional Score Networks (NCSN)라고 함

Score을 정확하게 근사했으니, sample generation만 신경써주면 됨

먼저 random initial sample에서 시작하여, highest noise level score을 이용하여 sample을 움직이고, 그 다음으로 2번째로 큰 highest noise level score을 이용해서, ..., 맨 마지막으로는 original distribution과 구분할 수 없을 정도의 small noise level score sample을 움직이는 방식으로 sample generation

즉, 점진적으로 noise level을 anneal down한 score을 이용하여 움직인다고 이해하면 되며, 저자들은 이를 sampling with annealed Langevin dynamics라고 함

해당 방법의 장점은 다음과 같음

Flexible model architecture 사용 가능
학습하는 과정에서 MCMC sampling을 사용하지 않음 (다만 sample generation을 할 때에는 MCMC sampling을 사용)
학습하는 과정에서 adversarial training을 사용하지 않음
학습하는 과정에서 approximation을 사용하지 않음 (ex. VAE에서의 ELBO)
Objective를 이용하여 모델간 비교가 가능
기존의 likelihood-based models나 GANs와 comparable한 sample generation 가능
어떠한 추가 학습 없이, image inpainting을 수행할 수 있음

2. Score-based generative modeling

score network를 학습 → Langevin dynamics + estimated score을 이용하여 sample generation

preliminary

score matching

Score matching은 원래 있던 개념으로, non-normalized statistical model을 학습하기 위해 사용되었음

Score matching 개념을 score estimation으로 repurpose하여, $p(\mathbf{x})$를 estimate하지 않고 $\nabla_{\mathbf{x}} \mathrm{log} p(\mathbf{x})$를 estimate하도록 score network $s_{\theta}(\mathbf{x})$를 학습

학습은 score network가 predict한 score와 data distribution의 score의 차이를 minimize하면 됨

하지만 data distribution의 score을 구할 수 없기에, assumption + 수식 유도를 통해 objective function을 다음과 같이 구할 수 있음

objective function proof

denoising score matching

High dimensional data에서 $\mathrm{tr}(\nabla_{\mathbf{x}} s_{\theta}(\mathbf{x}))$를 구하는 것은 computation이 큼

논문에서 $\mathrm{tr}(\nabla_{\mathbf{x}} s_{\theta}(\mathbf{x}))$를 구하지 않아도 되는 popular large scale score matching methods 2가지를 소개함

→ denoising score matching, sliced score matching

저자들은 sliced score matching의 computation이 더 많이 들어 denoising score matching을 사용했다고 함

sampling with Langevin dynamics

3. Challenges of score-based generative modeling

3.1. The manifold hypothesis

Manifold hypothesis : data in the real world tend to concentrate on low dimensional manifolds embedded in a high dimensional space (a.k.a. the ambient space)

위의 objective function에서, consistent score estimator가 되기 위해서는 data distribution의 whole space를 support해야한다는 조건이 있었음

Manifold hypothesis를 따른다는 말은 data distribution의 whole space를 support하는 것이 아니기에, consistent score estimator가 될 수 없음

Figure 1을 통해, perturb를 하지 않으면 loss가 fluctuate한다는 것을 확인할 수 있음

3.2. Low data density regions

Low density regions에서의 data scarcity는 score estimation with score matching, MCMC sampling with Langevin dynamics 모두 어렵게 만듬

3.2.1. Inaccurate score estimation with score matching

Low density region에서 data sample의 부족으로 인해, score function을 정확하게 estimate할 충분한 evidence가 없음

위의 Figure을 통해 score estimation은 data density가 높은 mode 근처에서만 reliable하다는 것을 확인하 수 있음

3.2.2. Slow mixing of Langevin dynamics

Low density에서 score function이 정확하지 않기에, reasonable time 내에 true distribution으로 converge하지 않을 수 있음

Figure 3을 통해, Langevin dynamics을 통해 구한 sample들이 incorrect relative density를 가진다는 것을 확인할 수 있음

4. Noise Conditional Score Networks: learning and inference

overall concept

결국 위 2가지 문제 모두 data에 random Gaussian noise를 perturb함으로 해결할 수 있음

Gaussian noise distribution의 support는 whole space이기에, manifold hypothesis를 해결할 수 있음

Large Gaussian noise로 perturb하면 data가 부족한 low density regions를 filling하는 역할을 하며, 이는 perturbed data distribution의 score을 정확하게 estimate할 수 있게 됨

하지만 large Gaussian noise로 perturb하면, perturbed data distribution와 original unperturbed data distribution간의 차이가 커지게 됨

저자들이 제안한 방식은 다음과 같음

Multiple noise level로 perturb된 noise-perturbed distribution의 score을 estimate하도록 학습 (with single conditional score network)
Sample을 generation할 때, 처음에는 large noise score을 사용하다 점진적으로 low noise score을 사용하는, 즉 noise level을 anneal down하여 original과 구분하기 힘든 data를 generation

정리하면 다양한 noise level로 perturb된 distribution의 score을 estimate하도록 학습하고, 이를 이용하여 generation하면 된다임

Implementation을 보면 10단계의 noise level을 사용하며, 각 noise마다 100번의 Langevin dynamics로 image update

4.1. Noise Conditional Score Networks

Perturbed data distribution을 적분 식으로 표현한 것이 잘 이해되지 않을 수 있는데, kernel density estimation을 생각해보면 됨
(kernel density estimation wiki에서의 figure을 보면 쉽게 이해 가능)

Architecture로는 segmentation에서 제안된 모델인 RefineNet에서 약간 변형하여 사용
(batch normalization대신 conditional instance normalization에 조금 수정을 가한 CondInstanceNorm++을 사용했다는 점 등이 있는데, 이는 논문의 Appendix A 참고)

4.2. Learning NCSNs via score matching

4.3. NCSN inference via annealed Langevin dynamics

We start annealed Langevin dynamics by initializing the samples from some fixed prior distribution
(ex. uniform noise)

4.4. Image inpainting

학습된 모델에 어떠한 추가 학습 없이 inpainting 가능
unmasked 부분은 update 안하고, masked 부분만 업데이트하는 것이 핵심

5. Experiments

generation - Table 1, Figure 4, Figure 5

inpainting - Figure 6

check memorization - Figure 9, 10

training dataset에서 생성된 image의 nearest neighbor을 visualize → model이 memorize한게 아님
(L2 distance in pixel space, L2 distance in the feature space of an ImageNet pre-trained Inception V3)

The text was updated successfully, but these errors were encountered:

Jasonlee1995 added Generative Generative Modeling Vision Related with Computer Vision tasks labels Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2019 NeurIPS Oral] Generative Modeling by Estimating Gradients of the Data Distribution #126

[2019 NeurIPS Oral] Generative Modeling by Estimating Gradients of the Data Distribution #126

Jasonlee1995 commented Apr 22, 2023 •

edited

Loading

3.2.1. Inaccurate score estimation with score matching

3.2.2. Slow mixing of Langevin dynamics

[2019 NeurIPS Oral] Generative Modeling by Estimating Gradients of the Data Distribution #126

[2019 NeurIPS Oral] Generative Modeling by Estimating Gradients of the Data Distribution #126

Comments

Jasonlee1995 commented Apr 22, 2023 • edited Loading

1. Introduction

2. Score-based generative modeling

3. Challenges of score-based generative modeling

3.2.1. Inaccurate score estimation with score matching

3.2.2. Slow mixing of Langevin dynamics

4. Noise Conditional Score Networks: learning and inference

5. Experiments

Jasonlee1995 commented Apr 22, 2023 •

edited

Loading