Skip to content

Commit 6a514b6

Browse files
Update README.md
1 parent ab334f7 commit 6a514b6

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

unit2/README.md

+14-3
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,30 @@ Fine-tuning typically works best if the new data somewhat resembles the base mod
2424

2525
## Guidance
2626

27+
![guidance example image](guidance_eg.png)
28+
2729
Unconditional models don't give much control over what is generated. We can train a conditional model (more on that in the next section) that takes additional inputs to help steer the generation process, but what if we already have a trained unconditional model we'd like to use? Enter guidance, a process by which the model predictions at each step in the generation process are evaluated against some guidance function and modified such that the final generated image is more to our liking.
2830

29-
This guidance function can be almost anything, making this a powerful technique! In the notebook we build up from a simple example to one utilizing a powerful pre-trained model called CLIP which lets us guide generation based on a text description.
31+
This guidance function can be almost anything, making this a powerful technique! In the notebook we build up from a simple example (controlling the color, as illustrated in the example output above) to one utilizing a powerful pre-trained model called CLIP which lets us guide generation based on a text description.
3032

3133
## Conditioning
3234

33-
Guidance is a great way to get some additional mileage from an unconditional diffusion model, but if we have additional information (such as a class label or an image caption) available during training then we can also feed this to the model for it to use as it makes its predictions. In doing so, we create a **conditional** model, which we can control at inference time by controlling what is fed in as conditioning. The notebook shows an example of a class-conditioned model which learns to generate images according to a class label. TODO note about timestep conditioning?
35+
![conditioning example](conditional_digit_generation.png)
36+
37+
Guidance is a great way to get some additional mileage from an unconditional diffusion model, but if we have additional information (such as a class label or an image caption) available during training then we can also feed this to the model for it to use as it makes its predictions. In doing so, we create a **conditional** model, which we can control at inference time by controlling what is fed in as conditioning. The notebook shows an example of a class-conditioned model which learns to generate images according to a class label.
38+
39+
There are a number of ways to pass in this conditioning information, such as
40+
- Feeding it in as additional channels in the input to the UNet. This is often used when the conditioning information is the same shape as the image, such as a segmentation mask, a depth map, a blurry version of the image (in the case of a restoration/superresolution model). It does work for other types of conditioning too. For example, in the notebook the class label is mapped to an embedding and then expanded to be the same width and height as the input image so that it can be fed in as additional channels.
41+
- Creating an embedding and then projecting it down to a size that matches the number of channels at the output of one or more internal layers of the unet, and then adding it to those outputs. This is how the timestep conditioning is handled, for example. The output of each resnet block has a projected timestep embedding added to it. This is useful when you have a vector such as a CLIP image embedding as your conditioning information. Another notable example is the 'Image Variations' version of Stable Diffusion [TODO linik] which uses this same trick.
42+
- Adding cross-attention layers that can 'attend' to a sequence passed in as conditioning. This is most useful when the conditioning is in the form of some text - the text is mapped to a sequence of embeddings using a transformer model, and then cross-attention layers in the unet are used to incorporate this information into the denoising path. We'll see this in action in Unit 3 as we examine how Stable Diffusion handles text conditioning.
43+
44+
3445

3546
## Hands-On Notebook
3647

3748
At this point, you know enough to get started with the accompanying notebooks!
3849

39-
TODO link table and descriptions
50+
Notebook 1
4051

4152
## Project Time
4253

0 commit comments

Comments
 (0)