Diffusion Self-Distillation for Zero-Shot Customized Image Generation

CVPR 2025

This repository represents the official implementation of the paper titled "Diffusion Self-Distillation for Zero-Shot Customized Image Generation".

This repository is still under construction, many updates will be applied in the near future.

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.

🛠️ Setup

📦 Repository

Clone the repository (requires git):

git clone https://github.com/primecai/diffusion-self-distillation.git
cd diffusion-self-distillation

💻 Dependencies

For the environment, run:

pip install -r requirements.txt

You may need to setup Google Gemini API key to use the prompt enhancement feature, which is optional but highly recommended.

📦 Pretrained Models

Download our pretrained models from Hugging Face or Google Drive and unzip. You should have the following files:

transformers
- config.json
- diffusion_pytorch_model.safetensors
pytorch_lora_weights.safetensors

🏃 Inference

To generate subject-preserving images, simply run:

CUDA_VISIBLE_DEVICES=0 python generate.py \
    --model_path /PATH/TO/transformer \                         # Path to the 'transformer' folder
    --lora_path /PATH/TO/pytorch_lora_weights.safetensors \     # Path to the 'pytorch_lora_weights.safetensors' file
    --image_path /PATH/TO/conditioning_image.png \              # Path to the conditioning image
    --text "this character sitting on a chair" \                # Text prompt
    --output_path output.png \                                  # Path to save the output image
    --guidance 3.5 \                                            # Guidance scale
    --i_guidance 1.0 \                                          # True image guidance scale, set to >1.0 if you want to enhance the image conditioning
    --t_guidance 1.0 \                                          # True text guidance scale, set to >1.0 if you want to enhance the text conditioning
    # --disable_gemini_prompt \                                 # Disable Gemini prompt enhancement, not recommended unless you have a very detailed prompt

Training

TBD

🎓 Citation

Please cite our paper:

@inproceedings{cai2024dsd,
    author={Cai, Shengqu and Chan, Eric Ryan and Zhang, Yunzhi and Guibas, Leonidas and Wu, Jiajun and Wetzstein, Gordon.},
    title={Diffusion Self-Distillation for Zero-Shot Customized Image Generation},
    booktitle={CVPR},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
README.md		README.md
generate.py		generate.py
pipeline.py		pipeline.py
recaption.py		recaption.py
requirements.txt		requirements.txt
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

🛠️ Setup

📦 Repository

💻 Dependencies

📦 Pretrained Models

🏃 Inference

Training

🎓 Citation

About

Releases

Packages

Languages

Decentralised-AI/diffusion-self-distillation

Folders and files

Latest commit

History

Repository files navigation

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

🛠️ Setup

📦 Repository

💻 Dependencies

📦 Pretrained Models

🏃 Inference

Training

🎓 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages