Skip to content

Decentralised-AI/diffusion-self-distillation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

CVPR 2025

This repository represents the official implementation of the paper titled "Diffusion Self-Distillation for Zero-Shot Customized Image Generation".

This repository is still under construction, many updates will be applied in the near future.

Website Paper HuggingFace Demo HuggingFace Space HuggingFace Model Data

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, there is insufficient high-quality paired data to train such a model directly. We propose Diffusion Self-Distillation, a method for using a pre-trained text-to-image model to generate its own dataset for text-conditioned image-to-image tasks. We first leverage a text-to-image diffusion model's in-context generation ability to create grids of images and curate a large paired dataset with the help of a Visual-Language Model. We then fine-tune the text-to-image model into a text+image-to-image model using the curated paired dataset. We demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on a wide range of identity-preservation generation tasks, without requiring test-time optimization.

teaser

🛠️ Setup

📦 Repository

Clone the repository (requires git):

git clone https://github.com/primecai/diffusion-self-distillation.git
cd diffusion-self-distillation

💻 Dependencies

For the environment, run:

pip install -r requirements.txt

You may need to setup Google Gemini API key to use the prompt enhancement feature, which is optional but highly recommended.

📦 Pretrained Models

Download our pretrained models from Hugging Face or Google Drive and unzip. You should have the following files:

  • transformers
    • config.json
    • diffusion_pytorch_model.safetensors
  • pytorch_lora_weights.safetensors

🏃 Inference

To generate subject-preserving images, simply run:

CUDA_VISIBLE_DEVICES=0 python generate.py \
    --model_path /PATH/TO/transformer \                         # Path to the 'transformer' folder
    --lora_path /PATH/TO/pytorch_lora_weights.safetensors \     # Path to the 'pytorch_lora_weights.safetensors' file
    --image_path /PATH/TO/conditioning_image.png \              # Path to the conditioning image
    --text "this character sitting on a chair" \                # Text prompt
    --output_path output.png \                                  # Path to save the output image
    --guidance 3.5 \                                            # Guidance scale
    --i_guidance 1.0 \                                          # True image guidance scale, set to >1.0 if you want to enhance the image conditioning
    --t_guidance 1.0 \                                          # True text guidance scale, set to >1.0 if you want to enhance the text conditioning
    # --disable_gemini_prompt \                                 # Disable Gemini prompt enhancement, not recommended unless you have a very detailed prompt

Training

TBD

🎓 Citation

Please cite our paper:

@inproceedings{cai2024dsd,
    author={Cai, Shengqu and Chan, Eric Ryan and Zhang, Yunzhi and Guibas, Leonidas and Wu, Jiajun and Wetzstein, Gordon.},
    title={Diffusion Self-Distillation for Zero-Shot Customized Image Generation},
    booktitle={CVPR},
    year={2025}
}       

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%