Stars
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.
Implementation of all RL algorithms in a simpler way
A Step-by-Step Implementation of Google Veo 3 Architecture from Scratch
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
A unified inference and post-training framework for accelerated video generation.
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment [ICCV 2025] - Official implementation
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
The collection of awesome papers on alignment of diffusion models.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The …
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Ring attention implementation with flash attention
This repo powers my blog experiment where ChatGPT manages a real-money micro-cap stock portfolio.