-
CUHK, MMLab
- Hong Kong
Stars
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
[NeurIPS 2025] Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following
[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
PlayStation 4 emulator for Windows, Linux and macOS written in C++
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Data annotation toolbox supports image, audio and video data.
The Open-Source Data Annotation Platform
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
V3Det / mmdetection-V3Det
Forked from open-mmlab/mmdetectionOpenMMLab Detection Toolbox and Benchmark for V3Det
OpenMMLab FewShot Learning Toolbox and Benchmark
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
OpenMMLab Detection Toolbox and Benchmark