Skip to content
View yhcao6's full-sized avatar
  • CUHK, MMLab
  • Hong Kong

Block or report yhcao6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,206 95 Updated Jul 22, 2025

[NeurIPS 2025] Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Python 77 1 Updated Sep 18, 2025

[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following

Python 104 Updated Sep 16, 2025

[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++

Python 196 4 Updated Jul 28, 2025

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 12,803 1,332 Updated Sep 26, 2025

PlayStation 4 emulator for Windows, Linux and macOS written in C++

C++ 26,343 1,745 Updated Sep 29, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,954 796 Updated Sep 25, 2025

Data annotation toolbox supports image, audio and video data.

Python 1,373 149 Updated Aug 11, 2025

The Open-Source Data Annotation Platform

TypeScript 927 102 Updated Feb 19, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 44,993 3,727 Updated Sep 29, 2025

Detectron2 Toolbox and Benchmark for V3Det

Python 18 2 Updated Jun 2, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,896 177 Updated May 26, 2025

OpenMMLab Detection Toolbox and Benchmark for V3Det

Python 15 2 Updated Apr 3, 2024
Python 58 4 Updated Jan 12, 2022

OpenMMLab FewShot Learning Toolbox and Benchmark

Python 734 123 Updated Sep 5, 2023

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Python 3,796 610 Updated Sep 19, 2023

Exercise for cpp

C++ 1 Updated Jul 14, 2020

OpenMMLab Detection Toolbox and Benchmark

Python 31,739 9,748 Updated Aug 21, 2024