Lists (1)
Sort Name ascending (A-Z)
Stars
Fast, Flexible and Portable Structured Generation
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Foundational Model for Speech Recognition Tasks
A python library for self-supervised learning on images.
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
Reference PyTorch implementation and models for DINOv3
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
Multilingual Document Layout Parsing in a Single Vision-Language Model
D-FINE: SoTA Object Detection model custom training/exporting/inferencing pipeline from scratch
Supercharge Your LLM with the Fastest KV Cache Layer
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
GenAI Processors is a lightweight Python library that enables efficient, parallel content processing.
PyTorch code and models for VJEPA2 self-supervised learning from video.
[CVPR2025] KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Official repository for the paper "CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models"
The simplest, fastest repository for training/finetuning small-sized VLMs.
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
Falcon: A Remote Sensing Vision-Language Foundation Model