Skip to content
View qubvel's full-sized avatar

Block or report qubvel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fast, Flexible and Portable Structured Generation

C++ 1,282 87 Updated Sep 30, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,128 78 Updated Sep 22, 2025

Foundational Model for Speech Recognition Tasks

Python 296 40 Updated Jul 30, 2025

A python library for self-supervised learning on images.

Python 3,551 311 Updated Sep 25, 2025

All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

Python 901 33 Updated Sep 30, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 7,419 451 Updated Sep 24, 2025

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 683 37 Updated Nov 19, 2024

FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.

Python 284 15 Updated Aug 7, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 4,795 473 Updated Sep 8, 2025

Sampling profiler for Python programs

Rust 14,360 480 Updated Sep 8, 2025

D-FINE: SoTA Object Detection model custom training/exporting/inferencing pipeline from scratch

Python 60 8 Updated Sep 24, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,442 615 Updated Sep 30, 2025

Inworld TTS

Python 499 41 Updated Sep 19, 2025

Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.

Python 695 116 Updated Aug 10, 2025
Python 5,974 473 Updated Aug 29, 2025

A high-performance inference engine for AI models

Rust 1,324 33 Updated Sep 29, 2025

GenAI Processors is a lightweight Python library that enables efficient, parallel content processing.

Python 1,974 182 Updated Aug 31, 2025

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 2,243 209 Updated Aug 28, 2025

[CVPR2025] KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Python 62 8 Updated Apr 8, 2025

Official repository for the paper "CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models"

Python 275 22 Updated Sep 28, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,080 389 Updated Sep 10, 2025

A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms

Python 2,140 208 Updated Sep 29, 2025
Python 32 Updated Jul 25, 2025

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Python 502 43 Updated May 19, 2025

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 736 27 Updated Apr 20, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 11,102 1,115 Updated Sep 1, 2025

OpenAI Frontier Evals

Python 890 103 Updated Sep 24, 2025

[CVPR 2025] Prompt Depth Anything

Python 923 57 Updated Sep 2, 2025

[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).

Jupyter Notebook 406 34 Updated Sep 25, 2025

Falcon: A Remote Sensing Vision-Language Foundation Model

Python 328 28 Updated Apr 10, 2025
Next