Stars
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Train transformer language models with reinforcement learning.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
[NeurIPS2024] Cross-video Identity Correlating for Person Re-identification Pre-training
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
Official inference framework for 1-bit LLMs
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
A generative speech model for daily dialogue.
llama3 implementation one matrix multiplication at a time
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: …
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The simplest, fastest repository for training/finetuning medium-sized GPTs.