Stars
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Documentation for Google's Gen AI site - including the Gemini API and Gemma
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Code for acl2017 paper "An unsupervised neural attention model for aspect extraction"
Unsupervised Aspect Extraction for restaurant review dataset using mistral-7b model.
Lumina-T2X is a unified framework for Text to Any Modality Generation
The ultimate training toolkit for finetuning diffusion models
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Official inference repo for FLUX.1 models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
2025最新悄咪咪收集的10000+个Telegram群合集,附全网最有趣好用的机器人BOT🤖【dianbaodaohang.com】
LAVIS - A One-stop Library for Language-Vision Intelligence
This is an official implementation for "Video Swin Transformers".
Video Swin Transformer - PyTorch