Skip to content
View loveunk's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report loveunk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 6,684 449 Updated May 5, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,223 1,504 Updated Apr 24, 2025

s1: Simple test-time scaling

Python 6,561 764 Updated Jun 25, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,149 891 Updated Sep 29, 2025

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 15,216 2,190 Updated Sep 3, 2025

机器学习、深度学习的学习路径及知识总结

Jupyter Notebook 2,137 360 Updated Jan 26, 2025
Python 4,270 405 Updated Sep 14, 2025

An Extensible Deep Learning Library

Python 2,259 377 Updated Sep 16, 2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,225 79 Updated Jan 23, 2025

Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)

Python 1,923 135 Updated Jul 17, 2025

HPT - Open Multimodal LLMs from HyperGAI

Python 315 22 Updated Jun 6, 2024

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 629 69 Updated Dec 10, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,570 298 Updated Aug 6, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,649 2,634 Updated Aug 12, 2024

An Autonomous LLM Agent for Complex Task Solving

Python 8,450 895 Updated Aug 12, 2024

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Python 27,471 3,457 Updated Sep 23, 2025

🚀 Power Your World with AI - Explore, Extend, Empower.

JavaScript 7,988 616 Updated Sep 15, 2025

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 3,212 319 Updated Feb 27, 2025

A family of lightweight multimodal models.

Python 1,044 77 Updated Nov 18, 2024

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,374 521 Updated Sep 11, 2025

Official Code for DragGAN (SIGGRAPH 2023)

Python 35,994 3,445 Updated May 18, 2024

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Python 7,756 591 Updated Jul 17, 2024

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

2,781 123 Updated Aug 28, 2025

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,364 239 Updated Dec 3, 2024
Python 3,891 254 Updated Mar 15, 2024

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 624 44 Updated Dec 30, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,269 465 Updated Aug 7, 2024

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

1,337 58 Updated Mar 14, 2024

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Python 3,965 330 Updated Jun 12, 2024

Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".

Python 691 65 Updated Sep 19, 2024
Next