The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 17,092 2,102 Updated Dec 25, 2024

facebookresearch / sapiens

High-resolution models for human tasks.

Python 5,161 299 Updated Nov 18, 2024

okaris / simple-lama

Simple LaMa Inpainting: An easy-to-use implementation of the LaMa (Large Mask) inpainting model. Remove unwanted objects or fill in missing areas in images with just a few lines of code. Supports b…

Python 22 1 Updated Nov 5, 2024

SanshruthR / CCTV_YOLO

Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826 https://x.com/githubprojects/status/1891370506537910724 https://www.threads.net/@githubproject…

Python 575 58 Updated Mar 20, 2025

USTC3DV / PortraitGen-code

Python 188 8 Updated Dec 23, 2024

ali-vilab / AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization

Python 4,188 372 Updated Apr 8, 2024

stephansturges / WALDO

Whereabouts Ascertainment for Low-lying Detectable Objects. The SOTA in FOSS AI for drones!

Python 1,651 196 Updated Jan 7, 2025

illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 2,231 199 Updated Sep 29, 2025

QiuYannnn / Local-File-Organizer

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organiz…

Python 2,569 233 Updated Oct 21, 2024

NexaAI / nexa-sdk

Run the latest LLMs and VLMs across GPU, NPU, and CPU with bindings for Python, Android Java, and iOS Swift, getting up and running quickly with OpenAI gpt-oss, Gemma 3, Qwen3, and more.

Go 5,057 680 Updated Sep 30, 2025

anysphere / priompt

Prompt design using JSX.

TypeScript 2,700 153 Updated Sep 25, 2025

mufeedvh / code2prompt

A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.

Rust 6,569 371 Updated Sep 30, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,954 796 Updated Sep 25, 2025

pinokiofactory / cogstudio

Python 375 32 Updated Mar 16, 2025

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 11,967 1,189 Updated Sep 7, 2025

NUS-HPC-AI-Lab / VideoSys

VideoSys: An easy and efficient system for video generation

Python 2,003 133 Updated Aug 27, 2025

VisualComputingInstitute / diffusion-e2e-ft

[WACV'25 Oral] Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Python 472 21 Updated Dec 16, 2024

replicate / flux-fine-tuner

Cog wrapper for ostris/ai-toolkit + post-finetuning cog inference for flux models

Python 405 71 Updated Jun 4, 2025

google-research / tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

29,208 2,388 Updated Jun 18, 2024

voideditor / void

TypeScript 27,135 2,043 Updated Aug 7, 2025

IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

Python 2,626 435 Updated Sep 29, 2025

sentient-engineering / sentient

the framework/ sdk that lets you build browser controlling agents in 3 lines of code. join chat @ https://discord.gg/umgnyQU2K8

Python 561 71 Updated Oct 10, 2024

AnswerDotAI / rerankers

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Python 1,543 95 Updated May 28, 2025

ToTheBeginning / PuLID

[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Python 3,480 258 Updated Jul 31, 2025

fishaudio / fish-speech

SOTA Open Source TTS

Python 23,040 1,901 Updated Sep 23, 2025

huggingface / segment-anything-2

Forked from facebookresearch/sam2

Jupyter Notebook 84 10 Updated Oct 10, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,072 216 Updated May 19, 2025