Lists (2)
Sort Name ascending (A-Z)
Starred repositories
Select a portrait, click to move the head around (please use your own space / GPU!)
Open Source AI Platform - AI Chat with advanced features that works with every LLM
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
High-resolution models for human tasks.
Simple LaMa Inpainting: An easy-to-use implementation of the LaMa (Large Mask) inpainting model. Remove unwanted objects or fill in missing areas in images with just a few lines of code. Supports b…
Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826 https://x.com/githubprojects/status/1891370506537910724 https://www.threads.net/@githubproject…
Official implementations for paper: Anydoor: zero-shot object-level image customization
Whereabouts Ascertainment for Low-lying Detectable Objects. The SOTA in FOSS AI for drones!
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organiz…
Run the latest LLMs and VLMs across GPU, NPU, and CPU with bindings for Python, Android Java, and iOS Swift, getting up and running quickly with OpenAI gpt-oss, Gemma 3, Qwen3, and more.
A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
VideoSys: An easy and efficient system for video generation
[WACV'25 Oral] Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Cog wrapper for ostris/ai-toolkit + post-finetuning cog inference for flux models
A playbook for systematically maximizing the performance of deep learning models.
A simple, high-quality voice conversion tool focused on ease of use and performance.
the framework/ sdk that lets you build browser controlling agents in 3 lines of code. join chat @ https://discord.gg/umgnyQU2K8
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.