Lists (2)
Sort Name ascending (A-Z)
Starred repositories
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
We write your reusable computer vision tools. 💜
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Open Source AI Platform - AI Chat with advanced features that works with every LLM
Automate browser-based workflows with LLMs and Computer Vision
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
High-resolution models for human tasks.
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
Official implementations for paper: Anydoor: zero-shot object-level image customization
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
A simple, high-quality voice conversion tool focused on ease of use and performance.
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organiz…
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.