Lists (22)
Sort Name ascending (A-Z)
AntiSpoofing
BackboneModel
Crawl Data
DataEngineer
Denoiser
Dev
Feature
G2P
Interview
LLM
NTU
React
Singing Voice
Sound and Signal Processing
Speaker Diarization
SpeakerVerification
Speech Synthesis
Collection of Text-to-speech RepoSpeechEmotionRecognition
SpeechSOTA
TTS
VectorDatabase
VoiceCloning
Stars
A lightweight, powerful framework for multi-agent workflows
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
Source & evaluation code for ICAMCS 2024 paper "Emotional Vietnamese Speech-Based Depression Diagnosis Using Dynamic Attention Mechanism"
🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org
AudioBench: A Universal Benchmark for Audio Large Language Models
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
A high-throughput and memory-efficient inference and serving engine for LLMs
End-to-end stack for WebRTC. SFU media server and SDKs.
Examples and guides for using the OpenAI API
A repo containing download guidance and corresponding scripts of the VoxBlink dataset.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Build datasets using natural language
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021
zero-shot voice conversion & singing voice conversion, with real-time support
SSL Layerwise analysis for speech deepfake detection
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024] and "LS-EEND: long-form streaming…
Unified automatic quality assessment for speech, music, and sound.
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
Collection of papers on state-space models
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2E, F5-TTS, CosyVoice), with Whisper audio processing, RVC voice changer, YouTube downlo…