LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,854 196 Updated Nov 14, 2024

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 17,765 1,779 Updated Mar 14, 2025

fixie-ai / ultravox

A fast multimodal LLM for real-time voice

Python 3,721 273 Updated Feb 14, 2025

argilla-io / synthetic-data-generator

Build datasets using natural language

Python 428 52 Updated Mar 5, 2025

splinter21 / f5-tts-trtllm

Python 1 6 Updated Dec 19, 2024

rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 7,660 729 Updated Feb 27, 2025

kuielab / mdx-net

KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

Python 199 23 Updated Feb 27, 2023

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 1,692 188 Updated Mar 11, 2025

Yaselley / SSL_Layerwise_Deepfake

SSL Layerwise analysis for speech deepfake detection

Python 20 Updated Feb 17, 2025

hexgrad / kokoro

https://hf.co/hexgrad/Kokoro-82M

JavaScript 1,657 163 Updated Mar 1, 2025

zhenye234 / LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 460 34 Updated Mar 12, 2025

Audio-WestlakeU / FS-EEND

The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024] and "LS-EEND: long-form streaming…

Python 120 5 Updated Mar 4, 2025

facebookresearch / audiobox-aesthetics

Unified automatic quality assessment for speech, music, and sound.

Python 416 26 Updated Mar 7, 2025

sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation

Python 582 79 Updated Feb 21, 2025

deepseek-ai / DeepSeek-V3

Python 92,206 14,958 Updated Feb 24, 2025

radarFudan / Awesome-state-space-models

Collection of papers on state-space models

584 20 Updated Mar 2, 2025

abus-aikorea / voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2E, F5-TTS, CosyVoice), with Whisper audio processing, RVC voice changer, YouTube downlo…

Python 3,467 262 Updated Mar 14, 2025

Đỗ Trí Nhân v-nhandt21

Lists (22)

AntiSpoofing

BackboneModel

Crawl Data

DataEngineer

Denoiser

Dev

Feature

G2P

Interview

LLM

NTU

React

Singing Voice

Sound and Signal Processing

Speaker Diarization

SpeakerVerification

Speech Synthesis

SpeechEmotionRecognition

SpeechSOTA

TTS

VectorDatabase

VoiceCloning

Stars