Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 22,507 2,450 Updated Mar 13, 2025

FireRedTeam / FireRedTTS2

Long-form streaming TTS system for multi-speaker dialogue generation

Python 724 89 Updated Sep 17, 2025

hyama5 / vae_align

Alignment examples for Interspeech 2024

27 Updated Jul 5, 2024

EZ-VC / EZ-VC

Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion [EMNLP 2025 Findings]

Python 24 6 Updated Sep 9, 2025

rishikksh20 / ministral-playground-pytorch

Readable Implementation of Ministral 8B from Mistral

Python 3 Updated Sep 12, 2025

rishikksh20 / soundstorm-vc

Customize Soundstorm for voice conversion use case

Python 2 Updated Jan 5, 2024

rishikksh20 / MiniMax-TTS-pytorch

Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report

Python 48 3 Updated Sep 2, 2025

Tobertz-max / DiFlow-TTS

DiFlow-TTS delivers low-latency zero-shot TTS via discrete flow matching and factorized speech tokens. A compact, open framework for fast voice synthesis.🐙

Python 43 4 Updated Sep 30, 2025

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,126 78 Updated Sep 22, 2025

rishikksh20 / qwen3-playground

Readable implementation of Qwen3 0.6B model

Python 4 1 Updated Sep 4, 2025

TencentARC / AudioStory

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Jupyter Notebook 278 17 Updated Sep 21, 2025

HeCheng0625 / Diffusion-Speech-Tokenizer

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 179 12 Updated Sep 21, 2025

microsoft / VibeVoice

Frontier Open-Source Text-to-Speech

9,376 1,135 Updated Sep 5, 2025

AIDC-AI / Marco-Voice

A Unified Framework for Expressive Speech Synthesis with Voice Cloning

Python 376 31 Updated Aug 18, 2025

naver-ai / RapFlow-TTS

Python 44 7 Updated Jul 16, 2025

hayeong0 / DDDM-VC

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)

Python 236 24 Updated Jul 31, 2024

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,382 532 Updated Sep 15, 2025

s3prl / s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Python 2,459 512 Updated Jun 13, 2025

AaronZ345 / TCSinger2

PyTorch Implementation of TCSinger 2(ACL 2025): Customizable Multilingual Zero-shot Singing Voice Synthesis

Python 148 27 Updated Sep 4, 2025

fundwotsai2001 / Text-to-music-dataset-preparation

A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]

Python 25 Updated May 20, 2025

zruiii / QwenAudioSFT

The repoduction codes for Qwen-Audio Fine-tuning

Python 49 4 Updated Aug 15, 2024

JEdward7777 / StableTTS_Editing

Extending the StableTTS project for the application of updating recordings to content which has changed.

Python 1 1 Updated Mar 6, 2025

KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python 427 45 Updated Sep 13, 2024

zhiyongchenGREAT / stabletts_ccfm

repo for stableTTS consistency flow matching

Python 6 1 Updated Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robin.Zhang Robinatp

Achievements