Lists (1)
Sort Name ascending (A-Z)
Stars
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Unsupervised domain adaptation for conversational speech enhancement using RemixIT
Speech, Language, Audio, Music Processing with Large Language Model
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
vits2 backbone with multilingual-bert
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
An unofficial PyTorch implementation of the audio LM VALL-E
It's a repository for implementations of neural speech editing algorithms.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Revisiting Denoising Diffusion Probabilistic Models for Speech Enhancement: Condition Collapse, Efficiency and Refinement, Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
This is official repository of new SOTA diffusion models based method for speech enhancement
A collection of resources and papers on Diffusion Models
Unofficial Pytorch implementation of the paper 'Categorical Reparameterization with Gumbel-Softmax' and 'The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables'
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
transformer based neural network for speech enhancement in time domain
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
Official repo for "A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT" to appear in ICASSP 2021
Official repo for the STRFNet system appeared in INTERSPEECH2020
Composite measure of speech qualilty from the book by Philipos Loizou "Speech Enhancement - Theory and Practice"
Unofficial implementation of PercepNet: A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
Python implementation of performance metrics in Loizou's Speech Enhancement book
Perceptual Metrics of Audio - perceptually relevant loss function. DPAM and CDPAM