A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 15,780 3,115 Updated Sep 30, 2025

wkentaro / labelme

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

Python 15,077 3,582 Updated Sep 23, 2025

davidsandberg / facenet

Face recognition using Tensorflow

Python 14,195 4,811 Updated Jul 24, 2023

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 12,428 1,318 Updated Sep 29, 2025

hoya012 / deep_learning_object_detection

A paper list of object detection using deep learning.

Python 11,416 2,774 Updated Feb 12, 2024

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 10,544 1,120 Updated Apr 9, 2025

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 10,504 1,562 Updated Sep 25, 2025

facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 9,319 1,294 Updated Apr 24, 2024

librosa / librosa

Python library for audio and music analysis

Python 7,905 1,008 Updated Sep 16, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,383 532 Updated Sep 15, 2025

canopyai / Orpheus-TTS

Towards Human-Sounding Speech

Python 5,600 468 Updated May 6, 2025

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,817 1,156 Updated Sep 25, 2025

hzy46 / Deep-Learning-21-Examples

《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码

Python 4,624 1,764 Updated Mar 18, 2019

Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need

Python 4,415 1,308 Updated May 21, 2023

buriburisuri / speech-to-text-wavenet

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

Python 3,991 793 Updated Oct 8, 2021

TensorSpeech / TensorFlowTTS

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Python 3,982 811 Updated Jul 5, 2024

andabi / deep-voice-conversion

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

Python 3,934 842 Updated Sep 30, 2022

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 3,683 277 Updated Sep 26, 2025

CharlesShang / FastMaskRCNN

Mask RCNN in TensorFlow

Python 3,097 1,092 Updated Jan 5, 2021

ace-step / ACE-Step

ACE-Step: A Step Towards Music Generation Foundation Model

Python 3,056 337 Updated Jun 27, 2025

openvpi / DiffSinger

Forked from MoonInTheRiver/DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Python 2,984 313 Updated Sep 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robin.Zhang Robinatp

Achievements

Achievements

Block or report Robinatp

Stars

Stability-AI / stablediffusion

facebookresearch / Detectron

matterport / Mask_RCNN

QwenLM / Qwen3

NVIDIA-NeMo / NeMo