Stars
High-Resolution Image Synthesis with Latent Diffusion Models
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
A paper list of object detection using deep learning.
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Python library for audio and music analysis
Text-audio foundation model from Boson AI
Production First and Production Ready End-to-End Speech Recognition Toolkit
《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码
A TensorFlow Implementation of the Transformer: Attention Is All You Need
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
ACE-Step: A Step Towards Music Generation Foundation Model
openvpi / DiffSinger
Forked from MoonInTheRiver/DiffSingerAn advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
Unofficial implemention of lanenet model for real time lane detection
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Bag of Tricks and A Strong Baseline for Deep Person Re-identification