Auto Video Translator 🎥🔊

Change video languages using AI like YouTube - completely local or with OpenAI APIs

This tool translates videos by:

Converting speech to text
Translating the text
Generating natural-sounding translated speech
Merging new audio in a single audio file that matches the times of the original audio

Supports both video files and pre-generated JSON transcripts.

Prerequisites

Node.js (v18+)
npm (included with Node.js)
Docker (for local processing options)

Quick Start

npm install
node . <inputFile> <inputLang> <outputLang>

Example: node . my_video.wav en es to convert from english to spanish

Language Support

This list may vary depending on the model used.

Input Languages (Speech-to-Text)	Output Languages (Text-to-Speech)
Full list of Whisper v3 languages	OpenAudio S1 Mini languages
- English (en)	- English (en)
- Chinese (zh)	- Chinese (zh)
- German (de)	- Japanese (ja)
- Spanish (es)	- German (de)
- Russian (ru)	- French (fr)
- Korean (ko)	- Spanish (es)
- French (fr)	- Korean (ko)
- Japanese (ja)	- Arabic (ar)
- Portuguese (pt)	- Russian (ru)
- Turkish (tr)	- Dutch (nl)
- Polish (pl)	- Italian (it)
- Catalan (ca)	- Polish (pl)
- Dutch (nl)	- Portuguese (pt)
- Arabic (ar)
- Swedish (sv)
- Italian (it)
- Indonesian (id)
- Hindi (hi)
- Finnish (fi)
- Vietnamese (vi)
- Hebrew (he)
- Ukrainian (uk)
- Greek (el)
- Malay (ms)
- Czech (cs)
- Romanian (ro)
- Danish (da)
- Hungarian (hu)
- Tamil (ta)
- Norwegian (no)
- Thai (th)
- Urdu (ur)
- Croatian (hr)
- Bulgarian (bg)
- Lithuanian (lt)
- Latin (la)
- Maori (mi)
- Malayalam (ml)
- Welsh (cy)
- Slovak (sk)
- Telugu (te)
- Persian (fa)
- Latvian (lv)
- Bengali (bn)
- Serbian (sr)
- Azerbaijani (az)
- Slovenian (sl)
- Kannada (kn)
- Estonian (et)
- Macedonian (mk)
- Breton (br)
- Basque (eu)
- Icelandic (is)
- Armenian (hy)
- Nepali (ne)
- Mongolian (mn)
- Bosnian (bs)
- Kazakh (kk)
- Albanian (sq)
- Swahili (sw)
- Galician (gl)
- Marathi (mr)
- Punjabi (pa)
- Sinhala (si)
- Khmer (km)
- Shona (sn)
- Yoruba (yo)
- Somali (so)
- Afrikaans (af)
- Occitan (oc)
- Georgian (ka)
- Belarusian (be)
- Tajik (tg)
- Sindhi (sd)
- Gujarati (gu)
- Amharic (am)
- Yiddish (yi)
- Lao (lo)
- Uzbek (uz)
- Faroese (fo)
- Haitian Creole (ht)
- Pashto (ps)
- Turkmen (tk)
- Nynorsk (nn)
- Maltese (mt)
- Sanskrit (sa)
- Luxembourgish (lb)
- Burmese (my)
- Tibetan (bo)
- Tagalog (tl)
- Malagasy (mg)
- Assamese (as)
- Tatar (tt)
- Hawaiian (haw)
- Lingala (ln)
- Hausa (ha)
- Bashkir (ba)
- Javanese (jw)
- Sundanese (su)

Configuration Options

🌐 Option 1: Local Processing (Recommended)

Uses local Docker containers for private, offline processing

Start specific services:

# Start only the services you need
docker compose up -d whisper-stt libretranslate kokoro-tts

Set environment variables:

Windows (CMD/PowerShell):

:: Use your machine's local IP (not localhost) for Docker on Windows
set STT_OPENAI_KEY=-
set STT_OPENAI_HOST=http://192.168.1.100:8881/v1
set TTS_OPENAI_KEY=-
set TTS_OPENAI_HOST=http://192.168.1.100:8882/v1
set TTS_OPENAI_VOICE=af_bella
set RETRANSLATE_HOST=http://192.168.1.100:8883

Linux/macOS:

export STT_OPENAI_KEY=-
export STT_OPENAI_HOST=http://localhost:8881/v1
export TTS_OPENAI_KEY=-
export TTS_OPENAI_HOST=http://localhost:8882/v1
export TTS_OPENAI_VOICE=af_bella
export RETRANSLATE_HOST=http://localhost:8883

Important: Even in local mode, you must set dummy values for both STT_OPENAI_KEY and TTS_OPENAI_KEY:

STT_OPENAI_KEY=- (dummy value)

TTS_OPENAI_KEY=- (dummy value) These are required for the tool to work properly in local mode.

Windows Docker Note: Replace 192.168.1.100 with your actual local IP address. Find it with ipconfig (look for IPv4 Address). Localhost may not work with Docker on Windows.

🎤 Option 2: Custom TTS (Voice Cloning)

Use your own voice for translations with one of these models:

IndexTTS

Requires: Audio sample only
Port: 8882

Start services:

docker compose up -d whisper-stt libretranslate indextts

Set environment variables:

:: Windows
set CUSTOM_TTS=http://192.168.1.100:8882
set CUSTOM_TTS_MODEL=indextts
set CUSTOM_TTS_SAMPLE=C:\\path\\to\\your\\voice_sample.wav

# Linux/macOS
export CUSTOM_TTS=http://localhost:8882
export CUSTOM_TTS_MODEL=indextts
export CUSTOM_TTS_SAMPLE=/path/to/your/voice_sample.wav

Fish-Speech (OpenAudio-S1-Mini)

Requires: Audio sample + transcription file
Port: 8882

Start services:

docker compose up -d whisper-stt libretranslate openaudio-s1-mini

Set environment variables:
```
:: Windows
set CUSTOM_TTS=http://192.168.1.100:8882
set CUSTOM_TTS_MODEL=fishspeech
set CUSTOM_TTS_SAMPLE=C:\\path\\to\\your\\voice_sample.wav
```
```
# Linux/macOS
export CUSTOM_TTS=http://localhost:8882
export CUSTOM_TTS_MODEL=fishspeech
export CUSTOM_TTS_SAMPLE=/path/to/your/voice_sample.wav
```
Voice Sample Requirements:
- 8-20 seconds duration
- Clean, clear, noise-free audio
- WAV format
- Must be in the target output language (Recommended)
Fish-Speech Additional Requirement:
- Create a transcription file with the same name as your audio sample but with .txt extension
- Example: If your sample is voice_sample.wav, create voice_sample.wav.txt with the transcription

☁️ Option 3: OpenAI API Processing

Uses OpenAI's cloud services (requires API keys)

Set these environment variables:

Windows:

set STT_OPENAI_KEY=sk-xxxxxxxx
set STT_OPENAI_HOST=https://api.openai.com/v1

set TTS_OPENAI_KEY=sk-xxxxxxxx
set TTS_OPENAI_HOST=https://api.openai.com/v1
set TTS_OPENAI_VOICE=alloy  # or nova, shimmer, echo

set TRANSLATE_OPENAI_KEY=sk-xxxxxxxx
set TRANSLATE_OPENAI_HOST=https://api.openai.com/v1
set TRANSLATE_OPENAI_MODEL=gpt-4-turbo

Linux/macOS:

export STT_OPENAI_KEY=sk-xxxxxxxx
export STT_OPENAI_HOST=https://api.openai.com/v1

export TTS_OPENAI_KEY=sk-xxxxxxxx
export TTS_OPENAI_HOST=https://api.openai.com/v1
export TTS_OPENAI_VOICE=alloy

export TRANSLATE_OPENAI_KEY=sk-xxxxxxxx
export TRANSLATE_OPENAI_HOST=https://api.openai.com/v1
export TRANSLATE_OPENAI_MODEL=gpt-4-turbo

Advanced Input Options

Using Pre-generated Transcripts

Convert SRT to JSON:

node convert_srt_to_json.js <input.srt> <output.json>

Use JSON as input to skip transcription:

node . transcript.json <inputLang> <outputLang>

Skipping Processing Stages

Skip transcription: Provide JSON file instead of video

Skip translation: Use skip as input language

node . input.wav skip es  # Keeps original speech, translates to Spanish

Voice Selection Guide

Local processing: Browse available voices
- English: af_bella
- Spanish: ef_dora
- Japanese: gf_kokoro
OpenAI: alloy, echo, fable, nova, onyx, shimmer
Custom TTS: Use your own voice sample

Usage Examples

# Basic video translation (English to Spanish)
node . presentation.wav en es

# Use pre-generated transcript (skip speech-to-text)
node . transcript.json en fr

# Skip translation (keep original speech, translate to German)
node . interview.wav skip de

# Local processing with Windows Docker
node . demo.wav en ja

# Use custom voice cloning (after Docker setup)
node . vlog.wav en fr  # Uses your voice_sample.wav for French

Important Notes

For local processing:
- Keep Docker running while processing audios
- First run will download large models (5-10GB)
- Requires powerful hardware (recommended 16GB+ RAM and 8GB+ VRAM)
IP addresses:
- Windows Docker: Must use machine's local IP (not localhost)
- Find Windows IP: Run ipconfig → "IPv4 Address"
- Linux/macOS can use localhost
Output files:
- Audios: <original>_<outputLang>.wav
- Transcripts: <original>.srt and <original>.json
Custom TTS requirements:
- Voice samples must be high-quality recordings
- Samples must match target language
- First run will take longer to train voice model
- Requires additional GPU resources for best results

Other interesting models that could be added:

Zero-Shot:

https://github.com/k2-fsa/ZipVoice - Good results (Enlgish, Chinese) - Apache-2.0 :D
https://github.com/bytedance/MegaTTS3 - Good results (Enlgish, Chinese) - Apache-2.0 :D
https://github.com/boson-ai/higgs-audio - Very good results (English, Chinese Only?) (Warning: Very big) - Apache-2.0 :D
https://github.com/fishaudio/fish-speech - Very good results (Chinese, English, German, Japanese, French, Spanish, Korean, Arabic, Dutch, Russian, Italian, Polish, Portuguese) - cc-by-nc-sa-4.0 :c

Warning! Any of the audios found in this repository cannot be used for anything other than personal and private use to test the code, non-private and personal use, that is, generating an audio that is uploaded to the internet, shared, sent to someone, even with the purpose of doing harm or not, even if it is just for fun is illegal. You cannot use another person's voices without their consent (in this case my voice), the improper use of their voice will be an infringement of identity theft and can end in serious legal problems, even facing prison sentences of several months or years, this message is only to warn about the use of my voice in this repository and in my YouTube videos, this message also affects previous commits.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.devcontainer		.devcontainer
src		src
tests		tests
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
adri_sample_en.wav		adri_sample_en.wav
adri_sample_en.wav.txt		adri_sample_en.wav.txt
adri_sample_es.wav		adri_sample_es.wav
compose.yml		compose.yml
convert_srt_to_json.js		convert_srt_to_json.js
model_aliases.json		model_aliases.json
model_aliases.turbo.json		model_aliases.turbo.json
package-lock.json		package-lock.json
package.json		package.json
test.json		test.json
test.srt		test.srt
test.wav		test.wav
test_2.wav		test_2.wav
test_es.wav		test_es.wav
zeroshot.wav		zeroshot.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Auto Video Translator 🎥🔊

Prerequisites

Quick Start

Language Support

Configuration Options

🌐 Option 1: Local Processing (Recommended)

🎤 Option 2: Custom TTS (Voice Cloning)

IndexTTS

Fish-Speech (OpenAudio-S1-Mini)

☁️ Option 3: OpenAI API Processing

Advanced Input Options

Using Pre-generated Transcripts

Skipping Processing Stages

Voice Selection Guide

Usage Examples

Important Notes

Other interesting models that could be added:

Zero-Shot:

About

Uh oh!

Releases

Packages

Languages

License

adriabama06/auto-video-translator

Folders and files

Latest commit

History

Repository files navigation

Auto Video Translator 🎥🔊

Prerequisites

Quick Start

Language Support

Configuration Options

🌐 Option 1: Local Processing (Recommended)

🎤 Option 2: Custom TTS (Voice Cloning)

IndexTTS

Fish-Speech (OpenAudio-S1-Mini)

☁️ Option 3: OpenAI API Processing

Advanced Input Options

Using Pre-generated Transcripts

Skipping Processing Stages

Voice Selection Guide

Usage Examples

Important Notes

Other interesting models that could be added:

Zero-Shot:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages