Change video languages using AI like YouTube - completely local or with OpenAI APIs
This tool translates videos by:
- Converting speech to text
- Translating the text
- Generating natural-sounding translated speech
- Merging new audio in a single audio file that matches the times of the original audio
Supports both video files and pre-generated JSON transcripts.
- Node.js (v18+)
- npm (included with Node.js)
- Docker (for local processing options)
npm install
node . <inputFile> <inputLang> <outputLang>Example: node . my_video.wav en es to convert from english to spanish
This list may vary depending on the model used.
| Input Languages (Speech-to-Text) | Output Languages (Text-to-Speech) |
|---|---|
| Full list of Whisper v3 languages | OpenAudio S1 Mini languages |
| - English (en) | - English (en) |
| - Chinese (zh) | - Chinese (zh) |
| - German (de) | - Japanese (ja) |
| - Spanish (es) | - German (de) |
| - Russian (ru) | - French (fr) |
| - Korean (ko) | - Spanish (es) |
| - French (fr) | - Korean (ko) |
| - Japanese (ja) | - Arabic (ar) |
| - Portuguese (pt) | - Russian (ru) |
| - Turkish (tr) | - Dutch (nl) |
| - Polish (pl) | - Italian (it) |
| - Catalan (ca) | - Polish (pl) |
| - Dutch (nl) | - Portuguese (pt) |
| - Arabic (ar) | |
| - Swedish (sv) | |
| - Italian (it) | |
| - Indonesian (id) | |
| - Hindi (hi) | |
| - Finnish (fi) | |
| - Vietnamese (vi) | |
| - Hebrew (he) | |
| - Ukrainian (uk) | |
| - Greek (el) | |
| - Malay (ms) | |
| - Czech (cs) | |
| - Romanian (ro) | |
| - Danish (da) | |
| - Hungarian (hu) | |
| - Tamil (ta) | |
| - Norwegian (no) | |
| - Thai (th) | |
| - Urdu (ur) | |
| - Croatian (hr) | |
| - Bulgarian (bg) | |
| - Lithuanian (lt) | |
| - Latin (la) | |
| - Maori (mi) | |
| - Malayalam (ml) | |
| - Welsh (cy) | |
| - Slovak (sk) | |
| - Telugu (te) | |
| - Persian (fa) | |
| - Latvian (lv) | |
| - Bengali (bn) | |
| - Serbian (sr) | |
| - Azerbaijani (az) | |
| - Slovenian (sl) | |
| - Kannada (kn) | |
| - Estonian (et) | |
| - Macedonian (mk) | |
| - Breton (br) | |
| - Basque (eu) | |
| - Icelandic (is) | |
| - Armenian (hy) | |
| - Nepali (ne) | |
| - Mongolian (mn) | |
| - Bosnian (bs) | |
| - Kazakh (kk) | |
| - Albanian (sq) | |
| - Swahili (sw) | |
| - Galician (gl) | |
| - Marathi (mr) | |
| - Punjabi (pa) | |
| - Sinhala (si) | |
| - Khmer (km) | |
| - Shona (sn) | |
| - Yoruba (yo) | |
| - Somali (so) | |
| - Afrikaans (af) | |
| - Occitan (oc) | |
| - Georgian (ka) | |
| - Belarusian (be) | |
| - Tajik (tg) | |
| - Sindhi (sd) | |
| - Gujarati (gu) | |
| - Amharic (am) | |
| - Yiddish (yi) | |
| - Lao (lo) | |
| - Uzbek (uz) | |
| - Faroese (fo) | |
| - Haitian Creole (ht) | |
| - Pashto (ps) | |
| - Turkmen (tk) | |
| - Nynorsk (nn) | |
| - Maltese (mt) | |
| - Sanskrit (sa) | |
| - Luxembourgish (lb) | |
| - Burmese (my) | |
| - Tibetan (bo) | |
| - Tagalog (tl) | |
| - Malagasy (mg) | |
| - Assamese (as) | |
| - Tatar (tt) | |
| - Hawaiian (haw) | |
| - Lingala (ln) | |
| - Hausa (ha) | |
| - Bashkir (ba) | |
| - Javanese (jw) | |
| - Sundanese (su) |
Uses local Docker containers for private, offline processing
-
Start specific services:
# Start only the services you need docker compose up -d whisper-stt libretranslate kokoro-tts -
Set environment variables:
Windows (CMD/PowerShell):
:: Use your machine's local IP (not localhost) for Docker on Windows set STT_OPENAI_KEY=- set STT_OPENAI_HOST=http://192.168.1.100:8881/v1 set TTS_OPENAI_KEY=- set TTS_OPENAI_HOST=http://192.168.1.100:8882/v1 set TTS_OPENAI_VOICE=af_bella set RETRANSLATE_HOST=http://192.168.1.100:8883
Linux/macOS:
export STT_OPENAI_KEY=- export STT_OPENAI_HOST=http://localhost:8881/v1 export TTS_OPENAI_KEY=- export TTS_OPENAI_HOST=http://localhost:8882/v1 export TTS_OPENAI_VOICE=af_bella export RETRANSLATE_HOST=http://localhost:8883
Important: Even in local mode, you must set dummy values for both
STT_OPENAI_KEYandTTS_OPENAI_KEY:STT_OPENAI_KEY=-(dummy value)TTS_OPENAI_KEY=-(dummy value) These are required for the tool to work properly in local mode.
Windows Docker Note: Replace
192.168.1.100with your actual local IP address. Find it withipconfig(look for IPv4 Address). Localhost may not work with Docker on Windows.
Use your own voice for translations with one of these models:
- Requires: Audio sample only
- Port: 8882
-
Start services:
docker compose up -d whisper-stt libretranslate indextts
-
Set environment variables:
:: Windows set CUSTOM_TTS=http://192.168.1.100:8882 set CUSTOM_TTS_MODEL=indextts set CUSTOM_TTS_SAMPLE=C:\\path\\to\\your\\voice_sample.wav
# Linux/macOS export CUSTOM_TTS=http://localhost:8882 export CUSTOM_TTS_MODEL=indextts export CUSTOM_TTS_SAMPLE=/path/to/your/voice_sample.wav
- Requires: Audio sample + transcription file
- Port: 8882
-
Start services:
docker compose up -d whisper-stt libretranslate openaudio-s1-mini
-
Set environment variables:
:: Windows set CUSTOM_TTS=http://192.168.1.100:8882 set CUSTOM_TTS_MODEL=fishspeech set CUSTOM_TTS_SAMPLE=C:\\path\\to\\your\\voice_sample.wav
# Linux/macOS export CUSTOM_TTS=http://localhost:8882 export CUSTOM_TTS_MODEL=fishspeech export CUSTOM_TTS_SAMPLE=/path/to/your/voice_sample.wav
Voice Sample Requirements:
- 8-20 seconds duration
- Clean, clear, noise-free audio
- WAV format
- Must be in the target output language (Recommended)
Fish-Speech Additional Requirement:
- Create a transcription file with the same name as your audio sample but with
.txtextension - Example: If your sample is
voice_sample.wav, createvoice_sample.wav.txtwith the transcription
Uses OpenAI's cloud services (requires API keys)
Set these environment variables:
Windows:
set STT_OPENAI_KEY=sk-xxxxxxxx
set STT_OPENAI_HOST=https://api.openai.com/v1
set TTS_OPENAI_KEY=sk-xxxxxxxx
set TTS_OPENAI_HOST=https://api.openai.com/v1
set TTS_OPENAI_VOICE=alloy # or nova, shimmer, echo
set TRANSLATE_OPENAI_KEY=sk-xxxxxxxx
set TRANSLATE_OPENAI_HOST=https://api.openai.com/v1
set TRANSLATE_OPENAI_MODEL=gpt-4-turboLinux/macOS:
export STT_OPENAI_KEY=sk-xxxxxxxx
export STT_OPENAI_HOST=https://api.openai.com/v1
export TTS_OPENAI_KEY=sk-xxxxxxxx
export TTS_OPENAI_HOST=https://api.openai.com/v1
export TTS_OPENAI_VOICE=alloy
export TRANSLATE_OPENAI_KEY=sk-xxxxxxxx
export TRANSLATE_OPENAI_HOST=https://api.openai.com/v1
export TRANSLATE_OPENAI_MODEL=gpt-4-turbo-
Convert SRT to JSON:
node convert_srt_to_json.js <input.srt> <output.json>
-
Use JSON as input to skip transcription:
node . transcript.json <inputLang> <outputLang>
- Skip transcription: Provide JSON file instead of video
- Skip translation: Use
skipas input languagenode . input.wav skip es # Keeps original speech, translates to Spanish
-
Local processing: Browse available voices
- English:
af_bella - Spanish:
ef_dora - Japanese:
gf_kokoro
- English:
-
OpenAI:
alloy,echo,fable,nova,onyx,shimmer -
Custom TTS: Use your own voice sample
# Basic video translation (English to Spanish)
node . presentation.wav en es
# Use pre-generated transcript (skip speech-to-text)
node . transcript.json en fr
# Skip translation (keep original speech, translate to German)
node . interview.wav skip de
# Local processing with Windows Docker
node . demo.wav en ja
# Use custom voice cloning (after Docker setup)
node . vlog.wav en fr # Uses your voice_sample.wav for French-
For local processing:
- Keep Docker running while processing audios
- First run will download large models (5-10GB)
- Requires powerful hardware (recommended 16GB+ RAM and 8GB+ VRAM)
-
IP addresses:
- Windows Docker: Must use machine's local IP (not localhost)
- Find Windows IP: Run
ipconfigβ "IPv4 Address" - Linux/macOS can use
localhost
-
Output files:
- Audios:
<original>_<outputLang>.wav - Transcripts:
<original>.srtand<original>.json
- Audios:
-
Custom TTS requirements:
- Voice samples must be high-quality recordings
- Samples must match target language
- First run will take longer to train voice model
- Requires additional GPU resources for best results
- https://github.com/k2-fsa/ZipVoice - Good results (Enlgish, Chinese) - Apache-2.0 :D
- https://github.com/bytedance/MegaTTS3 - Good results (Enlgish, Chinese) - Apache-2.0 :D
- https://github.com/boson-ai/higgs-audio - Very good results (English, Chinese Only?) (Warning: Very big) - Apache-2.0 :D
- https://github.com/fishaudio/fish-speech - Very good results (Chinese, English, German, Japanese, French, Spanish, Korean, Arabic, Dutch, Russian, Italian, Polish, Portuguese) - cc-by-nc-sa-4.0 :c
Warning! Any of the audios found in this repository cannot be used for anything other than personal and private use to test the code, non-private and personal use, that is, generating an audio that is uploaded to the internet, shared, sent to someone, even with the purpose of doing harm or not, even if it is just for fun is illegal. You cannot use another person's voices without their consent (in this case my voice), the improper use of their voice will be an infringement of identity theft and can end in serious legal problems, even facing prison sentences of several months or years, this message is only to warn about the use of my voice in this repository and in my YouTube videos, this message also affects previous commits.