PDF scientific paper translation and bilingual comparison.
- 📊 Preserve formulas, charts, table of contents, and annotations (preview).
- 🌐 Support multiple languages, and diverse translation services.
- 🤖 Provides commandline tool, interactive user interface, and Docker
Feel free to provide feedback in GitHub Issues or Telegram Group.
- [Nov. 20 2024] Supports Docker
- [Nov. 20 2024] Supports multiple-threads translation
- [Nov. 19 2024] Provides an interactive graphical user interface
- [Nov. 18 2024] Supports more services: DeepL, DeepLX, and Azure
We provide three methods for using this project: commanline, GUI, and Docker.
-
Python installed (3.8 <= version <= 3.12)
-
Install our package
pip install pdf2zh
-
Use:
pdf2zh document.pdf
-
Python installed (3.8 <= version <= 3.12)
-
Install our package
pip install pdf2zh
-
Start using in browser:
pdf2zh -i
-
If your browswer has not been started automatically, goto
http://localhost:7860/
See documentation for GUI for more details.
-
Pull and run:
docker pull byaidu/pdf2zh docker run -p 7860:7860 byaidu/pdf2zh
-
Open in browser:
http://localhost:7860/
Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory. Use Google as the default translation service.
In the following table, we list all advanced options for reference:
| Option | Function | Example |
|---|---|---|
-i |
Enter GUI | pdf2zh -i |
-p |
Partial document translation | pdf2zh example.pdf -p 1 |
-li |
Source language | pdf2zh example.pdf -li en |
-lo |
Target language | pdf2zh example.pdf -lo zh |
-s |
Translation service | pdf2zh example.pdf -s deepl |
-t |
Multi-threads | pdf2zh example.pdf -t 1 |
-f, -c |
Exceptions | pdf2zh example.pdf -f "(MS.*)" |
Some services require setting environmental variables. Please refer to ChatGPT for how to set environment variables.
-
Entire document
pdf2zh example.pdf
-
Part of the document
pdf2zh example.pdf -p 1-3,5
See Google Languages Codes, DeepL Languages Codes
pdf2zh example.pdf -li en -lo ja-
DeepL
See DeepL
Set ENVs to construct an endpoint like:
{DEEPL_SERVER_URL}/translateDEEPL_SERVER_URL(Optional), e.g.,export DEEPL_SERVER_URL=https://api.deepl.comDEEPL_AUTH_KEY, e.g.,export DEEPL_AUTH_KEY=xxx
pdf2zh example.pdf -s deepl
-
DeepLX
See DeepLX
Set ENVs to construct an endpoint like:
{DEEPL_SERVER_URL}/translateDEEPLX_SERVER_URL(Optional), e.g.,export DEEPLX_SERVER_URL=https://api.deeplx.orgDEEPLX_AUTH_KEY, e.g.,export DEEPLX_AUTH_KEY=xxx
pdf2zh example.pdf -s deeplx
-
Ollama
See Ollama
Set ENVs to construct an endpoint like:
{OLLAMA_HOST}/api/chatOLLAMA_HOST(Optional), e.g.,export OLLAMA_HOST=https://localhost:11434
pdf2zh example.pdf -s ollama:gemma2
-
LLM with OpenAI compatible schemas (OpenAI / SiliconCloud / Zhipu)
See SiliconCloud, Zhipu
Set ENVs to construct an endpoint like:
{OPENAI_BASE_URL}/chat/completionsOPENAI_BASE_URL(Optional), e.g.,export OPENAI_BASE_URL=https://api.openai.com/v1OPENAI_API_KEY, e.g.,export OPENAI_API_KEY=xxx
pdf2zh example.pdf -s openai:gpt-4o
-
Azure
Following ENVs are required:
AZURE_APIKEY, e.g.,export AZURE_APIKEY=xxxAZURE_ENDPOINT, e.g,export AZURE_ENDPOINT=https://api.translator.azure.cn/AZURE_REGION, e.g.,export AZURE_REGION=chinaeast2
pdf2zh example.pdf -s azure
Use regex to specify formula fonts and characters that need to be preserved.
pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"Use -t to specify how many threads to use in translation:
pdf2zh example.pdf -t 1-
Document merging: PyMuPDF
-
Document parsing: Pdfminer.six
-
Document extraction: MinerU
-
Multi-threaded translation: MathTranslate
-
Layout parsing: DocLayout-YOLO
-
Document standard: PDF Explained, PDF Cheat Sheets





