PDF scientific paper translation and bilingual comparison.
-
📊 Retain formulas and charts.
-
📄 Preserve table of contents.
-
🌐 Support multiple translation services.
Require Python version <=3.12
pip install pdf2zhExecute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory.
pdf2zh example.pdfpdf2zh example.pdf -p 1-3,5See Google Languages Codes, DeepL Languages Codes.
pdf2zh example.pdf -li en -lo jaSee DeepLX.
Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/{DEEPL_AUTH_KEY}/translate
DEEPL_SERVER_URL(Optional), e.g.,export DEEPL_SERVER_URL=https://api.deepl.comDEEPL_AUTH_KEY, e.g.,export DEEPL_AUTH_KEY=xxx
pdf2zh example.pdf -s deeplSee Ollama.
Set ENVs to construct an endpoint like: {OLLAMA_HOST}/api/chat
OLLAMA_HOST(Optional), e.g.,export OLLAMA_HOST=https://localhost:11434
pdf2zh example.pdf -s ollama:gemma2See OpenAI.
Set ENVs to construct an endpoint like: {OPENAI_BASE_URL}/chat/completions
OPENAI_BASE_URL(Optional), e.g.,export OPENAI_BASE_URL=https://api.openai.com/v1OPENAI_API_KEY, e.g.,export OPENAI_API_KEY=xxx
pdf2zh example.pdf -s openai:gpt-4opdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"Document merging: PyMuPDF
Document parsing: Pdfminer.six
Document extraction: MinerU
Multi-threaded translation: MathTranslate
Layout parsing: DocLayout-YOLO
Document standard: PDF Explained, PDF Cheat Sheets


