Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
190c5ff
Add openaleph-procrastinate. Bump versions to satisfy dependencies (p…
catileptic Jun 11, 2025
a46a028
🧑‍💻 Add pre-commit, use requirements.txt, upgrade to python3.13
simonwoerpel Jun 16, 2025
109b53a
🧑‍💻 Add dev requirements only for test build
simonwoerpel Jun 16, 2025
43a02d1
🔥 (github) Drop daily cache job
simonwoerpel Jun 16, 2025
f977f90
✅ (tests/test_pdf) Fix whitespace errors from test results
simonwoerpel Jun 16, 2025
d6ad514
🔨 (make) Build before test
simonwoerpel Jun 16, 2025
8ef339f
👷 Inline base build
simonwoerpel Jun 16, 2025
9e376e4
🚧 Tweak builds and tags
simonwoerpel Jun 16, 2025
39a2385
👷 (github) Skip intermediate arm46 build for tests
simonwoerpel Jun 16, 2025
03f86fd
👷 (github) Skip cache-from [tmp]
simonwoerpel Jun 16, 2025
ff1a8c4
Revert "👷 (github) Skip cache-from [tmp]"
simonwoerpel Jun 16, 2025
4adcfb2
👷 (github/docker) Try this
simonwoerpel Jun 16, 2025
099aef5
🚨 Apply black
simonwoerpel Jun 17, 2025
4071354
👷 (github/docker) Don't use registry cache
simonwoerpel Jun 17, 2025
dd8bb0c
🧪 (test_image) Skip gif test
simonwoerpel Jun 17, 2025
9adf413
👷 (github/docker) maybe this
simonwoerpel Jun 17, 2025
53d0a8a
✨ Boilerplate ingest task for procrastinate
simonwoerpel Jun 17, 2025
6457ebd
🔧 Use pydantic_settings
simonwoerpel Jun 17, 2025
90292be
📌 Use openaleph-procrastinate from git
simonwoerpel Jun 18, 2025
99046a8
⚰️ Drop TranscriptionSupport
simonwoerpel Jun 18, 2025
33a33f0
🔥 Remove analysis part
simonwoerpel Jun 18, 2025
ef84cf7
🔥 Remove servicelayer worker
simonwoerpel Jun 18, 2025
b1dadd8
♻️ Refactor manager and supports to work with procrastinate
simonwoerpel Jun 18, 2025
e64425b
🚧 Make procrastinate task to work with manager
simonwoerpel Jun 18, 2025
13bbf14
➖ languagecodes, pantomime -> rigour
simonwoerpel Jun 18, 2025
0dacf45
🔊 Tweak global logging
simonwoerpel Jun 18, 2025
bf2a15e
♻️ Refactor cli with typer
simonwoerpel Jun 18, 2025
8873af6
🧪 Make tests work with procrastinate refactor
simonwoerpel Jun 18, 2025
bf26349
🩹 (ingestors/email) Use relative path
simonwoerpel Jun 18, 2025
bddb556
✨ (support/timestamp) Fall back to dateparser for unknown formats
simonwoerpel Jun 18, 2025
d91c43c
🙈 Ignore more
simonwoerpel Jun 18, 2025
6f7ff92
🔥 Remove unused lid model
simonwoerpel Jun 18, 2025
5bb5a8b
👷 (github) Tag base image properly
simonwoerpel Jun 19, 2025
cfe652e
📦 (docker) Use entrypoint and run procrastinate worker
simonwoerpel Jun 19, 2025
6e6fa8a
🧑‍💻 (contrib) Add non-docker debian install dependencies
simonwoerpel Jun 19, 2025
6653ac9
🧪 Add end-to-end testing setup
simonwoerpel Jun 19, 2025
ad775be
Merge branch 'main' into feat/procrastinate
simonwoerpel Jun 19, 2025
4c16741
🧪 (e2e) Working example
simonwoerpel Jun 19, 2025
e678b17
🔧 (settings) Move deferring settings up to openaleph-procrastinate
simonwoerpel Jun 19, 2025
8d1ea41
📌 requirements
simonwoerpel Jun 19, 2025
4143953
Pin Tesserocr to 2.6.2
catileptic Jun 26, 2025
bb5ea64
Add ENV LD_PRELOAD for Apple Silicone as comment
catileptic Jun 26, 2025
ca71d99
Solve minor errors
catileptic Jun 26, 2025
49f26c0
Bump openaleph_procrastinate version
catileptic Jun 27, 2025
1e948b5
🐛 (cli) Use defer settings correctly in debug mode
simonwoerpel Jul 1, 2025
b398811
⬆️ openaleph-procrastinate v0.0.7
simonwoerpel Jul 1, 2025
4dab40f
👽️ Adapt explicit defers from openaleph-procrastinate v0.0.7
simonwoerpel Jul 1, 2025
5b5f3d3
👷 Tweak compose settings
simonwoerpel Jul 1, 2025
7d61230
Bump openaleph-procrastinate version
catileptic Jul 16, 2025
15f2677
Add namespace to entities. Remove app user
catileptic Jul 16, 2025
00b64b1
Add namespace info to test setup
catileptic Jul 21, 2025
a2f0d14
Merge branch 'main' into feat/procrastinate
catileptic Jul 21, 2025
b16623e
Explicitly set the testing DB to sqlite
catileptic Jul 22, 2025
7c40dba
Pin procrastinate to 3.2.2 for tests
catileptic Jul 22, 2025
a7f2616
Add transcription procrastinate task
catileptic Jul 22, 2025
a19972b
📌 Pin procrastinate==3.2.2 for test docker build
simonwoerpel Jul 22, 2025
258b816
🚧 (docker) Cleanup duplicated RUN
simonwoerpel Jul 22, 2025
a8a948b
🚧 (cli) Adjust settings display
simonwoerpel Jul 23, 2025
aec12e6
🔧 (tests) Properly set FTM_STORE_URI
simonwoerpel Jul 23, 2025
a283898
⬆️ Dependencies
simonwoerpel Jul 23, 2025
e1e2200
✨ Documentation
simonwoerpel Jul 23, 2025
90dd7aa
🔥 Drop google cloud vision support
simonwoerpel Jul 24, 2025
49de026
Always index entities after ingesting
catileptic Jul 28, 2025
a7113e7
Replace get_dataset with get_fragments (ftmq.store)
catileptic Jul 28, 2025
1d7d261
⬆️ ftm(q) 4.1.x, openaleph-procrastinate 0.0.13
simonwoerpel Jul 29, 2025
1c72b9c
🐛 (support/email) Catch empty name
simonwoerpel Jul 29, 2025
6d12654
🎨 (support/transcription) Cleanup
simonwoerpel Jul 29, 2025
d0f6f04
🔥 Drop unused settings
simonwoerpel Jul 29, 2025
6536961
⬆️ openaleph-procrastinate 0.0.14
simonwoerpel Jul 29, 2025
9058612
🚧 (cli) Make foreign_id optional
simonwoerpel Jul 29, 2025
1501eb1
✅ Add e2e testing with minio
simonwoerpel Jul 29, 2025
1eae3cf
⬆️ openaleph-procrastinate 0.0.16
simonwoerpel Aug 2, 2025
ee94e1f
⬆️ openaleph-procrastinate 0.0.16
simonwoerpel Aug 3, 2025
67629db
💚 (github) Skip e2e
simonwoerpel Aug 3, 2025
ce7c91c
⬆️ ftmq 4.1.1, openaleph-procrastinate 0.0.18
simonwoerpel Aug 3, 2025
3c85e81
🚧 (tests/e2e) Adjustments
simonwoerpel Aug 3, 2025
27c158a
🔖 Bump version: 3.24.0 → 5.0.0rc1
simonwoerpel Aug 3, 2025
27996cb
💚 (github) Enable e2e again
simonwoerpel Aug 3, 2025
33d6417
⬆️ openaleph-procrastinate 0.0.20
simonwoerpel Aug 6, 2025
999b21d
🚧 (ingestors/image) Explicitly close PIL obj after processing
simonwoerpel Aug 7, 2025
6524ffc
🚧 (support/shell) Write to subprocess special DEVNULL
simonwoerpel Aug 7, 2025
76e0735
🚧 (ingestors/access) Wrap subprocess call in context manager
simonwoerpel Aug 7, 2025
854b1de
🚧 (ingestors/csv) Properly use context manager for file open
simonwoerpel Aug 7, 2025
bd75599
🚧 (support/ocr) Clean up OCR engine after use
simonwoerpel Aug 7, 2025
48b8dbc
⚗️ memray
simonwoerpel Aug 7, 2025
c2a5998
🔧 (settings) Properly configure servicelayer tags
simonwoerpel Aug 7, 2025
fdad9bf
🚧 (tasks) Collect garbage, just in case
simonwoerpel Aug 7, 2025
f06f42c
📌 Pin olefile<0.47 as this leaks crazy memory
simonwoerpel Aug 7, 2025
33ebebf
Merge pull request #14 from openaleph/fix/address-memory-issues
simonwoerpel Aug 7, 2025
8a67f36
📌 Fix RC version string
simonwoerpel Aug 11, 2025
a9b2b9b
⬆️ All the things
simonwoerpel Aug 11, 2025
b628c80
🔖 Bump version: 5.0.0-rc1 → 5.0.0-rc2
simonwoerpel Aug 11, 2025
80b34ef
⬆️ openaleph-procrastinate 0.0.25 and others
simonwoerpel Aug 14, 2025
538b3e7
🔖 Bump version: 5.0.0-rc2 → 5.0.0-rc3
simonwoerpel Aug 14, 2025
f7eb25f
🩹 (tasks) Pass through batch (formerly job_id)
simonwoerpel Aug 18, 2025
5743921
📌 Pin back tesserocr=2.6.2
simonwoerpel Aug 18, 2025
b4fa0d5
Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.…
catileptic Aug 27, 2025
171d063
⬆️ followthemoney 4.2.0
simonwoerpel Aug 27, 2025
896685f
⬆️ ftmq 4.2.2 (psycopg3)
simonwoerpel Aug 28, 2025
83a8098
⬆️ openaleph-procrastinate 0.0.29
simonwoerpel Aug 28, 2025
d436fb5
🔧 Ensure psycopg3 for sl tags db
simonwoerpel Aug 28, 2025
a9f6af0
Temporarily disable daily ingest-file-base build
catileptic Aug 28, 2025
83f633b
Update poetry.lock
catileptic Aug 28, 2025
a83794b
🔖 Bump version: 5.0.0-rc4 → 5.0.0-rc5
catileptic Aug 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 3.24.0
current_version = 5.0.0-rc5
tag_name = {new_version}
commit = True
tag = True
Expand Down
5 changes: 0 additions & 5 deletions .dockerignore

This file was deleted.

1 change: 1 addition & 0 deletions .dockerignore
4 changes: 4 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,7 @@ jobs:
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Run e2e test
working-directory: ./e2e
run: bash -c ./test_e2e.sh
9 changes: 5 additions & 4 deletions .github/workflows/docker-base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ name: Build ingest-file-base

on:
workflow_dispatch: {}
schedule:
- cron: "0 0 * * *"
# schedule:
# - cron: "0 0 * * *"
push:
paths:
- Dockerfile.base
Expand All @@ -28,7 +28,8 @@ jobs:
type=ref,event=branch
type=semver,pattern={{version}}
type=sha
type=raw,value=latest
type=raw,value=cache
type=raw,value=latest,enable=${{ startsWith(github.ref, 'refs/tags') }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
Expand All @@ -44,7 +45,7 @@ jobs:
with:
context: .
file: ./Dockerfile.base
platforms: linux/amd64,linux/arm64
platforms: linux/amd64 #,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
contrib/*.bin
data/archive
data/model_type_prediction.ftz
debug.sqlite3
data/servicelayer-archive
# documentation
site
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
7 changes: 4 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,21 @@
# * Run "pre-commit install".
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
rev: v6.0.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-merge-conflict
- id: check-symlinks
- id: check-toml
- id: check-yaml
exclude: "mkdocs.yml"
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
args: ["--fix=lf"]
- id: trailing-whitespace
exclude: ".bumpversion.cfg" # wtf
exclude: ".bumpversion.cfg" # wtf

# - repo: https://github.com/asottile/pyupgrade
# rev: v3.10.1
Expand Down Expand Up @@ -78,7 +79,7 @@ repos:
rev: 1.9.0
hooks:
- id: poetry-export
args: ["--without-hashes", "-o", "requirements.txt"]
args: ["--without-hashes", "--with", "main", "-o", "requirements.txt"]
- id: poetry-export
args:
["--without-hashes", "--only", "dev", "-o", "requirements-dev.txt"]
17 changes: 12 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
FROM ghcr.io/openaleph/ingest-file-base:latest

# uncomment when running on Apple Silicon
# ENV LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libgomp.so.1

COPY . /ingestors
RUN rm -rf /ingestors/tests
WORKDIR /ingestors
RUN pip3 install --no-cache-dir -r /ingestors/requirements.txt
RUN pip3 install --no-cache-dir /ingestors

RUN pip3 install --no-cache-dir --no-deps -r /ingestors/requirements.txt
RUN pip3 install --no-deps --no-cache-dir /ingestors

ENV ARCHIVE_TYPE=file \
ARCHIVE_PATH=/data \
FTM_STORE_URI=postgresql://aleph:aleph@postgres/aleph \
OPENALEPH_DB_URI=postgresql://aleph:aleph@postgres/aleph \
REDIS_URL=redis://redis:6379/0 \
TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata

USER app
CMD ingestors process
ENV PROCRASTINATE_APP="ingestors.tasks.app"

CMD ["procrastinate", "worker", "-q", "ingest"]
37 changes: 9 additions & 28 deletions Dockerfile.base
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ RUN apt-get -qq -y update \
# python deps (mostly to install their dependencies)
python3-pip python3-dev python3-pil \
# tesseract
tesseract-ocr libtesseract-dev libleptonica-dev pkg-config\
tesseract-ocr libtesseract-dev libleptonica-dev pkg-config \
# libraries
libxslt1-dev libpq-dev libldap2-dev libsasl2-dev \
zlib1g-dev libicu-dev libxml2-dev \
# package tools
unrar p7zip-full \
unrar \
# audio & video metadata
libmediainfo-dev \
# image processing, djvu
Expand Down Expand Up @@ -116,41 +116,22 @@ ENV LANG='en_US.UTF-8' \
OMP_THREAD_LIMIT='1' \
OPENBLAS_NUM_THREADS='1'

ENV LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libgomp.so.1"
# force compile tesserocr 2.6.2 with C++ 14
# to make it compatible with Tesseract 5
RUN pip download --no-binary=:all: "tesserocr==2.6.2" \
&& tar -xzf tesserocr-2.6.2.tar.gz \
&& sed -i "s/-std=c++11/-std=c++14/" tesserocr-2.6.2/setup.py \
&& cd tesserocr-2.6.2 \
&& CXXFLAGS="-std=c++14" pip install --no-cache-dir .

# tesseract 5
ENV TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata

RUN groupadd -g 1000 -r app \
&& useradd -m -u 1000 -s /bin/false -g app app

# Download the ftm-typepredict model
RUN mkdir /models/ && \
curl -o "/models/model_type_prediction.ftz" "https://public.data.occrp.org/develop/models/types/type-08012020-7a69d1b.ftz"

RUN pip3 install --no-cache-dir --prefer-binary --upgrade pip
RUN pip3 install --no-cache-dir --prefer-binary --upgrade setuptools wheel

# Install spaCy
RUN pip3 install --no-cache-dir spacy
# Install PyICU
RUN pip3 install --no-binary=:pyicu: pyicu
# Install TesserOCR
RUN pip3 install --no-binary=:tesserocr: tesserocr

# Install default (small) spaCy models
RUN python3 -m spacy download en_core_web_sm
RUN python3 -m spacy download de_core_news_sm
RUN python3 -m spacy download fr_core_news_sm
RUN python3 -m spacy download es_core_news_sm
RUN python3 -m spacy download ru_core_news_sm
RUN python3 -m spacy download pt_core_news_sm
RUN python3 -m spacy download ro_core_news_sm
RUN python3 -m spacy download mk_core_news_sm
RUN python3 -m spacy download el_core_news_sm
RUN python3 -m spacy download pl_core_news_sm
RUN python3 -m spacy download it_core_news_sm
RUN python3 -m spacy download lt_core_news_sm
RUN python3 -m spacy download nl_core_news_sm
RUN python3 -m spacy download nb_core_news_sm
RUN python3 -m spacy download da_core_news_sm
18 changes: 12 additions & 6 deletions Dockerfile.test
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
FROM ghcr.io/openaleph/ingest-file-base:latest

# uncomment when running on Apple Silicon
# ENV LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libgomp.so.1

COPY . /ingestors
WORKDIR /ingestors
RUN pip3 install --no-cache-dir -r /ingestors/requirements.txt
RUN pip3 install --no-cache-dir /ingestors

RUN pip3 install -r /ingestors/requirements-dev.txt
RUN pip3 install --no-cache-dir --no-deps -r /ingestors/requirements.txt
RUN pip3 install --no-deps --no-cache-dir /ingestors

RUN pip3 install --no-deps -r /ingestors/requirements-dev.txt
RUN pip3 install --no-cache-dir procrastinate==3.2.2
RUN chown -R app:app /ingestors

ENV ARCHIVE_TYPE=file \
ARCHIVE_PATH=/data \
FTM_STORE_URI=postgresql://aleph:aleph@postgres/aleph \
REDIS_URL=redis://redis:6379/0 \
TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata
TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata \
DEBUG=1

ENV PROCRASTINATE_APP="ingestors.tasks.app"

USER app
CMD ingestors process
CMD ["pytest"]
25 changes: 20 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
INGEST=ghcr.io/openaleph/ingest-file
INGEST=ghcr.io/openaleph/ingest-file
COMPOSE=docker compose
COMPOSE_E2E=docker compose -f docker-compose.e2e.yml
DOCKER=$(COMPOSE) run --rm ingest-file

.PHONY: build
Expand All @@ -9,12 +11,18 @@ all: build shell
build:
$(COMPOSE) build --no-rm --parallel

build-macos:
DOCKER_BUILDKIT=0 COMPOSE_DOCKER_CLI_BUILD=0 $(COMPOSE) build --no-rm --parallel

build-base:
docker build . -f Dockerfile.base -t ghcr.io/openaleph/ingest-file-base:latest

build-cache:
docker build . --cache-from ghcr.io/openaleph/ingest-file:cache -t ghcr.io/openaleph/ingest-file:cache

build-test:
$(COMPOSE) build test-ingest-file

build-macos:
DOCKER_BUILDKIT=0 COMPOSE_DOCKER_CLI_BUILD=0 $(COMPOSE) build --no-rm --parallel

services:
$(COMPOSE) up -d --remove-orphans postgres redis

Expand All @@ -30,8 +38,11 @@ format:
format-check:
black --check .

test: build services
PYTHONDEVMODE=1 PYTHONTRACEMALLOC=1 $(DOCKER) pytest --cov=ingestors --cov-report html --cov-report term
test: build-test services
PYTHONDEVMODE=1 PYTHONTRACEMALLOC=1 $(COMPOSE) run --rm test-ingest-file pytest

test-e2e: build services
$(COMPOSE_E2E) run --rm ingest-file

restart: build
$(COMPOSE) up --force-recreate --no-deps --detach ingest-file
Expand All @@ -53,3 +64,7 @@ clean:
dev:
python3 -m pip install --upgrade pip
python3 -m pip install -q -r requirements-dev.txt

documentation:
mkdocs build
aws --profile nbg1 --endpoint-url https://s3.investigativedata.org s3 sync ./site s3://openaleph.org/docs/lib/ingest-file
29 changes: 10 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,17 @@
# ingestors
[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://openaleph.org/docs/lib/ingest-file/)
[![Python test and package](https://github.com/openaleph/ingest-file/actions/workflows/build.yml/badge.svg)](https://github.com/openaleph/ingest-file/actions/workflows/build.yml)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![Coverage Status](https://coveralls.io/repos/github/openaleph/ingest-file/badge.svg?branch=main)](https://coveralls.io/github/openaleph/ingest-file?branch=main)
[![AGPLv3+ License](https://img.shields.io/pypi/l/ftmq)](./LICENSE)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)

``ingestors`` extract useful information from documents of different types in
a structured standard format. It retains folder structures across directories,
compressed archives and emails. The extracted data is formatted as Follow the
Money (FtM) entities, ready for import into Aleph, or processing as an object
graph.
# ingest-file

Supported file types:
``ingest-file`` extract useful information from documents of different types in a structured standard format. It retains folder structures across directories, compressed archives and emails. The extracted data is formatted as [Follow the Money (FtM)](https://followthemoney.tech) entities, ready for import into [OpenAleph](https://openaleph.org), or processing as an object graph.

* Plain text
* Images
* Web pages, XML documents
* PDF files
* Emails (Outlook, plain text)
* Archive files (ZIP, Rar, etc.)
## Documentation

Other features:

* Extendable and composable using classes and mixins.
* Generates FollowTheMoney objects to a database as result objects.
* Lightweight worker-style support for logging, failures and callbacks.
* Thoroughly tested.
https://openaleph.org/docs/lib/ingest-file

## Development environment

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.24.0
5.0.0-rc5
69 changes: 69 additions & 0 deletions contrib/install_deb.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash
# works for current debian testing (trixie/sid)

apt-get install build-essential locales ca-certificates \
python3-pip python3-dev python3-pil \
tesseract-ocr libtesseract-dev libleptonica-dev pkg-config\
libxslt1-dev libpq-dev libldap2-dev libsasl2-dev \
zlib1g-dev libicu-dev libxml2-dev \
unrar p7zip-full \
libmediainfo-dev \
imagemagick-6-common imagemagick mdbtools djvulibre-bin \
libtiff5-dev libjpeg-dev libfreetype-dev libwebp-dev \
libtiff-tools ghostscript librsvg2-bin jbig2dec \
pst-utils \
tesseract-ocr-eng \
tesseract-ocr-swa \
tesseract-ocr-swe \
tesseract-ocr-fil \
tesseract-ocr-tur \
tesseract-ocr-ukr \
tesseract-ocr-nld \
tesseract-ocr-nor \
tesseract-ocr-pol \
tesseract-ocr-por \
tesseract-ocr-ron \
tesseract-ocr-rus \
tesseract-ocr-slk \
tesseract-ocr-slv \
tesseract-ocr-spa \
tesseract-ocr-sqi \
tesseract-ocr-srp \
tesseract-ocr-ind \
tesseract-ocr-isl \
tesseract-ocr-ita \
tesseract-ocr-kan \
tesseract-ocr-kat \
tesseract-ocr-khm \
tesseract-ocr-lav \
tesseract-ocr-lit \
tesseract-ocr-mkd \
tesseract-ocr-mya \
tesseract-ocr-mlt \
tesseract-ocr-msa \
tesseract-ocr-est \
tesseract-ocr-fin \
tesseract-ocr-fra \
tesseract-ocr-frk \
tesseract-ocr-heb \
tesseract-ocr-hin \
tesseract-ocr-hrv \
tesseract-ocr-hye \
tesseract-ocr-hun \
tesseract-ocr-bul \
tesseract-ocr-cat \
tesseract-ocr-ces \
tesseract-ocr-nep \
tesseract-ocr-dan \
tesseract-ocr-deu \
tesseract-ocr-ell \
tesseract-ocr-afr \
tesseract-ocr-ara \
tesseract-ocr-aze \
tesseract-ocr-bel \
tesseract-ocr-uzb \
libreoffice fonts-opensymbol hyphen-fr hyphen-de \
hyphen-en-us hyphen-it hyphen-ru fonts-dejavu fonts-dejavu-core fonts-dejavu-extra \
fonts-droid-fallback fonts-dustin fonts-f500 fonts-fanwood fonts-freefont-ttf \
fonts-liberation fonts-lmodern fonts-lyx fonts-sil-gentium fonts-texgyre \
fonts-tlwg-purisa
Loading
Loading