Skip to content

Conversation

catileptic
Copy link
Collaborator

@catileptic catileptic commented Feb 28, 2025

Changes in this PR:

  • LICENSE & README files were excluded from .dockerignore in order for them to be copied into the container - otherwise, as per pyproject.toml, they don't exist and the build fails
  • ingest-file container bumped up to use python:3.11-slim because using a Python version that is at last equal to 3.11 is a requirement for nomenklatura
  • the Dockerfile was refactored to allow for all dependencies to be properly installed using this new base Python version
  • docker-compose.yml builds ingest-file locally instead of pulling the version from Alephdata repos
  • refactored code as a result of bumping the fingerprints lib version (required for nomenklatura)
  • as a result of refactoring servicelayer to use the modern importlib.metadata package, instead of the deprecated pkg_resources, a modification was also necessary to make the entry points in ingest-file discoverable
  • added pyproject.toml
  • ingest-file now uses the InvestigativeData version of servicelayer
  • substituted the unmaintained cchardet with faust-cchardet

@catileptic catileptic merged commit 0fc3567 into main Feb 28, 2025
1 check passed
@catileptic catileptic deleted the feature/python-3.11-poetry branch February 28, 2025 12:30
catileptic added a commit that referenced this pull request Apr 21, 2025
* Refactor ingest-file to prepare it for using nomenklatura (#2)

* Remove README & LICENSE from .dockerignore

* Refactor ingest-file to be compatible with nomenklatura

* Make linter happy

* Add poetry.lock

* Build whisper.cpp

* Add Whispercpp to Dockerfile from multi-stage build

* Launch subprocess for transcription. Refactor error handling.

* Make image build architecture-independent. Set model.

* Aesthtic adjustments to Dockerfile

* Apply transcription logic to audio and video. Add tests.

* The Processing of audio/video isn't a failure is transcription fails

* Make linter happy

* Transcription timeout as env var

* Fix wrong import from settings

* Cast env var timeout to int
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant