pypdf is a library and hence its users are developers. This document is not for the users, but for people who want to work on pypdf itself.
pip install -r requirements/dev.txt
See testing pypdf with pytest.
The reason for having the submodule sample-files
is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.
The resources
folder should contain a select set of core examples that cover
most cases we typically want to test for. The sample-files
might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
PDF producers.
In order to get the sample-files folder, you need to execute:
git submodule update --init
Git is a command line application for version control. If you don't know it, you can play ohmygit to learn it.
GitHub is the service where the pypdf project is hosted. While git is free and open source, GitHub is a paid service by Microsoft, but free in a lot of cases.
pre-commit is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered pre-commit install
once in your local copy of pypdf, it will automatically be executed when
you git commit
.
Having a clean commit message helps people to quickly understand what the commit is about, without actually looking at the changes. The first line of the commit message is used to auto-generate the CHANGELOG. For this reason, the format should be:
PREFIX: DESCRIPTION
BODY
The PREFIX
can be:
SEC
: Security improvements. Typically an infinite loop that was possible.BUG
: A bug was fixed. Likely there is one or multiple issues. Then write in theBODY
:Closes #123
where 123 is the issue number on GitHub. It would be absolutely amazing if you could write a regression test in those cases. That is a test that would fail without the fix. A bug is always an issue for pypdf users - test code or CI that was fixed is not considered a bug here.ENH
: A new feature! Describe in the body what it can be used for.DEP
: A deprecation. Either marking something as "this is going to be removed" or actually removing it.PI
: A performance improvement. This could also be a reduction in the file size of PDF files generated by pypdf.ROB
: A robustness change. Dealing better with broken PDF files.DOC
: A documentation change.TST
: Adding or adjusting tests.DEV
: Developer experience improvements, e.g. pre-commit or setting up CI.MAINT
: Quite a lot of different stuff. Performance improvements are for sure the most interesting changes in here. Refactorings as well.STY
: A style change. Something that makes pypdf code more consistent. Typically a small change. It could also be better error messages for end users.
The prefix is used to generate the CHANGELOG. Every PR must have exactly one - if you feel like several match, take the top one from this list that matches for your PR.
Smaller Pull Requests (PRs) are preferred as it's typically easier to merge them. For example, if you have some typos, a few code-style changes, a new feature, and a bug-fix, that could be 3 or 4 PRs.
A PR must be complete. That means if you introduce a new feature it must be finished within the PR and have a test for that feature.
We need to keep an eye on performance and thus we have a few benchmarks.