Skip to content

Commit 61769fe

Browse files
committed
Added blog post "docTR joins PyTorch Ecosystem: From Pixels to Data, Building a Recognition Pipeline with PyTorch and docTR"
Signed-off-by: Chris Abraham <cjyabraham@gmail.com>
1 parent d60d010 commit 61769fe

File tree

7 files changed

+169
-0
lines changed

7 files changed

+169
-0
lines changed
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
layout: blog_detail
3+
title: "docTR joins PyTorch Ecosystem: From Pixels to Data, Building a Recognition Pipeline with PyTorch and docTR"
4+
author: Olivier Dulcy & Sebastian Olivera, Mindee
5+
---
6+
7+
![docTR logo](/assets/images/doctr-joins-pytorch-ecosystem/fg1.png){:style="width:100%;display: block;max-width:400px; margin-left:auto; margin-right:auto;"}
8+
9+
We’re thrilled to announce that the docTR project has been integrated into the PyTorch ecosystem! This integration ensures that docTR aligns with PyTorch’s standards and practices, giving developers a reliable, community-backed solution for powerful OCR workflows.
10+
11+
**For more information on what it means to be a PyTorch ecosystem project, see the [PyTorch Ecosystem Tools page](https://pytorch.org/ecosystem/).**
12+
13+
14+
## About docTR
15+
16+
docTR is an Apache 2.0 project developed and distributed by [Mindee](https://www.mindee.com/) to help developers integrate OCR capabilities into applications with no prior knowledge required.
17+
18+
To quickly and efficiently extract text information, docTR uses a two-stage approach:
19+
20+
21+
22+
* First, it performs text **detection** to localize words.
23+
* Then, it conducts text **recognition** to identify all characters in a word.
24+
25+
**Detection** and **recognition** are performed by state-of-the-art models written in PyTorch. To learn more about this approach, you can refer [to the docTR documentation](https://mindee.github.io/doctr/using_doctr/using_models.html).
26+
27+
docTR enhances the user experience in PyTorch projects by providing high-performance OCR capabilities right out of the box. Its specially designed models require minimal to no fine-tuning for common use cases, allowing developers to quickly integrate advanced document analysis features.
28+
29+
30+
## Local installation
31+
32+
docTR requires Python >= 3.10 and supports Windows, Mac and Linux. Please refer to our [README](https://github.com/mindee/doctr?tab=readme-ov-file#installation) for necessary dependencies for MacBook with the M1 chip.
33+
34+
```
35+
pip3 install -U pip
36+
pip3 install "python-doctr[torch,viz]"
37+
```
38+
39+
This will install docTR along with the latest version of PyTorch.
40+
41+
42+
```
43+
Note: docTR also provides docker images for an easy deployment, such as a part of Kubernetes cluster.
44+
```
45+
46+
47+
48+
## Text recognition
49+
50+
Now, let’s try docTR’s OCR recognition on this sample:
51+
52+
53+
![OCR sample](/assets/images/doctr-joins-pytorch-ecosystem/fg2.png){:style="width:100%;display: block;max-width:300px; margin-left:auto; margin-right:auto;"}
54+
55+
56+
The OCR recognition model expects an image with only one word on it and will output the predicted word with a confidence score. You can use the following snippet to test OCR capabilities from docTR:
57+
58+
```
59+
python
60+
from doctr.io import DocumentFile
61+
from doctr.models import recognition_predictor
62+
63+
doc = DocumentFile.from_images("/path/to/image")
64+
65+
# Load the OCR model
66+
# This will download pre-trained models hosted by Mindee
67+
model = recognition_predictor(pretrained=True)
68+
69+
result = model(doc)
70+
print(result)
71+
```
72+
73+
Here, the most important line of code is `model = recognition_predictor(pretrained=True)`. This will load a default text recognition model,** **`crnn_vgg16_bn`, but you can select other models through the `arch` parameter. You can check out the [available architectures](https://mindee.github.io/doctr/using_doctr/using_models.html).
74+
75+
When run on the sample, the recognition predictor retrieves the following data: `[('MAGAZINE', 0.9872216582298279)]`
76+
77+
78+
```
79+
Note: using the DocumentFile object docTR provides an easy way to manipulate PDF or Images.
80+
```
81+
82+
83+
84+
## Text detection
85+
86+
The last example was a crop on a single word. Now, what about an image with several words on it, like this one?
87+
88+
89+
![photo of magazines](/assets/images/doctr-joins-pytorch-ecosystem/fg3.jpg){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
90+
91+
92+
A text detection model is used before the text recognition to output a segmentation map representing the location of the text. Following that, the text recognition is applied on every detected patch.
93+
94+
Below is a snippet to run only the detection part:
95+
96+
```
97+
from doctr.io import DocumentFile
98+
from doctr.models import detection_predictor
99+
from matplotlib import pyplot as plt
100+
from doctr.utils.geometry import detach_scores
101+
from doctr.utils.visualization import draw_boxes
102+
103+
doc = DocumentFile.from_images("path/to/my/file")
104+
model = detection_predictor(pretrained=True)
105+
106+
result = model(doc)
107+
108+
draw_boxes(detach_scores([result[0]["words"]])[0][0], doc[0])
109+
plt.axis('off')
110+
plt.show()
111+
```
112+
113+
Running it on the full sample yields the following:
114+
115+
116+
![photo of magazines](/assets/images/doctr-joins-pytorch-ecosystem/fg4.png){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
117+
118+
119+
Similarly to the text recognition, `detection_predictor` will load a default model (`fast_base `here). You can also load another one by providing it through the `arch` parameter.
120+
121+
122+
## The full implementation
123+
124+
Now, let’s plug both components into the same pipeline.
125+
126+
Conveniently, docTR provides a wrapper that does exactly that for us:
127+
128+
```
129+
from doctr.io import DocumentFile
130+
from doctr.models import ocr_predictor
131+
132+
doc = DocumentFile.from_images("/path/to/image")
133+
134+
model = ocr_predictor(pretrained=True, assume_straight_pages=False)
135+
136+
result = model(doc)
137+
result.show()
138+
```
139+
140+
![photo of magazines](/assets/images/doctr-joins-pytorch-ecosystem/fg5.png){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
141+
142+
The last line should display a matplotlib window which shows the detected patches. Hovering the mouse over them will display their contents.
143+
144+
You can also do more with this output, such as reconstituting a synthetic document like so:
145+
146+
```
147+
import matplotlib.pyplot as plt
148+
149+
synthetic_pages = result.synthesize()
150+
plt.imshow(synthetic_pages[0])
151+
plt.axis('off')
152+
plt.show()
153+
```
154+
155+
![black text on white](/assets/images/doctr-joins-pytorch-ecosystem/fg6.png){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
156+
157+
158+
The pipeline is highly customizable, where you can modify the detection or recognition model behaviors by passing arguments to the `ocr_predictor`. Please refer to the [documentation](https://mindee.github.io/doctr/using_doctr/using_models.html) to learn more about it.
159+
160+
161+
## Conclusion
162+
163+
We’re excited to welcome docTR into the PyTorch Ecosystem, where it seamlessly integrates with PyTorch pipelines to deliver state-of-the-art OCR capabilities right out of the box.
164+
165+
By empowering developers to quickly extract text from images or PDFs using familiar tooling, docTR simplifies complex document analysis tasks and enhances the overall PyTorch experience.
166+
167+
We invite you to explore the [docTR GitHub repository](https://github.com/mindee/doctr), join the [docTR community on Slack](https://slack.mindee.com/), and reach out at contact@mindee.com for inquiries or collaboration opportunities.
168+
169+
Together, we can continue to push the boundaries of document understanding and develop even more powerful, accessible tools for everyone in the PyTorch community.
7.1 KB
Loading
3.94 KB
Loading
180 KB
Loading
581 KB
Loading
902 KB
Loading
25.9 KB
Loading

0 commit comments

Comments
 (0)