Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions PYTHON APPS/PDF-Text-Extractor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# PDF-Text-Extractor
This GUI Application allows you to extract the texgt from the PDF files. The project is build using the PyPDF2 library for extracting text from PDFs, and the tkinter library for creating the GUI.

## Getting Started
To run the project, you will need to have Python and pip installed on your system.

### Installation
1. Clone or download the repository to your local machine.

```
git clone https://github.com/SamAddy/PDF-Extract-Text.git
```

2. Enter the working directory.

```
cd PDF-Extract-Text
```

3. Use pip to install the required libraries.

```
pip install -r requirements.txt
```

### Usage
1. Run the app using the following command:

```
python app.py
```

2. A GUI window will appear, with a button to selecgt the PDF file you want to extract text from.

3. Once you have selected the file, the text will be extracted and displayed in the text box.

4. You can also save the text to a file by clicking 'Save' button.

<!--
<p align="center">
<img src="https://github.com/SamAddy/PDF-Extract-Text/blob/main/Stage1.png" width=50% alt="Browse file"/>
<img src="https://github.com/SamAddy/PDF-Extract-Text/blob/main/Stage2.png" width=50% alt="Display extractedtext">
</p>


<p align="center">
![Browse file](https://github.com/SamAddy/PDF-Extract-Text/blob/main/Stage1.png)
![Diplay text in textbox](https://github.com/SamAddy/PDF-Extract-Text/blob/main/Stage2.png)
</p>
-->

<table align="center">
<tr>
<td>
<img src="https://github.com/SamAddy/PDF-Extract-Text/blob/main/Stage1.png" alt="image1" width="400"/>
</td>
<td>
<img src="https://github.com/SamAddy/PDF-Extract-Text/blob/main/Stage2.png" alt="image2" width="400"/>
</td>
</tr>
</table>



### Note
Please keep in mind that not all pdfs are created equal, and some pdfs may have text in an image format or other format that may not be extractable with PyPDF2.

### Built With
* [Python](https://www.python.org/) - The programming language used.
* [PYPDF2](https://pypi.org/project/PyPDF2/) - A library for extracting text from PDF files.
* [Tkinter](https://docs.python.org/3/library/tk.html) - A library for creating GUI in Python.

### Contributing
Contributions are absolutely welcome. If you have an idea for an improvement, please open an issue or submit a pull request.

### Acknowledgement
* Inspiration [Mariya Sha](https://github.com/MariyaSha/PDFextract_text)
Binary file added PYTHON APPS/PDF-Text-Extractor/Stage1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added PYTHON APPS/PDF-Text-Extractor/Stage2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions PYTHON APPS/PDF-Text-Extractor/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import tkinter as tk
import PyPDF2
from PIL import Image, ImageTk
from tkinter.filedialog import askopenfile

root = tk.Tk()
root.title('PDF to TEXT')
root.iconbitmap('./logo.png')
root.resizable(False, False)


canvas = tk.Canvas(root, width=600, height=400)
canvas.grid(columnspan=3, rowspan=3)

# Insert logo into the window
logo = Image.open('logo2.png')
logo = ImageTk.PhotoImage(logo)
logo_label = tk.Label(image=logo)
logo_label.image = logo
logo_label.grid(column=1, row=0)

# instructions
instructions = tk.Label(root, text='Select a PDF file on your device to extract all its text.', font='calibre')
instructions.grid(columnspan=3, column=0, row=1)

# Get the PDF file on device
browse_text = tk.StringVar()
browse_btn = tk.Button(root, textvariable=browse_text, command=lambda: open_file(), font='calibre', bg='red', width=15, height=2)
browse_text.set('Browse')
browse_btn.grid(column=1, row=2)

canvas = tk.Canvas(root, width=600, height=200)
canvas.grid(columnspan=3, rowspan=3)


def open_file():
browse_text.set('On it...')
# Open the PDF file using the PdfFileReader object
file = askopenfile(parent=root, mode='rb', title='Choose a file', filetypes=[('PDF file', '*.pdf')])
text = ""

if file:
read_pdf = PyPDF2.PdfReader(file)
for i in range(len(read_pdf.pages)):
text += read_pdf.pages[i].extract_text()

text_box = tk.Text(root, height=10, width=50, padx=15, pady=15)
text_box.insert(1.0, text)
text_box.tag_config('center', justify='center')
text_box.tag_add('center', 1.0, 'end')
text_box.grid(column=1, row=3)

browse_text.set('Browse')


def convert_to_docx():
pass


root.mainloop()
Binary file added PYTHON APPS/PDF-Text-Extractor/logo2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added PYTHON APPS/PDF-Text-Extractor/random_text.pdf
Binary file not shown.
Binary file added PYTHON APPS/PDF-Text-Extractor/requirements.txt
Binary file not shown.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,4 @@ guide [HERE](https://github.com/larymak/Python-project-Scripts/blob/main/CONTRIB
| 64 | [Umbrella Reminder](https://github.com/larymak/Python-project-Scripts/tree/main/TIME%20SCRIPTS/Umbrella%20Reminder) | [Edula Vinay Kumar Reddy](https://github.com/vinayedula) |
| 65 | [Image to PDF](https://github.com/larymak/Python-project-Scripts/tree/main/IMAGES%20%26%20PHOTO%20SCRIPTS/Image%20to%20PDF) | [Vedant Chainani](https://github.com/Envoy-VC) |
| 66 | [KeyLogger](https://github.com/larymak/Python-project-Scripts/tree/main/OTHERS/KeyLogger) | [Akhil](https://github.com/akhil-chagarlamudi) |
| 67 | [PDF Text Extractor](https://github.com/SamAddy/Python-project-Scripts/tree/main/PYTHON%20APPS/PDF-Text-Extractor) | [Samuel Addison](https://github.com/SamAddy) |