pdf-extraction

Star

Here are 29 public repositories matching this topic...

pytr-org / pytr

Star

Use TradeRepublic in terminal and mass download all documents

portfolio finance terminal-app portfolio-performance pdf-extraction traderepublic-statements traderepublic

Updated Sep 28, 2025
Python

mateogon / pdf-narrator

Star

Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.

pdf text-to-speech audiobook tts epub low-resource pdf-extraction pdf-to-audiobook immersive-reading kokoro-tts audiobook-generator pdf-audiobook

Updated Mar 28, 2025
Python

iamarunbrahma / pdf-to-markdown

Star

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

python information-retrieval document-conversion pdf-converter text-extraction pdf-parsing document-processing rag pdf-extraction retrieval-augmented-generation pdf-to-markdown

Updated Nov 22, 2024
Python

rrayhka / GRI-Extractor

Star

A tool to automatically extract GRI disclosure codes from corporate sustainability reports, enabling efficient analysis of environmental, social, and governance (ESG) data. Supports English and Indonesian reports.

python nlp machine-learning pattern-matching tf-idf gri groq pdf-extraction streamlit sustainability-developoment-goals llm sustainability-reporting

Updated Jun 9, 2025
Python

billy-enrizky / pdf-extraction

Star

Scalable PDF Extraction using Multimodal GPT 4o

pdf-extraction llm gpt-4o

Updated Aug 25, 2025
Python

souvik03-136 / TenderBot

Star

kdeepak2001 / jobyaari-pdf-extractor

Star

AI-powered PDF job data extractor using FREE Google Gemini API

python automation data-extraction data-processing pdf-extraction

Updated Oct 5, 2025
Python

heijul / pdf2gtfs

Star

A python tool to extract schedule data from PDF timetables and output it in GTFS.

gtfs pdf-extraction

Updated Sep 5, 2023
Python

bylickilabs / pdfAnalyzer

Sponsor

Star

PDF Analyzer** ist ein effizientes Python-Tool zur automatischen Analyse von PDF-Dokumenten.

python cli open-source metadata pdf text-mining automation reporting document-analysis document-processing file-analyzer pdf-extraction streamlit pdf-analysis file-inspector

Updated Jun 30, 2025
Python

vatsalmehta2001 / MLPapers_scraper-summarizer

Star

A web application that scrapes ML research papers from arXiv and generates summaries using either OpenAI or Claude API.

flask machine-learning summarization research-papers pdf-extraction arxiv-papers openai-api claude-api

Updated Apr 21, 2025
Python

RaghuSharma14 / PDF-Reader

Star

A PDF Reader application powered by AI, allowing users to upload PDF documents and extract meaningful information using advanced NLP models. Built with Streamlit, Transformers, and Langchain, this app provides a seamless interface for interacting with and analyzing PDF content.

machine-learning automation transformers text-extraction pdf-reader pdf-extraction streamlit pdf-analysis langchain natural-language-processing-nlp

Updated Apr 24, 2025
Python

LorysHamadache / pdf2txt-multipage-extractor

Star

Fast batch tool to extract first-page text from all PDFs in a folder using Python. Optimized with multiprocessing to handle thousands of PDFs efficiently.

multiprocessing multithreading pdf-extraction

Updated Mar 29, 2025
Python

tracywong117 / extract-info-from-pdf-paper

Star

This Python script uses pdfminer.six, PyPDF2, pdf2image to extract information (text, image) from pdf paper.

pdf pdf-extraction

Updated Feb 2, 2024
Python

Amartya-007 / Pdf-Reader

Star

Making an app so that we can read and extract information from prf easily or chat with our pdfs.

pdf question-answering google-api-client pdf-extraction streamlit generative-ai

Updated Aug 11, 2024
Python

Vejandlachakrish / PersonaPrep-Persona-Aligned-Educational-PDF-Extractor

Star

Extracts and ranks example problems, derivations, and formulas from physics PDFs using PyMuPDF. Fully containerized with Docker, no network access required.

python nlp docker json scripts pymupdf pdf-extraction adobe-hackathon

Updated Jul 28, 2025
Python

rishisolanke / PDF_Query_Langchain

Star

PDF Query LangChain is a tool that extracts and queries information from PDF documents using advanced language processing. Leveraging LangChain, OpenAI, and Cassandra, this app enables efficient, interactive querying of PDF content. Ideal for data analysis, research, and automated reporting, it simplifies detailed document analysis with ease.

python nlp natural-language-processing artificial-intelligence openai data-analysis research-tool pdf-extraction pdf-analysis langchain document-query

Updated Jul 23, 2024
Python

Atul-vaibhav / OCR-Extraction-Using-Python

Star

Extract text from images and PDFs using python and store in a JSON Format. Store the extracted in MYSQL database.

mysql python json ocr image-processing python3 ocr-recognition ocr-text-reader ocr-python pdf-extraction

Updated Jun 15, 2025
Python

siddharth-nandagopal / billionaires-rag-query

Star

Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's billionaires list. Extracts tabular data from PDFs, converts to multiple formats, and enables precise queries about net worth, age, and more. Integrates with Poetry and asdf for easy setup and management.

Updated Oct 17, 2024
Python

looptech-ai / docling-extractor

Star

Production-ready document processing API with Docker, Prometheus metrics, multilingual OCR, and Whisper transcription

multilingual docker ocr prometheus whisper document-processing fastapi pdf-extraction

Updated Oct 3, 2025
Python

FTiniNadhirah / Text-Preprocessing

Star

python text-mining anaconda preprocessing merge-pdf pdf-extraction

Updated Sep 12, 2019
Python

Improve this page

Add a description, image, and links to the pdf-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-extraction

Here are 29 public repositories matching this topic...

pytr-org / pytr

mateogon / pdf-narrator

iamarunbrahma / pdf-to-markdown

rrayhka / GRI-Extractor

billy-enrizky / pdf-extraction

souvik03-136 / TenderBot

kdeepak2001 / jobyaari-pdf-extractor

heijul / pdf2gtfs

bylickilabs / pdfAnalyzer

vatsalmehta2001 / MLPapers_scraper-summarizer

RaghuSharma14 / PDF-Reader

LorysHamadache / pdf2txt-multipage-extractor

tracywong117 / extract-info-from-pdf-paper

Amartya-007 / Pdf-Reader

Vejandlachakrish / PersonaPrep-Persona-Aligned-Educational-PDF-Extractor

rishisolanke / PDF_Query_Langchain

Atul-vaibhav / OCR-Extraction-Using-Python

siddharth-nandagopal / billionaires-rag-query

looptech-ai / docling-extractor

FTiniNadhirah / Text-Preprocessing

Improve this page

Add this topic to your repo