🤖 An intelligent n8n workflow that automatically processes PDF receipts/invoices from Gmail, extracts key data using Google Gemini AI, and logs the information to Google Sheets.
- 📧 Auto Email Monitoring: Watches Gmail for PDF attachments
- 🔍 AI-Powered Extraction: Uses Google Gemini to intelligently extract receipt data
- 📊 Google Sheets Integration: Automatically logs data to spreadsheet
- 🎯 Smart Field Detection: Extracts supplier name, order date, amount, currency, etc.
- 📱 Status Notifications: Sends confirmation emails after processing
- 🔄 Real-time Processing: Processes receipts as they arrive
The workflow follows this intelligent process:
- Gmail Trigger - Monitors for PDF attachments from specific senders
- File Handler - Saves PDF to temporary processing directory
- PDF Text Extraction - Converts PDF to text using pdftotext command
- AI Agent - Orchestrates the data extraction and logging process
- Google Gemini - Intelligently extracts structured data from receipt text
- Google Sheets Tool - Appends extracted data to tracking spreadsheet
- Gmail Notification - Sends status confirmation email
The AI extracts these 6 key fields from each receipt:
Field | Description | Example |
---|---|---|
Supplier name | Company/vendor name | "Amazon", "Starbucks" |
Order date | Transaction date | "2024-01-15" |
Order number | Invoice/receipt number | "ORD-12345" |
Supplier email | Vendor contact email | "noreply@vendor.com" |
Total amount | Final amount paid | "25.99" |
Currency | 3-letter ISO code | "USD", "EUR", "THB" |
- n8n instance (self-hosted or cloud)
- Linux/Mac system with command line access
pdftotext
utility installed
# Ubuntu/Debian
sudo apt-get install poppler-utils
# macOS
brew install poppler
# CentOS/RHEL
sudo yum install poppler-utils
- Visit Google Cloud Console
- Enable Generative AI API
- Create API key for Gemini
⚠️ Important: Monitor billing - this API has usage costs!
- Create OAuth2 credentials in Google Cloud Console
- Add authorized redirect URIs for your n8n instance
- Enable Gmail API
- Enable Google Sheets API in Google Cloud Console
- Use the same OAuth2 credentials or create separate ones
- Create a new Google Spreadsheet
- Set up columns with these exact headers:
| Supplier name | Order date | Order number | Supplier email | Total amount | Currency |
- Copy the spreadsheet ID from the URL
- Share the sheet with your Google account used for n8n
- Download
pdf-receipt-processor-workflow.json
- In n8n: Workflows > Import from File
- Upload the JSON file
- Gmail OAuth2: Create and link in Gmail Trigger and Gmail Tool nodes
- Google Gemini API: Create and link in Google Gemini Chat Model node
- Google Sheets OAuth2: Create and link in Google Sheets Tool node
- Gmail Filter: Change
your-email@example.com
to desired sender email - File Path: Ensure
/tmp/n8n_pdf_processing/
directory exists and is writable - Google Sheets ID: Replace
YOUR_GOOGLE_SHEETS_DOCUMENT_ID
with your sheet ID - Notification Email: Update recipient email in Gmail Tool node
- Send PDF Receipt: Email a PDF receipt to the monitored Gmail account
- Auto Processing: Workflow detects the email and processes the PDF
- Data Extraction: AI extracts key information from the receipt
- Sheet Update: Data is automatically added to your Google Sheet
- Confirmation: You receive an email confirming the processing
- Open the workflow in n8n
- Click "Execute Workflow"
- Send a test PDF to trigger processing
pdftotext Command Not Found
# Verify installation
which pdftotext
pdftotext -v
# If missing, install poppler-utils
Permission Denied on File Path
# Create directory and set permissions
sudo mkdir -p /tmp/n8n_pdf_processing
sudo chmod 755 /tmp/n8n_pdf_processing
AI Extraction Accuracy Issues
- Ensure PDF text is clear and readable
- Complex receipt layouts may need prompt tuning
- Check if PDF is image-based (requires OCR)
Google Sheets Connection Failed
- Verify sheet ID is correct
- Check if sheet is shared with the service account
- Confirm column headers match exactly
High API Costs
- Review Gemini API usage in Google Cloud Console
- Optimize prompts to reduce token usage
- Set up billing alerts and quotas
- 🏷️ Extract product categories and line items
- 🧾 Support for different receipt formats
- 💳 Add payment method detection
- 🏪 Store merchant category codes
- 📊 Add data validation and error handling
- 🔄 Implement retry logic for failed extractions
- 📈 Generate monthly expense reports
- 🎯 Smart categorization of expenses
- 💼 Connect to accounting software (QuickBooks, Xero)
- 📱 Send notifications to Slack/Teams
- 🗄️ Archive processed PDFs to cloud storage
- 📊 Create expense analytics dashboard
- Use regex patterns for simple receipt formats
- Create template-based extraction rules
- Manual field mapping for consistent suppliers
- Replace Gemini with OpenAI GPT
- Use local LLM models (Ollama)
- Try Claude API for different extraction styles
For image-based PDFs:
- Add Tesseract OCR step
- Use Google Vision API
- Integrate with Azure Cognitive Services
We welcome contributions! Areas for improvement:
- 📈 Accuracy: Better extraction prompts and validation
- 🔧 Error Handling: Robust error recovery and logging
- 🎨 UI/UX: Better status reporting and user feedback
- 📊 Analytics: Usage statistics and processing insights
- 🧪 Testing: Automated testing with sample receipts
This workflow is provided as-is for educational and personal use. Please comply with:
- Google Cloud APIs terms of service
- Gmail API usage policies
- Google Sheets API guidelines
- Respect data privacy regulations (GDPR, etc.)
- PDFs are temporarily stored during processing
- Extracted data contains financial information
- Consider data retention policies
- Never commit real credentials to version control
- Use environment variables for sensitive data
- Regularly rotate API keys
- Monitor access logs and usage
- Implement proper access controls
- This workflow processes financial documents
- Ensure compliance with local privacy laws
- Consider encryption for sensitive data
- Implement data deletion policies