Skip to content

Wrote a high-performance script that compares two massive Excel datasets by company codes, identifies duplicates and mismatches, and generates a summary report. Optimized for processing hundreds of thousands of rows. Tools: Python, pandas, openpyxl

Notifications You must be signed in to change notification settings

fudik94/bigdata_list_comporing_script_v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Excel File Comparator

This project compares two large Excel files by company code.
It finds which companies exist in each file, detects name mismatches, and creates a clean merged report.

Features

  • Reads two Excel files
  • Cleans and normalizes company codes and names
  • Detects which entries exist in each file
  • Flags different names for the same code
  • Handles hundreds of thousands of rows efficiently
  • Saves a final report in Excel format

Requirements

  • Python 3.9 or higher
  • pandas library

Install dependencies:

pip install pandas openpyxl

About

Wrote a high-performance script that compares two massive Excel datasets by company codes, identifies duplicates and mismatches, and generates a summary report. Optimized for processing hundreds of thousands of rows. Tools: Python, pandas, openpyxl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages