-
Notifications
You must be signed in to change notification settings - Fork 865
Open
Labels
Description
Thank you for the great tool! While using Gitingest, I encountered a challenge with large repositories containing irrelevant or extraneous information in the .txt output. This includes files such as:
- Jupyter Notebooks with embedded images.
- CSV files with results or large datasets.
- (possibly) Binaries or other non-textual data.
These often inflate the file size unnecessarily, making it difficult for downstream tools (e.g., ChatGPT) to process them efficiently.
Feature Request:
- Add a minify option to exclude or summarize such files in the output. This option could work by:
- Stripping or summarizing Jupyter Notebooks (e.g., excluding image data or non-code cells).
- Skipping large files like CSVs or binaries entirely.
- Allowing configurable file-type or size exclusions (e.g., via an argument like --exclude ".csv,.ipynb,*.bin" or --max-file-size 5MB).
Use Case:
This would be particularly useful for repositories where only code and textual content are relevant, improving the usability and performance of gitingest outputs.
Proposed Approach:
A potential implementation could involve:
- Scanning file extensions or MIME types to exclude or process specific formats.
- Adding a size threshold to skip overly large files.
- Leveraging libraries like nbconvert for Jupyter Notebook minification.
Thank you for considering this enhancement!