Skip to content

feat: add minify option to strip or compress large non-text files in digest #26

@escorciav

Description

@escorciav

Thank you for the great tool! While using Gitingest, I encountered a challenge with large repositories containing irrelevant or extraneous information in the .txt output. This includes files such as:

  • Jupyter Notebooks with embedded images.
  • CSV files with results or large datasets.
  • (possibly) Binaries or other non-textual data.

These often inflate the file size unnecessarily, making it difficult for downstream tools (e.g., ChatGPT) to process them efficiently.

Feature Request:

  • Add a minify option to exclude or summarize such files in the output. This option could work by:
    • Stripping or summarizing Jupyter Notebooks (e.g., excluding image data or non-code cells).
    • Skipping large files like CSVs or binaries entirely.
    • Allowing configurable file-type or size exclusions (e.g., via an argument like --exclude ".csv,.ipynb,*.bin" or --max-file-size 5MB).

Use Case:
This would be particularly useful for repositories where only code and textual content are relevant, improving the usability and performance of gitingest outputs.

Proposed Approach:
A potential implementation could involve:

  • Scanning file extensions or MIME types to exclude or process specific formats.
  • Adding a size threshold to skip overly large files.
  • Leveraging libraries like nbconvert for Jupyter Notebook minification.

Thank you for considering this enhancement!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions