Bugfix: Add max_line_length to search_code_advanced to prevent token flooding #22

marksolly · 2025-08-08T06:39:21Z

Introduces a new max_line_length parameter to the search_code_advanced tool to address an issue where search results containing very long lines (e.g., from minified JavaScript files) could lead to excessive token usage and unexpected behavior when consumed by Large Language Models (LLMs).

Problem:

When using the search_code_advanced tool with context_lines, if a match is found in a file with extremely long lines (such as a .min.js file), the entire line is returned as context. This can result in a massive, unexpected output, leading to:

Token Flooding: The large output consumes a significant number of tokens, which can be costly and inefficient.
LLM Input Issues: The excessive input can confuse or overwhelm LLMs, leading to poor or irrelevant responses.
Performance Degradation: Processing and transmitting large amounts of unnecessary data can slow down the entire workflow.

Solution:

This PR introduces a max_line_length parameter to the search_code_advanced tool, which defaults to 200 characters. This parameter truncates any line in the search results that exceeds the specified length, appending ... (truncated) to indicate that the line has been shortened.

Key Changes:

src/code_index_mcp/server.py: The search_code_advanced tool now accepts a max_line_length parameter with a default value of 200.
src/code_index_mcp/services/search_service.py: The search_code method now accepts and passes the max_line_length parameter to the underlying search strategies.
src/code_index_mcp/search/base.py: The parse_search_output function now includes logic to truncate lines based on max_line_length. The SearchStrategy abstract base class has been updated to include this parameter in the search method.
src/code_index_mcp/search/*.py: All concrete SearchStrategy implementations (ugrep, ripgrep, ag, grep, and basic) have been updated to accept and utilize the max_line_length parameter.

johnhuang316 · 2025-09-09T08:09:04Z

Suggestion: Default to No Limit for Backward Compatibility

Great work on this PR! The max_line_length parameter is a valuable addition that solves the token flooding issue with minified files.

However, I'd like to suggest changing the default value from 200 to None (no limit) for better backward compatibility:

Current PR: max_line_length: int = 200
Suggested: max_line_length: int = None

Rationale:

Preserves existing behavior - Users won't see unexpected truncation without explicitly opting in
Follows principle of least surprise - Tools should behave predictably by default
Optional enhancement - Users who need truncation can explicitly set the parameter
Gradual adoption - Can be documented with usage recommendations rather than forced

Implementation:

I've implemented this change along with comprehensive unit tests. The modification ensures:

Default behavior remains unchanged (no truncation)
All search strategies properly handle max_line_length=None
Truncation works correctly when explicitly set
Full test coverage for all scenarios

Recommendation for Users:

The feature can be documented with a recommendation like:

Tip: When searching files that may contain very long lines (like minified JS), consider setting max_line_length=200 to prevent excessive token usage.

This approach gives users control while protecting them from unexpected behavior changes.

What do you think about this approach?

Limit response line length in search_code_advanced

4ed55d9

marksolly mentioned this pull request Aug 8, 2025

Results may contain extremely long lines marksolly/code-scope-mcp#2

Closed

johnhuang316 force-pushed the master branch from 98e22c8 to 6f7f8d0 Compare August 11, 2025 05:01

johnhuang316 merged commit 4ed55d9 into johnhuang316:master Sep 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix: Add max_line_length to search_code_advanced to prevent token flooding #22

Bugfix: Add max_line_length to search_code_advanced to prevent token flooding #22

Uh oh!

marksolly commented Aug 8, 2025

Uh oh!

johnhuang316 commented Sep 9, 2025

Uh oh!

Uh oh!

Bugfix: Add max_line_length to search_code_advanced to prevent token flooding #22

Bugfix: Add max_line_length to search_code_advanced to prevent token flooding #22

Uh oh!

Conversation

marksolly commented Aug 8, 2025

Uh oh!

johnhuang316 commented Sep 9, 2025

Suggestion: Default to No Limit for Backward Compatibility

Rationale:

Implementation:

Recommendation for Users:

Uh oh!

Uh oh!