diff --git a/.gitignore b/.gitignore index 367a552..9539f72 100644 --- a/.gitignore +++ b/.gitignore @@ -48,4 +48,4 @@ COMMIT_MESSAGE.txt RELEASE_NOTE.txt .llm-context/ -.kiro/ \ No newline at end of file +AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..886f335 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,25 @@ +# Repository Guidelines + +## Project Structure & Module Organization +Code Index MCP lives in `src/code_index_mcp/`, with `indexing/` managing builders, `services/` exposing MCP tool implementations, `search/` coordinating query utilities, and `utils/` housing cross-cutting helpers. The lightweight CLI bootstrapper is `run.py`, which adds `src/` to `PYTHONPATH` before invoking `code_index_mcp.server`. Sample corpora for language regression reside under `test/sample-projects/` (for example `python/user_management/`). Reserve `tests/` for runnable suites and avoid checking in generated `__pycache__` artifacts. + +## Build, Test, and Development Commands +Install dependencies with `uv sync` after cloning. Use `uv run code-index-mcp` to launch the MCP server directly, or `uv run python run.py` when you need the local sys.path shim. During development, `uv run code-index-mcp --help` will list available CLI flags, and `uv run python -m code_index_mcp.server` mirrors the published entry point for debugging. + +## Coding Style & Naming Conventions +Target Python 3.10+ and follow the `.pylintrc` configuration: 4-space indentation, 100-character line limit, and restrained function signatures (<= 7 parameters). Modules and functions stay `snake_case`, classes use `PascalCase`, and constants remain uppercase with underscores. Prefer explicit imports from sibling packages (`from .services import ...`) and keep logging to stderr as implemented in `server.py`. + +## Testing Guidelines +Automated tests should live under `tests/`, mirroring the package hierarchy (`tests/indexing/test_shallow_index.py`, etc.). Use `uv run pytest` (with optional `-k` selectors) for unit and integration coverage, and stage representative fixtures inside `test/sample-projects/` when exercising new language strategies. Document expected behaviors in fixtures' README files or inline comments, and fail fast if tree-sitter support is not available for a language you add. + +## Commit & Pull Request Guidelines +Follow the Conventional Commits style seen in history (`feat`, `fix`, `refactor(scope): summary`). Reference issue numbers when relevant and keep subjects under 72 characters. Pull requests should include: 1) a concise problem statement, 2) before/after behavior or performance notes, 3) instructions for reproducing test runs (`uv run pytest`, `uv run code-index-mcp`). Attach updated screenshots or logs when touching developer experience flows, and confirm the file watcher still transitions to "active" in manual smoke tests. + +## Agent Workflow Tips +Always call `set_project_path` before invoking other tools, and prefer `search_code_advanced` with targeted `file_pattern` filters to minimize noise. When editing indexing strategies, run `refresh_index` in between changes to confirm cache rebuilds. Clean up temporary directories via `clear_settings` if you notice stale metadata, and document any new tooling you introduce in this guide. + +## Release Preparation Checklist +- Update the project version everywhere it lives: `pyproject.toml`, `src/code_index_mcp/__init__.py`, and `uv.lock`. +- Add a release note entry to `RELEASE_NOTE.txt` for the new version. +- Commit the version bump (plus any release artifacts) and push the branch to `origin`. +- Create a git tag for the new version and push the tag to `origin`. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md deleted file mode 100644 index f3b2d5b..0000000 --- a/ARCHITECTURE.md +++ /dev/null @@ -1,233 +0,0 @@ -# Code Index MCP System Architecture - -## Overview - -Code Index MCP is a Model Context Protocol (MCP) server that provides intelligent code indexing and analysis capabilities. The system follows SCIP (Source Code Intelligence Protocol) standards and uses a service-oriented architecture with clear separation of concerns. - -## High-Level Architecture - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ MCP Interface Layer │ -├─────────────────────────────────────────────────────────────────┤ -│ Service Layer │ -├─────────────────────────────────────────────────────────────────┤ -│ SCIP Core Layer │ -├─────────────────────────────────────────────────────────────────┤ -│ Language Strategies │ -├─────────────────────────────────────────────────────────────────┤ -│ Technical Tools Layer │ -└─────────────────────────────────────────────────────────────────┘ -``` - -## Layer Responsibilities - -### 1. MCP Interface Layer (`server.py`) -**Purpose**: Exposes MCP tools and handles protocol communication - -**Key Components**: -- MCP tool definitions (`@mcp.tool()`) -- Error handling and response formatting -- User interaction and guidance - -**MCP Tools**: -- `set_project_path` - Initialize project indexing -- `find_files` - File discovery with patterns -- `get_file_summary` - File analysis and metadata -- `search_code_advanced` - Content search across files -- `refresh_index` - Manual index rebuilding -- `get_file_watcher_status` - File monitoring status -- `configure_file_watcher` - File watcher settings - -### 2. Service Layer (`services/`) -**Purpose**: Business logic orchestration and workflow management - -**Key Services**: -- `ProjectManagementService` - Project lifecycle and initialization -- `FileWatcherService` - Real-time file monitoring and auto-refresh -- `IndexManagementService` - Index rebuild operations -- `CodeIntelligenceService` - File analysis and symbol intelligence -- `FileDiscoveryService` - File pattern matching and discovery -- `SearchService` - Advanced code search capabilities - -**Architecture Pattern**: Service delegation with clear business boundaries - -### 3. SCIP Core Layer (`scip/core/`) -**Purpose**: Language-agnostic SCIP protocol implementation - -**Core Components**: -- `SCIPSymbolManager` - Standard SCIP symbol ID generation -- `LocalReferenceResolver` - Cross-file reference resolution -- `PositionCalculator` - AST/Tree-sitter position conversion -- `MonikerManager` - External package dependency handling - -**Standards Compliance**: Full SCIP protocol buffer implementation - -### 4. Language Strategies (`scip/strategies/`) -**Purpose**: Language-specific code analysis using two-phase processing - -**Strategy Pattern Implementation**: -- `BaseStrategy` - Abstract interface and common functionality -- `PythonStrategy` - Python AST analysis -- `JavaScriptStrategy` - JavaScript/TypeScript Tree-sitter analysis -- `JavaStrategy` - Java Tree-sitter analysis -- `ObjectiveCStrategy` - Objective-C Tree-sitter analysis -- `FallbackStrategy` - Generic text-based analysis - -**Two-Phase Analysis**: -1. **Phase 1**: Symbol definition collection -2. **Phase 2**: Reference resolution and SCIP document generation - -### 5. Technical Tools Layer (`tools/`) -**Purpose**: Low-level technical capabilities - -**Tool Categories**: -- `filesystem/` - File system operations and pattern matching -- `scip/` - SCIP index operations and symbol analysis -- `config/` - Configuration and settings management -- `monitoring/` - File watching and system monitoring - -## Data Flow Architecture - -### File Analysis Workflow -``` -User Request → Service Layer → SCIP Strategy → Core Components → SCIP Documents -``` - -### Index Management Workflow -``` -File Changes → File Watcher → Index Management Service → Strategy Factory → Updated Index -``` - -### Search Workflow -``` -Search Query → Search Service → Advanced Search Tools → Filtered Results -``` - -## SCIP Implementation Details - -### Symbol ID Format -``` -scip-{language} {manager} {package} [version] {descriptors} -``` - -**Examples**: -- Local: `scip-python local myproject src/main.py/MyClass#method().` -- External: `scip-python pip requests 2.31.0 sessions/Session#get().` - -### Language Support Strategy - -**Parsing Approaches**: -- **Python**: Native AST module -- **JavaScript/TypeScript**: Tree-sitter -- **Java**: Tree-sitter -- **Objective-C**: Tree-sitter -- **Others**: Fallback text analysis - -**Supported Code Intelligence**: -- Symbol definitions (functions, classes, variables) -- Import/export tracking -- Cross-file reference resolution -- External dependency management -- Position-accurate symbol ranges - -## Configuration and Extensibility - -### Package Manager Integration -- **Python**: pip, conda, poetry detection -- **JavaScript**: npm, yarn package.json parsing -- **Java**: Maven pom.xml, Gradle build files -- **Configuration-driven**: Easy addition of new package managers - -### File Watcher System -- **Real-time monitoring**: Watchdog-based file system events -- **Debounced rebuilds**: 4-6 second batching of rapid changes -- **Configurable patterns**: Customizable include/exclude rules -- **Thread-safe**: ThreadPoolExecutor for concurrent rebuilds - -## Performance Characteristics - -### Indexing Performance -- **Incremental updates**: File-level granular rebuilds -- **Parallel processing**: Concurrent file analysis -- **Memory efficient**: Streaming SCIP document generation -- **Cache optimization**: Symbol table reuse across phases - -### Search Performance -- **Advanced tools**: ripgrep, ugrep, ag integration -- **Pattern optimization**: Glob-based file filtering -- **Result streaming**: Large result set handling - -## Error Handling and Reliability - -### Fault Tolerance -- **Graceful degradation**: Continue indexing on individual file failures -- **Error isolation**: Per-file error boundaries -- **Recovery mechanisms**: Automatic retry on transient failures -- **Comprehensive logging**: Debug and audit trail support - -### Validation -- **Input sanitization**: Path traversal protection -- **Range validation**: SCIP position boundary checking -- **Schema validation**: Protocol buffer structure verification - -## Future Architecture Considerations - -### Planned Enhancements -1. **Function Call Relationships**: Complete call graph analysis -2. **Type Information**: Enhanced semantic analysis -3. **Cross-repository Navigation**: Multi-project symbol resolution -4. **Language Server Protocol**: LSP compatibility layer -5. **Distributed Indexing**: Horizontal scaling support - -### Extension Points -- **Custom strategies**: Plugin architecture for new languages -- **Analysis plugins**: Custom symbol analyzers -- **Export formats**: Multiple output format support -- **Integration APIs**: External tool connectivity - -## Directory Structure - -``` -src/code_index_mcp/ -├── server.py # MCP interface layer -├── services/ # Business logic services -│ ├── project_management_service.py -│ ├── file_watcher_service.py -│ ├── index_management_service.py -│ ├── code_intelligence_service.py -│ └── ... -├── scip/ # SCIP implementation -│ ├── core/ # Language-agnostic core -│ │ ├── symbol_manager.py -│ │ ├── local_reference_resolver.py -│ │ ├── position_calculator.py -│ │ └── moniker_manager.py -│ ├── strategies/ # Language-specific strategies -│ │ ├── base_strategy.py -│ │ ├── python_strategy.py -│ │ ├── javascript_strategy.py -│ │ └── ... -│ └── factory.py # Strategy selection -├── tools/ # Technical capabilities -│ ├── filesystem/ -│ ├── scip/ -│ ├── config/ -│ └── monitoring/ -├── indexing/ # Index management -└── utils/ # Shared utilities -``` - -## Key Design Principles - -1. **Standards Compliance**: Full SCIP protocol adherence -2. **Language Agnostic**: Core components independent of specific languages -3. **Extensible**: Easy addition of new languages and features -4. **Performance**: Efficient indexing and search operations -5. **Reliability**: Fault-tolerant with comprehensive error handling -6. **Maintainability**: Clear separation of concerns and modular design - ---- - -*Last updated: 2025-01-14* -*Architecture version: 2.1.0* \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md deleted file mode 100644 index c3f9006..0000000 --- a/CHANGELOG.md +++ /dev/null @@ -1,162 +0,0 @@ -# Changelog - -All notable changes to this project will be documented in this file. - -## [2.1.1] - 2025-01-15 - -### Fixed -- **SCIP Java Strategy**: Simplified Java symbol analysis implementation - - Refactored JavaStrategy to use streamlined symbol registration methods - - Removed complex JavaAnalyzer and JavaRelationshipExtractor classes - - Fixed symbol creation with basic identifier extraction - - Removed relationships summary calculation that was causing issues - - Added back to_scip_relationships method for compatibility - - Streamlined Java AST processing to focus on core symbol definitions - -### Improved -- **Code Maintainability**: Significantly reduced complexity in Java SCIP processing -- **Performance**: Faster Java file analysis with simplified approach -- **Reliability**: More stable symbol extraction without complex relationship tracking - -## [2.1.0] - 2025-01-13 - -### Major SCIP Architecture Enhancement - -This release completes the migration to SCIP-based code indexing with significant improvements to the core infrastructure and API simplification. - -#### Core SCIP Infrastructure -- **Complete SCIP core components**: Added symbol_manager, position_calculator, reference_resolver, moniker_manager -- **Two-phase SCIP analysis**: Implemented symbol collection → reference resolution workflow -- **Unified index management**: New index_provider and unified_index_manager for seamless index operations -- **SCIP-compliant symbol IDs**: Standard symbol ID generation with cross-file reference support - -#### Enhanced Strategy System -- **All language strategies SCIP-compliant**: Refactored Python, Java, JavaScript, Objective-C strategies -- **External symbol extraction**: Added dependency tracking and external symbol resolution -- **Proper SCIP classifications**: Implemented symbol roles and syntax kind detection -- **Robust file handling**: Enhanced encoding detection and error recovery - -#### API Improvements -- **Simplified find_files response**: Returns clean file path lists instead of complex metadata objects -- **Enhanced SCIPSymbolAnalyzer**: Replaced legacy query tools with accurate symbol analysis -- **Improved logging**: Comprehensive logging throughout SCIP indexing pipeline - -#### Dependency Updates -- **pathspec integration**: Better .gitignore parsing and file filtering -- **Updated requirements**: Added comprehensive dependency list for cross-platform support - -#### Technical Improvements -- **Symbol analysis tools**: New inspection scripts for debugging and development -- **Enhanced error handling**: Better fallback strategies and error recovery -- **Testing improvements**: Updated sample projects for multilingual testing - -#### Breaking Changes -- **find_files API**: Now returns `List[str]` instead of complex metadata dictionary -- **Internal architecture**: Significant refactoring of internal components (no user-facing impact) - -## [2.0.0] - 2025-08-11 - -### 🚀 MAJOR RELEASE - SCIP Architecture Migration - -This release represents a **complete architectural overhaul** of the code indexing system, migrating from language-specific analyzers to a unified SCIP-based approach. - -#### ✨ New Architecture -- **Three-layer service architecture**: Service → Tool → Technical Components -- **Unified SCIP indexing**: Replace 8 language-specific analyzers with single SCIP protobuf system -- **Service-oriented design**: Clear separation of business logic, technical tools, and low-level operations -- **Composable components**: Modular design enabling easier testing and maintenance - -#### 🔧 Technical Improvements -- **Tree-sitter AST parsing**: Replace regex-based analysis with proper AST parsing -- **SCIP protobuf format**: Industry-standard code intelligence format -- **Reduced complexity**: Simplified from 40K+ lines to ~1K lines of core logic -- **Better error handling**: Improved exception handling and validation -- **Enhanced logging**: Better debugging and monitoring capabilities - -#### 📦 Backward Compatibility -- **MCP API unchanged**: All existing MCP tools work without modification -- **Automatic migration**: Legacy indexes automatically migrated to SCIP format -- **Same functionality**: All user-facing features preserved and enhanced -- **No breaking changes**: Seamless upgrade experience - -#### 🗑️ Removed Components -- Language-specific analyzers (C, C++, C#, Go, Java, JavaScript, Objective-C, Python) -- Legacy indexing models and relationship management -- Complex duplicate detection and qualified name systems -- Obsolete builder and scanner components -- Demo files and temporary utilities - -#### 🆕 New Services -- **ProjectManagementService**: Project lifecycle and configuration management -- **IndexManagementService**: Index building, rebuilding, and status monitoring -- **FileDiscoveryService**: Intelligent file discovery with pattern matching -- **CodeIntelligenceService**: Code analysis and summary generation -- **SystemManagementService**: File watcher and system configuration - -#### 🛠️ New Tool Layer -- **SCIPIndexTool & SCIPQueryTool**: SCIP operations and querying -- **FileMatchingTool & FileSystemTool**: File system operations -- **ProjectConfigTool & SettingsTool**: Configuration management -- **FileWatcherTool**: Enhanced file monitoring capabilities - -#### 📊 Performance Benefits -- **Faster indexing**: Tree-sitter parsing significantly faster than regex -- **Lower memory usage**: Streamlined data structures and processing -- **Better accuracy**: SCIP provides more precise code intelligence -- **Improved scalability**: Cleaner architecture supports larger codebases - -#### 🔄 Migration Guide -Existing users can upgrade seamlessly: -1. System automatically detects legacy index format -2. Migrates to new SCIP format on first run -3. All existing functionality preserved -4. No manual intervention required - -This release establishes a solid foundation for future enhancements while dramatically simplifying the codebase and improving performance. - -## [1.2.1] - 2024-08-06 - -### Fixed -- **File Watcher**: Enhanced move event handling for modern editors (VS Code, etc.) - - Fixed issue where files created via temp-then-move pattern weren't being detected - - Improved event processing logic to exclusively check destination path for move events - - Eliminated ambiguous fallback behavior that could cause inconsistent results - -### Improved -- **Code Quality**: Comprehensive Pylint compliance improvements - - Fixed all f-string logging warnings using lazy % formatting - - Added proper docstrings to fallback classes - - Fixed multiple-statements warnings - - Moved imports to top-level following PEP 8 conventions - - Added appropriate pylint disables for stub methods - -### Technical Details -- Unified path checking logic across all event types -- Reduced code complexity in `should_process_event()` method -- Better error handling with consistent exception management -- Enhanced debugging capabilities with improved logging - -## [1.2.0] - Previous Release - -### Added -- Enhanced find_files functionality with filename search -- Performance improvements to file discovery -- Auto-refresh troubleshooting documentation - -## [1.1.1] - Previous Release - -### Fixed -- Various bug fixes and stability improvements - -## [1.1.0] - Previous Release - -### Added -- Initial file watcher functionality -- Cross-platform file system monitoring - -## [1.0.0] - Initial Release - -### Added -- Core MCP server implementation -- Code indexing and analysis capabilities -- Multi-language support \ No newline at end of file diff --git a/README.md b/README.md index e893f5b..5cabcbe 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,7 @@ The easiest way to get started with any MCP-compatible application: 2. **Restart your application** – `uvx` automatically handles installation and execution -3. **Start using**: +3. **Start using** (give these prompts to your AI assistant): ``` Set the project path to /Users/dev/my-react-app Find all TypeScript files in this project @@ -62,13 +62,16 @@ The easiest way to get started with any MCP-compatible application: ## Key Features ### 🔍 **Intelligent Search & Analysis** -- **SCIP-Powered**: Industry-standard code intelligence format used by major IDEs +- **Dual-Strategy Architecture**: Specialized tree-sitter parsing for 7 core languages, fallback strategy for 50+ file types +- **Direct Tree-sitter Integration**: No regex fallbacks for specialized languages - fail fast with clear errors - **Advanced Search**: Auto-detects and uses the best available tool (ugrep, ripgrep, ag, or grep) -- **Universal Understanding**: Single system comprehends all programming languages -- **File Analysis**: Deep insights into structure, imports, classes, methods, and complexity metrics +- **Universal File Support**: Comprehensive coverage from advanced AST parsing to basic file indexing +- **File Analysis**: Deep insights into structure, imports, classes, methods, and complexity metrics after running `build_deep_index` ### 🗂️ **Multi-Language Support** -- **50+ File Types**: Java, Python, JavaScript/TypeScript, C/C++, Go, Rust, C#, Swift, Kotlin, Ruby, PHP, and more +- **7 Languages with Tree-sitter AST Parsing**: Python, JavaScript, TypeScript, Java, Go, Objective-C, Zig +- **50+ File Types with Fallback Strategy**: C/C++, Rust, Ruby, PHP, and all other programming languages +- **Document & Config Files**: Markdown, JSON, YAML, XML with appropriate handling - **Web Frontend**: Vue, React, Svelte, HTML, CSS, SCSS - **Database**: SQL variants, NoSQL, stored procedures, migrations - **Configuration**: JSON, YAML, XML, Markdown @@ -78,39 +81,35 @@ The easiest way to get started with any MCP-compatible application: - **File Watcher**: Automatic index updates when files change - **Cross-platform**: Native OS file system monitoring - **Smart Processing**: Batches rapid changes to prevent excessive rebuilds -- **Rich Metadata**: Captures symbols, references, definitions, and relationships +- **Shallow Index Refresh**: Watches file changes and keeps the file list current; run a deep rebuild when you need symbol metadata ### ⚡ **Performance & Efficiency** -- **SCIP Indexing**: Fast protobuf-based unified indexing system +- **Tree-sitter AST Parsing**: Native syntax parsing for accurate symbol extraction - **Persistent Caching**: Stores indexes for lightning-fast subsequent access - **Smart Filtering**: Intelligent exclusion of build directories and temporary files - **Memory Efficient**: Optimized for large codebases +- **Direct Dependencies**: No fallback mechanisms - fail fast with clear error messages ## Supported File Types
📁 Programming Languages (Click to expand) -**System & Low-Level:** -- C/C++ (`.c`, `.cpp`, `.h`, `.hpp`) -- Rust (`.rs`) -- Zig (`.zig`, `.zon`) -- Go (`.go`) - -**Object-Oriented:** -- Java (`.java`) -- C# (`.cs`) -- Kotlin (`.kt`) -- Scala (`.scala`) -- Objective-C/C++ (`.m`, `.mm`) -- Swift (`.swift`) - -**Scripting & Dynamic:** -- Python (`.py`) -- JavaScript/TypeScript (`.js`, `.ts`, `.jsx`, `.tsx`, `.mjs`, `.cjs`) -- Ruby (`.rb`) -- PHP (`.php`) -- Shell (`.sh`, `.bash`) +**Languages with Specialized Tree-sitter Strategies:** +- **Python** (`.py`, `.pyw`) - Full AST analysis with class/method extraction and call tracking +- **JavaScript** (`.js`, `.jsx`, `.mjs`, `.cjs`) - ES6+ class and function parsing with tree-sitter +- **TypeScript** (`.ts`, `.tsx`) - Complete type-aware symbol extraction with interfaces +- **Java** (`.java`) - Full class hierarchy, method signatures, and call relationships +- **Go** (`.go`) - Struct methods, receiver types, and function analysis +- **Objective-C** (`.m`, `.mm`) - Class/instance method distinction with +/- notation +- **Zig** (`.zig`, `.zon`) - Function and struct parsing with tree-sitter AST + +**All Other Programming Languages:** +All other programming languages use the **FallbackParsingStrategy** which provides basic file indexing and metadata extraction. This includes: +- **System & Low-Level:** C/C++ (`.c`, `.cpp`, `.h`, `.hpp`), Rust (`.rs`) +- **Object-Oriented:** C# (`.cs`), Kotlin (`.kt`), Scala (`.scala`), Swift (`.swift`) +- **Scripting & Dynamic:** Ruby (`.rb`), PHP (`.php`), Shell (`.sh`, `.bash`) +- **And 40+ more file types** - All handled through the fallback strategy for basic indexing
@@ -212,21 +211,25 @@ Then configure: + ## Available Tools ### 🏗️ **Project Management** | Tool | Description | |------|-------------| | **`set_project_path`** | Initialize indexing for a project directory | -| **`refresh_index`** | Rebuild the project index after file changes | +| **`refresh_index`** | Rebuild the shallow file index after file changes | +| **`build_deep_index`** | Generate the full symbol index used by deep analysis | | **`get_settings_info`** | View current project configuration and status | +*Run `build_deep_index` when you need symbol-level data; the default shallow index powers quick file discovery.* + ### 🔍 **Search & Discovery** | Tool | Description | |------|-------------| | **`search_code_advanced`** | Smart search with regex, fuzzy matching, and file filtering | | **`find_files`** | Locate files using glob patterns (e.g., `**/*.py`) | -| **`get_file_summary`** | Analyze file structure, functions, imports, and complexity | +| **`get_file_summary`** | Analyze file structure, functions, imports, and complexity (requires deep index) | ### 🔄 **Monitoring & Auto-refresh** | Tool | Description | @@ -263,6 +266,7 @@ Find all TypeScript component files in src/components Give me a summary of src/api/userService.ts ``` *Uses: `get_file_summary` to show functions, imports, and complexity* +*Tip: run `build_deep_index` first if you get a `needs_deep_index` response.* ### 🔍 **Advanced Search Examples** diff --git a/README_ja.md b/README_ja.md index 2d33bde..79059b1 100644 --- a/README_ja.md +++ b/README_ja.md @@ -44,7 +44,7 @@ Code Index MCPは、AIモデルと複雑なコードベースの橋渡しをす 2. **アプリケーションを再起動** – `uvx`がインストールと実行を自動処理 -3. **使用開始**: +3. **使用開始**(AIアシスタントにこれらのプロンプトを与える): ``` プロジェクトパスを/Users/dev/my-react-appに設定 このプロジェクトのすべてのTypeScriptファイルを検索 @@ -62,13 +62,16 @@ Code Index MCPは、AIモデルと複雑なコードベースの橋渡しをす ## 主な機能 ### 🔍 **インテリジェント検索・解析** -- **SCIPパワー**:主要IDEで使用される業界標準コードインテリジェンスフォーマット +- **二重戦略アーキテクチャ**:7つのコア言語に特化したTree-sitter解析、50+ファイルタイプにフォールバック戦略 +- **直接Tree-sitter統合**:特化言語で正規表現フォールバックなし - 明確なエラーメッセージで高速フェイル - **高度な検索**:最適なツール(ugrep、ripgrep、ag、grep)を自動検出・使用 -- **汎用理解**:単一システムですべてのプログラミング言語を理解 -- **ファイル解析**:構造、インポート、クラス、メソッド、複雑度メトリクスへの深い洞察 +- **汎用ファイルサポート**:高度なAST解析から基本ファイルインデックスまでの包括的カバレッジ +- **ファイル解析**:`build_deep_index` 実行後に構造、インポート、クラス、メソッド、複雑度メトリクスを深く把握 ### 🗂️ **多言語サポート** -- **50+ファイルタイプ**:Java、Python、JavaScript/TypeScript、C/C++、Go、Rust、C#、Swift、Kotlin、Ruby、PHPなど +- **7言語でTree-sitter AST解析**:Python、JavaScript、TypeScript、Java、Go、Objective-C、Zig +- **50+ファイルタイプでフォールバック戦略**:C/C++、Rust、Ruby、PHPおよびすべての他のプログラミング言語 +- **文書・設定ファイル**:Markdown、JSON、YAML、XML適切な処理 - **Webフロントエンド**:Vue、React、Svelte、HTML、CSS、SCSS - **データベース**:SQLバリアント、NoSQL、ストアドプロシージャ、マイグレーション - **設定ファイル**:JSON、YAML、XML、Markdown @@ -78,39 +81,35 @@ Code Index MCPは、AIモデルと複雑なコードベースの橋渡しをす - **ファイルウォッチャー**:ファイル変更時の自動インデックス更新 - **クロスプラットフォーム**:ネイティブOSファイルシステム監視 - **スマート処理**:急速な変更をバッチ処理して過度な再構築を防止 -- **豊富なメタデータ**:シンボル、参照、定義、関連性をキャプチャ +- **浅いインデックス更新**:ファイル変更を監視して最新のファイル一覧を維持し、シンボルが必要な場合は `build_deep_index` を実行 ### ⚡ **パフォーマンス・効率性** -- **スマートインデックス作成**:ビルドディレクトリをインテリジェントにフィルタリングしながら再帰的スキャン +- **Tree-sitter AST解析**:正確なシンボル抽出のためのネイティブ構文解析 - **永続キャッシュ**:超高速な後続アクセスのためのインデックス保存 -- **遅延ロード**:最適化された起動のため必要時のみツール検出 -- **メモリ効率**:大規模コードベース向けのインテリジェントキャッシュ戦略 +- **スマートフィルタリング**:ビルドディレクトリと一時ファイルのインテリジェント除外 +- **メモリ効率**:大規模コードベース向けに最適化 +- **直接依存関係**:フォールバック機構なし - 明確なエラーメッセージで高速フェイル ## サポートされているファイルタイプ
📁 プログラミング言語(クリックで展開) -**システム・低レベル言語:** -- C/C++ (`.c`, `.cpp`, `.h`, `.hpp`) -- Rust (`.rs`) -- Zig (`.zig`) -- Go (`.go`) - -**オブジェクト指向言語:** -- Java (`.java`) -- C# (`.cs`) -- Kotlin (`.kt`) -- Scala (`.scala`) -- Objective-C/C++ (`.m`, `.mm`) -- Swift (`.swift`) - -**スクリプト・動的言語:** -- Python (`.py`) -- JavaScript/TypeScript (`.js`, `.ts`, `.jsx`, `.tsx`, `.mjs`, `.cjs`) -- Ruby (`.rb`) -- PHP (`.php`) -- Shell (`.sh`, `.bash`) +**特化Tree-sitter戦略言語:** +- **Python** (`.py`, `.pyw`) - クラス/メソッド抽出と呼び出し追跡を含む完全AST解析 +- **JavaScript** (`.js`, `.jsx`, `.mjs`, `.cjs`) - Tree-sitterを使用したES6+クラスと関数解析 +- **TypeScript** (`.ts`, `.tsx`) - インターフェースを含む完全な型認識シンボル抽出 +- **Java** (`.java`) - 完全なクラス階層、メソッドシグネチャ、呼び出し関係 +- **Go** (`.go`) - 構造体メソッド、レシーバータイプ、関数解析 +- **Objective-C** (`.m`, `.mm`) - +/-記法を使用したクラス/インスタンスメソッド区別 +- **Zig** (`.zig`, `.zon`) - Tree-sitter ASTを使用した関数と構造体解析 + +**すべての他のプログラミング言語:** +すべての他のプログラミング言語は**フォールバック解析戦略**を使用し、基本ファイルインデックスとメタデータ抽出を提供します。これには以下が含まれます: +- **システム・低レベル言語:** C/C++ (`.c`, `.cpp`, `.h`, `.hpp`)、Rust (`.rs`) +- **オブジェクト指向言語:** C# (`.cs`)、Kotlin (`.kt`)、Scala (`.scala`)、Swift (`.swift`) +- **スクリプト・動的言語:** Ruby (`.rb`)、PHP (`.php`)、Shell (`.sh`, `.bash`) +- **および40+ファイルタイプ** - すべてフォールバック戦略による基本インデックス処理
@@ -234,21 +233,25 @@ pip install code-index-mcp + ## 利用可能なツール ### 🏗️ **プロジェクト管理** | ツール | 説明 | |--------|------| | **`set_project_path`** | プロジェクトディレクトリのインデックス作成を初期化 | -| **`refresh_index`** | ファイル変更後にプロジェクトインデックスを再構築 | +| **`refresh_index`** | ファイル変更後に浅いファイルインデックスを再構築 | +| **`build_deep_index`** | 深い解析で使う完全なシンボルインデックスを生成 | | **`get_settings_info`** | 現在のプロジェクト設定と状態を表示 | +*シンボルレベルのデータが必要な場合は `build_deep_index` を実行してください。デフォルトの浅いインデックスは高速なファイル探索を担います。* + ### 🔍 **検索・発見** | ツール | 説明 | |--------|------| | **`search_code_advanced`** | 正規表現、ファジーマッチング、ファイルフィルタリング対応のスマート検索 | | **`find_files`** | globパターンを使用したファイル検索(例:`**/*.py`) | -| **`get_file_summary`** | ファイル構造、関数、インポート、複雑度の解析 | +| **`get_file_summary`** | ファイル構造、関数、インポート、複雑度の解析(深いインデックスが必要) | ### 🔄 **監視・自動更新** | ツール | 説明 | @@ -285,6 +288,7 @@ src/components で全てのTypeScriptコンポーネントファイルを見つ src/api/userService.ts の要約を教えてください ``` *使用ツール:`get_file_summary` で関数、インポート、複雑度を表示* +*ヒント:`needs_deep_index` が返った場合は `build_deep_index` を先に実行してください。* ### 🔍 **高度な検索例** diff --git a/README_ko.md b/README_ko.md new file mode 100644 index 0000000..6995b6a --- /dev/null +++ b/README_ko.md @@ -0,0 +1,284 @@ +# 코드 인덱스 MCP + +
+ +[![MCP Server](https://img.shields.io/badge/MCP-Server-blue)](https://modelcontextprotocol.io) +[![Python](https://img.shields.io/badge/Python-3.10%2B-green)](https://www.python.org/) +[![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE) + +**대규모 언어 모델을 위한 지능형 코드 인덱싱과 분석** + +고급 검색, 정밀 분석, 유연한 탐색 기능으로 AI가 코드베이스를 이해하고 활용하는 방식을 혁신하세요. + +
+ + + code-index-mcp MCP server + + +## 개요 + +Code Index MCP는 [Model Context Protocol](https://modelcontextprotocol.io) 기반 MCP 서버로, AI 어시스턴트와 복잡한 코드베이스 사이를 연결합니다. 빠른 인덱싱, 강력한 검색, 정밀한 코드 분석을 제공하여 AI가 프로젝트 구조를 정확히 파악하고 효과적으로 지원하도록 돕습니다. + +**이럴 때 안성맞춤:** 코드 리뷰, 리팩터링, 문서화, 디버깅 지원, 아키텍처 분석 + +## 빠른 시작 + +### 🚀 **권장 설정 (대부분의 사용자)** + +어떤 MCP 호환 애플리케이션에서도 몇 단계만으로 시작할 수 있습니다. + +**사전 준비:** Python 3.10+ 및 [uv](https://github.com/astral-sh/uv) + +1. **MCP 설정에 서버 추가** (예: `claude_desktop_config.json` 또는 `~/.claude.json`) + ```json + { + "mcpServers": { + "code-index": { + "command": "uvx", + "args": ["code-index-mcp"] + } + } + } + ``` + +2. **애플리케이션 재시작** – `uvx`가 설치와 실행을 자동으로 처리합니다. + +3. **사용 시작** (AI 어시스턴트에게 아래 프롬프트를 전달) + ``` + 프로젝트 경로를 /Users/dev/my-react-app 으로 설정해줘 + 이 프로젝트에서 모든 TypeScript 파일을 찾아줘 + "authentication" 관련 함수를 검색해줘 + src/App.tsx 파일을 분석해줘 + ``` + +## 대표 사용 사례 + +**코드 리뷰:** "예전 API를 사용하는 부분을 모두 찾아줘" +**리팩터링 지원:** "이 함수는 어디에서 호출되나요?" +**프로젝트 학습:** "이 React 프로젝트의 핵심 컴포넌트를 보여줘" +**디버깅:** "에러 처리 로직이 있는 파일을 찾아줘" + +## 주요 기능 + +### 🧠 **지능형 검색과 분석** +- **듀얼 전략 아키텍처:** 7개 핵심 언어는 전용 tree-sitter 파서를 사용하고, 그 외 50+ 파일 형식은 폴백 전략으로 처리 +- **직접 Tree-sitter 통합:** 특화 언어에 정규식 폴백 없음 – 문제 시 즉시 실패하고 명확한 오류 메시지 제공 +- **고급 검색:** ugrep, ripgrep, ag, grep 중 최적의 도구를 자동 선택해 활용 +- **범용 파일 지원:** 정교한 AST 분석부터 기본 파일 인덱싱까지 폭넓게 커버 +- **파일 분석:** `build_deep_index` 실행 후 구조, 임포트, 클래스, 메서드, 복잡도 지표를 심층적으로 파악 + +### 🗂️ **다중 언어 지원** +- **Tree-sitter AST 분석(7종):** Python, JavaScript, TypeScript, Java, Go, Objective-C, Zig +- **폴백 전략(50+ 형식):** C/C++, Rust, Ruby, PHP 등 대부분의 프로그래밍 언어 지원 +- **문서 및 설정 파일:** Markdown, JSON, YAML, XML 등 상황에 맞는 처리 +- **웹 프론트엔드:** Vue, React, Svelte, HTML, CSS, SCSS +- **데이터 계층:** SQL, NoSQL, 스토어드 프로시저, 마이그레이션 스크립트 +- **구성 파일:** JSON, YAML, XML, Markdown +- **[지원 파일 전체 목록 보기](#지원-파일-형식)** + +### 🔄 **실시간 모니터링 & 자동 새로고침** +- **파일 워처:** 파일 변경 시 자동으로 얕은 인덱스(파일 목록) 갱신 +- **크로스 플랫폼:** 운영체제 기본 파일시스템 이벤트 활용 +- **스마트 처리:** 빠른 변경을 묶어 과도한 재빌드를 방지 +- **얕은 인덱스 갱신:** 파일 목록을 최신 상태로 유지하며, 심볼 데이터가 필요하면 `build_deep_index`를 실행 + +### ⚡ **성능 & 효율성** +- **Tree-sitter AST 파싱:** 정확한 심볼 추출을 위한 네이티브 구문 분석 +- **지속 캐싱:** 인덱스를 저장해 이후 응답 속도를 극대화 +- **스마트 필터링:** 빌드 디렉터리·임시 파일을 자동 제외 +- **메모리 효율:** 대규모 코드베이스를 염두에 둔 설계 +- **직접 의존성:** 불필요한 폴백 없이 명확한 오류 메시지 제공 + +## 지원 파일 형식 + +
+💻 프로그래밍 언어 (클릭하여 확장) + +**전용 Tree-sitter 전략 언어:** +- **Python** (`.py`, `.pyw`) – 클래스/메서드 추출 및 호출 추적이 포함된 완전 AST 분석 +- **JavaScript** (`.js`, `.jsx`, `.mjs`, `.cjs`) – ES6+ 클래스와 함수를 tree-sitter로 파싱 +- **TypeScript** (`.ts`, `.tsx`) – 인터페이스를 포함한 타입 인지 심볼 추출 +- **Java** (`.java`) – 클래스 계층, 메서드 시그니처, 호출 관계 분석 +- **Go** (`.go`) – 구조체 메서드, 리시버 타입, 함수 분석 +- **Objective-C** (`.m`, `.mm`) – 클래스/인스턴스 메서드를 +/- 표기로 구분 +- **Zig** (`.zig`, `.zon`) – 함수와 구조체를 tree-sitter AST로 분석 + +**기타 모든 프로그래밍 언어:** +나머지 언어는 **폴백 파싱 전략**으로 기본 메타데이터와 파일 인덱싱을 제공합니다. 예: +- **시스템/저수준:** C/C++ (`.c`, `.cpp`, `.h`, `.hpp`), Rust (`.rs`) +- **객체지향:** C# (`.cs`), Kotlin (`.kt`), Scala (`.scala`), Swift (`.swift`) +- **스크립트:** Ruby (`.rb`), PHP (`.php`), Shell (`.sh`, `.bash`) +- **그 외 40+ 형식** – 폴백 전략으로 빠른 탐색 가능 + +
+ +
+🌐 웹 프론트엔드 & UI + +- 프레임워크: Vue (`.vue`), Svelte (`.svelte`), Astro (`.astro`) +- 스타일링: CSS (`.css`, `.scss`, `.less`, `.sass`, `.stylus`, `.styl`), HTML (`.html`) +- 템플릿: Handlebars (`.hbs`, `.handlebars`), EJS (`.ejs`), Pug (`.pug`) + +
+ +
+🗄️ 데이터 계층 & SQL + +- **SQL 변형:** 표준 SQL (`.sql`, `.ddl`, `.dml`), 데이터베이스별 방언 (`.mysql`, `.postgresql`, `.psql`, `.sqlite`, `.mssql`, `.oracle`, `.ora`, `.db2`) +- **DB 객체:** 프로시저/함수 (`.proc`, `.procedure`, `.func`, `.function`), 뷰/트리거/인덱스 (`.view`, `.trigger`, `.index`) +- **마이그레이션 도구:** 마이그레이션 파일 (`.migration`, `.seed`, `.fixture`, `.schema`), 도구 구성 (`.liquibase`, `.flyway`) +- **NoSQL & 그래프:** 질의 언어 (`.cql`, `.cypher`, `.sparql`, `.gql`) + +
+ +
+📄 문서 & 설정 파일 + +- Markdown (`.md`, `.mdx`) +- 구성 파일 (`.json`, `.xml`, `.yml`, `.yaml`) + +
+ +## 사용 가능한 도구 + +### 🏗️ **프로젝트 관리** +| 도구 | 설명 | +|------|------| +| **`set_project_path`** | 프로젝트 디렉터리의 인덱스를 초기화 | +| **`refresh_index`** | 파일 변경 후 얕은 파일 인덱스를 재생성 | +| **`build_deep_index`** | 심층 분석에 사용하는 전체 심볼 인덱스를 생성 | +| **`get_settings_info`** | 현재 프로젝트 설정과 상태를 확인 | + +*심볼 레벨 데이터가 필요하면 `build_deep_index`를 실행하세요. 기본 얕은 인덱스는 빠른 파일 탐색을 담당합니다.* + +### 🔍 **검색 & 탐색** +| 도구 | 설명 | +|------|------| +| **`search_code_advanced`** | 정규식, 퍼지 매칭, 파일 필터링을 지원하는 스마트 검색 | +| **`find_files`** | 글롭 패턴으로 파일 찾기 (예: `**/*.py`) | +| **`get_file_summary`** | 파일 구조, 함수, 임포트, 복잡도를 분석 (심층 인덱스 필요) | + +### 🔄 **모니터링 & 자동 새로고침** +| 도구 | 설명 | +|------|------| +| **`get_file_watcher_status`** | 파일 워처 상태와 구성을 확인 | +| **`configure_file_watcher`** | 자동 새로고침 설정 (활성/비활성, 지연 시간, 추가 제외 패턴) | + +### 🛠️ **시스템 & 유지 관리** +| 도구 | 설명 | +|------|------| +| **`create_temp_directory`** | 인덱스 저장용 임시 디렉터리를 생성 | +| **`check_temp_directory`** | 인덱스 저장 위치와 권한을 확인 | +| **`clear_settings`** | 모든 설정과 캐시 데이터를 초기화 | +| **`refresh_search_tools`** | 사용 가능한 검색 도구를 재검색 (ugrep, ripgrep 등) | + +## 사용 예시 + +### 🧭 **빠른 시작 워크플로** + +**1. 프로젝트 초기화** +``` +프로젝트 경로를 /Users/dev/my-react-app 으로 설정해줘 +``` +*프로젝트를 설정하고 얕은 인덱스를 생성합니다.* + +**2. 프로젝트 구조 탐색** +``` +src/components 안의 TypeScript 컴포넌트 파일을 모두 찾아줘 +``` +*사용 도구: `find_files` (`src/components/**/*.tsx`)* + +**3. 핵심 파일 분석** +``` +src/api/userService.ts 요약을 알려줘 +``` +*사용 도구: `get_file_summary` (함수, 임포트, 복잡도 표시)* +*팁: `needs_deep_index` 응답이 나오면 먼저 `build_deep_index`를 실행하세요.* + +### 🔍 **고급 검색 예시** + +
+코드 패턴 검색 + +``` +"get.*Data"에 해당하는 함수 호출을 정규식으로 찾아줘 +``` +*예: `getData()`, `getUserData()`, `getFormData()`* + +
+ +
+퍼지 함수 검색 + +``` +'authUser'와 유사한 인증 관련 함수를 찾아줘 +``` +*예: `authenticateUser`, `authUserToken`, `userAuthCheck`* + +
+ +
+언어별 검색 + +``` +Python 파일에서만 "API_ENDPOINT" 를 찾아줘 +``` +*`search_code_advanced` + `file_pattern="*.py"`* + +
+ +
+자동 새로고침 설정 + +``` +파일 변경 시 자동으로 인덱스를 새로고침하도록 설정해줘 +``` +*`configure_file_watcher`로 활성화 및 지연 시간 설정* + +
+ +
+프로젝트 유지 관리 + +``` +새 컴포넌트를 추가했어. 프로젝트 인덱스를 다시 빌드해줘 +``` +*`refresh_index`로 빠르게 얕은 인덱스를 업데이트* + +
+ +## 문제 해결 + +### 🔄 **자동 새로고침이 동작하지 않을 때** +- 환경 문제로 `watchdog`가 빠졌다면 설치: `pip install watchdog` +- 수동 새로고침: 변경 후 `refresh_index` 도구 실행 +- 워처 상태 확인: `get_file_watcher_status` 도구로 활성 여부 점검 + +## 개발 & 기여 + +### 🛠️ **소스에서 실행하기** +```bash +git clone https://github.com/johnhuang316/code-index-mcp.git +cd code-index-mcp +uv sync +uv run code-index-mcp +``` + +### 🧪 **디버깅 도구** +```bash +npx @modelcontextprotocol/inspector uvx code-index-mcp +``` + +### 🤝 **기여 안내** +Pull Request를 언제든 환영합니다. 변경 사항과 테스트 방법을 함께 공유해주세요. + +--- + +### 📄 **라이선스** +[MIT License](LICENSE) + +### 🌍 **번역본** +- [English](README.md) +- [繁體中文](README_zh.md) +- [日本語](README_ja.md) diff --git a/README_zh.md b/README_zh.md index 1700e89..1e9c5ae 100644 --- a/README_zh.md +++ b/README_zh.md @@ -44,7 +44,7 @@ 2. **重新啟動應用程式** – `uvx` 會自動處理安裝和執行 -3. **開始使用**: +3. **開始使用**(向您的 AI 助理提供這些提示): ``` 設定專案路徑為 /Users/dev/my-react-app 在這個專案中找到所有 TypeScript 檔案 @@ -62,13 +62,16 @@ ## 主要特性 ### 🔍 **智慧搜尋與分析** -- **SCIP 驅動**:業界標準程式碼智能格式,被主流 IDE 採用 +- **雙策略架構**:7 種核心語言使用專業化 Tree-sitter 解析,50+ 種檔案類型使用備用策略 +- **直接 Tree-sitter 整合**:專業化語言無正則表達式備用 - 快速失敗並提供清晰錯誤訊息 - **進階搜尋**:自動偵測並使用最佳工具(ugrep、ripgrep、ag 或 grep) -- **通用理解**:單一系統理解所有程式語言 -- **檔案分析**:深入了解結構、匯入、類別、方法和複雜度指標 +- **通用檔案支援**:從進階 AST 解析到基本檔案索引的全面覆蓋 +- **檔案分析**:執行 `build_deep_index` 後深入了解結構、匯入、類別、方法和複雜度指標 ### 🗂️ **多語言支援** -- **50+ 種檔案類型**:Java、Python、JavaScript/TypeScript、C/C++、Go、Rust、C#、Swift、Kotlin、Ruby、PHP 等 +- **7 種語言使用 Tree-sitter AST 解析**:Python、JavaScript、TypeScript、Java、Go、Objective-C、Zig +- **50+ 種檔案類型使用備用策略**:C/C++、Rust、Ruby、PHP 和所有其他程式語言 +- **文件與配置檔案**:Markdown、JSON、YAML、XML 適當處理 - **網頁前端**:Vue、React、Svelte、HTML、CSS、SCSS - **資料庫**:SQL 變體、NoSQL、存儲過程、遷移腳本 - **配置檔案**:JSON、YAML、XML、Markdown @@ -78,39 +81,35 @@ - **檔案監控器**:檔案變更時自動更新索引 - **跨平台**:原生作業系統檔案系統監控 - **智慧處理**:批次處理快速變更以防止過度重建 -- **豐富元資料**:捕獲符號、引用、定義和關聯性 +- **淺層索引更新**:監控檔案變更並維持檔案清單最新;需要符號資料時請執行 `build_deep_index` ### ⚡ **效能與效率** -- **智慧索引**:遞迴掃描並智慧篩選建構目錄 +- **Tree-sitter AST 解析**:原生語法解析以實現準確的符號提取 - **持久快取**:儲存索引以實現超快速的後續存取 -- **延遲載入**:僅在需要時偵測工具以優化啟動速度 -- **記憶體高效**:針對大型程式碼庫的智慧快取策略 +- **智慧篩選**:智能排除建構目錄和暫存檔案 +- **記憶體高效**:針對大型程式碼庫優化 +- **直接依賴**:無備用機制 - 快速失敗並提供清晰錯誤訊息 ## 支援的檔案類型
📁 程式語言(點擊展開) -**系統與低階語言:** -- C/C++ (`.c`, `.cpp`, `.h`, `.hpp`) -- Rust (`.rs`) -- Zig (`.zig`) -- Go (`.go`) - -**物件導向語言:** -- Java (`.java`) -- C# (`.cs`) -- Kotlin (`.kt`) -- Scala (`.scala`) -- Objective-C/C++ (`.m`, `.mm`) -- Swift (`.swift`) - -**腳本與動態語言:** -- Python (`.py`) -- JavaScript/TypeScript (`.js`, `.ts`, `.jsx`, `.tsx`, `.mjs`, `.cjs`) -- Ruby (`.rb`) -- PHP (`.php`) -- Shell (`.sh`, `.bash`) +**專業化 Tree-sitter 策略語言:** +- **Python** (`.py`, `.pyw`) - 完整 AST 分析,包含類別/方法提取和呼叫追蹤 +- **JavaScript** (`.js`, `.jsx`, `.mjs`, `.cjs`) - ES6+ 類別和函數解析使用 Tree-sitter +- **TypeScript** (`.ts`, `.tsx`) - 完整類型感知符號提取,包含介面 +- **Java** (`.java`) - 完整類別階層、方法簽名和呼叫關係 +- **Go** (`.go`) - 結構方法、接收者類型和函數分析 +- **Objective-C** (`.m`, `.mm`) - 類別/實例方法區分,使用 +/- 標記法 +- **Zig** (`.zig`, `.zon`) - 函數和結構解析使用 Tree-sitter AST + +**所有其他程式語言:** +所有其他程式語言使用 **備用解析策略**,提供基本檔案索引和元資料提取。包括: +- **系統與低階語言:** C/C++ (`.c`, `.cpp`, `.h`, `.hpp`)、Rust (`.rs`) +- **物件導向語言:** C# (`.cs`)、Kotlin (`.kt`)、Scala (`.scala`)、Swift (`.swift`) +- **腳本與動態語言:** Ruby (`.rb`)、PHP (`.php`)、Shell (`.sh`, `.bash`) +- **以及 40+ 種檔案類型** - 全部通過備用策略處理進行基本索引
@@ -234,21 +233,25 @@ pip install code-index-mcp + ## 可用工具 ### 🏗️ **專案管理** | 工具 | 描述 | |------|------| | **`set_project_path`** | 為專案目錄初始化索引 | -| **`refresh_index`** | 在檔案變更後重建專案索引 | +| **`refresh_index`** | 在檔案變更後重建淺層檔案索引 | +| **`build_deep_index`** | 產生供深度分析使用的完整符號索引 | | **`get_settings_info`** | 檢視目前專案配置和狀態 | +*需要符號層級資料時,請執行 `build_deep_index`;預設的淺層索引提供快速檔案探索。* + ### 🔍 **搜尋與探索** | 工具 | 描述 | |------|------| | **`search_code_advanced`** | 智慧搜尋,支援正規表達式、模糊匹配和檔案篩選 | | **`find_files`** | 使用萬用字元模式尋找檔案(例如 `**/*.py`) | -| **`get_file_summary`** | 分析檔案結構、函式、匯入和複雜度 | +| **`get_file_summary`** | 分析檔案結構、函式、匯入和複雜度(需要深度索引) | ### 🔄 **監控與自動刷新** | 工具 | 描述 | @@ -285,6 +288,7 @@ pip install code-index-mcp 給我 src/api/userService.ts 的摘要 ``` *使用:`get_file_summary` 顯示函式、匯入和複雜度* +*提示:若收到 `needs_deep_index` 回應,請先執行 `build_deep_index`。* ### 🔍 **進階搜尋範例** diff --git a/RELEASE_NOTE.txt b/RELEASE_NOTE.txt new file mode 100644 index 0000000..8a744bb --- /dev/null +++ b/RELEASE_NOTE.txt @@ -0,0 +1,7 @@ +## 2.4.1 - Search Filtering Alignment + +### Highlights +- Code search now shares the central FileFilter blacklist, keeping results consistent with indexing (no more `node_modules` noise). +- CLI search strategies emit the appropriate exclusion flags automatically (ripgrep, ugrep, ag, grep). +- Basic fallback search prunes excluded directories during traversal, avoiding unnecessary IO. +- Added regression coverage for the new filtering behaviour (`tests/search/test_search_filters.py`). diff --git a/SCIP_OFFICIAL_STANDARDS.md b/SCIP_OFFICIAL_STANDARDS.md deleted file mode 100644 index 763b56c..0000000 --- a/SCIP_OFFICIAL_STANDARDS.md +++ /dev/null @@ -1,337 +0,0 @@ -# SCIP (Source Code Intelligence Protocol) Official Standards - -*This document contains only the official SCIP standards as defined by Sourcegraph, without any project-specific implementations.* - -## Overview - -SCIP (pronounced "skip") is a language-agnostic protocol for indexing source code to power code navigation functionality such as Go to definition, Find references, and Find implementations. It is a recursive acronym that stands for "SCIP Code Intelligence Protocol." - -**Official Repository**: https://github.com/sourcegraph/scip - -## Core Design Principles (Official) - -### Primary Goals -1. **Support code navigation at IDE-level fidelity** - Provide excellent code navigation experience -2. **Make indexer creation easy** by: - - Enabling cross-repository navigation - - Supporting file-level incremental indexing - - Facilitating parallel indexing - - Supporting multi-language indexer development - -### Design Philosophy -> "SCIP is meant to be a transmission format for sending data from some producers to some consumers -- it is not meant as a storage format for querying." - -### Technical Design Decisions -1. **Protobuf Schema** - - Relatively compact binary format - - Supports easy code generation - - Enables streaming reads/writes - - Maintains forward/backward compatibility - -2. **String-based Identifiers** - - Prefer human-readable string IDs for symbols - - Avoid integer ID mapping tables - - Improve debuggability - - Limit potential bug impact - -3. **Data Encoding Approach** - - Avoid direct graph encoding - - Use document and array-based approaches - - Enable streaming capabilities - - Minimize memory consumption during indexing - -### Non-Goals -- Not focused on code modification tools -- Not optimizing for consumer-side tooling -- Not prioritizing uncompressed data compactness -- Not serving as a standalone query engine - -## Protocol Buffer Schema (Official) - -### Main Message Types - -```protobuf -syntax = "proto3"; -package scip; - -message Index { - Metadata metadata = 1; - repeated Document documents = 2; - repeated SymbolInformation external_symbols = 3; -} - -message Metadata { - ProtocolVersion version = 1; - ToolInfo tool_info = 2; - string project_root = 3; - TextEncoding text_encoding = 4; -} - -message Document { - string language = 4; - string relative_path = 1; - repeated Occurrence occurrences = 2; - repeated SymbolInformation symbols = 3; - string text = 5; -} - -message Symbol { - string scheme = 1; - Package package = 2; - repeated Descriptor descriptors = 3; -} - -message SymbolInformation { - string symbol = 1; - repeated string documentation = 3; - repeated Relationship relationships = 4; - SymbolKind kind = 5; - string display_name = 6; - Signature signature_documentation = 7; - repeated string enclosing_symbol = 8; -} - -message Occurrence { - Range range = 1; - string symbol = 2; - int32 symbol_roles = 3; - repeated Diagnostic override_documentation = 4; - SyntaxKind syntax_kind = 5; -} - -message Range { - repeated int32 start = 1; // [line, column] - repeated int32 end = 2; // [line, column] -} -``` - -## Official Symbol Format Specification - -### Symbol Grammar (Official) -``` - ::= ' ' ' ' ()+ | 'local ' - ::= ' ' ' ' - ::= UTF-8 string (escape spaces with double space) - ::= | | | | | | | -``` - -### Symbol Components - -**Scheme**: Identifies the symbol's origin/context -- UTF-8 string -- Escape spaces with double space - -**Package**: Includes manager, name, and version -- Manager: Package manager identifier -- Package name: Unique package identifier -- Version: Package version - -**Descriptors**: Represent nested/hierarchical symbol structure -- Form a fully qualified name -- Support various symbol types - -**Local Symbols**: Only for entities within a single Document -- Format: `local ` -- Used for file-scoped symbols - -### Encoding Rules (Official) -- Descriptors form a fully qualified name -- Local symbols are only for entities within a single Document -- Symbols must uniquely identify an entity across a package -- Supports escaping special characters in identifiers - -## Enumerations (Official) - -### ProtocolVersion -```protobuf -enum ProtocolVersion { - UnspecifiedProtocolVersion = 0; -} -``` - -### TextEncoding -```protobuf -enum TextEncoding { - UnspecifiedTextEncoding = 0; - UTF8 = 1; - UTF16 = 2; -} -``` - -### SymbolRole -```protobuf -enum SymbolRole { - UnspecifiedSymbolRole = 0; - Definition = 1; - Import = 2; - WriteAccess = 4; - ReadAccess = 8; - Generated = 16; - Test = 32; -} -``` - -### SymbolKind -```protobuf -enum SymbolKind { - UnspecifiedSymbolKind = 0; - Array = 1; - Boolean = 2; - Class = 3; - Constant = 4; - Constructor = 5; - Enum = 6; - EnumMember = 7; - Event = 8; - Field = 9; - File = 10; - Function = 11; - Interface = 12; - Key = 13; - Method = 14; - Module = 15; - Namespace = 16; - Null = 17; - Number = 18; - Object = 19; - Operator = 20; - Package = 21; - Property = 22; - String = 23; - Struct = 24; - TypeParameter = 25; - Variable = 26; - Macro = 27; -} -``` - -### SyntaxKind -```protobuf -enum SyntaxKind { - UnspecifiedSyntaxKind = 0; - Comment = 1; - PunctuationDelimiter = 2; - PunctuationBracket = 3; - Keyword = 4; - // ... (additional syntax kinds) - IdentifierKeyword = 13; - IdentifierOperator = 14; - IdentifierBuiltin = 15; - IdentifierNull = 16; - IdentifierConstant = 17; - IdentifierMutableGlobal = 18; - IdentifierParameter = 19; - IdentifierLocal = 20; - IdentifierShadowed = 21; - IdentifierNamespace = 22; - IdentifierFunction = 23; - IdentifierFunctionDefinition = 24; - IdentifierMacro = 25; - IdentifierMacroDefinition = 26; - IdentifierType = 27; - IdentifierBuiltinType = 28; - IdentifierAttribute = 29; -} -``` - -## Official Position and Range Specification - -### Coordinate System -- **Line numbers**: 0-indexed -- **Column numbers**: 0-indexed character positions -- **UTF-8/UTF-16 aware**: Proper Unicode handling - -### Range Format -```protobuf -message Range { - repeated int32 start = 1; // [line, column] - repeated int32 end = 2; // [line, column] -} -``` - -### Requirements -- Start position must be <= end position -- Ranges must be within document boundaries -- Character-level precision required - -## Official Language Support - -### Currently Supported (Official Implementations) -- **TypeScript/JavaScript**: scip-typescript -- **Java**: scip-java (also supports Scala, Kotlin) -- **Python**: In development - -### Language Bindings Available -- **Rich bindings**: Go, Rust -- **Auto-generated bindings**: TypeScript, Haskell -- **CLI tools**: scip CLI for index manipulation - -## Performance Characteristics (Official Claims) - -### Compared to LSIF -- **10x speedup** in CI environments -- **4x smaller** compressed payload size -- **Better streaming**: Enables processing without loading entire index -- **Lower memory usage**: Document-based processing - -### Design Benefits -- Static typing from Protobuf schema -- More ergonomic debugging -- Reduced runtime errors -- Smaller index files - -## Official Tools and Ecosystem - -### SCIP CLI -- Index manipulation and conversion -- LSIF compatibility support -- Debugging and inspection tools - -### Official Indexers -- **scip-typescript**: `npm install -g @sourcegraph/scip-typescript` -- **scip-java**: Available as Docker image, Java launcher, fat jar - -### Integration Support -- GitLab Code Intelligence (via LSIF conversion) -- Sourcegraph native support -- VS Code extensions (community) - -## Standards Compliance Requirements - -### For SCIP Index Producers -1. Must generate valid Protocol Buffer format -2. Must follow symbol ID format specification -3. Must provide accurate position information -4. Should support streaming output -5. Must handle UTF-8/UTF-16 encoding correctly - -### For SCIP Index Consumers -1. Must handle streaming input -2. Should support all standard symbol kinds -3. Must respect symbol role classifications -4. Should provide graceful error handling -5. Must support position range validation - -## Official Documentation Sources - -### Primary Sources -- **Main Repository**: https://github.com/sourcegraph/scip -- **Protocol Schema**: https://github.com/sourcegraph/scip/blob/main/scip.proto -- **Design Document**: https://github.com/sourcegraph/scip/blob/main/DESIGN.md -- **Announcement Blog**: https://sourcegraph.com/blog/announcing-scip - -### Language-Specific Documentation -- **Java**: https://github.com/sourcegraph/scip-java -- **TypeScript**: https://github.com/sourcegraph/scip-typescript - -### Community Resources -- **Bindings**: Available for Go, Rust, TypeScript, Haskell -- **Examples**: Implementation examples in official repositories -- **Issues**: Bug reports and feature requests on GitHub - ---- - -*This document contains only official SCIP standards as defined by Sourcegraph.* -*Last updated: 2025-01-14* -*SCIP Version: Compatible with official v0.3.x specification* -*Source: Official Sourcegraph SCIP repositories and documentation* \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 2c0d989..428e2d3 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "code-index-mcp" -version = "2.1.2" +version = "2.4.1" description = "Code indexing and analysis tools for LLMs using MCP" readme = "README.md" requires-python = ">=3.10" @@ -15,14 +15,13 @@ authors = [ dependencies = [ "mcp>=0.3.0", "watchdog>=3.0.0", - "protobuf>=4.21.0", "tree-sitter>=0.20.0", "tree-sitter-javascript>=0.20.0", "tree-sitter-typescript>=0.20.0", "tree-sitter-java>=0.20.0", "tree-sitter-zig>=0.20.0", "pathspec>=0.12.1", - "libclang>=16.0.0", + "msgpack>=1.0.0", ] [project.urls] diff --git a/src/code_index_mcp/__init__.py b/src/code_index_mcp/__init__.py index e2fc513..f47ee02 100644 --- a/src/code_index_mcp/__init__.py +++ b/src/code_index_mcp/__init__.py @@ -3,4 +3,5 @@ A Model Context Protocol server for code indexing, searching, and analysis. """ -__version__ = "2.0.0" +__version__ = "2.4.1" + diff --git a/src/code_index_mcp/constants.py b/src/code_index_mcp/constants.py index 97713d1..159e31a 100644 --- a/src/code_index_mcp/constants.py +++ b/src/code_index_mcp/constants.py @@ -5,10 +5,8 @@ # Directory and file names SETTINGS_DIR = "code_indexer" CONFIG_FILE = "config.json" -SCIP_INDEX_FILE = "index.scip" # SCIP protobuf binary file -# Legacy files -INDEX_FILE = "index.json" # Legacy JSON index file (to be removed) -# CACHE_FILE removed - no longer needed with new indexing system +INDEX_FILE = "index.json" # JSON index file (deep index) +INDEX_FILE_SHALLOW = "index.shallow.json" # Minimal shallow index (file list) # Supported file extensions for code analysis # This is the authoritative list used by both old and new indexing systems @@ -77,3 +75,44 @@ '.liquibase', '.flyway', # Migration tools ] +# Centralized filtering configuration +FILTER_CONFIG = { + "exclude_directories": { + # Version control + '.git', '.svn', '.hg', '.bzr', + + # Package managers & dependencies + 'node_modules', '__pycache__', '.venv', 'venv', + 'vendor', 'bower_components', + + # Build outputs + 'dist', 'build', 'target', 'out', 'bin', 'obj', + + # IDE & editors + '.idea', '.vscode', '.vs', '.sublime-workspace', + + # Testing & coverage + '.pytest_cache', '.coverage', '.tox', '.nyc_output', + 'coverage', 'htmlcov', + + # OS artifacts + '.DS_Store', 'Thumbs.db', 'desktop.ini' + }, + + "exclude_files": { + # Temporary files + '*.tmp', '*.temp', '*.swp', '*.swo', + + # Backup files + '*.bak', '*~', '*.orig', + + # Log files + '*.log', + + # Lock files + 'package-lock.json', 'yarn.lock', 'Pipfile.lock' + }, + + "supported_extensions": SUPPORTED_EXTENSIONS +} + diff --git a/src/code_index_mcp/indexing/__init__.py b/src/code_index_mcp/indexing/__init__.py index edbcf50..e779911 100644 --- a/src/code_index_mcp/indexing/__init__.py +++ b/src/code_index_mcp/indexing/__init__.py @@ -1,8 +1,7 @@ """ Code indexing utilities for the MCP server. -This module provides utility functions for duplicate detection and -qualified name generation used by the SCIP indexing system. +This module provides simple JSON-based indexing optimized for LLM consumption. """ # Import utility functions that are still used @@ -11,11 +10,23 @@ normalize_file_path ) -# SCIP builder is still used by the new architecture -from .scip_builder import SCIPIndexBuilder +# New JSON-based indexing system +from .json_index_builder import JSONIndexBuilder, IndexMetadata +from .json_index_manager import JSONIndexManager, get_index_manager +from .shallow_index_manager import ShallowIndexManager, get_shallow_index_manager +from .deep_index_manager import DeepIndexManager +from .models import SymbolInfo, FileInfo __all__ = [ 'generate_qualified_name', 'normalize_file_path', - 'SCIPIndexBuilder' + 'JSONIndexBuilder', + 'JSONIndexManager', + 'get_index_manager', + 'ShallowIndexManager', + 'get_shallow_index_manager', + 'DeepIndexManager', + 'SymbolInfo', + 'FileInfo', + 'IndexMetadata' ] \ No newline at end of file diff --git a/src/code_index_mcp/indexing/deep_index_manager.py b/src/code_index_mcp/indexing/deep_index_manager.py new file mode 100644 index 0000000..6558703 --- /dev/null +++ b/src/code_index_mcp/indexing/deep_index_manager.py @@ -0,0 +1,46 @@ +""" +Deep Index Manager - Wrapper around JSONIndexManager for deep indexing. + +This class provides a clear semantic separation from the shallow manager. +It delegates to the existing JSONIndexManager (symbols + files JSON index). +""" + +from __future__ import annotations + +from typing import Optional, Dict, Any, List + +from .json_index_manager import JSONIndexManager + + +class DeepIndexManager: + """Thin wrapper over JSONIndexManager to expose deep-index API.""" + + def __init__(self) -> None: + self._mgr = JSONIndexManager() + + # Expose a subset of API to keep callers simple + def set_project_path(self, project_path: str) -> bool: + return self._mgr.set_project_path(project_path) + + def build_index(self, force_rebuild: bool = False) -> bool: + return self._mgr.build_index(force_rebuild=force_rebuild) + + def load_index(self) -> bool: + return self._mgr.load_index() + + def refresh_index(self) -> bool: + return self._mgr.refresh_index() + + def find_files(self, pattern: str = "*") -> List[str]: + return self._mgr.find_files(pattern) + + def get_file_summary(self, file_path: str) -> Optional[Dict[str, Any]]: + return self._mgr.get_file_summary(file_path) + + def get_index_stats(self) -> Dict[str, Any]: + return self._mgr.get_index_stats() + + def cleanup(self) -> None: + self._mgr.cleanup() + + diff --git a/src/code_index_mcp/indexing/index_provider.py b/src/code_index_mcp/indexing/index_provider.py index a87ddcf..660bb8d 100644 --- a/src/code_index_mcp/indexing/index_provider.py +++ b/src/code_index_mcp/indexing/index_provider.py @@ -1,43 +1,18 @@ """ -索引提供者接口定义 +Index provider interface definitions. -定义所有索引访问的标准接口,确保不同实现的一致性。 +Defines standard interfaces for all index access, ensuring consistency across different implementations. """ from typing import List, Optional, Dict, Any, Protocol from dataclasses import dataclass - -@dataclass -class SymbolInfo: - """符号信息标准数据结构""" - name: str - kind: str # 'class', 'function', 'method', 'variable', etc. - location: Dict[str, int] # {'line': int, 'column': int} - scope: str - documentation: List[str] - - -# Define FileInfo here to avoid circular imports -@dataclass -class FileInfo: - """文件信息标准数据结构""" - relative_path: str - language: str - absolute_path: str - - def __hash__(self): - return hash(self.relative_path) - - def __eq__(self, other): - if isinstance(other, FileInfo): - return self.relative_path == other.relative_path - return False +from .models import SymbolInfo, FileInfo @dataclass class IndexMetadata: - """索引元数据标准结构""" + """Standard index metadata structure.""" version: str format_type: str created_at: float @@ -49,68 +24,68 @@ class IndexMetadata: class IIndexProvider(Protocol): """ - 索引提供者标准接口 + Standard index provider interface. - 所有索引实现都必须遵循这个接口,确保一致的访问方式。 + All index implementations must follow this interface to ensure consistent access patterns. """ def get_file_list(self) -> List[FileInfo]: """ - 获取所有索引文件列表 + Get list of all indexed files. Returns: - 文件信息列表 + List of file information objects """ ... def get_file_info(self, file_path: str) -> Optional[FileInfo]: """ - 获取特定文件信息 + Get information for a specific file. Args: - file_path: 文件相对路径 + file_path: Relative file path Returns: - 文件信息,如果文件不在索引中则返回None + File information, or None if file is not in index """ ... def query_symbols(self, file_path: str) -> List[SymbolInfo]: """ - 查询文件中的符号信息 + Query symbol information in a file. Args: - file_path: 文件相对路径 + file_path: Relative file path Returns: - 符号信息列表 + List of symbol information objects """ ... - def search_files(self, pattern: str) -> List[FileInfo]: + def search_files(self, pattern: str) -> List[str]: """ - 按模式搜索文件 + Search files by pattern. Args: - pattern: glob模式或正则表达式 + pattern: Glob pattern or regular expression Returns: - 匹配的文件列表 + List of matching file paths """ ... def get_metadata(self) -> IndexMetadata: """ - 获取索引元数据 + Get index metadata. Returns: - 索引元数据信息 + Index metadata information """ ... def is_available(self) -> bool: """ - 检查索引是否可用 + Check if index is available. Returns: True if index is available and functional @@ -120,31 +95,31 @@ def is_available(self) -> bool: class IIndexManager(Protocol): """ - 索引管理器接口 + Index manager interface. - 定义索引生命周期管理的标准接口。 + Defines standard interface for index lifecycle management. """ def initialize(self) -> bool: - """初始化索引管理器""" + """Initialize the index manager.""" ... def get_provider(self) -> Optional[IIndexProvider]: - """获取当前活跃的索引提供者""" + """Get the current active index provider.""" ... def refresh_index(self, force: bool = False) -> bool: - """刷新索引""" + """Refresh the index.""" ... def save_index(self) -> bool: - """保存索引状态""" + """Save index state.""" ... def clear_index(self) -> None: - """清理索引状态""" + """Clear index state.""" ... def get_index_status(self) -> Dict[str, Any]: - """获取索引状态信息""" + """Get index status information.""" ... diff --git a/src/code_index_mcp/indexing/json_index_builder.py b/src/code_index_mcp/indexing/json_index_builder.py new file mode 100644 index 0000000..c12d694 --- /dev/null +++ b/src/code_index_mcp/indexing/json_index_builder.py @@ -0,0 +1,430 @@ +""" +JSON Index Builder - Clean implementation using Strategy pattern. + +This replaces the monolithic parser implementation with a clean, +maintainable Strategy pattern architecture. +""" + +import logging +import os +import time +from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed +from dataclasses import dataclass, asdict +from pathlib import Path +from typing import Dict, List, Optional, Any, Tuple + +from .strategies import StrategyFactory +from .models import SymbolInfo, FileInfo + +logger = logging.getLogger(__name__) + + +@dataclass +class IndexMetadata: + """Metadata for the JSON index.""" + project_path: str + indexed_files: int + index_version: str + timestamp: str + languages: List[str] + total_symbols: int = 0 + specialized_parsers: int = 0 + fallback_files: int = 0 + + +class JSONIndexBuilder: + """ + Main index builder using Strategy pattern for language parsing. + + This class orchestrates the index building process by: + 1. Discovering files in the project + 2. Using StrategyFactory to get appropriate parsers + 3. Extracting symbols and metadata + 4. Assembling the final JSON index + """ + + def __init__(self, project_path: str, additional_excludes: Optional[List[str]] = None): + from ..utils import FileFilter + + # Input validation + if not isinstance(project_path, str): + raise ValueError(f"Project path must be a string, got {type(project_path)}") + + project_path = project_path.strip() + if not project_path: + raise ValueError("Project path cannot be empty") + + if not os.path.isdir(project_path): + raise ValueError(f"Project path does not exist: {project_path}") + + self.project_path = project_path + self.in_memory_index: Optional[Dict[str, Any]] = None + self.strategy_factory = StrategyFactory() + self.file_filter = FileFilter(additional_excludes) + + logger.info(f"Initialized JSON index builder for {project_path}") + strategy_info = self.strategy_factory.get_strategy_info() + logger.info(f"Available parsing strategies: {len(strategy_info)} types") + + # Log specialized vs fallback coverage + specialized = len(self.strategy_factory.get_specialized_extensions()) + fallback = len(self.strategy_factory.get_fallback_extensions()) + logger.info(f"Specialized parsers: {specialized} extensions, Fallback coverage: {fallback} extensions") + + def _process_file(self, file_path: str, specialized_extensions: set) -> Optional[Tuple[Dict, Dict, str, bool]]: + """ + Process a single file - designed for parallel execution. + + Args: + file_path: Path to the file to process + specialized_extensions: Set of extensions with specialized parsers + + Returns: + Tuple of (symbols, file_info, language, is_specialized) or None on error + """ + try: + with open(file_path, 'r', encoding='utf-8', errors='ignore') as f: + content = f.read() + + ext = Path(file_path).suffix.lower() + rel_path = os.path.relpath(file_path, self.project_path).replace('\\', '/') + + # Get appropriate strategy + strategy = self.strategy_factory.get_strategy(ext) + + # Track strategy usage + is_specialized = ext in specialized_extensions + + # Parse file using strategy + symbols, file_info = strategy.parse_file(rel_path, content) + + logger.debug(f"Parsed {rel_path}: {len(symbols)} symbols ({file_info.language})") + + return (symbols, {rel_path: file_info}, file_info.language, is_specialized) + + except Exception as e: + logger.warning(f"Error processing {file_path}: {e}") + return None + + def build_index(self, parallel: bool = True, max_workers: Optional[int] = None) -> Dict[str, Any]: + """ + Build the complete index using Strategy pattern with parallel processing. + + Args: + parallel: Whether to use parallel processing (default: True) + max_workers: Maximum number of worker processes/threads (default: CPU count) + + Returns: + Complete JSON index with metadata, symbols, and file information + """ + logger.info(f"Building JSON index using Strategy pattern (parallel={parallel})...") + start_time = time.time() + + all_symbols = {} + all_files = {} + languages = set() + specialized_count = 0 + fallback_count = 0 + + # Get specialized extensions for tracking + specialized_extensions = set(self.strategy_factory.get_specialized_extensions()) + + # Get list of files to process + files_to_process = self._get_supported_files() + total_files = len(files_to_process) + + if total_files == 0: + logger.warning("No files to process") + return self._create_empty_index() + + logger.info(f"Processing {total_files} files...") + + if parallel and total_files > 1: + # Use ThreadPoolExecutor for I/O-bound file reading + # ProcessPoolExecutor has issues with strategy sharing + if max_workers is None: + max_workers = min(os.cpu_count() or 4, total_files) + + logger.info(f"Using parallel processing with {max_workers} workers") + + with ThreadPoolExecutor(max_workers=max_workers) as executor: + # Submit all tasks + future_to_file = { + executor.submit(self._process_file, file_path, specialized_extensions): file_path + for file_path in files_to_process + } + + # Process completed tasks + processed = 0 + for future in as_completed(future_to_file): + file_path = future_to_file[future] + result = future.result() + + if result: + symbols, file_info_dict, language, is_specialized = result + all_symbols.update(symbols) + all_files.update(file_info_dict) + languages.add(language) + + if is_specialized: + specialized_count += 1 + else: + fallback_count += 1 + + processed += 1 + if processed % 100 == 0: + logger.debug(f"Processed {processed}/{total_files} files") + else: + # Sequential processing + logger.info("Using sequential processing") + for file_path in files_to_process: + result = self._process_file(file_path, specialized_extensions) + if result: + symbols, file_info_dict, language, is_specialized = result + all_symbols.update(symbols) + all_files.update(file_info_dict) + languages.add(language) + + if is_specialized: + specialized_count += 1 + else: + fallback_count += 1 + + # Build index metadata + metadata = IndexMetadata( + project_path=self.project_path, + indexed_files=len(all_files), + index_version="2.0.0-strategy", + timestamp=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), + languages=sorted(list(languages)), + total_symbols=len(all_symbols), + specialized_parsers=specialized_count, + fallback_files=fallback_count + ) + + # Assemble final index + index = { + "metadata": asdict(metadata), + "symbols": {k: asdict(v) for k, v in all_symbols.items()}, + "files": {k: asdict(v) for k, v in all_files.items()} + } + + # Cache in memory + self.in_memory_index = index + + elapsed = time.time() - start_time + logger.info(f"Built index with {len(all_symbols)} symbols from {len(all_files)} files in {elapsed:.2f}s") + logger.info(f"Languages detected: {sorted(languages)}") + logger.info(f"Strategy usage: {specialized_count} specialized, {fallback_count} fallback") + + return index + + def _create_empty_index(self) -> Dict[str, Any]: + """Create an empty index structure.""" + metadata = IndexMetadata( + project_path=self.project_path, + indexed_files=0, + index_version="2.0.0-strategy", + timestamp=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), + languages=[], + total_symbols=0, + specialized_parsers=0, + fallback_files=0 + ) + + return { + "metadata": asdict(metadata), + "symbols": {}, + "files": {} + } + + def get_index(self) -> Optional[Dict[str, Any]]: + """Get the current in-memory index.""" + return self.in_memory_index + + def clear_index(self): + """Clear the in-memory index.""" + self.in_memory_index = None + logger.debug("Cleared in-memory index") + + def _get_supported_files(self) -> List[str]: + """ + Get all supported files in the project using centralized filtering. + + Returns: + List of file paths that can be parsed + """ + supported_files = [] + base_path = Path(self.project_path) + + try: + for root, dirs, files in os.walk(self.project_path): + # Filter directories in-place using centralized logic + dirs[:] = [d for d in dirs if not self.file_filter.should_exclude_directory(d)] + + # Filter files using centralized logic + for file in files: + file_path = Path(root) / file + if self.file_filter.should_process_path(file_path, base_path): + supported_files.append(str(file_path)) + + except Exception as e: + logger.error(f"Error scanning directory {self.project_path}: {e}") + + logger.debug(f"Found {len(supported_files)} supported files") + return supported_files + + def build_shallow_file_list(self) -> List[str]: + """ + Build a minimal shallow index consisting of relative file paths only. + + This method does not read file contents. It enumerates supported files + using centralized filtering and returns normalized relative paths with + forward slashes for cross-platform consistency. + + Returns: + List of relative file paths (using '/'). + """ + try: + absolute_files = self._get_supported_files() + result: List[str] = [] + for abs_path in absolute_files: + rel_path = os.path.relpath(abs_path, self.project_path).replace('\\', '/') + # Normalize leading './' + if rel_path.startswith('./'): + rel_path = rel_path[2:] + result.append(rel_path) + return result + except Exception as e: + logger.error(f"Failed to build shallow file list: {e}") + return [] + + def save_index(self, index: Dict[str, Any], index_path: str) -> bool: + """ + Save index to disk. + + Args: + index: Index data to save + index_path: Path where to save the index + + Returns: + True if successful, False otherwise + """ + try: + import json + with open(index_path, 'w', encoding='utf-8') as f: + json.dump(index, f, indent=2, ensure_ascii=False) + logger.info(f"Saved index to {index_path}") + return True + except Exception as e: + logger.error(f"Failed to save index to {index_path}: {e}") + return False + + def load_index(self, index_path: str) -> Optional[Dict[str, Any]]: + """ + Load index from disk. + + Args: + index_path: Path to the index file + + Returns: + Index data if successful, None otherwise + """ + try: + if not os.path.exists(index_path): + logger.debug(f"Index file not found: {index_path}") + return None + + import json + with open(index_path, 'r', encoding='utf-8') as f: + index = json.load(f) + + # Cache in memory + self.in_memory_index = index + logger.info(f"Loaded index from {index_path}") + return index + + except Exception as e: + logger.error(f"Failed to load index from {index_path}: {e}") + return None + + def get_parsing_statistics(self) -> Dict[str, Any]: + """ + Get detailed statistics about parsing capabilities. + + Returns: + Dictionary with parsing statistics and strategy information + """ + strategy_info = self.strategy_factory.get_strategy_info() + + return { + "total_strategies": len(strategy_info), + "specialized_languages": [lang for lang in strategy_info.keys() if not lang.startswith('fallback_')], + "fallback_languages": [lang.replace('fallback_', '') for lang in strategy_info.keys() if lang.startswith('fallback_')], + "total_extensions": len(self.strategy_factory.get_all_supported_extensions()), + "specialized_extensions": len(self.strategy_factory.get_specialized_extensions()), + "fallback_extensions": len(self.strategy_factory.get_fallback_extensions()), + "strategy_details": strategy_info + } + + def get_file_symbols(self, file_path: str) -> List[Dict[str, Any]]: + """ + Get symbols for a specific file. + + Args: + file_path: Relative path to the file + + Returns: + List of symbols in the file + """ + if not self.in_memory_index: + logger.warning("Index not loaded") + return [] + + try: + # Normalize file path + file_path = file_path.replace('\\', '/') + if file_path.startswith('./'): + file_path = file_path[2:] + + # Get file info + file_info = self.in_memory_index["files"].get(file_path) + if not file_info: + logger.warning(f"File not found in index: {file_path}") + return [] + + # Work directly with global symbols for this file + global_symbols = self.in_memory_index.get("symbols", {}) + result = [] + + # Find all symbols for this file directly from global symbols + for symbol_id, symbol_data in global_symbols.items(): + symbol_file = symbol_data.get("file", "").replace("\\", "/") + + # Check if this symbol belongs to our file + if symbol_file == file_path: + symbol_type = symbol_data.get("type", "unknown") + symbol_name = symbol_id.split("::")[-1] # Extract symbol name from ID + + # Create symbol info + symbol_info = { + "name": symbol_name, + "called_by": symbol_data.get("called_by", []), + "line": symbol_data.get("line"), + "signature": symbol_data.get("signature") + } + + # Categorize by type + if symbol_type in ["function", "method"]: + result.append(symbol_info) + elif symbol_type == "class": + result.append(symbol_info) + + # Sort by line number for consistent ordering + result.sort(key=lambda x: x.get("line", 0)) + + return result + + except Exception as e: + logger.error(f"Error getting file symbols for {file_path}: {e}") + return [] diff --git a/src/code_index_mcp/indexing/json_index_manager.py b/src/code_index_mcp/indexing/json_index_manager.py new file mode 100644 index 0000000..ec320e4 --- /dev/null +++ b/src/code_index_mcp/indexing/json_index_manager.py @@ -0,0 +1,465 @@ +""" +JSON Index Manager - Manages the lifecycle of the JSON-based index. + +This replaces the SCIP unified_index_manager with a simpler approach +focused on fast JSON-based indexing and querying. +""" + +import hashlib +import json +import logging +import os +import re +import tempfile +import threading +import fnmatch +from pathlib import Path +from typing import Dict, List, Optional, Any + +from .json_index_builder import JSONIndexBuilder +from ..constants import SETTINGS_DIR, INDEX_FILE, INDEX_FILE_SHALLOW + +logger = logging.getLogger(__name__) + + +class JSONIndexManager: + """Manages JSON-based code index lifecycle and storage.""" + + def __init__(self): + self.project_path: Optional[str] = None + self.index_builder: Optional[JSONIndexBuilder] = None + self.temp_dir: Optional[str] = None + self.index_path: Optional[str] = None + self.shallow_index_path: Optional[str] = None + self._shallow_file_list: Optional[List[str]] = None + self._lock = threading.RLock() + logger.info("Initialized JSON Index Manager") + + def set_project_path(self, project_path: str) -> bool: + """Set the project path and initialize index storage.""" + with self._lock: + try: + # Input validation + if not project_path or not isinstance(project_path, str): + logger.error(f"Invalid project path: {project_path}") + return False + + project_path = project_path.strip() + if not project_path: + logger.error("Project path cannot be empty") + return False + + if not os.path.isdir(project_path): + logger.error(f"Project path does not exist: {project_path}") + return False + + self.project_path = project_path + self.index_builder = JSONIndexBuilder(project_path) + + # Create temp directory for index storage + project_hash = hashlib.md5(project_path.encode()).hexdigest()[:12] + self.temp_dir = os.path.join(tempfile.gettempdir(), SETTINGS_DIR, project_hash) + os.makedirs(self.temp_dir, exist_ok=True) + + self.index_path = os.path.join(self.temp_dir, INDEX_FILE) + self.shallow_index_path = os.path.join(self.temp_dir, INDEX_FILE_SHALLOW) + + logger.info(f"Set project path: {project_path}") + logger.info(f"Index storage: {self.index_path}") + return True + + except Exception as e: + logger.error(f"Failed to set project path: {e}") + return False + + def build_index(self, force_rebuild: bool = False) -> bool: + """Build or rebuild the index.""" + with self._lock: + if not self.index_builder or not self.project_path: + logger.error("Index builder not initialized") + return False + + try: + # Check if we need to rebuild + if not force_rebuild and self._is_index_fresh(): + logger.info("Index is fresh, skipping rebuild") + return True + + logger.info("Building JSON index...") + index = self.index_builder.build_index() + + # Save to disk + self.index_builder.save_index(index, self.index_path) + + logger.info(f"Successfully built index with {len(index['symbols'])} symbols") + return True + + except Exception as e: + logger.error(f"Failed to build index: {e}") + return False + + def load_index(self) -> bool: + """Load existing index from disk.""" + with self._lock: + if not self.index_builder or not self.index_path: + logger.error("Index manager not initialized") + return False + + try: + index = self.index_builder.load_index(self.index_path) + if index: + logger.info(f"Loaded index with {len(index['symbols'])} symbols") + return True + else: + logger.warning("No existing index found") + return False + + except Exception as e: + logger.error(f"Failed to load index: {e}") + return False + + def build_shallow_index(self) -> bool: + """Build and save the minimal shallow index (file list).""" + with self._lock: + if not self.index_builder or not self.project_path or not self.shallow_index_path: + logger.error("Index builder not initialized for shallow index") + return False + + try: + file_list = self.index_builder.build_shallow_file_list() + # Persist as a JSON array for minimal overhead + with open(self.shallow_index_path, 'w', encoding='utf-8') as f: + json.dump(file_list, f, ensure_ascii=False) + self._shallow_file_list = file_list + logger.info(f"Saved shallow index with {len(file_list)} files to {self.shallow_index_path}") + return True + except Exception as e: + logger.error(f"Failed to build shallow index: {e}") + return False + + def load_shallow_index(self) -> bool: + """Load shallow index (file list) from disk into memory.""" + with self._lock: + try: + if not self.shallow_index_path or not os.path.exists(self.shallow_index_path): + logger.warning("No existing shallow index found") + return False + with open(self.shallow_index_path, 'r', encoding='utf-8') as f: + data = json.load(f) + if not isinstance(data, list): + logger.error("Shallow index format invalid (expected list)") + return False + # Normalize paths + normalized = [] + for p in data: + if isinstance(p, str): + q = p.replace('\\\\', '/').replace('\\', '/') + if q.startswith('./'): + q = q[2:] + normalized.append(q) + self._shallow_file_list = normalized + logger.info(f"Loaded shallow index with {len(normalized)} files") + return True + except Exception as e: + logger.error(f"Failed to load shallow index: {e}") + return False + + def refresh_index(self) -> bool: + """Refresh the index (rebuild and reload).""" + with self._lock: + logger.info("Refreshing index...") + if self.build_index(force_rebuild=True): + return self.load_index() + return False + + def find_files(self, pattern: str = "*") -> List[str]: + """ + Find files matching a glob pattern using the SHALLOW file list only. + + Notes: + - '*' does not cross '/' + - '**' matches across directories + - Always sources from the shallow index for consistency and speed + """ + with self._lock: + # Input validation + if not isinstance(pattern, str): + logger.error(f"Pattern must be a string, got {type(pattern)}") + return [] + + pattern = pattern.strip() + if not pattern: + pattern = "*" + + # Normalize to forward slashes + norm_pattern = pattern.replace('\\\\', '/').replace('\\', '/') + + # Build glob regex: '*' does not cross '/', '**' crosses directories + regex = self._compile_glob_regex(norm_pattern) + + # Always use shallow index for file discovery + try: + if self._shallow_file_list is None: + # Try load existing shallow index; if missing, build then load + if not self.load_shallow_index(): + # If still not available, attempt to build + if self.build_shallow_index(): + self.load_shallow_index() + + files = list(self._shallow_file_list or []) + + if norm_pattern == "*": + return files + + return [f for f in files if regex.match(f) is not None] + + except Exception as e: + logger.error(f"Error finding files: {e}") + return [] + + def get_file_summary(self, file_path: str) -> Optional[Dict[str, Any]]: + """ + Get summary information for a file. + + This method attempts to retrieve comprehensive file information including + symbol counts, functions, classes, methods, and imports. If the index + is not loaded, it will attempt auto-initialization to restore from the + most recent index state. + + Args: + file_path: Relative path to the file + + Returns: + Dictionary containing file summary information, or None if not found + """ + with self._lock: + # Input validation + if not isinstance(file_path, str): + logger.error(f"File path must be a string, got {type(file_path)}") + return None + + file_path = file_path.strip() + if not file_path: + logger.error("File path cannot be empty") + return None + + # Try to load cached index if not ready + if not self.index_builder or not self.index_builder.in_memory_index: + if not self._try_load_cached_index(): + logger.warning("Index not loaded and no cached index available") + return None + + try: + # Normalize file path + file_path = file_path.replace('\\', '/') + if file_path.startswith('./'): + file_path = file_path[2:] + + # Get file info + file_info = self.index_builder.in_memory_index["files"].get(file_path) + if not file_info: + logger.warning(f"File not found in index: {file_path}") + return None + + # Get symbols in file + symbols = self.index_builder.get_file_symbols(file_path) + + # Categorize symbols by signature + functions = [] + classes = [] + methods = [] + + for s in symbols: + signature = s.get("signature", "") + if signature: + if signature.startswith("def ") and "::" in signature: + # Method: contains class context + methods.append(s) + elif signature.startswith("def "): + # Function: starts with def but no class context + functions.append(s) + elif signature.startswith("class ") or signature is None: + # Class: starts with class or has no signature + classes.append(s) + else: + # Default to function for unknown signatures + functions.append(s) + else: + # No signature - try to infer from name patterns or default to function + name = s.get("name", "") + if name and name[0].isupper(): + # Capitalized names are likely classes + classes.append(s) + else: + # Default to function + functions.append(s) + + return { + "file_path": file_path, + "language": file_info["language"], + "line_count": file_info["line_count"], + "symbol_count": len(symbols), + "functions": functions, + "classes": classes, + "methods": methods, + "imports": file_info.get("imports", []), + "exports": file_info.get("exports", []) + } + + except Exception as e: + logger.error(f"Error getting file summary: {e}") + return None + + def get_index_stats(self) -> Dict[str, Any]: + """Get statistics about the current index.""" + with self._lock: + if not self.index_builder or not self.index_builder.in_memory_index: + return {"status": "not_loaded"} + + try: + index = self.index_builder.in_memory_index + metadata = index["metadata"] + + symbol_counts = {} + for symbol_data in index["symbols"].values(): + symbol_type = symbol_data.get("type", "unknown") + symbol_counts[symbol_type] = symbol_counts.get(symbol_type, 0) + 1 + + return { + "status": "loaded", + "project_path": metadata["project_path"], + "indexed_files": metadata["indexed_files"], + "total_symbols": len(index["symbols"]), + "symbol_types": symbol_counts, + "languages": metadata["languages"], + "index_version": metadata["index_version"], + "timestamp": metadata["timestamp"] + } + + except Exception as e: + logger.error(f"Error getting index stats: {e}") + return {"status": "error", "error": str(e)} + + def _is_index_fresh(self) -> bool: + """Check if the current index is fresh.""" + if not self.index_path or not os.path.exists(self.index_path): + return False + + try: + from code_index_mcp.utils.file_filter import FileFilter as _FileFilter # pylint: disable=C0415 + file_filter = _FileFilter() + + # Simple freshness check - index exists and is recent + index_mtime = os.path.getmtime(self.index_path) + base_path = Path(self.project_path) + + # Check if any source files are newer than index + for root, dirs, files in os.walk(self.project_path): + # Filter directories using centralized logic + dirs[:] = [d for d in dirs if not file_filter.should_exclude_directory(d)] + + for file in files: + file_path = Path(root) / file + if file_filter.should_process_path(file_path, base_path): + if os.path.getmtime(str(file_path)) > index_mtime: + return False + + return True + + except Exception as e: + logger.warning(f"Error checking index freshness: {e}") + return False + + def _try_load_cached_index(self, expected_project_path: Optional[str] = None) -> bool: + """ + Try to load a cached index file if available. + + This is a simplified version of auto-initialization that only loads + a cached index if we can verify it matches the expected project. + + Args: + expected_project_path: Optional path to verify against cached index + + Returns: + True if cached index was loaded successfully, False otherwise. + """ + try: + # First try to load from current index_path if set + if self.index_path and os.path.exists(self.index_path): + return self.load_index() + + # If expected project path provided, try to find its cache + if expected_project_path: + project_hash = hashlib.md5(expected_project_path.encode()).hexdigest()[:12] + temp_dir = os.path.join(tempfile.gettempdir(), SETTINGS_DIR, project_hash) + index_path = os.path.join(temp_dir, INDEX_FILE) + + if os.path.exists(index_path): + # Verify the cached index matches the expected project + with open(index_path, 'r', encoding='utf-8') as f: + index_data = json.load(f) + cached_project = index_data.get('metadata', {}).get('project_path') + + if cached_project == expected_project_path: + self.temp_dir = temp_dir + self.index_path = index_path + return self.load_index() + else: + logger.warning(f"Cached index project mismatch: {cached_project} != {expected_project_path}") + + return False + + except Exception as e: + logger.debug(f"Failed to load cached index: {e}") + return False + + def cleanup(self): + """Clean up resources.""" + with self._lock: + self.project_path = None + self.index_builder = None + self.temp_dir = None + self.index_path = None + logger.info("Cleaned up JSON Index Manager") + + @staticmethod + def _compile_glob_regex(pattern: str) -> re.Pattern: + """ + Compile a glob pattern where '*' does not match '/', and '**' matches across directories. + + Examples: + src/*.py -> direct children .py under src + **/*.py -> .py at any depth + """ + # Translate glob to regex + i = 0 + out = [] + special = ".^$+{}[]|()" + while i < len(pattern): + c = pattern[i] + if c == '*': + if i + 1 < len(pattern) and pattern[i + 1] == '*': + # '**' -> match across directories + out.append('.*') + i += 2 + continue + else: + out.append('[^/]*') + elif c == '?': + out.append('[^/]') + elif c in special: + out.append('\\' + c) + else: + out.append(c) + i += 1 + regex_str = '^' + ''.join(out) + '$' + return re.compile(regex_str) + + +# Global instance +_index_manager = JSONIndexManager() + + +def get_index_manager() -> JSONIndexManager: + """Get the global index manager instance.""" + return _index_manager diff --git a/src/code_index_mcp/indexing/models/__init__.py b/src/code_index_mcp/indexing/models/__init__.py new file mode 100644 index 0000000..b120a34 --- /dev/null +++ b/src/code_index_mcp/indexing/models/__init__.py @@ -0,0 +1,8 @@ +""" +Model classes for the indexing system. +""" + +from .symbol_info import SymbolInfo +from .file_info import FileInfo + +__all__ = ['SymbolInfo', 'FileInfo'] \ No newline at end of file diff --git a/src/code_index_mcp/indexing/models/file_info.py b/src/code_index_mcp/indexing/models/file_info.py new file mode 100644 index 0000000..0678774 --- /dev/null +++ b/src/code_index_mcp/indexing/models/file_info.py @@ -0,0 +1,24 @@ +""" +FileInfo model for representing file metadata. +""" + +from dataclasses import dataclass +from typing import Dict, List, Optional, Any + + +@dataclass +class FileInfo: + """Information about a source code file.""" + + language: str # programming language + line_count: int # total lines in file + symbols: Dict[str, List[str]] # symbol categories (functions, classes, etc.) + imports: List[str] # imported modules/packages + exports: Optional[List[str]] = None # exported symbols (for JS/TS modules) + package: Optional[str] = None # package name (for Java, Go, etc.) + docstring: Optional[str] = None # file-level documentation + + def __post_init__(self): + """Initialize mutable defaults.""" + if self.exports is None: + self.exports = [] \ No newline at end of file diff --git a/src/code_index_mcp/indexing/models/symbol_info.py b/src/code_index_mcp/indexing/models/symbol_info.py new file mode 100644 index 0000000..1659330 --- /dev/null +++ b/src/code_index_mcp/indexing/models/symbol_info.py @@ -0,0 +1,23 @@ +""" +SymbolInfo model for representing code symbols. +""" + +from dataclasses import dataclass +from typing import Optional, List + + +@dataclass +class SymbolInfo: + """Information about a code symbol (function, class, method, etc.).""" + + type: str # function, class, method, interface, etc. + file: str # file path where symbol is defined + line: int # line number where symbol starts + signature: Optional[str] = None # function/method signature + docstring: Optional[str] = None # documentation string + called_by: Optional[List[str]] = None # list of symbols that call this symbol + + def __post_init__(self): + """Initialize mutable defaults.""" + if self.called_by is None: + self.called_by = [] \ No newline at end of file diff --git a/src/code_index_mcp/indexing/scip_builder.py b/src/code_index_mcp/indexing/scip_builder.py deleted file mode 100644 index 828d378..0000000 --- a/src/code_index_mcp/indexing/scip_builder.py +++ /dev/null @@ -1,381 +0,0 @@ -"""SCIP Index Builder - main orchestrator for SCIP-based indexing.""" - -import os -import fnmatch -import pathspec -import logging -from pathlib import Path -from datetime import datetime -from typing import List, Dict, Any, Optional, Tuple -from concurrent.futures import ThreadPoolExecutor, as_completed -from dataclasses import dataclass, field - -from ..scip.factory import SCIPIndexerFactory, SCIPIndexingError -from ..scip.proto import scip_pb2 - - -logger = logging.getLogger(__name__) - - - -@dataclass -class ValidationResult: - """Result of SCIP index validation.""" - is_valid: bool - errors: List[str] = field(default_factory=list) - warnings: List[str] = field(default_factory=list) - - -@dataclass -class ScanResult: - """Result of a project scan.""" - file_list: List[Dict[str, Any]] - project_metadata: Dict[str, Any] - - -class SCIPIndexBuilder: - """Main builder class that orchestrates SCIP-based indexing.""" - - def __init__(self, max_workers: Optional[int] = None): - self.max_workers = max_workers - self.scip_factory = SCIPIndexerFactory() - self.project_path = "" - - def build_scip_index(self, project_path: str) -> scip_pb2.Index: - """Build complete SCIP index for a project.""" - # Build index without timing logs - start_time = datetime.now() - self.project_path = project_path - - logger.info("🚀 Starting SCIP index build for project: %s", project_path) - logger.debug("Build configuration: max_workers=%s", self.max_workers) - - try: - logger.info("📁 Phase 1: Scanning project files...") - # Phase 1: scan files - scan_result = self._scan_project_files(project_path) - total_files_considered = len(scan_result.file_list) - logger.info("✅ File scan completed, found %d valid files", total_files_considered) - - logger.info("🏷️ Phase 2: Grouping files by strategy...") - file_paths = [str(f['path']) for f in scan_result.file_list] - strategy_files = self.scip_factory.group_files_by_strategy(file_paths) - - for strategy, files in strategy_files.items(): - logger.info(" 📋 %s: %d files", strategy.__class__.__name__, len(files)) - logger.debug("File grouping completed") - - logger.info("⚙️ Phase 3: Processing files with strategies...") - all_documents = self._process_files(strategy_files, project_path) - logger.info("✅ File processing completed, generated %d documents", len(all_documents)) - - logger.info("🔗 Phase 4: Assembling SCIP index...") - scip_index = self._assemble_scip_index(all_documents, scan_result, start_time) - logger.debug("Index assembly completed") - - logger.info("🎉 SCIP index build completed successfully") - - logger.info("🔍 Phase 5: Validating SCIP index...") - validation_result = self._validate_scip_index(scip_index) - if not validation_result.is_valid: - logger.warning("⚠️ Index validation found issues: %s", validation_result.errors) - else: - logger.info("✅ Index validation passed") - - return scip_index - except Exception as e: - logger.error("❌ SCIP index build failed: %s", e, exc_info=True) - return self._create_fallback_scip_index(project_path, str(e)) - - def _scan_project_files(self, project_path: str) -> ScanResult: - """Scan project directory to get a list of files and metadata.""" - logger.debug("📂 Starting file system scan of: %s", project_path) - files = [] - - # Use project settings for exclude patterns - logger.debug("🚫 Loading exclude patterns...") - ignored_dirs = self._get_exclude_patterns() - logger.debug("Ignored directories: %s", ignored_dirs) - - # Load gitignore patterns - logger.debug("📋 Loading .gitignore patterns...") - gitignore_spec = self._load_gitignore_patterns(project_path) - if hasattr(gitignore_spec, 'patterns'): - logger.debug("Found %d gitignore patterns", len(gitignore_spec.patterns)) - elif gitignore_spec: - logger.debug("Loaded gitignore specification") - else: - logger.debug("No gitignore patterns found") - - scan_count = 0 - gitignore_skipped = 0 - hidden_files_skipped = 0 - ignored_dir_time = 0 - gitignore_check_time = 0 - - for root, dirs, filenames in os.walk(project_path): - scan_count += 1 - if scan_count % 100 == 0: - logger.debug("📊 Scanned %d directories, found %d files so far...", scan_count, len(files)) - - # Check if current root path contains any ignored directories - ignored_dir_start = datetime.now() - root_parts = Path(root).parts - project_parts = Path(project_path).parts - relative_parts = root_parts[len(project_parts):] - - # Skip if any part of the path is in ignored_dirs - if any(part in ignored_dirs for part in relative_parts): - ignored_dir_time += (datetime.now() - ignored_dir_start).total_seconds() - logger.debug("🚫 Skipping ignored directory: %s", root) - dirs[:] = [] # Don't descend further - continue - - # Modify dirs in-place to prune the search - original_dirs = len(dirs) - dirs[:] = [d for d in dirs if d not in ignored_dirs] - if len(dirs) < original_dirs: - ignored_dir_time += (datetime.now() - ignored_dir_start).total_seconds() - logger.debug("🚫 Filtered %d ignored subdirectories in %s", original_dirs - len(dirs), root) - else: - ignored_dir_time += (datetime.now() - ignored_dir_start).total_seconds() - - # Apply gitignore filtering to directories - gitignore_dir_start = datetime.now() - pre_gitignore_dirs = len(dirs) - dirs[:] = [d for d in dirs if not self._is_gitignored(os.path.join(root, d), project_path, gitignore_spec)] - gitignore_filtered_dirs = pre_gitignore_dirs - len(dirs) - gitignore_check_time += (datetime.now() - gitignore_dir_start).total_seconds() - - if gitignore_filtered_dirs > 0: - logger.debug("📋 .gitignore filtered %d directories in %s", gitignore_filtered_dirs, root) - - for filename in filenames: - file_check_start = datetime.now() - - # Ignore hidden files (but allow .gitignore itself) - if filename.startswith('.') and filename != '.gitignore': - hidden_files_skipped += 1 - gitignore_check_time += (datetime.now() - file_check_start).total_seconds() - continue - - full_path = os.path.join(root, filename) - - # Apply gitignore filtering to files - if self._is_gitignored(full_path, project_path, gitignore_spec): - gitignore_skipped += 1 - gitignore_check_time += (datetime.now() - file_check_start).total_seconds() - continue - - gitignore_check_time += (datetime.now() - file_check_start).total_seconds() - files.append(full_path) - - logger.info("📊 File scan summary: scanned %d directories, found %d valid files", scan_count, len(files)) - logger.info("🚫 Filtered files: %d gitignored, %d hidden files", gitignore_skipped, hidden_files_skipped) - - file_list = [{'path': f, 'is_binary': False} for f in files] - project_metadata = {"project_name": os.path.basename(project_path)} - return ScanResult(file_list=file_list, project_metadata=project_metadata) - - def _get_exclude_patterns(self) -> set: - """Get exclude patterns from project settings.""" - try: - from ..project_settings import ProjectSettings - # Try to get patterns from project settings - settings = ProjectSettings(self.project_path, skip_load=False) - exclude_patterns = settings.config.get("file_watcher", {}).get("exclude_patterns", []) - return set(exclude_patterns) - except Exception: - # Fallback to basic patterns if settings not available - return {'.git', '.svn', '.hg', '__pycache__', 'node_modules', '.venv', 'venv', - 'build', 'dist', 'target', '.idea', '.vscode'} - - def _load_gitignore_patterns(self, project_path: str): - """Load patterns from .gitignore file using pathspec (required).""" - gitignore_path = os.path.join(project_path, '.gitignore') - - if os.path.exists(gitignore_path): - try: - with open(gitignore_path, 'r', encoding='utf-8') as f: - spec = pathspec.PathSpec.from_lines('gitignorestyle', f) - return spec - except Exception: - logger.debug("Failed to load .gitignore via pathspec") - return None - - return None - - - - def _is_gitignored(self, file_path: str, project_path: str, gitignore_spec) -> bool: - """Check if a file or directory is ignored by .gitignore patterns using pathspec.""" - if not gitignore_spec: - return False - - try: - # Get relative path from project root - rel_path = os.path.relpath(file_path, project_path) - # Normalize path separators for cross-platform compatibility - rel_path = rel_path.replace('\\', '/') - - return gitignore_spec.match_file(rel_path) - except Exception: - return False - - - - def _process_files(self, strategy_files: Dict, project_path: str) -> List[scip_pb2.Document]: - """Process files using appropriate strategies, either sequentially or in parallel.""" - if self.max_workers and self.max_workers > 1: - return self._process_files_parallel(strategy_files, project_path) - return self._process_files_sequential(strategy_files, project_path) - - def _process_files_sequential(self, strategy_files: Dict, project_path: str) -> List[scip_pb2.Document]: - """Process files sequentially.""" - logger.debug("🔄 Processing files sequentially (single-threaded)") - all_documents = [] - - for strategy, files in strategy_files.items(): - strategy_name = strategy.__class__.__name__ - logger.info("⚙️ Processing %d files with %s...", len(files), strategy_name) - - try: - documents = strategy.generate_scip_documents(files, project_path) - logger.info("✅ %s completed, generated %d documents", strategy_name, len(documents)) - all_documents.extend(documents) - except Exception as e: - logger.error("❌ %s failed: %s", strategy_name, e, exc_info=True) - logger.info("🔄 Trying fallback strategies for %d files...", len(files)) - fallback_docs = self._try_fallback_strategies(files, strategy, project_path) - all_documents.extend(fallback_docs) - logger.info("📄 Fallback generated %d documents", len(fallback_docs)) - - return all_documents - - def _process_files_parallel(self, strategy_files: Dict, project_path: str) -> List[scip_pb2.Document]: - """Process files in parallel.""" - all_documents = [] - with ThreadPoolExecutor(max_workers=self.max_workers) as executor: - future_to_strategy = { - executor.submit(s.generate_scip_documents, f, project_path): (s, f) - for s, f in strategy_files.items() - } - for future in as_completed(future_to_strategy): - strategy, files = future_to_strategy[future] - try: - documents = future.result() - all_documents.extend(documents) - - except Exception as e: - all_documents.extend(self._try_fallback_strategies(files, strategy, project_path)) - return all_documents - - def _try_fallback_strategies(self, failed_files: List[str], failed_strategy, project_path: str) -> List[scip_pb2.Document]: - """Try fallback strategies for files that failed.""" - fallback_documents = [] - - for file_path in failed_files: - extension = self._get_file_extension(file_path) - strategies = self.scip_factory.get_strategies_for_extension(extension) - fallback_strategies = [s for s in strategies if s != failed_strategy] - - success = False - for fallback in fallback_strategies: - try: - docs = fallback.generate_scip_documents([file_path], project_path) - fallback_documents.extend(docs) - success = True - break - except Exception: - pass - - if not success: - pass - return fallback_documents - - def _assemble_scip_index(self, documents: List[scip_pb2.Document], scan_result: ScanResult, start_time: datetime) -> scip_pb2.Index: - """Assemble the final SCIP index.""" - scip_index = scip_pb2.Index() - scip_index.metadata.CopyFrom(self._create_metadata(scan_result.project_metadata, start_time)) - scip_index.documents.extend(documents) - external_symbols = self._extract_external_symbols(documents) - scip_index.external_symbols.extend(external_symbols) - - return scip_index - - def _create_metadata(self, project_metadata: Dict[str, Any], start_time: datetime) -> scip_pb2.Metadata: - """Create SCIP metadata.""" - metadata = scip_pb2.Metadata() - metadata.version = scip_pb2.ProtocolVersion.UnspecifiedProtocolVersion - metadata.tool_info.name = "code-index-mcp" - metadata.tool_info.version = "1.2.1" - metadata.tool_info.arguments.extend(["scip-indexing"]) - metadata.project_root = self.project_path - metadata.text_document_encoding = scip_pb2.TextDocumentEncoding.UTF8 - return metadata - - def _extract_external_symbols(self, documents: List[scip_pb2.Document]) -> List[scip_pb2.SymbolInformation]: - """Extract and deduplicate external symbols from strategies.""" - external_symbols = [] - seen_symbols = set() - - # Collect external symbols from all strategies - for strategy in self.scip_factory.strategies: - try: - strategy_external_symbols = strategy.get_external_symbols() - for symbol_info in strategy_external_symbols: - symbol_id = symbol_info.symbol - if symbol_id not in seen_symbols: - external_symbols.append(symbol_info) - seen_symbols.add(symbol_id) - except Exception as e: - # Strategy might not support external symbols yet - continue - - return external_symbols - - def _validate_scip_index(self, scip_index: scip_pb2.Index) -> ValidationResult: - """Validate the completed SCIP index.""" - errors, warnings = [], [] - if not scip_index.metadata.project_root: - errors.append("Missing project_root in metadata") - if not scip_index.documents: - warnings.append("No documents in SCIP index") - for i, doc in enumerate(scip_index.documents): - if not doc.relative_path: - errors.append(f"Document {i} missing relative_path") - if not doc.language: - warnings.append(f"Document {i} ({doc.relative_path}) missing language") - if not scip_index.metadata.tool_info.name: - warnings.append("Missing tool name in metadata") - return ValidationResult(is_valid=not errors, errors=errors, warnings=warnings) - - def _create_fallback_scip_index(self, project_path: str, error_message: str) -> scip_pb2.Index: - """Create a minimal fallback SCIP index on failure.""" - scip_index = scip_pb2.Index() - metadata = scip_pb2.Metadata() - metadata.tool_info.name = "code-index-mcp" - metadata.tool_info.version = "1.2.1" - metadata.project_root = project_path - metadata.text_document_encoding = scip_pb2.TextDocumentEncoding.UTF8 - scip_index.metadata.CopyFrom(metadata) - - error_doc = scip_pb2.Document() - error_doc.relative_path = "BUILD_ERROR.md" - error_doc.language = "markdown" - error_doc.text = f"# Build Error\n\nSCIP indexing failed: {error_message}\n" - scip_index.documents.append(error_doc) - - - return scip_index - - def _get_file_extension(self, file_path: str) -> str: - """Extract file extension.""" - return os.path.splitext(file_path)[1].lower() - - def get_strategy_summary(self) -> Dict[str, Any]: - """Get a summary of available strategies.""" - return { - 'total_strategies': len(self.scip_factory.strategies), - 'registered_strategies': [s.get_strategy_name() for s in self.scip_factory.strategies] - } diff --git a/src/code_index_mcp/indexing/shallow_index_manager.py b/src/code_index_mcp/indexing/shallow_index_manager.py new file mode 100644 index 0000000..530c593 --- /dev/null +++ b/src/code_index_mcp/indexing/shallow_index_manager.py @@ -0,0 +1,155 @@ +""" +Shallow Index Manager - Manages a minimal file-list-only index. + +This manager builds and loads a shallow index consisting of relative file +paths only. It is optimized for fast initialization and filename-based +search/browsing. Content parsing and symbol extraction are not performed. +""" + +from __future__ import annotations + +import hashlib +import json +import logging +import os +import tempfile +import threading +from typing import List, Optional +import re + +from .json_index_builder import JSONIndexBuilder +from ..constants import SETTINGS_DIR, INDEX_FILE_SHALLOW + +logger = logging.getLogger(__name__) + + +class ShallowIndexManager: + """Manage shallow (file-list) index lifecycle and storage.""" + + def __init__(self) -> None: + self.project_path: Optional[str] = None + self.index_builder: Optional[JSONIndexBuilder] = None + self.temp_dir: Optional[str] = None + self.index_path: Optional[str] = None + self._file_list: Optional[List[str]] = None + self._lock = threading.RLock() + + def set_project_path(self, project_path: str) -> bool: + with self._lock: + try: + if not isinstance(project_path, str) or not project_path.strip(): + logger.error("Invalid project path for shallow index") + return False + project_path = project_path.strip() + if not os.path.isdir(project_path): + logger.error(f"Project path does not exist: {project_path}") + return False + + self.project_path = project_path + self.index_builder = JSONIndexBuilder(project_path) + + project_hash = hashlib.md5(project_path.encode()).hexdigest()[:12] + self.temp_dir = os.path.join(tempfile.gettempdir(), SETTINGS_DIR, project_hash) + os.makedirs(self.temp_dir, exist_ok=True) + self.index_path = os.path.join(self.temp_dir, INDEX_FILE_SHALLOW) + return True + except Exception as e: # noqa: BLE001 - centralized logging + logger.error(f"Failed to set project path (shallow): {e}") + return False + + def build_index(self) -> bool: + """Build and persist the shallow file list index.""" + with self._lock: + if not self.index_builder or not self.index_path: + logger.error("ShallowIndexManager not initialized") + return False + try: + file_list = self.index_builder.build_shallow_file_list() + with open(self.index_path, 'w', encoding='utf-8') as f: + json.dump(file_list, f, ensure_ascii=False) + self._file_list = file_list + logger.info(f"Built shallow index with {len(file_list)} files") + return True + except Exception as e: # noqa: BLE001 + logger.error(f"Failed to build shallow index: {e}") + return False + + def load_index(self) -> bool: + """Load shallow index from disk to memory.""" + with self._lock: + try: + if not self.index_path or not os.path.exists(self.index_path): + return False + with open(self.index_path, 'r', encoding='utf-8') as f: + data = json.load(f) + if isinstance(data, list): + # Normalize slashes/prefix + normalized: List[str] = [] + for p in data: + if isinstance(p, str): + q = p.replace('\\\\', '/').replace('\\', '/') + if q.startswith('./'): + q = q[2:] + normalized.append(q) + self._file_list = normalized + return True + return False + except Exception as e: # noqa: BLE001 + logger.error(f"Failed to load shallow index: {e}") + return False + + def get_file_list(self) -> List[str]: + with self._lock: + return list(self._file_list or []) + + def find_files(self, pattern: str = "*") -> List[str]: + with self._lock: + if not isinstance(pattern, str): + return [] + norm = (pattern.strip() or "*").replace('\\\\','/').replace('\\','/') + regex = self._compile_glob_regex(norm) + files = self._file_list or [] + if norm == "*": + return list(files) + return [f for f in files if regex.match(f) is not None] + + @staticmethod + def _compile_glob_regex(pattern: str) -> re.Pattern: + i = 0 + out = [] + special = ".^$+{}[]|()" + while i < len(pattern): + c = pattern[i] + if c == '*': + if i + 1 < len(pattern) and pattern[i + 1] == '*': + out.append('.*') + i += 2 + continue + else: + out.append('[^/]*') + elif c == '?': + out.append('[^/]') + elif c in special: + out.append('\\' + c) + else: + out.append(c) + i += 1 + return re.compile('^' + ''.join(out) + '$') + + def cleanup(self) -> None: + with self._lock: + self.project_path = None + self.index_builder = None + self.temp_dir = None + self.index_path = None + self._file_list = None + + +# Global singleton +_shallow_manager = ShallowIndexManager() + + +def get_shallow_index_manager() -> ShallowIndexManager: + return _shallow_manager + + diff --git a/src/code_index_mcp/indexing/strategies/__init__.py b/src/code_index_mcp/indexing/strategies/__init__.py new file mode 100644 index 0000000..0f51274 --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/__init__.py @@ -0,0 +1,8 @@ +""" +Parsing strategies for different programming languages. +""" + +from .base_strategy import ParsingStrategy +from .strategy_factory import StrategyFactory + +__all__ = ['ParsingStrategy', 'StrategyFactory'] \ No newline at end of file diff --git a/src/code_index_mcp/indexing/strategies/base_strategy.py b/src/code_index_mcp/indexing/strategies/base_strategy.py new file mode 100644 index 0000000..691dce0 --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/base_strategy.py @@ -0,0 +1,87 @@ +""" +Abstract base class for language parsing strategies. +""" + +import os +from abc import ABC, abstractmethod +from typing import Dict, List, Tuple, Optional +from ..models import SymbolInfo, FileInfo + + +class ParsingStrategy(ABC): + """Abstract base class for language parsing strategies.""" + + @abstractmethod + def get_language_name(self) -> str: + """Return the language name this strategy handles.""" + + @abstractmethod + def get_supported_extensions(self) -> List[str]: + """Return list of file extensions this strategy supports.""" + + @abstractmethod + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """ + Parse file content and extract symbols. + + Args: + file_path: Path to the file being parsed + content: File content as string + + Returns: + Tuple of (symbols_dict, file_info) + - symbols_dict: Maps symbol_id -> SymbolInfo + - file_info: FileInfo with metadata about the file + """ + + def _create_symbol_id(self, file_path: str, symbol_name: str) -> str: + """ + Create a unique symbol ID. + + Args: + file_path: Path to the file containing the symbol + symbol_name: Name of the symbol + + Returns: + Unique symbol identifier in format "relative_path::symbol_name" + """ + relative_path = self._get_relative_path(file_path) + return f"{relative_path}::{symbol_name}" + + def _get_relative_path(self, file_path: str) -> str: + """Convert absolute file path to relative path.""" + parts = file_path.replace('\\', '/').split('/') + + # Priority order: test > src (outermost project roots first) + for root_dir in ['test', 'src']: + if root_dir in parts: + root_index = parts.index(root_dir) + relative_parts = parts[root_index:] + return '/'.join(relative_parts) + + # Fallback: use just filename + return os.path.basename(file_path) + + def _extract_line_number(self, content: str, symbol_position: int) -> int: + """ + Extract line number from character position in content. + + Args: + content: File content + symbol_position: Character position in content + + Returns: + Line number (1-based) + """ + return content[:symbol_position].count('\n') + 1 + + def _get_file_name(self, file_path: str) -> str: + """Get just the filename from a full path.""" + return os.path.basename(file_path) + + def _safe_extract_text(self, content: str, start: int, end: int) -> str: + """Safely extract text from content, handling bounds.""" + try: + return content[start:end].strip() + except (IndexError, TypeError): + return "" diff --git a/src/code_index_mcp/indexing/strategies/fallback_strategy.py b/src/code_index_mcp/indexing/strategies/fallback_strategy.py new file mode 100644 index 0000000..21653bd --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/fallback_strategy.py @@ -0,0 +1,46 @@ +""" +Fallback parsing strategy for unsupported languages and file types. +""" + +import os +from typing import Dict, List, Tuple +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + + +class FallbackParsingStrategy(ParsingStrategy): + """Fallback parser for unsupported languages and file types.""" + + def __init__(self, language_name: str = "unknown"): + self.language_name = language_name + + def get_language_name(self) -> str: + return self.language_name + + def get_supported_extensions(self) -> List[str]: + return [] # Fallback supports any extension + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Basic parsing: extract file information without symbol parsing.""" + symbols = {} + + # For document files, we can at least index their existence + file_info = FileInfo( + language=self.language_name, + line_count=len(content.splitlines()), + symbols={"functions": [], "classes": []}, + imports=[] + ) + + # For document files (e.g. .md, .txt, .json), we can add a symbol representing the file itself + if self.language_name in ['markdown', 'text', 'json', 'yaml', 'xml', 'config', 'css', 'html']: + filename = os.path.basename(file_path) + symbol_id = self._create_symbol_id(file_path, f"file:{filename}") + symbols[symbol_id] = SymbolInfo( + type="file", + file=file_path, + line=1, + signature=f"{self.language_name} file: {filename}" + ) + + return symbols, file_info diff --git a/src/code_index_mcp/indexing/strategies/go_strategy.py b/src/code_index_mcp/indexing/strategies/go_strategy.py new file mode 100644 index 0000000..b3a95cb --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/go_strategy.py @@ -0,0 +1,164 @@ +""" +Go parsing strategy using regex patterns. +""" + +import re +from typing import Dict, List, Tuple, Optional +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + + +class GoParsingStrategy(ParsingStrategy): + """Go-specific parsing strategy using regex patterns.""" + + def get_language_name(self) -> str: + return "go" + + def get_supported_extensions(self) -> List[str]: + return ['.go'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse Go file using regex patterns.""" + symbols = {} + functions = [] + classes = [] # Go doesn't have classes, but we'll track structs/interfaces + imports = [] + package = None + + lines = content.splitlines() + + for i, line in enumerate(lines): + line = line.strip() + + # Package declaration + if line.startswith('package '): + package = line.split('package ')[1].strip() + + # Import statements + elif line.startswith('import '): + import_match = re.search(r'import\s+"([^"]+)"', line) + if import_match: + imports.append(import_match.group(1)) + + # Function declarations + elif line.startswith('func '): + func_match = re.match(r'func\s+(\w+)\s*\(', line) + if func_match: + func_name = func_match.group(1) + symbol_id = self._create_symbol_id(file_path, func_name) + symbols[symbol_id] = SymbolInfo( + type="function", + file=file_path, + line=i + 1, + signature=line + ) + functions.append(func_name) + + # Method declarations (func (receiver) methodName) + method_match = re.match(r'func\s+\([^)]+\)\s+(\w+)\s*\(', line) + if method_match: + method_name = method_match.group(1) + symbol_id = self._create_symbol_id(file_path, method_name) + symbols[symbol_id] = SymbolInfo( + type="method", + file=file_path, + line=i + 1, + signature=line + ) + functions.append(method_name) + + # Struct declarations + elif re.match(r'type\s+\w+\s+struct\s*\{', line): + struct_match = re.match(r'type\s+(\w+)\s+struct', line) + if struct_match: + struct_name = struct_match.group(1) + symbol_id = self._create_symbol_id(file_path, struct_name) + symbols[symbol_id] = SymbolInfo( + type="struct", + file=file_path, + line=i + 1 + ) + classes.append(struct_name) + + # Interface declarations + elif re.match(r'type\s+\w+\s+interface\s*\{', line): + interface_match = re.match(r'type\s+(\w+)\s+interface', line) + if interface_match: + interface_name = interface_match.group(1) + symbol_id = self._create_symbol_id(file_path, interface_name) + symbols[symbol_id] = SymbolInfo( + type="interface", + file=file_path, + line=i + 1 + ) + classes.append(interface_name) + + # Phase 2: Add call relationship analysis + self._analyze_go_calls(content, symbols, file_path) + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(lines), + symbols={"functions": functions, "classes": classes}, + imports=imports, + package=package + ) + + return symbols, file_info + + def _analyze_go_calls(self, content: str, symbols: Dict[str, SymbolInfo], file_path: str): + """Analyze Go function calls for relationships.""" + lines = content.splitlines() + current_function = None + is_function_declaration_line = False + + for i, line in enumerate(lines): + original_line = line + line = line.strip() + + # Track current function context + if line.startswith('func '): + func_name = self._extract_go_function_name(line) + if func_name: + current_function = self._create_symbol_id(file_path, func_name) + is_function_declaration_line = True + else: + is_function_declaration_line = False + + # Find function calls: functionName() or obj.methodName() + # Skip the function declaration line itself to avoid false self-calls + if current_function and not is_function_declaration_line and ('(' in line and ')' in line): + called_functions = self._extract_go_called_functions(line) + for called_func in called_functions: + # Find the called function in symbols and add relationship + for symbol_id, symbol_info in symbols.items(): + if called_func in symbol_id.split("::")[-1]: + if current_function not in symbol_info.called_by: + symbol_info.called_by.append(current_function) + + def _extract_go_function_name(self, line: str) -> Optional[str]: + """Extract function name from Go function declaration.""" + try: + # func functionName(...) or func (receiver) methodName(...) + match = re.match(r'func\s+(?:\([^)]*\)\s+)?(\w+)\s*\(', line) + if match: + return match.group(1) + except: + pass + return None + + def _extract_go_called_functions(self, line: str) -> List[str]: + """Extract function names that are being called in this line.""" + called_functions = [] + + # Find patterns like: functionName( or obj.methodName( + patterns = [ + r'(\w+)\s*\(', # functionName( + r'\.(\w+)\s*\(', # .methodName( + ] + + for pattern in patterns: + matches = re.findall(pattern, line) + called_functions.extend(matches) + + return called_functions diff --git a/src/code_index_mcp/indexing/strategies/java_strategy.py b/src/code_index_mcp/indexing/strategies/java_strategy.py new file mode 100644 index 0000000..af2ff8e --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/java_strategy.py @@ -0,0 +1,209 @@ +""" +Java parsing strategy using tree-sitter - Optimized single-pass version. +""" + +import logging +from typing import Dict, List, Tuple, Optional, Set +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + +logger = logging.getLogger(__name__) + +import tree_sitter +from tree_sitter_java import language + + +class JavaParsingStrategy(ParsingStrategy): + """Java-specific parsing strategy - Single Pass Optimized.""" + + def __init__(self): + self.java_language = tree_sitter.Language(language()) + + def get_language_name(self) -> str: + return "java" + + def get_supported_extensions(self) -> List[str]: + return ['.java'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse Java file using tree-sitter with single-pass optimization.""" + symbols = {} + functions = [] + classes = [] + imports = [] + package = None + + # Symbol lookup index for O(1) access + symbol_lookup = {} # name -> symbol_id mapping + + parser = tree_sitter.Parser(self.java_language) + + try: + tree = parser.parse(content.encode('utf8')) + + # Extract package info first + for node in tree.root_node.children: + if node.type == 'package_declaration': + package = self._extract_java_package(node, content) + break + + # Single-pass traversal that handles everything + context = TraversalContext( + content=content, + file_path=file_path, + symbols=symbols, + functions=functions, + classes=classes, + imports=imports, + symbol_lookup=symbol_lookup + ) + + self._traverse_node_single_pass(tree.root_node, context) + + except Exception as e: + logger.warning(f"Error parsing Java file {file_path}: {e}") + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(content.splitlines()), + symbols={"functions": functions, "classes": classes}, + imports=imports, + package=package + ) + + return symbols, file_info + + def _traverse_node_single_pass(self, node, context: 'TraversalContext', + current_class: Optional[str] = None, + current_method: Optional[str] = None): + """Single-pass traversal that extracts symbols and analyzes calls.""" + + # Handle class declarations + if node.type == 'class_declaration': + name = self._get_java_class_name(node, context.content) + if name: + symbol_id = self._create_symbol_id(context.file_path, name) + symbol_info = SymbolInfo( + type="class", + file=context.file_path, + line=node.start_point[0] + 1 + ) + context.symbols[symbol_id] = symbol_info + context.symbol_lookup[name] = symbol_id + context.classes.append(name) + + # Traverse class body with updated context + for child in node.children: + self._traverse_node_single_pass(child, context, current_class=name, current_method=current_method) + return + + # Handle method declarations + elif node.type == 'method_declaration': + name = self._get_java_method_name(node, context.content) + if name: + # Build full method name with class context + if current_class: + full_name = f"{current_class}.{name}" + else: + full_name = name + + symbol_id = self._create_symbol_id(context.file_path, full_name) + symbol_info = SymbolInfo( + type="method", + file=context.file_path, + line=node.start_point[0] + 1, + signature=self._get_java_method_signature(node, context.content) + ) + context.symbols[symbol_id] = symbol_info + context.symbol_lookup[full_name] = symbol_id + context.symbol_lookup[name] = symbol_id # Also index by method name alone + context.functions.append(full_name) + + # Traverse method body with updated context + for child in node.children: + self._traverse_node_single_pass(child, context, current_class=current_class, + current_method=symbol_id) + return + + # Handle method invocations (calls) + elif node.type == 'method_invocation': + if current_method: + called_method = self._get_called_method_name(node, context.content) + if called_method: + # Use O(1) lookup instead of O(n) iteration + if called_method in context.symbol_lookup: + symbol_id = context.symbol_lookup[called_method] + symbol_info = context.symbols[symbol_id] + if current_method not in symbol_info.called_by: + symbol_info.called_by.append(current_method) + else: + # Try to find method with class prefix + for name, sid in context.symbol_lookup.items(): + if name.endswith(f".{called_method}"): + symbol_info = context.symbols[sid] + if current_method not in symbol_info.called_by: + symbol_info.called_by.append(current_method) + break + + # Handle import declarations + elif node.type == 'import_declaration': + import_text = context.content[node.start_byte:node.end_byte] + # Extract the import path (remove 'import' keyword and semicolon) + import_path = import_text.replace('import', '').replace(';', '').strip() + if import_path: + context.imports.append(import_path) + + # Continue traversing children for other node types + for child in node.children: + self._traverse_node_single_pass(child, context, current_class=current_class, + current_method=current_method) + + def _get_java_class_name(self, node, content: str) -> Optional[str]: + for child in node.children: + if child.type == 'identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_java_method_name(self, node, content: str) -> Optional[str]: + for child in node.children: + if child.type == 'identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_java_method_signature(self, node, content: str) -> str: + return content[node.start_byte:node.end_byte].split('\n')[0].strip() + + def _extract_java_package(self, node, content: str) -> Optional[str]: + for child in node.children: + if child.type == 'scoped_identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_called_method_name(self, node, content: str) -> Optional[str]: + """Extract called method name from method invocation node.""" + # Handle obj.method() pattern - look for the method name after the dot + for child in node.children: + if child.type == 'field_access': + # For field_access nodes, get the field (method) name + for subchild in child.children: + if subchild.type == 'identifier' and subchild.start_byte > child.start_byte: + # Get the rightmost identifier (the method name) + return content[subchild.start_byte:subchild.end_byte] + elif child.type == 'identifier': + # Direct method call without object reference + return content[child.start_byte:child.end_byte] + return None + + +class TraversalContext: + """Context object to pass state during single-pass traversal.""" + + def __init__(self, content: str, file_path: str, symbols: Dict, + functions: List, classes: List, imports: List, symbol_lookup: Dict): + self.content = content + self.file_path = file_path + self.symbols = symbols + self.functions = functions + self.classes = classes + self.imports = imports + self.symbol_lookup = symbol_lookup \ No newline at end of file diff --git a/src/code_index_mcp/indexing/strategies/javascript_strategy.py b/src/code_index_mcp/indexing/strategies/javascript_strategy.py new file mode 100644 index 0000000..63c78f7 --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/javascript_strategy.py @@ -0,0 +1,154 @@ +""" +JavaScript parsing strategy using tree-sitter. +""" + +import logging +from typing import Dict, List, Tuple, Optional +import tree_sitter +from tree_sitter_javascript import language +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + +logger = logging.getLogger(__name__) + + +class JavaScriptParsingStrategy(ParsingStrategy): + """JavaScript-specific parsing strategy using tree-sitter.""" + + def __init__(self): + self.js_language = tree_sitter.Language(language()) + + def get_language_name(self) -> str: + return "javascript" + + def get_supported_extensions(self) -> List[str]: + return ['.js', '.jsx', '.mjs', '.cjs'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse JavaScript file using tree-sitter.""" + symbols = {} + functions = [] + classes = [] + imports = [] + exports = [] + + parser = tree_sitter.Parser(self.js_language) + tree = parser.parse(content.encode('utf8')) + self._traverse_js_node(tree.root_node, content, file_path, symbols, functions, classes, imports, exports) + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(content.splitlines()), + symbols={"functions": functions, "classes": classes}, + imports=imports, + exports=exports + ) + + return symbols, file_info + + def _traverse_js_node(self, node, content: str, file_path: str, symbols: Dict[str, SymbolInfo], + functions: List[str], classes: List[str], imports: List[str], exports: List[str]): + """Traverse JavaScript AST node.""" + if node.type == 'function_declaration': + name = self._get_function_name(node, content) + if name: + symbol_id = self._create_symbol_id(file_path, name) + signature = self._get_js_function_signature(node, content) + symbols[symbol_id] = SymbolInfo( + type="function", + file=file_path, + line=node.start_point[0] + 1, + signature=signature + ) + functions.append(name) + + # Handle arrow functions and function expressions in lexical declarations (const/let) + elif node.type in ['lexical_declaration', 'variable_declaration']: + # Look for const/let/var name = arrow_function or function_expression + for child in node.children: + if child.type == 'variable_declarator': + name_node = None + value_node = None + for declarator_child in child.children: + if declarator_child.type == 'identifier': + name_node = declarator_child + elif declarator_child.type in ['arrow_function', 'function_expression', 'function']: + value_node = declarator_child + + if name_node and value_node: + name = content[name_node.start_byte:name_node.end_byte] + symbol_id = self._create_symbol_id(file_path, name) + # Create signature from the declaration + signature = content[child.start_byte:child.end_byte].split('\n')[0].strip() + symbols[symbol_id] = SymbolInfo( + type="function", + file=file_path, + line=child.start_point[0] + 1, # Use child position, not parent + signature=signature + ) + functions.append(name) + + elif node.type == 'class_declaration': + name = self._get_class_name(node, content) + if name: + symbol_id = self._create_symbol_id(file_path, name) + symbols[symbol_id] = SymbolInfo( + type="class", + file=file_path, + line=node.start_point[0] + 1 + ) + classes.append(name) + + elif node.type == 'method_definition': + method_name = self._get_method_name(node, content) + class_name = self._find_parent_class(node, content) + if method_name and class_name: + full_name = f"{class_name}.{method_name}" + symbol_id = self._create_symbol_id(file_path, full_name) + signature = self._get_js_function_signature(node, content) + symbols[symbol_id] = SymbolInfo( + type="method", + file=file_path, + line=node.start_point[0] + 1, + signature=signature + ) + # Add method to functions list for consistency + functions.append(full_name) + + # Continue traversing children + for child in node.children: + self._traverse_js_node(child, content, file_path, symbols, functions, classes, imports, exports) + + def _get_function_name(self, node, content: str) -> Optional[str]: + """Extract function name from tree-sitter node.""" + for child in node.children: + if child.type == 'identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_class_name(self, node, content: str) -> Optional[str]: + """Extract class name from tree-sitter node.""" + for child in node.children: + if child.type == 'identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_method_name(self, node, content: str) -> Optional[str]: + """Extract method name from tree-sitter node.""" + for child in node.children: + if child.type == 'property_identifier': + return content[child.start_byte:child.end_byte] + return None + + def _find_parent_class(self, node, content: str) -> Optional[str]: + """Find the parent class of a method.""" + parent = node.parent + while parent: + if parent.type == 'class_declaration': + return self._get_class_name(parent, content) + parent = parent.parent + return None + + def _get_js_function_signature(self, node, content: str) -> str: + """Extract JavaScript function signature.""" + return content[node.start_byte:node.end_byte].split('\n')[0].strip() diff --git a/src/code_index_mcp/indexing/strategies/objective_c_strategy.py b/src/code_index_mcp/indexing/strategies/objective_c_strategy.py new file mode 100644 index 0000000..4226f1c --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/objective_c_strategy.py @@ -0,0 +1,154 @@ +""" +Objective-C parsing strategy using regex patterns. +""" + +import re +from typing import Dict, List, Tuple, Optional +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + + +class ObjectiveCParsingStrategy(ParsingStrategy): + """Objective-C parsing strategy using regex patterns.""" + + def get_language_name(self) -> str: + return "objective-c" + + def get_supported_extensions(self) -> List[str]: + return ['.m', '.mm'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse Objective-C file using regex patterns.""" + symbols = {} + functions = [] + classes = [] + imports = [] + + lines = content.splitlines() + current_class = None + + for i, line in enumerate(lines): + line = line.strip() + + # Import statements + if line.startswith('#import ') or line.startswith('#include '): + import_match = re.search(r'#(?:import|include)\s+[<"]([^>"]+)[>"]', line) + if import_match: + imports.append(import_match.group(1)) + + # Interface declarations + elif line.startswith('@interface '): + interface_match = re.match(r'@interface\s+(\w+)', line) + if interface_match: + class_name = interface_match.group(1) + current_class = class_name + symbol_id = self._create_symbol_id(file_path, class_name) + symbols[symbol_id] = SymbolInfo( + type="class", + file=file_path, + line=i + 1 + ) + classes.append(class_name) + + # Implementation declarations + elif line.startswith('@implementation '): + impl_match = re.match(r'@implementation\s+(\w+)', line) + if impl_match: + current_class = impl_match.group(1) + + # Method declarations + elif line.startswith(('- (', '+ (')): + method_match = re.search(r'[+-]\s*\([^)]+\)\s*(\w+)', line) + if method_match: + method_name = method_match.group(1) + full_name = f"{current_class}.{method_name}" if current_class else method_name + symbol_id = self._create_symbol_id(file_path, full_name) + symbols[symbol_id] = SymbolInfo( + type="method", + file=file_path, + line=i + 1, + signature=line + ) + functions.append(full_name) + + # C function declarations + elif re.match(r'\w+.*\s+\w+\s*\([^)]*\)\s*\{?', line) and not line.startswith(('if', 'for', 'while')): + func_match = re.search(r'\s(\w+)\s*\([^)]*\)', line) + if func_match: + func_name = func_match.group(1) + symbol_id = self._create_symbol_id(file_path, func_name) + symbols[symbol_id] = SymbolInfo( + type="function", + file=file_path, + line=i + 1, + signature=line + ) + functions.append(func_name) + + # End of class + elif line == '@end': + current_class = None + + # Phase 2: Add call relationship analysis + self._analyze_objc_calls(content, symbols, file_path) + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(lines), + symbols={"functions": functions, "classes": classes}, + imports=imports + ) + + return symbols, file_info + + def _analyze_objc_calls(self, content: str, symbols: Dict[str, SymbolInfo], file_path: str): + """Analyze Objective-C method calls for relationships.""" + lines = content.splitlines() + current_function = None + + for i, line in enumerate(lines): + original_line = line + line = line.strip() + + # Track current method context + if line.startswith('- (') or line.startswith('+ ('): + func_name = self._extract_objc_method_name(line) + if func_name: + current_function = self._create_symbol_id(file_path, func_name) + + # Find method calls: [obj methodName] or functionName() + if current_function and ('[' in line and ']' in line or ('(' in line and ')' in line)): + called_functions = self._extract_objc_called_functions(line) + for called_func in called_functions: + # Find the called function in symbols and add relationship + for symbol_id, symbol_info in symbols.items(): + if called_func in symbol_id.split("::")[-1]: + if current_function not in symbol_info.called_by: + symbol_info.called_by.append(current_function) + + def _extract_objc_method_name(self, line: str) -> Optional[str]: + """Extract method name from Objective-C method declaration.""" + try: + # - (returnType)methodName:(params) or + (returnType)methodName + match = re.search(r'[+-]\s*\([^)]*\)\s*(\w+)', line) + if match: + return match.group(1) + except: + pass + return None + + def _extract_objc_called_functions(self, line: str) -> List[str]: + """Extract method names that are being called in this line.""" + called_functions = [] + + # Find patterns like: [obj methodName] or functionName( + patterns = [ + r'\[\s*\w+\s+(\w+)\s*[\]:]', # [obj methodName] + r'(\w+)\s*\(', # functionName( + ] + + for pattern in patterns: + matches = re.findall(pattern, line) + called_functions.extend(matches) + + return called_functions diff --git a/src/code_index_mcp/indexing/strategies/python_strategy.py b/src/code_index_mcp/indexing/strategies/python_strategy.py new file mode 100644 index 0000000..a09d00c --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/python_strategy.py @@ -0,0 +1,264 @@ +""" +Python parsing strategy using AST - Optimized single-pass version. +""" + +import ast +import logging +from typing import Dict, List, Tuple, Optional, Set +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + +logger = logging.getLogger(__name__) + + +class PythonParsingStrategy(ParsingStrategy): + """Python-specific parsing strategy using Python's built-in AST - Single Pass Optimized.""" + + def get_language_name(self) -> str: + return "python" + + def get_supported_extensions(self) -> List[str]: + return ['.py', '.pyw'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse Python file using AST with single-pass optimization.""" + symbols = {} + functions = [] + classes = [] + imports = [] + + try: + tree = ast.parse(content) + # Single-pass visitor that handles everything at once + visitor = SinglePassVisitor(symbols, functions, classes, imports, file_path) + visitor.visit(tree) + except SyntaxError as e: + logger.warning(f"Syntax error in Python file {file_path}: {e}") + except Exception as e: + logger.warning(f"Error parsing Python file {file_path}: {e}") + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(content.splitlines()), + symbols={"functions": functions, "classes": classes}, + imports=imports + ) + + return symbols, file_info + + +class SinglePassVisitor(ast.NodeVisitor): + """Single-pass AST visitor that extracts symbols and analyzes calls in one traversal.""" + + def __init__(self, symbols: Dict[str, SymbolInfo], functions: List[str], + classes: List[str], imports: List[str], file_path: str): + self.symbols = symbols + self.functions = functions + self.classes = classes + self.imports = imports + self.file_path = file_path + + # Context tracking for call analysis + self.current_function_stack = [] + self.current_class = None + + # Symbol lookup index for O(1) access + self.symbol_lookup = {} # name -> symbol_id mapping for fast lookups + + # Track processed nodes to avoid duplicates + self.processed_nodes: Set[int] = set() + + def visit_ClassDef(self, node: ast.ClassDef): + """Visit class definition - extract symbol and analyze in single pass.""" + class_name = node.name + symbol_id = self._create_symbol_id(self.file_path, class_name) + + # Extract docstring + docstring = ast.get_docstring(node) + + # Create symbol info + symbol_info = SymbolInfo( + type="class", + file=self.file_path, + line=node.lineno, + docstring=docstring + ) + + # Store in symbols and lookup index + self.symbols[symbol_id] = symbol_info + self.symbol_lookup[class_name] = symbol_id + self.classes.append(class_name) + + # Track class context for method processing + old_class = self.current_class + self.current_class = class_name + + # Process class body (including methods) + for child in node.body: + if isinstance(child, ast.FunctionDef): + self._handle_method(child, class_name) + else: + # Visit other nodes in class body + self.visit(child) + + # Restore previous class context + self.current_class = old_class + + def visit_FunctionDef(self, node: ast.FunctionDef): + """Visit function definition - extract symbol and track context.""" + # Skip if this is a method (already handled by ClassDef) + if self.current_class: + return + + # Skip if already processed + node_id = id(node) + if node_id in self.processed_nodes: + return + self.processed_nodes.add(node_id) + + func_name = node.name + symbol_id = self._create_symbol_id(self.file_path, func_name) + + # Extract function signature and docstring + signature = self._extract_function_signature(node) + docstring = ast.get_docstring(node) + + # Create symbol info + symbol_info = SymbolInfo( + type="function", + file=self.file_path, + line=node.lineno, + signature=signature, + docstring=docstring + ) + + # Store in symbols and lookup index + self.symbols[symbol_id] = symbol_info + self.symbol_lookup[func_name] = symbol_id + self.functions.append(func_name) + + # Track function context for call analysis + function_id = f"{self.file_path}::{func_name}" + self.current_function_stack.append(function_id) + + # Visit function body to analyze calls + self.generic_visit(node) + + # Pop function from stack + self.current_function_stack.pop() + + def _handle_method(self, node: ast.FunctionDef, class_name: str): + """Handle method definition within a class.""" + method_name = f"{class_name}.{node.name}" + method_symbol_id = self._create_symbol_id(self.file_path, method_name) + + method_signature = self._extract_function_signature(node) + method_docstring = ast.get_docstring(node) + + # Create symbol info + symbol_info = SymbolInfo( + type="method", + file=self.file_path, + line=node.lineno, + signature=method_signature, + docstring=method_docstring + ) + + # Store in symbols and lookup index + self.symbols[method_symbol_id] = symbol_info + self.symbol_lookup[method_name] = method_symbol_id + self.symbol_lookup[node.name] = method_symbol_id # Also index by method name alone + self.functions.append(method_name) + + # Track method context for call analysis + function_id = f"{self.file_path}::{method_name}" + self.current_function_stack.append(function_id) + + # Visit method body to analyze calls + for child in node.body: + self.visit(child) + + # Pop method from stack + self.current_function_stack.pop() + + def visit_Import(self, node: ast.Import): + """Handle import statements.""" + for alias in node.names: + self.imports.append(alias.name) + self.generic_visit(node) + + def visit_ImportFrom(self, node: ast.ImportFrom): + """Handle from...import statements.""" + if node.module: + for alias in node.names: + self.imports.append(f"{node.module}.{alias.name}") + self.generic_visit(node) + + def visit_Call(self, node: ast.Call): + """Visit function call and record relationship using O(1) lookup.""" + if not self.current_function_stack: + self.generic_visit(node) + return + + try: + # Get the function name being called + called_function = None + + if isinstance(node.func, ast.Name): + # Direct function call: function_name() + called_function = node.func.id + elif isinstance(node.func, ast.Attribute): + # Method call: obj.method() or module.function() + called_function = node.func.attr + + if called_function: + # Get the current calling function + caller_function = self.current_function_stack[-1] + + # Use O(1) lookup instead of O(n) iteration + # First try exact match + if called_function in self.symbol_lookup: + symbol_id = self.symbol_lookup[called_function] + symbol_info = self.symbols[symbol_id] + if symbol_info.type in ["function", "method"]: + if caller_function not in symbol_info.called_by: + symbol_info.called_by.append(caller_function) + else: + # Try method name match for any class + for name, symbol_id in self.symbol_lookup.items(): + if name.endswith(f".{called_function}"): + symbol_info = self.symbols[symbol_id] + if symbol_info.type in ["function", "method"]: + if caller_function not in symbol_info.called_by: + symbol_info.called_by.append(caller_function) + break + except Exception: + # Silently handle parsing errors for complex call patterns + pass + + # Continue visiting child nodes + self.generic_visit(node) + + def _create_symbol_id(self, file_path: str, symbol_name: str) -> str: + """Create a unique symbol ID.""" + return f"{file_path}::{symbol_name}" + + def _extract_function_signature(self, node: ast.FunctionDef) -> str: + """Extract function signature from AST node.""" + # Build basic signature + args = [] + + # Regular arguments + for arg in node.args.args: + args.append(arg.arg) + + # Varargs (*args) + if node.args.vararg: + args.append(f"*{node.args.vararg.arg}") + + # Keyword arguments (**kwargs) + if node.args.kwarg: + args.append(f"**{node.args.kwarg.arg}") + + signature = f"def {node.name}({', '.join(args)}):" + return signature \ No newline at end of file diff --git a/src/code_index_mcp/indexing/strategies/strategy_factory.py b/src/code_index_mcp/indexing/strategies/strategy_factory.py new file mode 100644 index 0000000..c7116d9 --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/strategy_factory.py @@ -0,0 +1,201 @@ +""" +Strategy factory for creating appropriate parsing strategies. +""" + +import threading +from typing import Dict, List +from .base_strategy import ParsingStrategy +from .python_strategy import PythonParsingStrategy +from .javascript_strategy import JavaScriptParsingStrategy +from .typescript_strategy import TypeScriptParsingStrategy +from .java_strategy import JavaParsingStrategy +from .go_strategy import GoParsingStrategy +from .objective_c_strategy import ObjectiveCParsingStrategy +from .zig_strategy import ZigParsingStrategy +from .fallback_strategy import FallbackParsingStrategy + + +class StrategyFactory: + """Factory for creating appropriate parsing strategies.""" + + def __init__(self): + # Initialize all strategies with thread safety + self._strategies: Dict[str, ParsingStrategy] = {} + self._initialized = False + self._lock = threading.RLock() + self._initialize_strategies() + + # File type mappings for fallback parser + self._file_type_mappings = { + # Web and markup + '.html': 'html', '.htm': 'html', + '.css': 'css', '.scss': 'css', '.sass': 'css', + '.less': 'css', '.stylus': 'css', '.styl': 'css', + '.md': 'markdown', '.mdx': 'markdown', + '.json': 'json', '.jsonc': 'json', + '.xml': 'xml', + '.yml': 'yaml', '.yaml': 'yaml', + + # Frontend frameworks + '.vue': 'vue', + '.svelte': 'svelte', + '.astro': 'astro', + + # Template engines + '.hbs': 'handlebars', '.handlebars': 'handlebars', + '.ejs': 'ejs', + '.pug': 'pug', + + # Database and SQL + '.sql': 'sql', '.ddl': 'sql', '.dml': 'sql', + '.mysql': 'sql', '.postgresql': 'sql', '.psql': 'sql', + '.sqlite': 'sql', '.mssql': 'sql', '.oracle': 'sql', + '.ora': 'sql', '.db2': 'sql', + '.proc': 'sql', '.procedure': 'sql', + '.func': 'sql', '.function': 'sql', + '.view': 'sql', '.trigger': 'sql', '.index': 'sql', + '.migration': 'sql', '.seed': 'sql', '.fixture': 'sql', + '.schema': 'sql', + '.cql': 'sql', '.cypher': 'sql', '.sparql': 'sql', + '.gql': 'graphql', + '.liquibase': 'sql', '.flyway': 'sql', + + # Config and text files + '.txt': 'text', + '.ini': 'config', '.cfg': 'config', '.conf': 'config', + '.toml': 'config', + '.properties': 'config', + '.env': 'config', + '.gitignore': 'config', + '.dockerignore': 'config', + '.editorconfig': 'config', + + # Other programming languages (will use fallback) + '.c': 'c', '.cpp': 'cpp', '.h': 'h', '.hpp': 'hpp', + '.cxx': 'cpp', '.cc': 'cpp', '.hxx': 'hpp', '.hh': 'hpp', + '.cs': 'csharp', + '.rb': 'ruby', + '.php': 'php', + '.swift': 'swift', + '.kt': 'kotlin', '.kts': 'kotlin', + '.rs': 'rust', + '.scala': 'scala', + '.sh': 'shell', '.bash': 'shell', '.zsh': 'shell', + '.ps1': 'powershell', + '.bat': 'batch', '.cmd': 'batch', + '.r': 'r', '.R': 'r', + '.pl': 'perl', '.pm': 'perl', + '.lua': 'lua', + '.dart': 'dart', + '.hs': 'haskell', + '.ml': 'ocaml', '.mli': 'ocaml', + '.fs': 'fsharp', '.fsx': 'fsharp', + '.clj': 'clojure', '.cljs': 'clojure', + '.vim': 'vim', + } + + def _initialize_strategies(self): + """Initialize all parsing strategies with thread safety.""" + with self._lock: + if self._initialized: + return + + try: + # Python + python_strategy = PythonParsingStrategy() + for ext in python_strategy.get_supported_extensions(): + self._strategies[ext] = python_strategy + + # JavaScript + js_strategy = JavaScriptParsingStrategy() + for ext in js_strategy.get_supported_extensions(): + self._strategies[ext] = js_strategy + + # TypeScript + ts_strategy = TypeScriptParsingStrategy() + for ext in ts_strategy.get_supported_extensions(): + self._strategies[ext] = ts_strategy + + # Java + java_strategy = JavaParsingStrategy() + for ext in java_strategy.get_supported_extensions(): + self._strategies[ext] = java_strategy + + # Go + go_strategy = GoParsingStrategy() + for ext in go_strategy.get_supported_extensions(): + self._strategies[ext] = go_strategy + + # Objective-C + objc_strategy = ObjectiveCParsingStrategy() + for ext in objc_strategy.get_supported_extensions(): + self._strategies[ext] = objc_strategy + + # Zig + zig_strategy = ZigParsingStrategy() + for ext in zig_strategy.get_supported_extensions(): + self._strategies[ext] = zig_strategy + + self._initialized = True + + except Exception as e: + # Reset state on failure to allow retry + self._strategies.clear() + self._initialized = False + raise e + + def get_strategy(self, file_extension: str) -> ParsingStrategy: + """ + Get appropriate strategy for file extension. + + Args: + file_extension: File extension (e.g., '.py', '.js') + + Returns: + Appropriate parsing strategy + """ + with self._lock: + # Ensure initialization is complete + if not self._initialized: + self._initialize_strategies() + + # Check for specialized strategies first + if file_extension in self._strategies: + return self._strategies[file_extension] + + # Use fallback strategy with appropriate language name + language_name = self._file_type_mappings.get(file_extension, 'unknown') + return FallbackParsingStrategy(language_name) + + def get_all_supported_extensions(self) -> List[str]: + """Get all supported extensions across strategies.""" + specialized = list(self._strategies.keys()) + fallback = list(self._file_type_mappings.keys()) + return specialized + fallback + + def get_specialized_extensions(self) -> List[str]: + """Get extensions that have specialized parsers.""" + return list(self._strategies.keys()) + + def get_fallback_extensions(self) -> List[str]: + """Get extensions that use fallback parsing.""" + return list(self._file_type_mappings.keys()) + + def get_strategy_info(self) -> Dict[str, List[str]]: + """Get information about available strategies.""" + info = {} + + # Group extensions by strategy type + for ext, strategy in self._strategies.items(): + strategy_name = strategy.get_language_name() + if strategy_name not in info: + info[strategy_name] = [] + info[strategy_name].append(ext) + + # Add fallback info + fallback_languages = set(self._file_type_mappings.values()) + for lang in fallback_languages: + extensions = [ext for ext, mapped_lang in self._file_type_mappings.items() if mapped_lang == lang] + info[f"fallback_{lang}"] = extensions + + return info diff --git a/src/code_index_mcp/indexing/strategies/typescript_strategy.py b/src/code_index_mcp/indexing/strategies/typescript_strategy.py new file mode 100644 index 0000000..05ed04d --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/typescript_strategy.py @@ -0,0 +1,251 @@ +""" +TypeScript parsing strategy using tree-sitter - Optimized single-pass version. +""" + +import logging +from typing import Dict, List, Tuple, Optional, Set +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + +logger = logging.getLogger(__name__) + +import tree_sitter +from tree_sitter_typescript import language_typescript + + +class TypeScriptParsingStrategy(ParsingStrategy): + """TypeScript-specific parsing strategy using tree-sitter - Single Pass Optimized.""" + + def __init__(self): + self.ts_language = tree_sitter.Language(language_typescript()) + + def get_language_name(self) -> str: + return "typescript" + + def get_supported_extensions(self) -> List[str]: + return ['.ts', '.tsx'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse TypeScript file using tree-sitter with single-pass optimization.""" + symbols = {} + functions = [] + classes = [] + imports = [] + exports = [] + + # Symbol lookup index for O(1) access + symbol_lookup = {} # name -> symbol_id mapping + + parser = tree_sitter.Parser(self.ts_language) + tree = parser.parse(content.encode('utf8')) + + # Single-pass traversal that handles everything + context = TraversalContext( + content=content, + file_path=file_path, + symbols=symbols, + functions=functions, + classes=classes, + imports=imports, + exports=exports, + symbol_lookup=symbol_lookup + ) + + self._traverse_node_single_pass(tree.root_node, context) + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(content.splitlines()), + symbols={"functions": functions, "classes": classes}, + imports=imports, + exports=exports + ) + + return symbols, file_info + + def _traverse_node_single_pass(self, node, context: 'TraversalContext', + current_function: Optional[str] = None, + current_class: Optional[str] = None): + """Single-pass traversal that extracts symbols and analyzes calls.""" + + # Handle function declarations + if node.type == 'function_declaration': + name = self._get_function_name(node, context.content) + if name: + symbol_id = self._create_symbol_id(context.file_path, name) + signature = self._get_ts_function_signature(node, context.content) + symbol_info = SymbolInfo( + type="function", + file=context.file_path, + line=node.start_point[0] + 1, + signature=signature + ) + context.symbols[symbol_id] = symbol_info + context.symbol_lookup[name] = symbol_id + context.functions.append(name) + + # Traverse function body with updated context + func_context = f"{context.file_path}::{name}" + for child in node.children: + self._traverse_node_single_pass(child, context, current_function=func_context, + current_class=current_class) + return + + # Handle class declarations + elif node.type == 'class_declaration': + name = self._get_class_name(node, context.content) + if name: + symbol_id = self._create_symbol_id(context.file_path, name) + symbol_info = SymbolInfo( + type="class", + file=context.file_path, + line=node.start_point[0] + 1 + ) + context.symbols[symbol_id] = symbol_info + context.symbol_lookup[name] = symbol_id + context.classes.append(name) + + # Traverse class body with updated context + for child in node.children: + self._traverse_node_single_pass(child, context, current_function=current_function, + current_class=name) + return + + # Handle interface declarations + elif node.type == 'interface_declaration': + name = self._get_interface_name(node, context.content) + if name: + symbol_id = self._create_symbol_id(context.file_path, name) + symbol_info = SymbolInfo( + type="interface", + file=context.file_path, + line=node.start_point[0] + 1 + ) + context.symbols[symbol_id] = symbol_info + context.symbol_lookup[name] = symbol_id + context.classes.append(name) # Group interfaces with classes + + # Traverse interface body with updated context + for child in node.children: + self._traverse_node_single_pass(child, context, current_function=current_function, + current_class=name) + return + + # Handle method definitions + elif node.type == 'method_definition': + method_name = self._get_method_name(node, context.content) + if method_name and current_class: + full_name = f"{current_class}.{method_name}" + symbol_id = self._create_symbol_id(context.file_path, full_name) + signature = self._get_ts_function_signature(node, context.content) + symbol_info = SymbolInfo( + type="method", + file=context.file_path, + line=node.start_point[0] + 1, + signature=signature + ) + context.symbols[symbol_id] = symbol_info + context.symbol_lookup[full_name] = symbol_id + context.symbol_lookup[method_name] = symbol_id # Also index by method name alone + context.functions.append(full_name) + + # Traverse method body with updated context + method_context = f"{context.file_path}::{full_name}" + for child in node.children: + self._traverse_node_single_pass(child, context, current_function=method_context, + current_class=current_class) + return + + # Handle function calls + elif node.type == 'call_expression' and current_function: + # Extract the function being called + called_function = None + if node.children: + func_node = node.children[0] + if func_node.type == 'identifier': + # Direct function call + called_function = context.content[func_node.start_byte:func_node.end_byte] + elif func_node.type == 'member_expression': + # Method call (obj.method or this.method) + for child in func_node.children: + if child.type == 'property_identifier': + called_function = context.content[child.start_byte:child.end_byte] + break + + # Add relationship using O(1) lookup + if called_function: + if called_function in context.symbol_lookup: + symbol_id = context.symbol_lookup[called_function] + symbol_info = context.symbols[symbol_id] + if current_function not in symbol_info.called_by: + symbol_info.called_by.append(current_function) + else: + # Try to find method with class prefix + for name, sid in context.symbol_lookup.items(): + if name.endswith(f".{called_function}"): + symbol_info = context.symbols[sid] + if current_function not in symbol_info.called_by: + symbol_info.called_by.append(current_function) + break + + # Handle import declarations + elif node.type == 'import_statement': + import_text = context.content[node.start_byte:node.end_byte] + context.imports.append(import_text) + + # Handle export declarations + elif node.type in ['export_statement', 'export_default_declaration']: + export_text = context.content[node.start_byte:node.end_byte] + context.exports.append(export_text) + + # Continue traversing children for other node types + for child in node.children: + self._traverse_node_single_pass(child, context, current_function=current_function, + current_class=current_class) + + def _get_function_name(self, node, content: str) -> Optional[str]: + """Extract function name from tree-sitter node.""" + for child in node.children: + if child.type == 'identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_class_name(self, node, content: str) -> Optional[str]: + """Extract class name from tree-sitter node.""" + for child in node.children: + if child.type == 'identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_interface_name(self, node, content: str) -> Optional[str]: + """Extract interface name from tree-sitter node.""" + for child in node.children: + if child.type == 'type_identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_method_name(self, node, content: str) -> Optional[str]: + """Extract method name from tree-sitter node.""" + for child in node.children: + if child.type == 'property_identifier': + return content[child.start_byte:child.end_byte] + return None + + def _get_ts_function_signature(self, node, content: str) -> str: + """Extract TypeScript function signature.""" + return content[node.start_byte:node.end_byte].split('\n')[0].strip() + + +class TraversalContext: + """Context object to pass state during single-pass traversal.""" + + def __init__(self, content: str, file_path: str, symbols: Dict, + functions: List, classes: List, imports: List, exports: List, symbol_lookup: Dict): + self.content = content + self.file_path = file_path + self.symbols = symbols + self.functions = functions + self.classes = classes + self.imports = imports + self.exports = exports + self.symbol_lookup = symbol_lookup \ No newline at end of file diff --git a/src/code_index_mcp/indexing/strategies/zig_strategy.py b/src/code_index_mcp/indexing/strategies/zig_strategy.py new file mode 100644 index 0000000..658ca2b --- /dev/null +++ b/src/code_index_mcp/indexing/strategies/zig_strategy.py @@ -0,0 +1,99 @@ +""" +Zig parsing strategy using tree-sitter. +""" + +import logging +from typing import Dict, List, Tuple, Optional +from .base_strategy import ParsingStrategy +from ..models import SymbolInfo, FileInfo + +logger = logging.getLogger(__name__) + +import tree_sitter +from tree_sitter_zig import language + + +class ZigParsingStrategy(ParsingStrategy): + """Zig parsing strategy using tree-sitter.""" + + def __init__(self): + self.zig_language = tree_sitter.Language(language()) + + def get_language_name(self) -> str: + return "zig" + + def get_supported_extensions(self) -> List[str]: + return ['.zig', '.zon'] + + def parse_file(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse Zig file using tree-sitter.""" + return self._tree_sitter_parse(file_path, content) + + + def _tree_sitter_parse(self, file_path: str, content: str) -> Tuple[Dict[str, SymbolInfo], FileInfo]: + """Parse Zig file using tree-sitter.""" + symbols = {} + functions = [] + classes = [] + imports = [] + + parser = tree_sitter.Parser(self.zig_language) + tree = parser.parse(content.encode('utf8')) + + # Phase 1: Extract symbols using tree-sitter + self._traverse_zig_node(tree.root_node, content, file_path, symbols, functions, classes, imports) + + file_info = FileInfo( + language=self.get_language_name(), + line_count=len(content.splitlines()), + symbols={"functions": functions, "classes": classes}, + imports=imports + ) + + return symbols, file_info + + def _traverse_zig_node(self, node, content: str, file_path: str, symbols: Dict, functions: List, classes: List, imports: List): + """Traverse Zig AST node and extract symbols.""" + if node.type == 'function_declaration': + func_name = self._extract_zig_function_name_from_node(node, content) + if func_name: + line_number = self._extract_line_number(content, node.start_byte) + symbol_id = self._create_symbol_id(file_path, func_name) + symbols[symbol_id] = SymbolInfo( + type="function", + file=file_path, + line=line_number, + signature=self._safe_extract_text(content, node.start_byte, node.end_byte) + ) + functions.append(func_name) + + elif node.type in ['struct_declaration', 'union_declaration', 'enum_declaration']: + type_name = self._extract_zig_type_name_from_node(node, content) + if type_name: + line_number = self._extract_line_number(content, node.start_byte) + symbol_id = self._create_symbol_id(file_path, type_name) + symbols[symbol_id] = SymbolInfo( + type=node.type.replace('_declaration', ''), + file=file_path, + line=line_number + ) + classes.append(type_name) + + # Recurse through children + for child in node.children: + self._traverse_zig_node(child, content, file_path, symbols, functions, classes, imports) + + def _extract_zig_function_name_from_node(self, node, content: str) -> Optional[str]: + """Extract function name from tree-sitter node.""" + for child in node.children: + if child.type == 'identifier': + return self._safe_extract_text(content, child.start_byte, child.end_byte) + return None + + def _extract_zig_type_name_from_node(self, node, content: str) -> Optional[str]: + """Extract type name from tree-sitter node.""" + for child in node.children: + if child.type == 'identifier': + return self._safe_extract_text(content, child.start_byte, child.end_byte) + return None + diff --git a/src/code_index_mcp/indexing/unified_index_manager.py b/src/code_index_mcp/indexing/unified_index_manager.py deleted file mode 100644 index 052f3ed..0000000 --- a/src/code_index_mcp/indexing/unified_index_manager.py +++ /dev/null @@ -1,433 +0,0 @@ -""" -统一索引管理器 - 提供项目索引的统一访问接口 - -这个模块实现了一个中央化的索引管理器,统一处理所有索引相关操作, -包括SCIP索引、遗留索引格式的兼容,以及内存缓存管理。 -""" - -import os -import logging -import time -from typing import Dict, Any, List, Optional, Union -from pathlib import Path - -from .index_provider import IIndexProvider, IIndexManager, IndexMetadata, SymbolInfo, FileInfo -from ..project_settings import ProjectSettings - -# Try to import SCIP proto, handle if not available -try: - from ..scip.proto.scip_pb2 import Index as SCIPIndex, Document as SCIPDocument - SCIP_PROTO_AVAILABLE = True -except ImportError: - SCIPIndex = None - SCIPDocument = None - SCIP_PROTO_AVAILABLE = False - -logger = logging.getLogger(__name__) - - -class UnifiedIndexManager: - """ - 统一索引管理器 - - 负责协调不同索引格式,提供统一的访问接口, - 并处理索引的生命周期管理。 - """ - - def __init__(self, project_path: str, settings: Optional[ProjectSettings] = None): - self.project_path = project_path - self.settings = settings or ProjectSettings(project_path) - - # 核心组件 - 延迟导入避免循环依赖 - self._scip_tool = None - self._current_provider: Optional[IIndexProvider] = None - self._metadata: Optional[IndexMetadata] = None - - # 状态管理 - self._is_initialized = False - self._last_check_time = 0 - self._check_interval = 30 # 30秒检查间隔 - - def _get_scip_tool(self): - """延迟导入SCIP工具以避免循环依赖""" - if self._scip_tool is None: - from ..tools.scip.scip_index_tool import SCIPIndexTool - self._scip_tool = SCIPIndexTool() - return self._scip_tool - - def initialize(self) -> bool: - """ - 初始化索引管理器 - - Returns: - True if initialization successful - """ - try: - # 1. 尝试加载现有索引 - if self._load_existing_index(): - logger.info("Successfully loaded existing index") - self._is_initialized = True - return True - - # 2. 如果没有现有索引,构建新索引 - if self._build_new_index(): - logger.info("Successfully built new index") - self._is_initialized = True - return True - - logger.warning("Failed to initialize index") - return False - - except Exception as e: - logger.error(f"Index initialization failed: {e}") - return False - - def get_provider(self) -> Optional[IIndexProvider]: - """ - 获取当前索引提供者 - - Returns: - 当前活跃的索引提供者,如果索引不可用则返回None - """ - if not self._is_initialized: - self.initialize() - - # 定期检查索引状态 - current_time = time.time() - if current_time - self._last_check_time > self._check_interval: - self._check_index_health() - self._last_check_time = current_time - - return self._current_provider - - def refresh_index(self, force: bool = False) -> bool: - """ - 刷新索引 - - Args: - force: 是否强制重建索引 - - Returns: - True if refresh successful - """ - try: - if force or self._needs_rebuild(): - return self._build_new_index() - else: - # 尝试增量更新 - return self._incremental_update() - except Exception as e: - logger.error(f"Index refresh failed: {e}") - return False - - def save_index(self) -> bool: - """ - 保存当前索引状态 - - Returns: - True if save successful - """ - try: - if self._current_provider and isinstance(self._current_provider, SCIPIndexProvider): - return self._get_scip_tool().save_index() - return False - except Exception as e: - logger.error(f"Index save failed: {e}") - return False - - def clear_index(self) -> None: - """清理索引状态""" - try: - if self._scip_tool: - self._scip_tool.clear_index() - self._current_provider = None - self._metadata = None - self._is_initialized = False - logger.info("Index cleared successfully") - except Exception as e: - logger.error(f"Index clear failed: {e}") - - def get_index_status(self) -> Dict[str, Any]: - """ - 获取索引状态信息 - - Returns: - 包含索引状态的字典 - """ - status = { - 'is_initialized': self._is_initialized, - 'is_available': self._current_provider is not None, - 'provider_type': type(self._current_provider).__name__ if self._current_provider else None, - 'metadata': self._metadata.__dict__ if self._metadata else None, - 'last_check': self._last_check_time - } - - if self._current_provider: - status['file_count'] = len(self._current_provider.get_file_list()) - - return status - - def _load_existing_index(self) -> bool: - """尝试加载现有索引""" - try: - # 1. 尝试SCIP索引 - scip_tool = self._get_scip_tool() - if scip_tool.load_existing_index(self.project_path): - self._current_provider = SCIPIndexProvider(scip_tool) - self._metadata = self._create_metadata_from_scip() - logger.info("Loaded SCIP index") - return True - - # 2. 尝试遗留索引(如果需要兼容) - legacy_data = self.settings.load_existing_index() - if legacy_data and self._is_valid_legacy_index(legacy_data): - self._current_provider = LegacyIndexProvider(legacy_data) - self._metadata = self._create_metadata_from_legacy(legacy_data) - logger.info("Loaded legacy index") - return True - - return False - - except Exception as e: - logger.error(f"Failed to load existing index: {e}") - return False - - def _build_new_index(self) -> bool: - """构建新索引""" - try: - scip_tool = self._get_scip_tool() - file_count = scip_tool.build_index(self.project_path) - if file_count > 0: - self._current_provider = SCIPIndexProvider(scip_tool) - self._metadata = self._create_metadata_from_scip() - - # 保存索引 - scip_tool.save_index() - - logger.info(f"Built new SCIP index with {file_count} files") - return True - - return False - - except Exception as e: - logger.error(f"Failed to build new index: {e}") - return False - - def _check_index_health(self) -> None: - """检查索引健康状态""" - if self._current_provider and not self._current_provider.is_available(): - logger.warning("Index provider became unavailable, attempting recovery") - self.initialize() - - def _needs_rebuild(self) -> bool: - """检查是否需要重建索引""" - if not self._metadata: - return True - - # 检查项目文件是否有更新 - try: - latest_mtime = 0 - for root, _, files in os.walk(self.project_path): - for file in files: - file_path = os.path.join(root, file) - mtime = os.path.getmtime(file_path) - latest_mtime = max(latest_mtime, mtime) - - return latest_mtime > self._metadata.last_updated - - except Exception: - return True # 如果检查失败,保守地重建 - - def _incremental_update(self) -> bool: - """增量更新索引(如果支持)""" - # 目前简化为完全重建 - # 在未来版本中可以实现真正的增量更新 - return self._build_new_index() - - def _create_metadata_from_scip(self) -> IndexMetadata: - """从SCIP索引创建元数据""" - scip_tool = self._get_scip_tool() - metadata_dict = scip_tool.get_project_metadata() - return IndexMetadata( - version="4.0-scip", - format_type="scip", - created_at=time.time(), - last_updated=time.time(), - file_count=metadata_dict.get('total_files', 0), - project_root=metadata_dict.get('project_root', self.project_path), - tool_version=metadata_dict.get('tool_version', 'unknown') - ) - - def _create_metadata_from_legacy(self, legacy_data: Dict[str, Any]) -> IndexMetadata: - """从遗留索引创建元数据""" - return IndexMetadata( - version="3.0-legacy", - format_type="legacy", - created_at=legacy_data.get('created_at', time.time()), - last_updated=legacy_data.get('last_updated', time.time()), - file_count=legacy_data.get('project_metadata', {}).get('total_files', 0), - project_root=self.project_path, - tool_version="legacy" - ) - - def _is_valid_legacy_index(self, index_data: Dict[str, Any]) -> bool: - """验证遗留索引是否有效""" - return ( - isinstance(index_data, dict) and - 'index_metadata' in index_data and - index_data.get('index_metadata', {}).get('version', '') >= '3.0' - ) - - -class SCIPIndexProvider: - """SCIP索引提供者实现""" - - def __init__(self, scip_tool): - self._scip_tool = scip_tool - - def get_file_list(self) -> List[FileInfo]: - return self._scip_tool.get_file_list() - - def get_file_info(self, file_path: str) -> Optional[FileInfo]: - file_list = self.get_file_list() - for file_info in file_list: - if file_info.relative_path == file_path: - return file_info - return None - - def query_symbols(self, file_path: str) -> List[SymbolInfo]: - # This method is deprecated - use CodeIntelligenceService for symbol analysis - return [] - - def search_files(self, pattern: str) -> List[FileInfo]: - # 延迟导入避免循环依赖 - from ..tools.filesystem.file_matching_tool import FileMatchingTool - matcher = FileMatchingTool() - return matcher.match_glob_pattern(self.get_file_list(), pattern) - - def get_metadata(self) -> IndexMetadata: - metadata_dict = self._scip_tool.get_project_metadata() - return IndexMetadata( - version="4.0-scip", - format_type="scip", - created_at=time.time(), - last_updated=time.time(), - file_count=metadata_dict.get('total_files', 0), - project_root=metadata_dict.get('project_root', ''), - tool_version=metadata_dict.get('tool_version', 'unknown') - ) - - def is_available(self) -> bool: - return self._scip_tool.is_index_available() - - -class LegacyIndexProvider: - """遗留索引提供者实现(兼容性支持)""" - - def __init__(self, legacy_data: Dict[str, Any]): - self._data = legacy_data - - def get_file_list(self) -> List[FileInfo]: - # 从遗留数据转换为标准格式 - files = [] - file_dict = self._data.get('files', {}) - - for file_path, file_data in file_dict.items(): - file_info = FileInfo( - relative_path=file_path, - language=file_data.get('language', 'unknown'), - absolute_path=file_data.get('absolute_path', '') - ) - files.append(file_info) - - return files - - def get_file_info(self, file_path: str) -> Optional[FileInfo]: - file_dict = self._data.get('files', {}) - if file_path in file_dict: - file_data = file_dict[file_path] - return FileInfo( - relative_path=file_path, - language=file_data.get('language', 'unknown'), - absolute_path=file_data.get('absolute_path', '') - ) - return None - - def query_symbols(self, file_path: str) -> List[SymbolInfo]: - # 遗留格式的符号信息有限,转换为标准格式 - file_dict = self._data.get('files', {}) - if file_path in file_dict: - legacy_symbols = file_dict[file_path].get('symbols', []) - symbols = [] - for symbol_data in legacy_symbols: - if isinstance(symbol_data, dict): - symbol = SymbolInfo( - name=symbol_data.get('name', ''), - kind=symbol_data.get('kind', 'unknown'), - location=symbol_data.get('location', {'line': 1, 'column': 1}), - scope=symbol_data.get('scope', 'global'), - documentation=symbol_data.get('documentation', []) - ) - symbols.append(symbol) - return symbols - return [] - - def search_files(self, pattern: str) -> List[FileInfo]: - import fnmatch - matched_files = [] - - for file_info in self.get_file_list(): - if fnmatch.fnmatch(file_info.relative_path, pattern): - matched_files.append(file_info) - - return matched_files - - def get_metadata(self) -> IndexMetadata: - meta = self._data.get('index_metadata', {}) - return IndexMetadata( - version=meta.get('version', '3.0-legacy'), - format_type="legacy", - created_at=meta.get('created_at', time.time()), - last_updated=meta.get('last_updated', time.time()), - file_count=len(self._data.get('files', {})), - project_root=meta.get('project_root', ''), - tool_version="legacy" - ) - - def is_available(self) -> bool: - return bool(self._data.get('files')) - - -# 全局索引管理器实例 -_global_index_manager: Optional[UnifiedIndexManager] = None - - -def get_unified_index_manager(project_path: str = None, settings: ProjectSettings = None) -> UnifiedIndexManager: - """ - 获取全局统一索引管理器实例 - - Args: - project_path: 项目路径(首次初始化时需要) - settings: 项目设置(可选) - - Returns: - UnifiedIndexManager实例 - """ - global _global_index_manager - - if _global_index_manager is None and project_path: - _global_index_manager = UnifiedIndexManager(project_path, settings) - - if _global_index_manager and project_path and _global_index_manager.project_path != project_path: - # 项目路径改变,重新创建管理器 - _global_index_manager = UnifiedIndexManager(project_path, settings) - - return _global_index_manager - - -def clear_global_index_manager() -> None: - """清理全局索引管理器""" - global _global_index_manager - if _global_index_manager: - _global_index_manager.clear_index() - _global_index_manager = None diff --git a/src/code_index_mcp/project_settings.py b/src/code_index_mcp/project_settings.py index 5ad2c04..d3c3965 100644 --- a/src/code_index_mcp/project_settings.py +++ b/src/code_index_mcp/project_settings.py @@ -13,16 +13,9 @@ from datetime import datetime -# SCIP protobuf import -try: - from .scip.proto.scip_pb2 import Index as SCIPIndex - SCIP_AVAILABLE = True -except ImportError: - SCIPIndex = None - SCIP_AVAILABLE = False from .constants import ( - SETTINGS_DIR, CONFIG_FILE, SCIP_INDEX_FILE, INDEX_FILE + SETTINGS_DIR, CONFIG_FILE, INDEX_FILE ) from .search.base import SearchStrategy from .search.ugrep import UgrepStrategy @@ -188,35 +181,6 @@ def get_config_path(self): else: return os.path.join(os.path.expanduser("~"), CONFIG_FILE) - def get_scip_index_path(self): - """Get the path to the SCIP index file""" - try: - path = os.path.join(self.settings_path, SCIP_INDEX_FILE) - # Ensure directory exists - os.makedirs(os.path.dirname(path), exist_ok=True) - return path - except Exception: - # If error occurs, use file in project or home directory as fallback - if self.base_path and os.path.exists(self.base_path): - return os.path.join(self.base_path, SCIP_INDEX_FILE) - else: - return os.path.join(os.path.expanduser("~"), SCIP_INDEX_FILE) - - def get_index_path(self): - """Get the path to the legacy index file (for backward compatibility)""" - try: - path = os.path.join(self.settings_path, INDEX_FILE) - # Ensure directory exists - os.makedirs(os.path.dirname(path), exist_ok=True) - return path - except Exception: - # If error occurs, use file in project or home directory as fallback - if self.base_path and os.path.exists(self.base_path): - return os.path.join(self.base_path, INDEX_FILE) - else: - return os.path.join(os.path.expanduser("~"), INDEX_FILE) - - # get_cache_path method removed - no longer needed with new indexing system def _get_timestamp(self): """Get current timestamp""" @@ -367,214 +331,25 @@ def load_index(self): except Exception: return None - def save_scip_index(self, scip_index): - """Save SCIP index in protobuf binary format - Args: - scip_index: SCIP Index protobuf object - """ - if not SCIP_AVAILABLE: - raise RuntimeError("SCIP protobuf not available. Cannot save SCIP index.") - - if not isinstance(scip_index, SCIPIndex): - raise ValueError("scip_index must be a SCIP Index protobuf object") + def cleanup_legacy_files(self) -> None: + """Clean up any legacy index files found.""" try: - scip_path = self.get_scip_index_path() - - # Ensure directory exists - dir_path = os.path.dirname(scip_path) - if not os.path.exists(dir_path): - os.makedirs(dir_path, exist_ok=True) - - # Serialize to binary format - binary_data = scip_index.SerializeToString() - - # Save binary data - with open(scip_path, 'wb') as f: - f.write(binary_data) - + legacy_files = [ + os.path.join(self.settings_path, "file_index.pickle"), + os.path.join(self.settings_path, "content_cache.pickle"), + os.path.join(self.settings_path, INDEX_FILE) # Legacy JSON + ] - - except Exception: - # Try saving to project or home directory - try: - if self.base_path and os.path.exists(self.base_path): - fallback_path = os.path.join(self.base_path, SCIP_INDEX_FILE) - else: - fallback_path = os.path.join(os.path.expanduser("~"), SCIP_INDEX_FILE) - - - binary_data = scip_index.SerializeToString() - with open(fallback_path, 'wb') as f: - f.write(binary_data) - except Exception: - raise - - def load_scip_index(self): - """Load SCIP index from protobuf binary format - - Returns: - SCIP Index object, or None if file doesn't exist or has errors - """ - if not SCIP_AVAILABLE: - return None - - # If skip_load is set, return None directly - if self.skip_load: - return None - - try: - scip_path = self.get_scip_index_path() - - if os.path.exists(scip_path): - - try: - with open(scip_path, 'rb') as f: - binary_data = f.read() - - # Deserialize from binary format - scip_index = SCIPIndex() - scip_index.ParseFromString(binary_data) - - - return scip_index - - except Exception: - return None - else: - # Try fallback paths - fallback_paths = [] - if self.base_path and os.path.exists(self.base_path): - fallback_paths.append(os.path.join(self.base_path, SCIP_INDEX_FILE)) - fallback_paths.append(os.path.join(os.path.expanduser("~"), SCIP_INDEX_FILE)) - - for fallback_path in fallback_paths: - if os.path.exists(fallback_path): - - try: - with open(fallback_path, 'rb') as f: - binary_data = f.read() - - scip_index = SCIPIndex() - scip_index.ParseFromString(binary_data) - - - return scip_index - except Exception: - continue - - return None - - except Exception: - return None - - # save_cache and load_cache methods removed - no longer needed with new indexing system - - def detect_index_version(self): - """Detect the version of the existing index - - Returns: - str: Version string ('legacy', '3.0', or None if no index exists) - """ - try: - # Check for new JSON format first - index_path = self.get_index_path() - if os.path.exists(index_path): - try: - with open(index_path, 'r', encoding='utf-8') as f: - index_data = json.load(f) - - # Check if it has the new structure - if isinstance(index_data, dict) and 'index_metadata' in index_data: - version = index_data.get('index_metadata', {}).get('version', '3.0') - return version - else: - return 'legacy' - except (json.JSONDecodeError, UnicodeDecodeError): - return 'legacy' - - # Check for old pickle format - old_pickle_path = os.path.join(self.settings_path, "file_index.pickle") - if os.path.exists(old_pickle_path): - return 'legacy' - - # Check fallback locations - if self.base_path and os.path.exists(self.base_path): - fallback_json = os.path.join(self.base_path, INDEX_FILE) - fallback_pickle = os.path.join(self.base_path, "file_index.pickle") - else: - fallback_json = os.path.join(os.path.expanduser("~"), INDEX_FILE) - fallback_pickle = os.path.join(os.path.expanduser("~"), "file_index.pickle") - - if os.path.exists(fallback_json): - try: - with open(fallback_json, 'r', encoding='utf-8') as f: - index_data = json.load(f) - if isinstance(index_data, dict) and 'index_metadata' in index_data: - version = index_data.get('index_metadata', {}).get('version', '3.0') - return version - else: - return 'legacy' - except Exception: - return 'legacy' - - if os.path.exists(fallback_pickle): - return 'legacy' - - return None - - except Exception: - return None - - def migrate_legacy_index(self): - """Migrate legacy index format to new format - - Returns: - bool: True if migration was successful or not needed, False if failed - """ - try: - version = self.detect_index_version() - - if version is None: - return True - - if version == '3.0' or (isinstance(version, str) and version >= '3.0'): - return True - - if version == 'legacy': - - # Clean up legacy files - legacy_files = [ - os.path.join(self.settings_path, "file_index.pickle"), - os.path.join(self.settings_path, "content_cache.pickle") - ] - - # Add fallback locations - if self.base_path and os.path.exists(self.base_path): - legacy_files.extend([ - os.path.join(self.base_path, "file_index.pickle"), - os.path.join(self.base_path, "content_cache.pickle") - ]) - else: - legacy_files.extend([ - os.path.join(os.path.expanduser("~"), "file_index.pickle"), - os.path.join(os.path.expanduser("~"), "content_cache.pickle") - ]) - - for legacy_file in legacy_files: - if os.path.exists(legacy_file): - try: - os.remove(legacy_file) - except Exception: - pass - - return False # Indicate that manual rebuild is needed - - return True - + for legacy_file in legacy_files: + if os.path.exists(legacy_file): + try: + os.remove(legacy_file) + except Exception: + pass except Exception: - return False + pass def clear(self): """Clear config and index files""" diff --git a/src/code_index_mcp/scip/__init__.py b/src/code_index_mcp/scip/__init__.py deleted file mode 100644 index 30ace0d..0000000 --- a/src/code_index_mcp/scip/__init__.py +++ /dev/null @@ -1,10 +0,0 @@ -""" -SCIP (Source Code Intelligence Protocol) indexing module. - -This module provides SCIP-based code indexing capabilities using a multi-strategy -approach to support various programming languages and tools. -""" - -from .factory import SCIPIndexerFactory, SCIPIndexingError - -__all__ = ['SCIPIndexerFactory', 'SCIPIndexingError'] \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/__init__.py b/src/code_index_mcp/scip/core/__init__.py deleted file mode 100644 index cbd4fc0..0000000 --- a/src/code_index_mcp/scip/core/__init__.py +++ /dev/null @@ -1 +0,0 @@ -"""SCIP core components for standard-compliant indexing.""" \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/local_reference_resolver.py b/src/code_index_mcp/scip/core/local_reference_resolver.py deleted file mode 100644 index cef4da8..0000000 --- a/src/code_index_mcp/scip/core/local_reference_resolver.py +++ /dev/null @@ -1,470 +0,0 @@ -"""Local Reference Resolver - Cross-file reference resolution within a project.""" - -import logging -from typing import Dict, List, Optional, Set, Tuple, Any -from dataclasses import dataclass -from pathlib import Path - -from ..proto import scip_pb2 - - -logger = logging.getLogger(__name__) - - -@dataclass -class SymbolDefinition: - """Information about a symbol definition.""" - symbol_id: str - file_path: str - definition_range: scip_pb2.Range - symbol_kind: int - display_name: str - documentation: List[str] - - -@dataclass -class SymbolReference: - """Information about a symbol reference.""" - symbol_id: str - file_path: str - reference_range: scip_pb2.Range - context_scope: List[str] - - -@dataclass -class SymbolRelationship: - """Information about a relationship between symbols.""" - source_symbol_id: str - target_symbol_id: str - relationship_type: str # InternalRelationshipType enum value - relationship_data: Dict[str, Any] # Additional relationship metadata - - -class LocalReferenceResolver: - """ - Resolves references within a local project. - - This class maintains a symbol table for all definitions in the project - and helps resolve references to their definitions. - """ - - def __init__(self, project_path: str): - """ - Initialize reference resolver for a project. - - Args: - project_path: Absolute path to project root - """ - self.project_path = Path(project_path).resolve() - - # Symbol tables - self.symbol_definitions: Dict[str, SymbolDefinition] = {} - self.symbol_references: Dict[str, List[SymbolReference]] = {} - - # Relationship storage - self.symbol_relationships: Dict[str, List[SymbolRelationship]] = {} # source_symbol_id -> relationships - self.reverse_relationships: Dict[str, List[SymbolRelationship]] = {} # target_symbol_id -> relationships - - # File-based indexes for faster lookup - self.file_symbols: Dict[str, Set[str]] = {} # file_path -> symbol_ids - self.symbol_by_name: Dict[str, List[str]] = {} # display_name -> symbol_ids - - logger.debug(f"LocalReferenceResolver initialized for project: {project_path}") - - def register_symbol_definition(self, - symbol_id: str, - file_path: str, - definition_range: scip_pb2.Range, - symbol_kind: int, - display_name: str, - documentation: List[str] = None) -> None: - """ - Register a symbol definition. - - Args: - symbol_id: SCIP symbol ID - file_path: File path relative to project root - definition_range: SCIP Range of definition - symbol_kind: SCIP symbol kind - display_name: Human-readable symbol name - documentation: Optional documentation - """ - definition = SymbolDefinition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=definition_range, - symbol_kind=symbol_kind, - display_name=display_name, - documentation=documentation or [] - ) - - self.symbol_definitions[symbol_id] = definition - - # Update file index - if file_path not in self.file_symbols: - self.file_symbols[file_path] = set() - self.file_symbols[file_path].add(symbol_id) - - # Update name index - if display_name not in self.symbol_by_name: - self.symbol_by_name[display_name] = [] - if symbol_id not in self.symbol_by_name[display_name]: - self.symbol_by_name[display_name].append(symbol_id) - - logger.debug(f"Registered symbol definition: {display_name} -> {symbol_id}") - - def register_symbol_reference(self, - symbol_id: str, - file_path: str, - reference_range: scip_pb2.Range, - context_scope: List[str] = None) -> None: - """ - Register a symbol reference. - - Args: - symbol_id: SCIP symbol ID being referenced - file_path: File path where reference occurs - reference_range: SCIP Range of reference - context_scope: Scope context where reference occurs - """ - reference = SymbolReference( - symbol_id=symbol_id, - file_path=file_path, - reference_range=reference_range, - context_scope=context_scope or [] - ) - - if symbol_id not in self.symbol_references: - self.symbol_references[symbol_id] = [] - self.symbol_references[symbol_id].append(reference) - - logger.debug(f"Registered symbol reference: {symbol_id} in {file_path}") - - def resolve_reference_by_name(self, - symbol_name: str, - context_file: str, - context_scope: List[str] = None) -> Optional[str]: - """ - Resolve a symbol reference by name to its definition symbol ID. - - Args: - symbol_name: Name of symbol to resolve - context_file: File where reference occurs - context_scope: Scope context of reference - - Returns: - Symbol ID of definition or None if not found - """ - context_scope = context_scope or [] - - # Look for exact name matches - if symbol_name not in self.symbol_by_name: - return None - - candidate_symbols = self.symbol_by_name[symbol_name] - - if len(candidate_symbols) == 1: - return candidate_symbols[0] - - # Multiple candidates - use scope-based resolution - return self._resolve_with_scope(candidate_symbols, context_file, context_scope) - - def get_symbol_definition(self, symbol_id: str) -> Optional[SymbolDefinition]: - """ - Get symbol definition by ID. - - Args: - symbol_id: SCIP symbol ID - - Returns: - SymbolDefinition or None if not found - """ - return self.symbol_definitions.get(symbol_id) - - def get_symbol_references(self, symbol_id: str) -> List[SymbolReference]: - """ - Get all references to a symbol. - - Args: - symbol_id: SCIP symbol ID - - Returns: - List of SymbolReference objects - """ - return self.symbol_references.get(symbol_id, []) - - def get_file_symbols(self, file_path: str) -> Set[str]: - """ - Get all symbols defined in a file. - - Args: - file_path: File path relative to project root - - Returns: - Set of symbol IDs defined in the file - """ - return self.file_symbols.get(file_path, set()) - - def find_symbols_by_pattern(self, pattern: str) -> List[SymbolDefinition]: - """ - Find symbols matching a pattern. - - Args: - pattern: Search pattern (simple substring match) - - Returns: - List of matching SymbolDefinition objects - """ - matches = [] - pattern_lower = pattern.lower() - - for symbol_def in self.symbol_definitions.values(): - if (pattern_lower in symbol_def.display_name.lower() or - pattern_lower in symbol_def.symbol_id.lower()): - matches.append(symbol_def) - - return matches - - def get_project_statistics(self) -> Dict[str, int]: - """ - Get statistics about the symbol table including relationships. - - Returns: - Dictionary with statistics - """ - total_references = sum(len(refs) for refs in self.symbol_references.values()) - total_relationships = sum(len(rels) for rels in self.symbol_relationships.values()) - - return { - 'total_definitions': len(self.symbol_definitions), - 'total_references': total_references, - 'total_relationships': total_relationships, - 'files_with_symbols': len(self.file_symbols), - 'unique_symbol_names': len(self.symbol_by_name), - 'symbols_with_relationships': len(self.symbol_relationships) - } - - def _resolve_with_scope(self, - candidate_symbols: List[str], - context_file: str, - context_scope: List[str]) -> Optional[str]: - """ - Resolve symbol using scope-based heuristics. - - Args: - candidate_symbols: List of candidate symbol IDs - context_file: File where reference occurs - context_scope: Scope context - - Returns: - Best matching symbol ID or None - """ - # Scoring system for symbol resolution - scored_candidates = [] - - for symbol_id in candidate_symbols: - definition = self.symbol_definitions.get(symbol_id) - if not definition: - continue - - score = 0 - - # Prefer symbols from the same file - if definition.file_path == context_file: - score += 100 - - # Prefer symbols from similar scope depth - symbol_scope_depth = symbol_id.count('/') - context_scope_depth = len(context_scope) - scope_diff = abs(symbol_scope_depth - context_scope_depth) - score += max(0, 50 - scope_diff * 10) - - # Prefer symbols with matching scope components - for scope_component in context_scope: - if scope_component in symbol_id: - score += 20 - - scored_candidates.append((score, symbol_id)) - - if not scored_candidates: - return None - - # Return highest scoring candidate - scored_candidates.sort(key=lambda x: x[0], reverse=True) - best_symbol = scored_candidates[0][1] - - logger.debug(f"Resolved '{candidate_symbols}' to '{best_symbol}' " - f"(score: {scored_candidates[0][0]})") - - return best_symbol - - def clear(self) -> None: - """Clear all symbol tables.""" - self.symbol_definitions.clear() - self.symbol_references.clear() - self.file_symbols.clear() - self.symbol_by_name.clear() - - logger.debug("Symbol tables cleared") - - def export_symbol_table(self) -> Dict[str, any]: - """ - Export symbol table for debugging or persistence. - - Returns: - Dictionary representation of symbol table - """ - return { - 'definitions': { - symbol_id: { - 'file_path': defn.file_path, - 'display_name': defn.display_name, - 'symbol_kind': defn.symbol_kind, - 'documentation': defn.documentation - } - for symbol_id, defn in self.symbol_definitions.items() - }, - 'references': { - symbol_id: len(refs) - for symbol_id, refs in self.symbol_references.items() - }, - 'relationships': { - symbol_id: len(rels) - for symbol_id, rels in self.symbol_relationships.items() - }, - 'statistics': self.get_project_statistics() - } - - def add_symbol_relationship(self, - source_symbol_id: str, - target_symbol_id: str, - relationship_type: str, - relationship_data: Dict[str, Any] = None) -> None: - """ - Add a relationship between symbols. - - Args: - source_symbol_id: Source symbol ID - target_symbol_id: Target symbol ID - relationship_type: Type of relationship (enum value as string) - relationship_data: Additional relationship metadata - """ - relationship = SymbolRelationship( - source_symbol_id=source_symbol_id, - target_symbol_id=target_symbol_id, - relationship_type=relationship_type, - relationship_data=relationship_data or {} - ) - - # Add to forward relationships - if source_symbol_id not in self.symbol_relationships: - self.symbol_relationships[source_symbol_id] = [] - self.symbol_relationships[source_symbol_id].append(relationship) - - # Add to reverse relationships for quick lookup - if target_symbol_id not in self.reverse_relationships: - self.reverse_relationships[target_symbol_id] = [] - self.reverse_relationships[target_symbol_id].append(relationship) - - logger.debug(f"Added relationship: {source_symbol_id} --{relationship_type}--> {target_symbol_id}") - - def get_symbol_relationships(self, symbol_id: str) -> List[SymbolRelationship]: - """ - Get all relationships where the symbol is the source. - - Args: - symbol_id: Symbol ID - - Returns: - List of relationships - """ - return self.symbol_relationships.get(symbol_id, []) - - def get_reverse_relationships(self, symbol_id: str) -> List[SymbolRelationship]: - """ - Get all relationships where the symbol is the target. - - Args: - symbol_id: Symbol ID - - Returns: - List of relationships where this symbol is the target - """ - return self.reverse_relationships.get(symbol_id, []) - - def get_all_relationships_for_symbol(self, symbol_id: str) -> Dict[str, List[SymbolRelationship]]: - """ - Get both forward and reverse relationships for a symbol. - - Args: - symbol_id: Symbol ID - - Returns: - Dictionary with 'outgoing' and 'incoming' relationship lists - """ - return { - 'outgoing': self.get_symbol_relationships(symbol_id), - 'incoming': self.get_reverse_relationships(symbol_id) - } - - def find_relationships_by_type(self, relationship_type: str) -> List[SymbolRelationship]: - """ - Find all relationships of a specific type. - - Args: - relationship_type: Type of relationship to find - - Returns: - List of matching relationships - """ - matches = [] - for relationships in self.symbol_relationships.values(): - for rel in relationships: - if rel.relationship_type == relationship_type: - matches.append(rel) - return matches - - def remove_symbol_relationships(self, symbol_id: str) -> None: - """ - Remove all relationships for a symbol (both as source and target). - - Args: - symbol_id: Symbol ID to remove relationships for - """ - # Remove as source - if symbol_id in self.symbol_relationships: - del self.symbol_relationships[symbol_id] - - # Remove as target - if symbol_id in self.reverse_relationships: - del self.reverse_relationships[symbol_id] - - # Remove from other symbols' relationships where this symbol is referenced - for source_id, relationships in self.symbol_relationships.items(): - self.symbol_relationships[source_id] = [ - rel for rel in relationships if rel.target_symbol_id != symbol_id - ] - - logger.debug(f"Removed all relationships for symbol: {symbol_id}") - - def get_relationship_statistics(self) -> Dict[str, Any]: - """ - Get statistics about relationships. - - Returns: - Dictionary with relationship statistics - """ - total_relationships = sum(len(rels) for rels in self.symbol_relationships.values()) - relationship_types = {} - - for relationships in self.symbol_relationships.values(): - for rel in relationships: - rel_type = rel.relationship_type - relationship_types[rel_type] = relationship_types.get(rel_type, 0) + 1 - - return { - 'total_relationships': total_relationships, - 'symbols_with_outgoing_relationships': len(self.symbol_relationships), - 'symbols_with_incoming_relationships': len(self.reverse_relationships), - 'relationship_types': relationship_types - } \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/moniker_manager.py b/src/code_index_mcp/scip/core/moniker_manager.py deleted file mode 100644 index 64fe640..0000000 --- a/src/code_index_mcp/scip/core/moniker_manager.py +++ /dev/null @@ -1,375 +0,0 @@ -""" -Moniker Manager - handles import/export monikers for cross-repository navigation. - -Monikers in SCIP enable cross-repository symbol resolution by providing standardized -identifiers for external packages, modules, and dependencies. -""" - -import logging -import re -from typing import Dict, List, Optional, Set, Tuple, NamedTuple -from pathlib import Path -from dataclasses import dataclass, field - -from ..proto import scip_pb2 - - -logger = logging.getLogger(__name__) - - -@dataclass -class PackageInfo: - """Information about an external package.""" - manager: str # e.g., "npm", "pip", "maven", "cargo" - name: str # Package name - version: str # Package version (optional) - - def to_scip_package(self) -> str: - """Convert to SCIP package format.""" - if self.version: - return f"{self.manager} {self.name} {self.version}" - return f"{self.manager} {self.name}" - - -@dataclass -class ImportedSymbol: - """Represents an imported symbol from external package.""" - package_info: PackageInfo - module_path: str # Module path within package - symbol_name: str # Symbol name - alias: Optional[str] = None # Local alias if any - import_kind: str = "default" # "default", "named", "namespace", "side_effect" - - @property - def local_name(self) -> str: - """Get the local name used in code.""" - return self.alias or self.symbol_name - - -@dataclass -class ExportedSymbol: - """Represents a symbol exported by this package.""" - symbol_name: str - symbol_kind: str # "function", "class", "variable", "type", etc. - module_path: str # Path within this package - is_default: bool = False - - -class MonikerManager: - """ - Manages import/export monikers for cross-repository symbol resolution. - - Key responsibilities: - 1. Track external package dependencies - 2. Generate SCIP symbols for imported symbols - 3. Create external symbol information - 4. Support package manager integration (npm, pip, maven, etc.) - """ - - def __init__(self, project_path: str, project_name: str): - """ - Initialize moniker manager. - - Args: - project_path: Root path of the current project - project_name: Name of the current project - """ - self.project_path = project_path - self.project_name = project_name - - # Track imported symbols from external packages - self.imported_symbols: Dict[str, ImportedSymbol] = {} - - # Track symbols exported by this project - self.exported_symbols: Dict[str, ExportedSymbol] = {} - - # Package dependency information - self.dependencies: Dict[str, PackageInfo] = {} - - # Cache for generated SCIP symbol IDs - self._symbol_cache: Dict[str, str] = {} - - # Registry of known package managers and their patterns - self.package_managers = { - "npm": PackageManagerConfig( - name="npm", - config_files=["package.json", "package-lock.json", "yarn.lock"], - import_patterns=[ - r"import\s+.*?from\s+['\"]([^'\"]+)['\"]", - r"require\s*\(\s*['\"]([^'\"]+)['\"]\s*\)" - ] - ), - "pip": PackageManagerConfig( - name="pip", - config_files=["requirements.txt", "pyproject.toml", "setup.py", "Pipfile"], - import_patterns=[ - r"from\s+([a-zA-Z_][a-zA-Z0-9_]*(?:\.[a-zA-Z_][a-zA-Z0-9_]*)*)", - r"import\s+([a-zA-Z_][a-zA-Z0-9_]*(?:\.[a-zA-Z_][a-zA-Z0-9_]*)*)" - ] - ), - "maven": PackageManagerConfig( - name="maven", - config_files=["pom.xml", "build.gradle", "build.gradle.kts"], - import_patterns=[ - r"import\s+([a-zA-Z_][a-zA-Z0-9_.]*)" - ] - ), - "cargo": PackageManagerConfig( - name="cargo", - config_files=["Cargo.toml", "Cargo.lock"], - import_patterns=[ - r"use\s+([a-zA-Z_][a-zA-Z0-9_]*(?:::[a-zA-Z_][a-zA-Z0-9_]*)*)" - ] - ) - } - - # Detect project package manager - self.detected_manager = self._detect_package_manager() - - logger.debug(f"Initialized MonikerManager for {project_name} with {self.detected_manager or 'no'} package manager") - - def register_import(self, - package_name: str, - symbol_name: str, - module_path: str = "", - alias: Optional[str] = None, - import_kind: str = "named", - version: Optional[str] = None) -> str: - """ - Register an imported symbol from external package. - - Args: - package_name: Name of the external package - symbol_name: Name of the imported symbol - module_path: Module path within package - alias: Local alias for the symbol - import_kind: Type of import (default, named, namespace, side_effect) - version: Package version if known - - Returns: - SCIP symbol ID for the imported symbol - """ - # Create package info - manager = self.detected_manager or "unknown" - package_info = PackageInfo(manager, package_name, version or "") - - # Create imported symbol - imported_symbol = ImportedSymbol( - package_info=package_info, - module_path=module_path, - symbol_name=symbol_name, - alias=alias, - import_kind=import_kind - ) - - # Generate cache key - cache_key = f"{package_name}.{module_path}.{symbol_name}" - - # Store imported symbol - self.imported_symbols[cache_key] = imported_symbol - self.dependencies[package_name] = package_info - - # Generate SCIP symbol ID - symbol_id = self._generate_external_symbol_id(imported_symbol) - self._symbol_cache[cache_key] = symbol_id - - logger.debug(f"Registered import: {cache_key} -> {symbol_id}") - return symbol_id - - def register_export(self, - symbol_name: str, - symbol_kind: str, - module_path: str, - is_default: bool = False) -> str: - """ - Register a symbol exported by this project. - - Args: - symbol_name: Name of the exported symbol - symbol_kind: Kind of symbol (function, class, etc.) - module_path: Module path within this project - is_default: Whether this is a default export - - Returns: - SCIP symbol ID for the exported symbol - """ - exported_symbol = ExportedSymbol( - symbol_name=symbol_name, - symbol_kind=symbol_kind, - module_path=module_path, - is_default=is_default - ) - - cache_key = f"export.{module_path}.{symbol_name}" - self.exported_symbols[cache_key] = exported_symbol - - # Generate local symbol ID (this will be accessible to other projects) - symbol_id = self._generate_export_symbol_id(exported_symbol) - self._symbol_cache[cache_key] = symbol_id - - logger.debug(f"Registered export: {cache_key} -> {symbol_id}") - return symbol_id - - def get_external_symbol_information(self) -> List[scip_pb2.SymbolInformation]: - """ - Generate external symbol information for all imported symbols. - - Returns: - List of SymbolInformation for external symbols - """ - external_symbols = [] - - for cache_key, imported_symbol in self.imported_symbols.items(): - symbol_id = self._symbol_cache.get(cache_key) - if not symbol_id: - continue - - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = imported_symbol.local_name - symbol_info.kind = self._infer_symbol_kind(imported_symbol.symbol_name) - - # Add package information to documentation - pkg = imported_symbol.package_info - documentation = [ - f"External symbol from {pkg.name}", - f"Package manager: {pkg.manager}" - ] - if pkg.version: - documentation.append(f"Version: {pkg.version}") - if imported_symbol.module_path: - documentation.append(f"Module: {imported_symbol.module_path}") - - symbol_info.documentation.extend(documentation) - - external_symbols.append(symbol_info) - - logger.info(f"Generated {len(external_symbols)} external symbol information entries") - return external_symbols - - def resolve_import_reference(self, symbol_name: str, context_file: str) -> Optional[str]: - """ - Resolve a symbol reference to an imported symbol. - - Args: - symbol_name: Name of the symbol being referenced - context_file: File where the reference occurs - - Returns: - SCIP symbol ID if the symbol is an import, None otherwise - """ - # Look for exact matches first - for cache_key, imported_symbol in self.imported_symbols.items(): - if imported_symbol.local_name == symbol_name: - return self._symbol_cache.get(cache_key) - - # Look for partial matches (e.g., module.symbol) - for cache_key, imported_symbol in self.imported_symbols.items(): - if symbol_name.startswith(imported_symbol.local_name + "."): - # This might be a member access on imported module - base_symbol_id = self._symbol_cache.get(cache_key) - if base_symbol_id: - # Create symbol ID for the member - member_name = symbol_name[len(imported_symbol.local_name) + 1:] - return self._generate_member_symbol_id(imported_symbol, member_name) - - return None - - def get_dependency_info(self) -> Dict[str, PackageInfo]: - """Get information about all detected dependencies.""" - return self.dependencies.copy() - - def _detect_package_manager(self) -> Optional[str]: - """Detect which package manager this project uses.""" - project_root = Path(self.project_path) - - for manager_name, config in self.package_managers.items(): - for config_file in config.config_files: - if (project_root / config_file).exists(): - logger.info(f"Detected {manager_name} package manager") - return manager_name - - return None - - def _generate_external_symbol_id(self, imported_symbol: ImportedSymbol) -> str: - """Generate SCIP symbol ID for external symbol.""" - pkg = imported_symbol.package_info - - # SCIP format: scheme manager package version descriptors - parts = ["scip-python" if pkg.manager == "pip" else f"scip-{pkg.manager}"] - parts.append(pkg.manager) - parts.append(pkg.name) - - if pkg.version: - parts.append(pkg.version) - - # Add module path if present - if imported_symbol.module_path: - parts.append(imported_symbol.module_path.replace("/", ".")) - - # Add symbol descriptor - if imported_symbol.symbol_name: - parts.append(f"{imported_symbol.symbol_name}.") - - return " ".join(parts) - - def _generate_export_symbol_id(self, exported_symbol: ExportedSymbol) -> str: - """Generate SCIP symbol ID for exported symbol.""" - # For exports, use local scheme but make it accessible - manager = self.detected_manager or "local" - - parts = [f"scip-{manager}", manager, self.project_name] - - if exported_symbol.module_path: - parts.append(exported_symbol.module_path.replace("/", ".")) - - # Add appropriate descriptor based on symbol kind - descriptor = self._get_symbol_descriptor(exported_symbol.symbol_kind) - parts.append(f"{exported_symbol.symbol_name}{descriptor}") - - return " ".join(parts) - - def _generate_member_symbol_id(self, imported_symbol: ImportedSymbol, member_name: str) -> str: - """Generate symbol ID for a member of an imported symbol.""" - base_id = self._generate_external_symbol_id(imported_symbol) - - # Remove the trailing descriptor and add member - if base_id.endswith("."): - base_id = base_id[:-1] - - return f"{base_id}#{member_name}." - - def _get_symbol_descriptor(self, symbol_kind: str) -> str: - """Get SCIP descriptor suffix for symbol kind.""" - descriptors = { - "function": "().", - "method": "().", - "class": "#", - "interface": "#", - "type": "#", - "variable": ".", - "constant": ".", - "module": "/", - "namespace": "/" - } - return descriptors.get(symbol_kind.lower(), ".") - - def _infer_symbol_kind(self, symbol_name: str) -> int: - """Infer SCIP symbol kind from symbol name.""" - # Simple heuristics - could be enhanced with actual type information - if symbol_name.istitle(): # CamelCase suggests class/type - return scip_pb2.Class - elif symbol_name.isupper(): # UPPER_CASE suggests constant - return scip_pb2.Constant - elif "." in symbol_name: # Dotted suggests module/namespace - return scip_pb2.Module - else: - return scip_pb2.Function # Default assumption - - -@dataclass -class PackageManagerConfig: - """Configuration for a specific package manager.""" - name: str - config_files: List[str] = field(default_factory=list) - import_patterns: List[str] = field(default_factory=list) \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/position_calculator.py b/src/code_index_mcp/scip/core/position_calculator.py deleted file mode 100644 index 1f46139..0000000 --- a/src/code_index_mcp/scip/core/position_calculator.py +++ /dev/null @@ -1,306 +0,0 @@ -"""SCIP Position Calculator - Accurate position calculation for SCIP ranges.""" - -import ast -import logging -from typing import Tuple, List, Optional -try: - import tree_sitter - TREE_SITTER_AVAILABLE = True -except ImportError: - TREE_SITTER_AVAILABLE = False - -from ..proto import scip_pb2 - - -logger = logging.getLogger(__name__) - - -class PositionCalculator: - """ - Accurate position calculator for SCIP ranges. - - Handles conversion from various source positions (AST nodes, Tree-sitter nodes, - line/column positions) to precise SCIP Range objects. - """ - - def __init__(self, content: str, encoding: str = 'utf-8'): - """ - Initialize position calculator with file content. - - Args: - content: File content as string - encoding: File encoding (default: utf-8) - """ - self.content = content - self.encoding = encoding - self.lines = content.split('\n') - - # Build byte offset mapping for accurate position calculation - self._build_position_maps() - - logger.debug(f"PositionCalculator initialized for {len(self.lines)} lines") - - def _build_position_maps(self): - """Build mapping tables for efficient position conversion.""" - # Build line start byte offsets - self.line_start_bytes: List[int] = [0] - - content_bytes = self.content.encode(self.encoding) - current_byte = 0 - - for line in self.lines[:-1]: # Exclude last line - line_bytes = line.encode(self.encoding) - current_byte += len(line_bytes) + 1 # +1 for newline - self.line_start_bytes.append(current_byte) - - def ast_node_to_range(self, node: ast.AST) -> scip_pb2.Range: - """ - Convert Python AST node to SCIP Range. - - Args: - node: Python AST node - - Returns: - SCIP Range object - """ - range_obj = scip_pb2.Range() - - if hasattr(node, 'lineno') and hasattr(node, 'col_offset'): - # Python AST uses 1-based line numbers, SCIP uses 0-based - start_line = node.lineno - 1 - start_col = node.col_offset - - # Try to get end position - if hasattr(node, 'end_lineno') and hasattr(node, 'end_col_offset'): - end_line = node.end_lineno - 1 - end_col = node.end_col_offset - else: - # Estimate end position - end_line, end_col = self._estimate_ast_end_position(node, start_line, start_col) - - range_obj.start.extend([start_line, start_col]) - range_obj.end.extend([end_line, end_col]) - else: - # Fallback for nodes without position info - range_obj.start.extend([0, 0]) - range_obj.end.extend([0, 1]) - - return range_obj - - def tree_sitter_node_to_range(self, node) -> scip_pb2.Range: - """ - Convert Tree-sitter node to SCIP Range. - - Args: - node: Tree-sitter Node object - - Returns: - SCIP Range object - """ - if not TREE_SITTER_AVAILABLE: - logger.warning("Tree-sitter not available, using fallback range") - range_obj = scip_pb2.Range() - range_obj.start.extend([0, 0]) - range_obj.end.extend([0, 1]) - return range_obj - - range_obj = scip_pb2.Range() - - # Tree-sitter provides byte offsets, convert to line/column - start_line, start_col = self.byte_to_line_col(node.start_byte) - end_line, end_col = self.byte_to_line_col(node.end_byte) - - range_obj.start.extend([start_line, start_col]) - range_obj.end.extend([end_line, end_col]) - - return range_obj - - def line_col_to_range(self, - start_line: int, - start_col: int, - end_line: Optional[int] = None, - end_col: Optional[int] = None, - name_length: int = 1) -> scip_pb2.Range: - """ - Create SCIP Range from line/column positions. - - Args: - start_line: Start line (0-based) - start_col: Start column (0-based) - end_line: End line (optional) - end_col: End column (optional) - name_length: Length of symbol name for end position estimation - - Returns: - SCIP Range object - """ - range_obj = scip_pb2.Range() - - # Use provided end position or estimate - if end_line is not None and end_col is not None: - final_end_line = end_line - final_end_col = end_col - else: - final_end_line = start_line - final_end_col = start_col + name_length - - range_obj.start.extend([start_line, start_col]) - range_obj.end.extend([final_end_line, final_end_col]) - - return range_obj - - def byte_to_line_col(self, byte_offset: int) -> Tuple[int, int]: - """ - Convert byte offset to line/column position. - - Args: - byte_offset: Byte offset in file - - Returns: - Tuple of (line, column) - both 0-based - """ - if byte_offset < 0: - return (0, 0) - - # Find the line containing this byte offset - line_num = 0 - for i, line_start in enumerate(self.line_start_bytes): - if byte_offset < line_start: - line_num = i - 1 - break - else: - line_num = len(self.line_start_bytes) - 1 - - # Ensure line_num is valid - line_num = max(0, min(line_num, len(self.lines) - 1)) - - # Calculate column within the line - line_start_byte = self.line_start_bytes[line_num] - byte_in_line = byte_offset - line_start_byte - - # Convert byte offset to character offset within line - if line_num < len(self.lines): - line_content = self.lines[line_num] - try: - # Convert byte offset to character offset - line_bytes = line_content.encode(self.encoding) - if byte_in_line <= len(line_bytes): - char_offset = len(line_bytes[:byte_in_line].decode(self.encoding, errors='ignore')) - else: - char_offset = len(line_content) - except (UnicodeDecodeError, UnicodeEncodeError): - # Fallback to byte offset as character offset - char_offset = min(byte_in_line, len(line_content)) - else: - char_offset = 0 - - return (line_num, char_offset) - - def find_name_in_line(self, line_num: int, name: str) -> Tuple[int, int]: - """ - Find the position of a name within a line. - - Args: - line_num: Line number (0-based) - name: Name to find - - Returns: - Tuple of (start_col, end_col) or (0, len(name)) if not found - """ - if line_num < 0 or line_num >= len(self.lines): - return (0, len(name)) - - line_content = self.lines[line_num] - start_col = line_content.find(name) - - if start_col == -1: - # Try to find word boundary match - import re - pattern = r'\b' + re.escape(name) + r'\b' - match = re.search(pattern, line_content) - if match: - start_col = match.start() - else: - start_col = 0 - - end_col = start_col + len(name) - return (start_col, end_col) - - def _estimate_ast_end_position(self, - node: ast.AST, - start_line: int, - start_col: int) -> Tuple[int, int]: - """ - Estimate end position for AST nodes without end position info. - - Args: - node: AST node - start_line: Start line - start_col: Start column - - Returns: - Tuple of (end_line, end_col) - """ - # Try to get name length from common node types - name_length = 1 - - if hasattr(node, 'id'): # Name nodes - name_length = len(node.id) - elif hasattr(node, 'name'): # Function/Class definition nodes - name_length = len(node.name) - elif hasattr(node, 'arg'): # Argument nodes - name_length = len(node.arg) - elif hasattr(node, 'attr'): # Attribute nodes - name_length = len(node.attr) - elif isinstance(node, ast.Constant) and isinstance(node.value, str): - name_length = len(str(node.value)) + 2 # Add quotes - - # For most cases, end position is on the same line - end_line = start_line - end_col = start_col + name_length - - # Ensure end position doesn't exceed line length - if start_line < len(self.lines): - line_length = len(self.lines[start_line]) - end_col = min(end_col, line_length) - - return (end_line, end_col) - - def validate_range(self, range_obj: scip_pb2.Range) -> bool: - """ - Validate that a SCIP Range is within file bounds. - - Args: - range_obj: SCIP Range to validate - - Returns: - True if range is valid - """ - if len(range_obj.start) != 2 or len(range_obj.end) != 2: - return False - - start_line, start_col = range_obj.start[0], range_obj.start[1] - end_line, end_col = range_obj.end[0], range_obj.end[1] - - # Check line bounds - if start_line < 0 or start_line >= len(self.lines): - return False - if end_line < 0 or end_line >= len(self.lines): - return False - - # Check column bounds - if start_line < len(self.lines): - if start_col < 0 or start_col > len(self.lines[start_line]): - return False - - if end_line < len(self.lines): - if end_col < 0 or end_col > len(self.lines[end_line]): - return False - - # Check that start <= end - if start_line > end_line: - return False - if start_line == end_line and start_col > end_col: - return False - - return True \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/relationship_manager.py b/src/code_index_mcp/scip/core/relationship_manager.py deleted file mode 100644 index e16c33f..0000000 --- a/src/code_index_mcp/scip/core/relationship_manager.py +++ /dev/null @@ -1,286 +0,0 @@ -"""SCIP 關係管理器 - 負責將內部關係轉換為標準 SCIP Relationship""" - -import logging -from typing import List, Dict, Optional, Any, Set -from enum import Enum - -from ..proto import scip_pb2 - -logger = logging.getLogger(__name__) - - -class RelationshipType(Enum): - """內部關係類型定義""" - CALLS = "calls" # 函數調用關係 - CALLED_BY = "called_by" # 被調用關係 - INHERITS = "inherits" # 繼承關係 - IMPLEMENTS = "implements" # 實現關係 - REFERENCES = "references" # 引用關係 - TYPE_DEFINITION = "type_definition" # 類型定義關係 - DEFINITION = "definition" # 定義關係 - - -class SCIPRelationshipManager: - """ - SCIP 關係轉換和管理核心 - - 負責將內部關係格式轉換為標準 SCIP Relationship 對象, - 並管理符號間的各種關係類型。 - """ - - def __init__(self): - """初始化關係管理器""" - self.relationship_cache: Dict[str, List[scip_pb2.Relationship]] = {} - self.symbol_relationships: Dict[str, Set[str]] = {} - - logger.debug("SCIPRelationshipManager initialized") - - def create_relationship(self, - target_symbol: str, - relationship_type: RelationshipType) -> scip_pb2.Relationship: - """ - 創建標準 SCIP Relationship 對象 - - Args: - target_symbol: 目標符號的 SCIP 符號 ID - relationship_type: 關係類型 - - Returns: - SCIP Relationship 對象 - """ - relationship = scip_pb2.Relationship() - relationship.symbol = target_symbol - - # 根據關係類型設置相應的布爾標誌 - if relationship_type == RelationshipType.REFERENCES: - relationship.is_reference = True - elif relationship_type == RelationshipType.IMPLEMENTS: - relationship.is_implementation = True - elif relationship_type == RelationshipType.TYPE_DEFINITION: - relationship.is_type_definition = True - elif relationship_type == RelationshipType.DEFINITION: - relationship.is_definition = True - else: - # 對於 CALLS, CALLED_BY, INHERITS 等關係,使用 is_reference - # 這些關係在 SCIP 標準中主要通過 is_reference 表示 - relationship.is_reference = True - - logger.debug(f"Created SCIP relationship: {target_symbol} ({relationship_type.value})") - return relationship - - def add_relationships_to_symbol(self, - symbol_info: scip_pb2.SymbolInformation, - relationships: List[scip_pb2.Relationship]) -> None: - """ - 將關係列表添加到 SCIP 符號信息中 - - Args: - symbol_info: SCIP 符號信息對象 - relationships: 要添加的關係列表 - """ - if not relationships: - return - - # 清除現有關係(如果有的話) - del symbol_info.relationships[:] - - # 添加新關係 - symbol_info.relationships.extend(relationships) - - logger.debug(f"Added {len(relationships)} relationships to symbol {symbol_info.symbol}") - - def convert_call_relationships(self, - call_relationships: Any, - symbol_manager: Any) -> List[scip_pb2.Relationship]: - """ - 將內部 CallRelationships 轉換為 SCIP Relationship 列表 - - Args: - call_relationships: 內部 CallRelationships 對象 - symbol_manager: 符號管理器,用於生成符號 ID - - Returns: - SCIP Relationship 對象列表 - """ - relationships = [] - - # 處理本地調用關係 - if hasattr(call_relationships, 'local') and call_relationships.local: - for function_name in call_relationships.local: - # 嘗試生成目標符號 ID - target_symbol_id = self._generate_local_symbol_id( - function_name, symbol_manager - ) - if target_symbol_id: - relationship = self.create_relationship( - target_symbol_id, RelationshipType.CALLS - ) - relationships.append(relationship) - - # 處理外部調用關係 - if hasattr(call_relationships, 'external') and call_relationships.external: - for call_info in call_relationships.external: - if isinstance(call_info, dict) and 'name' in call_info: - # 為外部調用生成符號 ID - target_symbol_id = self._generate_external_symbol_id( - call_info, symbol_manager - ) - if target_symbol_id: - relationship = self.create_relationship( - target_symbol_id, RelationshipType.CALLS - ) - relationships.append(relationship) - - logger.debug(f"Converted call relationships: {len(relationships)} relationships") - return relationships - - def add_inheritance_relationship(self, - child_symbol_id: str, - parent_symbol_id: str) -> scip_pb2.Relationship: - """ - 添加繼承關係 - - Args: - child_symbol_id: 子類符號 ID - parent_symbol_id: 父類符號 ID - - Returns: - SCIP Relationship 對象 - """ - relationship = self.create_relationship(parent_symbol_id, RelationshipType.INHERITS) - - # 記錄關係到緩存 - if child_symbol_id not in self.symbol_relationships: - self.symbol_relationships[child_symbol_id] = set() - self.symbol_relationships[child_symbol_id].add(parent_symbol_id) - - logger.debug(f"Added inheritance: {child_symbol_id} -> {parent_symbol_id}") - return relationship - - def add_implementation_relationship(self, - implementer_symbol_id: str, - interface_symbol_id: str) -> scip_pb2.Relationship: - """ - 添加實現關係(介面實現) - - Args: - implementer_symbol_id: 實現者符號 ID - interface_symbol_id: 介面符號 ID - - Returns: - SCIP Relationship 對象 - """ - relationship = self.create_relationship(interface_symbol_id, RelationshipType.IMPLEMENTS) - - # 記錄關係到緩存 - if implementer_symbol_id not in self.symbol_relationships: - self.symbol_relationships[implementer_symbol_id] = set() - self.symbol_relationships[implementer_symbol_id].add(interface_symbol_id) - - logger.debug(f"Added implementation: {implementer_symbol_id} -> {interface_symbol_id}") - return relationship - - def get_symbol_relationships(self, symbol_id: str) -> List[scip_pb2.Relationship]: - """ - 獲取符號的所有關係 - - Args: - symbol_id: 符號 ID - - Returns: - 關係列表 - """ - if symbol_id in self.relationship_cache: - return self.relationship_cache[symbol_id] - return [] - - def cache_relationships(self, symbol_id: str, relationships: List[scip_pb2.Relationship]) -> None: - """ - 緩存符號的關係 - - Args: - symbol_id: 符號 ID - relationships: 關係列表 - """ - self.relationship_cache[symbol_id] = relationships - logger.debug(f"Cached {len(relationships)} relationships for {symbol_id}") - - def clear_cache(self) -> None: - """清除關係緩存""" - self.relationship_cache.clear() - self.symbol_relationships.clear() - logger.debug("Relationship cache cleared") - - def get_statistics(self) -> Dict[str, int]: - """ - 獲取關係統計信息 - - Returns: - 統計信息字典 - """ - total_relationships = sum(len(rels) for rels in self.relationship_cache.values()) - return { - 'symbols_with_relationships': len(self.relationship_cache), - 'total_relationships': total_relationships, - 'cached_symbol_connections': len(self.symbol_relationships) - } - - def _generate_local_symbol_id(self, function_name: str, symbol_manager: Any) -> Optional[str]: - """ - 為本地函數生成符號 ID - - Args: - function_name: 函數名稱 - symbol_manager: 符號管理器 - - Returns: - 符號 ID 或 None - """ - try: - if hasattr(symbol_manager, 'create_local_symbol'): - # 假設這是一個本地符號,使用基本路徑 - return symbol_manager.create_local_symbol( - language="unknown", # 將在具體策略中設置正確的語言 - file_path="", # 將在具體策略中設置正確的文件路徑 - symbol_path=[function_name], - descriptor="()." # 函數描述符 - ) - except Exception as e: - logger.warning(f"Failed to generate local symbol ID for {function_name}: {e}") - return None - - def _generate_external_symbol_id(self, call_info: Dict[str, Any], symbol_manager: Any) -> Optional[str]: - """ - 為外部調用生成符號 ID - - Args: - call_info: 外部調用信息 - symbol_manager: 符號管理器 - - Returns: - 符號 ID 或 None - """ - try: - function_name = call_info.get('name', '') - file_path = call_info.get('file', '') - - if function_name and hasattr(symbol_manager, 'create_local_symbol'): - return symbol_manager.create_local_symbol( - language="unknown", # 將在具體策略中設置正確的語言 - file_path=file_path, - symbol_path=[function_name], - descriptor="()." # 函數描述符 - ) - except Exception as e: - logger.warning(f"Failed to generate external symbol ID for {call_info}: {e}") - return None - - -class RelationshipError(Exception): - """關係處理相關錯誤""" - pass - - -class RelationshipConversionError(RelationshipError): - """關係轉換錯誤""" - pass \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/relationship_types.py b/src/code_index_mcp/scip/core/relationship_types.py deleted file mode 100644 index 7088448..0000000 --- a/src/code_index_mcp/scip/core/relationship_types.py +++ /dev/null @@ -1,389 +0,0 @@ -"""SCIP 關係類型定義和映射 - -這個模組定義了內部關係類型到 SCIP 標準關係的映射, -並提供關係驗證和規範化功能。 -""" - -import logging -from typing import Dict, List, Optional, Set, Any -from enum import Enum -from dataclasses import dataclass - -from ..proto import scip_pb2 - -logger = logging.getLogger(__name__) - - -class InternalRelationshipType(Enum): - """內部關係類型定義 - 擴展版本支援更多關係類型""" - - # 函數調用關係 - CALLS = "calls" # A 調用 B - CALLED_BY = "called_by" # A 被 B 調用 - - # 類型關係 - INHERITS = "inherits" # A 繼承 B - INHERITED_BY = "inherited_by" # A 被 B 繼承 - IMPLEMENTS = "implements" # A 實現 B (介面) - IMPLEMENTED_BY = "implemented_by" # A 被 B 實現 - - # 定義和引用關係 - DEFINES = "defines" # A 定義 B - DEFINED_BY = "defined_by" # A 被 B 定義 - REFERENCES = "references" # A 引用 B - REFERENCED_BY = "referenced_by" # A 被 B 引用 - - # 類型相關關係 - TYPE_OF = "type_of" # A 是 B 的類型 - HAS_TYPE = "has_type" # A 有類型 B - - # 模組和包關係 - IMPORTS = "imports" # A 導入 B - IMPORTED_BY = "imported_by" # A 被 B 導入 - EXPORTS = "exports" # A 導出 B - EXPORTED_BY = "exported_by" # A 被 B 導出 - - # 組合關係 - CONTAINS = "contains" # A 包含 B (類包含方法) - CONTAINED_BY = "contained_by" # A 被 B 包含 - - # 重寫關係 - OVERRIDES = "overrides" # A 重寫 B - OVERRIDDEN_BY = "overridden_by" # A 被 B 重寫 - - -@dataclass -class RelationshipMapping: - """關係映射配置""" - scip_is_reference: bool = False - scip_is_implementation: bool = False - scip_is_type_definition: bool = False - scip_is_definition: bool = False - description: str = "" - - -class SCIPRelationshipMapper: - """ - SCIP 關係映射器 - - 負責將內部關係類型映射到標準 SCIP Relationship 格式, - 並提供關係驗證和查詢功能。 - """ - - # 內部關係類型到 SCIP 標準的映射表 - RELATIONSHIP_MAPPINGS: Dict[InternalRelationshipType, RelationshipMapping] = { - # 函數調用關係 - 使用 is_reference - InternalRelationshipType.CALLS: RelationshipMapping( - scip_is_reference=True, - description="Function call relationship" - ), - InternalRelationshipType.CALLED_BY: RelationshipMapping( - scip_is_reference=True, - description="Reverse function call relationship" - ), - - # 繼承關係 - 使用 is_reference - InternalRelationshipType.INHERITS: RelationshipMapping( - scip_is_reference=True, - description="Class inheritance relationship" - ), - InternalRelationshipType.INHERITED_BY: RelationshipMapping( - scip_is_reference=True, - description="Reverse inheritance relationship" - ), - - # 實現關係 - 使用 is_implementation - InternalRelationshipType.IMPLEMENTS: RelationshipMapping( - scip_is_implementation=True, - description="Interface implementation relationship" - ), - InternalRelationshipType.IMPLEMENTED_BY: RelationshipMapping( - scip_is_implementation=True, - description="Reverse implementation relationship" - ), - - # 定義關係 - 使用 is_definition - InternalRelationshipType.DEFINES: RelationshipMapping( - scip_is_definition=True, - description="Symbol definition relationship" - ), - InternalRelationshipType.DEFINED_BY: RelationshipMapping( - scip_is_definition=True, - description="Reverse definition relationship" - ), - - # 引用關係 - 使用 is_reference - InternalRelationshipType.REFERENCES: RelationshipMapping( - scip_is_reference=True, - description="Symbol reference relationship" - ), - InternalRelationshipType.REFERENCED_BY: RelationshipMapping( - scip_is_reference=True, - description="Reverse reference relationship" - ), - - # 類型關係 - 使用 is_type_definition - InternalRelationshipType.TYPE_OF: RelationshipMapping( - scip_is_type_definition=True, - description="Type definition relationship" - ), - InternalRelationshipType.HAS_TYPE: RelationshipMapping( - scip_is_type_definition=True, - description="Has type relationship" - ), - - # 導入/導出關係 - 使用 is_reference - InternalRelationshipType.IMPORTS: RelationshipMapping( - scip_is_reference=True, - description="Module import relationship" - ), - InternalRelationshipType.IMPORTED_BY: RelationshipMapping( - scip_is_reference=True, - description="Reverse import relationship" - ), - InternalRelationshipType.EXPORTS: RelationshipMapping( - scip_is_reference=True, - description="Module export relationship" - ), - InternalRelationshipType.EXPORTED_BY: RelationshipMapping( - scip_is_reference=True, - description="Reverse export relationship" - ), - - # 包含關係 - 使用 is_reference - InternalRelationshipType.CONTAINS: RelationshipMapping( - scip_is_reference=True, - description="Containment relationship" - ), - InternalRelationshipType.CONTAINED_BY: RelationshipMapping( - scip_is_reference=True, - description="Reverse containment relationship" - ), - - # 重寫關係 - 使用 is_implementation - InternalRelationshipType.OVERRIDES: RelationshipMapping( - scip_is_implementation=True, - description="Method override relationship" - ), - InternalRelationshipType.OVERRIDDEN_BY: RelationshipMapping( - scip_is_implementation=True, - description="Reverse override relationship" - ), - } - - def __init__(self): - """初始化關係映射器""" - self.custom_mappings: Dict[str, RelationshipMapping] = {} - logger.debug("SCIPRelationshipMapper initialized") - - def map_to_scip_relationship(self, - target_symbol: str, - relationship_type: InternalRelationshipType) -> scip_pb2.Relationship: - """ - 將內部關係類型映射為 SCIP Relationship 對象 - - Args: - target_symbol: 目標符號 ID - relationship_type: 內部關係類型 - - Returns: - SCIP Relationship 對象 - - Raises: - ValueError: 如果關係類型不支援 - """ - if relationship_type not in self.RELATIONSHIP_MAPPINGS: - raise ValueError(f"Unsupported relationship type: {relationship_type}") - - mapping = self.RELATIONSHIP_MAPPINGS[relationship_type] - - relationship = scip_pb2.Relationship() - relationship.symbol = target_symbol - relationship.is_reference = mapping.scip_is_reference - relationship.is_implementation = mapping.scip_is_implementation - relationship.is_type_definition = mapping.scip_is_type_definition - relationship.is_definition = mapping.scip_is_definition - - logger.debug(f"Mapped {relationship_type.value} -> SCIP relationship for {target_symbol}") - return relationship - - def batch_map_relationships(self, - relationships: List[tuple]) -> List[scip_pb2.Relationship]: - """ - 批量映射關係 - - Args: - relationships: (target_symbol, relationship_type) 元組列表 - - Returns: - SCIP Relationship 對象列表 - """ - scip_relationships = [] - - for target_symbol, relationship_type in relationships: - try: - scip_rel = self.map_to_scip_relationship(target_symbol, relationship_type) - scip_relationships.append(scip_rel) - except ValueError as e: - logger.warning(f"Failed to map relationship: {e}") - continue - - logger.debug(f"Batch mapped {len(scip_relationships)} relationships") - return scip_relationships - - def validate_relationship_type(self, relationship_type: str) -> bool: - """ - 驗證關係類型是否支援 - - Args: - relationship_type: 關係類型字符串 - - Returns: - 是否支援 - """ - try: - InternalRelationshipType(relationship_type) - return True - except ValueError: - return relationship_type in self.custom_mappings - - def get_supported_relationship_types(self) -> List[str]: - """ - 獲取所有支援的關係類型 - - Returns: - 關係類型字符串列表 - """ - builtin_types = [rt.value for rt in InternalRelationshipType] - custom_types = list(self.custom_mappings.keys()) - return builtin_types + custom_types - - def get_relationship_description(self, relationship_type: InternalRelationshipType) -> str: - """ - 獲取關係類型的描述 - - Args: - relationship_type: 關係類型 - - Returns: - 描述字符串 - """ - mapping = self.RELATIONSHIP_MAPPINGS.get(relationship_type) - return mapping.description if mapping else "Unknown relationship" - - def add_custom_mapping(self, - relationship_type: str, - mapping: RelationshipMapping) -> None: - """ - 添加自定義關係映射 - - Args: - relationship_type: 自定義關係類型名稱 - mapping: 關係映射配置 - """ - self.custom_mappings[relationship_type] = mapping - logger.debug(f"Added custom relationship mapping: {relationship_type}") - - def get_reverse_relationship(self, relationship_type: InternalRelationshipType) -> Optional[InternalRelationshipType]: - """ - 獲取關係的反向關係 - - Args: - relationship_type: 關係類型 - - Returns: - 反向關係類型或 None - """ - reverse_mappings = { - InternalRelationshipType.CALLS: InternalRelationshipType.CALLED_BY, - InternalRelationshipType.CALLED_BY: InternalRelationshipType.CALLS, - InternalRelationshipType.INHERITS: InternalRelationshipType.INHERITED_BY, - InternalRelationshipType.INHERITED_BY: InternalRelationshipType.INHERITS, - InternalRelationshipType.IMPLEMENTS: InternalRelationshipType.IMPLEMENTED_BY, - InternalRelationshipType.IMPLEMENTED_BY: InternalRelationshipType.IMPLEMENTS, - InternalRelationshipType.DEFINES: InternalRelationshipType.DEFINED_BY, - InternalRelationshipType.DEFINED_BY: InternalRelationshipType.DEFINES, - InternalRelationshipType.REFERENCES: InternalRelationshipType.REFERENCED_BY, - InternalRelationshipType.REFERENCED_BY: InternalRelationshipType.REFERENCES, - InternalRelationshipType.TYPE_OF: InternalRelationshipType.HAS_TYPE, - InternalRelationshipType.HAS_TYPE: InternalRelationshipType.TYPE_OF, - InternalRelationshipType.IMPORTS: InternalRelationshipType.IMPORTED_BY, - InternalRelationshipType.IMPORTED_BY: InternalRelationshipType.IMPORTS, - InternalRelationshipType.EXPORTS: InternalRelationshipType.EXPORTED_BY, - InternalRelationshipType.EXPORTED_BY: InternalRelationshipType.EXPORTS, - InternalRelationshipType.CONTAINS: InternalRelationshipType.CONTAINED_BY, - InternalRelationshipType.CONTAINED_BY: InternalRelationshipType.CONTAINS, - InternalRelationshipType.OVERRIDES: InternalRelationshipType.OVERRIDDEN_BY, - InternalRelationshipType.OVERRIDDEN_BY: InternalRelationshipType.OVERRIDES, - } - - return reverse_mappings.get(relationship_type) - - def is_directional_relationship(self, relationship_type: InternalRelationshipType) -> bool: - """ - 檢查關係是否是有向的 - - Args: - relationship_type: 關係類型 - - Returns: - 是否有向 - """ - # 大多數關係都是有向的 - non_directional = { - # 可以在這裡添加非有向關係類型 - } - return relationship_type not in non_directional - - def group_relationships_by_type(self, - relationships: List[scip_pb2.Relationship]) -> Dict[str, List[scip_pb2.Relationship]]: - """ - 按關係的 SCIP 標誌分組 - - Args: - relationships: SCIP 關係列表 - - Returns: - 按類型分組的關係字典 - """ - groups = { - 'references': [], - 'implementations': [], - 'type_definitions': [], - 'definitions': [] - } - - for rel in relationships: - if rel.is_reference: - groups['references'].append(rel) - if rel.is_implementation: - groups['implementations'].append(rel) - if rel.is_type_definition: - groups['type_definitions'].append(rel) - if rel.is_definition: - groups['definitions'].append(rel) - - return groups - - def get_statistics(self) -> Dict[str, Any]: - """ - 獲取映射器統計信息 - - Returns: - 統計信息字典 - """ - return { - 'builtin_relationship_types': len(InternalRelationshipType), - 'custom_relationship_types': len(self.custom_mappings), - 'total_supported_types': len(InternalRelationshipType) + len(self.custom_mappings) - } - - -class RelationshipTypeError(Exception): - """關係類型相關錯誤""" - pass - - -class UnsupportedRelationshipError(RelationshipTypeError): - """不支援的關係類型錯誤""" - pass \ No newline at end of file diff --git a/src/code_index_mcp/scip/core/symbol_manager.py b/src/code_index_mcp/scip/core/symbol_manager.py deleted file mode 100644 index 73a2e99..0000000 --- a/src/code_index_mcp/scip/core/symbol_manager.py +++ /dev/null @@ -1,323 +0,0 @@ -"""SCIP Symbol Manager - Standard-compliant symbol ID generation with moniker support.""" - -import os -import logging -from typing import List, Optional, Dict, Any -from pathlib import Path -from dataclasses import dataclass - -from .moniker_manager import MonikerManager, PackageInfo - - -logger = logging.getLogger(__name__) - - -@dataclass -class SCIPSymbolInfo: - """Information about a SCIP symbol.""" - scheme: str # scip-python, scip-javascript, etc. - manager: str # local, pypi, npm, maven, etc. - package: str # package/project name - version: str # version (for external packages) - descriptors: str # symbol path with descriptors - - -class SCIPSymbolManager: - """ - Standard SCIP Symbol Manager for local projects with cross-repository support. - - Generates symbol IDs that comply with SCIP specification: - Format: {scheme} {manager} {package} {version} {descriptors} - - For local projects: - - scheme: scip-{language} - - manager: local - - package: project name - - version: empty (local projects don't have versions) - - descriptors: file_path/symbol_path{descriptor} - - For external packages: - - scheme: scip-{language} - - manager: npm, pip, maven, etc. - - package: external package name - - version: package version - - descriptors: module_path/symbol_path{descriptor} - """ - - def __init__(self, project_path: str, project_name: Optional[str] = None): - """ - Initialize symbol manager for a project. - - Args: - project_path: Absolute path to project root - project_name: Project name (defaults to directory name) - """ - self.project_path = Path(project_path).resolve() - self.project_name = project_name or self.project_path.name - - # Normalize project name for SCIP (replace invalid characters) - self.project_name = self._normalize_package_name(self.project_name) - - # Initialize moniker manager for cross-repository support - self.moniker_manager = MonikerManager(str(self.project_path), self.project_name) - - logger.debug(f"SCIPSymbolManager initialized for project: {self.project_name}") - - def create_local_symbol(self, - language: str, - file_path: str, - symbol_path: List[str], - descriptor: str = "") -> str: - """ - Create a local symbol ID following SCIP standard. - - Args: - language: Programming language (python, javascript, java, etc.) - file_path: File path relative to project root - symbol_path: List of symbol components (module, class, function, etc.) - descriptor: SCIP descriptor ((), #, ., etc.) - - Returns: - Standard SCIP symbol ID - - Example: - create_local_symbol("python", "src/main.py", ["MyClass", "method"], "()") - -> "scip-python local myproject src/main.py/MyClass#method()." - """ - # Normalize inputs - scheme = f"scip-{language.lower()}" - manager = "local" - package = self.project_name - version = "" # Local projects don't have versions - - # Build descriptors path - normalized_file_path = self._normalize_file_path(file_path) - symbol_components = symbol_path.copy() - - if symbol_components: - # Last component gets the descriptor - last_symbol = symbol_components[-1] + descriptor - symbol_components[-1] = last_symbol - - descriptors = f"{normalized_file_path}/{'/'.join(symbol_components)}" - else: - descriptors = normalized_file_path - - # Build final symbol ID - parts = [scheme, manager, package] - if version: - parts.append(version) - parts.append(descriptors) - - symbol_id = " ".join(parts) - - logger.debug(f"Created local symbol: {symbol_id}") - return symbol_id - - def create_builtin_symbol(self, language: str, builtin_name: str) -> str: - """ - Create a symbol ID for built-in language constructs. - - Args: - language: Programming language - builtin_name: Name of built-in (str, int, Object, etc.) - - Returns: - SCIP symbol ID for built-in - """ - scheme = f"scip-{language.lower()}" - manager = "builtin" - package = language.lower() - descriptors = builtin_name - - return f"{scheme} {manager} {package} {descriptors}" - - def create_stdlib_symbol(self, - language: str, - module_name: str, - symbol_name: str, - descriptor: str = "") -> str: - """ - Create a symbol ID for standard library symbols. - - Args: - language: Programming language - module_name: Standard library module name - symbol_name: Symbol name within module - descriptor: SCIP descriptor - - Returns: - SCIP symbol ID for standard library symbol - """ - scheme = f"scip-{language.lower()}" - manager = "stdlib" - package = language.lower() - descriptors = f"{module_name}/{symbol_name}{descriptor}" - - return f"{scheme} {manager} {package} {descriptors}" - - def create_external_symbol(self, - language: str, - package_name: str, - module_path: str, - symbol_name: str, - descriptor: str = "", - version: Optional[str] = None, - alias: Optional[str] = None) -> str: - """ - Create a symbol ID for external package symbols using moniker manager. - - Args: - language: Programming language - package_name: External package name - module_path: Module path within package - symbol_name: Symbol name - descriptor: SCIP descriptor - version: Package version - alias: Local alias for the symbol - - Returns: - SCIP symbol ID for external symbol - """ - return self.moniker_manager.register_import( - package_name=package_name, - symbol_name=symbol_name, - module_path=module_path, - alias=alias, - version=version - ) - - def register_export(self, - symbol_name: str, - symbol_kind: str, - file_path: str, - is_default: bool = False) -> str: - """ - Register a symbol as exportable from this project. - - Args: - symbol_name: Name of the exported symbol - symbol_kind: Kind of symbol (function, class, etc.) - file_path: File path where symbol is defined - is_default: Whether this is a default export - - Returns: - SCIP symbol ID for the exported symbol - """ - normalized_file_path = self._normalize_file_path(file_path) - return self.moniker_manager.register_export( - symbol_name=symbol_name, - symbol_kind=symbol_kind, - module_path=normalized_file_path, - is_default=is_default - ) - - def resolve_import_reference(self, symbol_name: str, context_file: str) -> Optional[str]: - """ - Resolve a symbol reference to an imported external symbol. - - Args: - symbol_name: Name of the symbol being referenced - context_file: File where the reference occurs - - Returns: - SCIP symbol ID if resolved to external import, None otherwise - """ - return self.moniker_manager.resolve_import_reference(symbol_name, context_file) - - def get_external_symbols(self): - """Get external symbol information for the index.""" - return self.moniker_manager.get_external_symbol_information() - - def get_dependencies(self) -> Dict[str, PackageInfo]: - """Get information about detected external dependencies.""" - return self.moniker_manager.get_dependency_info() - - def parse_symbol(self, symbol_id: str) -> Optional[SCIPSymbolInfo]: - """ - Parse a SCIP symbol ID into components. - - Args: - symbol_id: SCIP symbol ID to parse - - Returns: - SCIPSymbolInfo object or None if parsing fails - """ - try: - parts = symbol_id.split(" ", 4) - if len(parts) < 4: - return None - - scheme = parts[0] - manager = parts[1] - package = parts[2] - - # Handle version (optional) - if len(parts) == 5: - version = parts[3] - descriptors = parts[4] - else: - version = "" - descriptors = parts[3] - - return SCIPSymbolInfo( - scheme=scheme, - manager=manager, - package=package, - version=version, - descriptors=descriptors - ) - - except Exception as e: - logger.warning(f"Failed to parse symbol ID '{symbol_id}': {e}") - return None - - def get_file_path_from_symbol(self, symbol_id: str) -> Optional[str]: - """ - Extract file path from a local symbol ID. - - Args: - symbol_id: SCIP symbol ID - - Returns: - File path or None if not a local symbol - """ - symbol_info = self.parse_symbol(symbol_id) - if not symbol_info or symbol_info.manager != "local": - return None - - # Extract file path from descriptors (before first '/') - descriptors = symbol_info.descriptors - if "/" in descriptors: - return descriptors.split("/", 1)[0] - - return descriptors - - def _normalize_package_name(self, name: str) -> str: - """Normalize package name for SCIP compatibility.""" - # Replace invalid characters with underscores - import re - normalized = re.sub(r'[^a-zA-Z0-9_-]', '_', name) - - # Ensure it starts with a letter or underscore - if normalized and not normalized[0].isalpha() and normalized[0] != '_': - normalized = f"_{normalized}" - - return normalized.lower() - - def _normalize_file_path(self, file_path: str) -> str: - """Normalize file path for SCIP descriptors.""" - # Convert to forward slashes and remove leading slash - normalized = file_path.replace('\\', '/') - if normalized.startswith('/'): - normalized = normalized[1:] - - return normalized - - def get_project_info(self) -> Dict[str, Any]: - """Get project information.""" - return { - 'project_path': str(self.project_path), - 'project_name': self.project_name, - 'normalized_name': self.project_name - } \ No newline at end of file diff --git a/src/code_index_mcp/scip/factory.py b/src/code_index_mcp/scip/factory.py deleted file mode 100644 index 1620d8b..0000000 --- a/src/code_index_mcp/scip/factory.py +++ /dev/null @@ -1,200 +0,0 @@ -"""SCIP Indexer Factory - manages and selects appropriate indexing strategies.""" - -import logging -from typing import List, Dict, Set, Optional -from .strategies.base_strategy import SCIPIndexerStrategy, StrategyError -from .strategies.python_strategy import PythonStrategy -from .strategies.javascript_strategy import JavaScriptStrategy -from .strategies.java_strategy import JavaStrategy -from .strategies.objective_c_strategy import ObjectiveCStrategy -# Optional strategies - import only if available -try: - from .strategies.zig_strategy import ZigStrategy - ZIG_AVAILABLE = True -except ImportError: - ZigStrategy = None - ZIG_AVAILABLE = False -from .strategies.fallback_strategy import FallbackStrategy -from ..constants import SUPPORTED_EXTENSIONS - - -logger = logging.getLogger(__name__) - - -class SCIPIndexerFactory: - """Factory for creating and managing SCIP indexing strategies.""" - - def __init__(self): - """Initialize the factory with all available strategies.""" - self.strategies: List[SCIPIndexerStrategy] = [] - self.strategy_cache: Dict[str, SCIPIndexerStrategy] = {} - self._register_all_strategies() - self._validate_coverage() - - def _register_all_strategies(self): - """Register all available strategies in priority order.""" - logger.info("Registering SCIP indexing strategies (SCIP compliant)...") - - # Language-specific strategies (high priority: 95) - strategy_classes = [ - (PythonStrategy, 95), - (JavaScriptStrategy, 95), - (JavaStrategy, 95), - (ObjectiveCStrategy, 95), - ] - - # Add optional strategies if available - if ZIG_AVAILABLE and ZigStrategy: - strategy_classes.append((ZigStrategy, 95)) - - for strategy_class, priority in strategy_classes: - try: - strategy = strategy_class(priority=priority) - if strategy.is_available(): - self.register_strategy(strategy) - logger.debug(f"Registered {strategy.get_strategy_name()}") - else: - logger.warning(f"Strategy {strategy_class.__name__} is not available") - except Exception as e: - logger.warning(f"Failed to initialize {strategy_class.__name__}: {e}") - continue - - # Fallback strategy (lowest priority: 10) - fallback = FallbackStrategy(priority=10) - self.register_strategy(fallback) - logger.debug(f"Registered {fallback.get_strategy_name()}") - - logger.info(f"Registered {len(self.strategies)} strategies") - - def register_strategy(self, strategy: SCIPIndexerStrategy): - """ - Register a new strategy. - - Args: - strategy: The strategy to register - """ - self.strategies.append(strategy) - # Sort strategies by priority (highest first) - self.strategies.sort(key=lambda s: s.get_priority(), reverse=True) - - def get_strategy(self, extension: str, file_path: str = "") -> SCIPIndexerStrategy: - """ - Get the best strategy for a file type. - - Args: - extension: File extension (e.g., '.py') - file_path: Optional full file path for context - - Returns: - Best available strategy for the file type - - Raises: - StrategySelectionError: If no suitable strategy is found - """ - # Check cache first - cache_key = f"{extension}:{file_path}" - if cache_key in self.strategy_cache: - return self.strategy_cache[cache_key] - - # Find the highest priority strategy that can handle this file - for strategy in self.strategies: - if strategy.can_handle(extension, file_path): - self.strategy_cache[cache_key] = strategy - return strategy - - # No strategy found - raise StrategySelectionError(f"No strategy available for extension '{extension}'") - - def get_strategies_for_extension(self, extension: str) -> List[SCIPIndexerStrategy]: - """ - Get all strategies that can handle a file extension. - - Args: - extension: File extension to check - - Returns: - List of strategies, ordered by priority - """ - return [s for s in self.strategies if s.can_handle(extension, "")] - - def list_supported_extensions(self) -> Set[str]: - """ - Get all file extensions supported by registered strategies. - - Returns: - Set of supported file extensions - """ - supported = set() - - # Add extensions from all registered strategies - for strategy in self.strategies: - if isinstance(strategy, PythonStrategy): - supported.update({'.py', '.pyw'}) - elif isinstance(strategy, JavaScriptStrategy): - supported.update({'.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs'}) - elif isinstance(strategy, JavaStrategy): - supported.update({'.java'}) - elif isinstance(strategy, ObjectiveCStrategy): - supported.update({'.m', '.mm'}) - elif ZIG_AVAILABLE and isinstance(strategy, ZigStrategy): - supported.update({'.zig', '.zon'}) - elif isinstance(strategy, FallbackStrategy): - # Fallback supports everything, but we don't want to list everything here - pass - - return supported - - def group_files_by_strategy(self, file_paths: List[str]) -> Dict[SCIPIndexerStrategy, List[str]]: - """ - Group files by the strategy that should handle them. - - Args: - file_paths: List of file paths to group - - Returns: - Dictionary mapping strategies to their file lists - """ - strategy_files = {} - - for file_path in file_paths: - # Get file extension - extension = self._get_file_extension(file_path) - - try: - strategy = self.get_strategy(extension, file_path) - if strategy not in strategy_files: - strategy_files[strategy] = [] - strategy_files[strategy].append(file_path) - except StrategySelectionError: - # Skip files we can't handle - logger.debug(f"No strategy available for file: {file_path}") - continue - - return strategy_files - - def _get_file_extension(self, file_path: str) -> str: - """Extract file extension from path.""" - if '.' not in file_path: - return '' - return '.' + file_path.split('.')[-1].lower() - - def _validate_coverage(self): - """Validate that we have reasonable coverage of supported file types.""" - if not self.strategies: - logger.warning("No SCIP strategies registered - indexing will not work") - return - - logger.info(f"SCIP factory initialized with {len(self.strategies)} strategies") - - -# Exception classes -class SCIPIndexingError(Exception): - """Base exception for SCIP indexing errors.""" - - -class StrategySelectionError(SCIPIndexingError): - """Raised when no suitable strategy can be found for a file.""" - - -class IndexingFailedError(SCIPIndexingError): - """Raised when indexing fails for a file or project.""" \ No newline at end of file diff --git a/src/code_index_mcp/scip/proto/__init__.py b/src/code_index_mcp/scip/proto/__init__.py deleted file mode 100644 index 479e6fc..0000000 --- a/src/code_index_mcp/scip/proto/__init__.py +++ /dev/null @@ -1 +0,0 @@ -"""SCIP Protocol Buffer definitions and utilities.""" \ No newline at end of file diff --git a/src/code_index_mcp/scip/proto/scip.proto b/src/code_index_mcp/scip/proto/scip.proto deleted file mode 100644 index 9519306..0000000 --- a/src/code_index_mcp/scip/proto/scip.proto +++ /dev/null @@ -1,265 +0,0 @@ -// SCIP (Source Code Intelligence Protocol) schema definition. -// This is a direct copy from: https://github.com/sourcegraph/scip/blob/main/scip.proto - -syntax = "proto3"; - -package scip; - -option go_package = "github.com/sourcegraph/scip/bindings/go/scip/v1"; -option java_package = "com.sourcegraph.scip_java"; - -// An Index message payload represents a complete SCIP index for a workspace -// rooted at a single directory. An Index payload may have a large memory -// footprint and it's recommended to emit and consume an Index payload one -// field value at a time. To permit such streaming usage, the `metadata` and -// `documents` fields should preferably come first and each `documents` field -// should be emitted as a separate message. -// -// To reduce the memory footprint of Index messages, all Symbol values that -// are referenced from `documents` should be de-duplicated and stored in the -// `external_symbols` field. When consuming Index messages, the client should -// construct a symbol table from these `external_symbols` to correctly resolve -// Symbol references that appear in `documents`. -message Index { - Metadata metadata = 1; - repeated Document documents = 2; - repeated SymbolInformation external_symbols = 3; -} - -// ProtocolVersion specifies the protocol version that should be used to -// interpret this SCIP index. Different versions of the protocol may not -// be backwards compatible with each other. -enum ProtocolVersion { - UnspecifiedProtocolVersion = 0; -} - -// Metadata contains information about the producer of the SCIP index. -message Metadata { - ProtocolVersion version = 1; - ToolInfo tool_info = 2; - string project_root = 3; - TextDocumentEncoding text_document_encoding = 4; -} - -enum TextDocumentEncoding { - UnspecifiedTextDocumentEncoding = 0; - // Use UTF-8 encoding where a 'character' corresponds to a Unicode scalar - // value and a 'character offset' corresponds to a byte offset in the - // underlying byte array. - UTF8 = 1; - // Use UTF-16 encoding where a 'character' corresponds to a Unicode code unit - // (which may be a high or low surrogate), and a 'character offset' - // corresponds to the Unicode code unit offset in the underlying byte array. - UTF16 = 2; - // Use UTF-32 encoding where a 'character' corresponds to a Unicode scalar - // value and a 'character offset' corresponds to a byte offset in the - // underlying byte array. - UTF32 = 3; -} - -// Information about the tool that produced the SCIP index. -message ToolInfo { - string name = 1; - string version = 2; - repeated string arguments = 3; -} - -// A Document represents the metadata about one source file on disk. -message Document { - string relative_path = 1; - string language = 2; - repeated Occurrence occurrences = 3; - repeated SymbolInformation symbols = 4; - // Optional: the text contents of this document. - string text = 5; - // Used to indicate the encoding used for the text. Should be UTF-8 - // if unspecified, to be compatible with editors and the JVM ecosystem. - PositionEncoding position_encoding = 6; -} - -enum PositionEncoding { - UnspecifiedPositionEncoding = 0; - // The position encoding where columns are measured in UTF-8 byte - // offsets. This is the default encoding if unspecified. - UTF8Bytes = 1; - // The position encoding where columns are measured in UTF-16 code - // units. This encoding is supported by the Language Server Protocol - // and is part of many Microsoft/web ecosystems. - UTF16CodeUnits = 2; - // The position encoding where columns are measured in UTF-32 Unicode - // scalar values (also known as Unicode codepoints). This encoding is - // supported by some text editors like Emacs and the Neovim ecosystem. - UTF32CodeUnits = 3; -} - -// An Occurrence associates source positions with symbols. -message Occurrence { - Range range = 1; - string symbol = 2; - int32 symbol_roles = 3; - SyntaxKind syntax_kind = 4; - repeated Diagnostic diagnostics = 5; - repeated string enclosing_range = 6; -} - -enum SyntaxKind { - UnspecifiedSyntaxKind = 0; - Comment = 1; - PunctuationDelimiter = 2; - PunctuationBracket = 3; - Keyword = 4; - // IdentifierKeyword corresponds to identifiers that are treated as keywords. - // This is needed for languages such as Go where built-in functions like - // `println` are identifiers but have special meaning. - IdentifierKeyword = 5; - IdentifierOperator = 6; - Identifier = 7; - IdentifierBuiltin = 8; - IdentifierNull = 9; - IdentifierConstant = 10; - IdentifierMutableGlobal = 11; - IdentifierParameter = 12; - IdentifierLocal = 13; - IdentifierShadowed = 14; - IdentifierNamespace = 15; - IdentifierFunction = 16; - IdentifierFunctionDefinition = 17; - IdentifierMacro = 18; - IdentifierMacroDefinition = 19; - IdentifierType = 20; - IdentifierBuiltinType = 21; - IdentifierAttribute = 22; - RegexEscape = 23; - RegexRepeated = 24; - RegexWildcard = 25; - RegexDelimiter = 26; - RegexJoin = 27; - StringLiteral = 28; - StringLiteralEscape = 29; - StringLiteralSpecial = 30; - StringLiteralKey = 31; - CharacterLiteral = 32; - NumericLiteral = 33; - BooleanLiteral = 34; - Tag = 35; - TagAttribute = 36; - TagDelimiter = 37; -} - -// A Range represents source positions. -message Range { - repeated int32 start = 1; - repeated int32 end = 2; -} - -// A Diagnostic is a message associated with source positions. -message Diagnostic { - Severity severity = 1; - string code = 2; - string message = 3; - string source = 4; - repeated DiagnosticTag tags = 5; -} - -enum Severity { - UnspecifiedSeverity = 0; - Error = 1; - Warning = 2; - Information = 3; - Hint = 4; -} - -enum DiagnosticTag { - UnspecifiedDiagnosticTag = 0; - Unnecessary = 1; - Deprecated = 2; -} - -// SymbolInformation provides rich metadata about symbols in the index. -message SymbolInformation { - string symbol = 1; - repeated string documentation = 2; - repeated Relationship relationships = 3; - SymbolKind kind = 4; - string display_name = 5; - string signature_documentation = 6; - repeated string enclosing_symbol = 7; -} - -enum SymbolKind { - UnspecifiedSymbolKind = 0; - Array = 1; - Boolean = 2; - Class = 3; - Constant = 4; - Constructor = 5; - Enum = 6; - EnumMember = 7; - Event = 8; - Field = 9; - File = 10; - Function = 11; - Interface = 12; - Key = 13; - Method = 14; - Module = 15; - Namespace = 16; - Null = 17; - Number = 18; - Object = 19; - Operator = 20; - Package = 21; - Parameter = 22; - Property = 23; - String = 24; - Struct = 25; - TypeParameter = 26; - Unit = 27; - Value = 28; - Variable = 29; - // Language-specific symbol kinds. Use the `display_name` field to give - // the symbol a generic name. - AssociatedType = 30; - SelfParameter = 31; - UnknownKind = 32; - Trait = 33; - Union = 34; - Macro = 35; -} - -// Represents a relationship between symbols. -message Relationship { - string symbol = 1; - bool is_reference = 2; - bool is_implementation = 3; - bool is_type_definition = 4; - bool is_definition = 5; -} - -// Symbol roles encode the relationship a symbol has to its containing document. -// A symbol can have multiple roles. For example, a function that is being defined -// can have both the `definition` role and the `import` role if it's imported from -// another package. -enum SymbolRole { - UnspecifiedSymbolRole = 0; - // Is the symbol defined here? If yes, this is considered a symbol definition. - Definition = 1; - // Is the symbol imported here? For example, the symbol `fmt` is imported in - // the Go code `import "fmt"`. - Import = 2; - // Is the symbol written here? For example, the symbol `variable` is written - // in the Go code `variable := value`. - Write = 4; - // Is the symbol read here? This is the default role for a symbol that is - // being referenced. - Read = 8; - // Is the symbol generated here? For example, in the Go code `type Foo struct { Name string }`, - // the symbol `Name` has the role `Generated | Read` for the getter function `func (x Foo) Name() string`. - Generated = 16; - // Is the symbol tested here? For example, in the Go code `func TestSomething(t *testing.T) { t.Errorf("got %s") }`, - // the symbols `TestSomething` and `t.Errorf` have the role `Test`. - Test = 32; - // Is the symbol for a type reference? For example, in the Go code `var x []User`, - // the symbol `User` has the role `Type | Read`. - Type = 64; -} \ No newline at end of file diff --git a/src/code_index_mcp/scip/proto/scip_pb2.py b/src/code_index_mcp/scip/proto/scip_pb2.py deleted file mode 100644 index 06f63d9..0000000 --- a/src/code_index_mcp/scip/proto/scip_pb2.py +++ /dev/null @@ -1,69 +0,0 @@ -# -*- coding: utf-8 -*- -# Generated by the protocol buffer compiler. DO NOT EDIT! -# NO CHECKED-IN PROTOBUF GENCODE -# source: code_index_mcp/scip/proto/scip.proto -# Protobuf Python Version: 6.31.1 -"""Generated protocol buffer code.""" -from google.protobuf import descriptor as _descriptor -from google.protobuf import descriptor_pool as _descriptor_pool -from google.protobuf import runtime_version as _runtime_version -from google.protobuf import symbol_database as _symbol_database -from google.protobuf.internal import builder as _builder -_runtime_version.ValidateProtobufRuntimeVersion( - _runtime_version.Domain.PUBLIC, - 6, - 31, - 1, - '', - 'code_index_mcp/scip/proto/scip.proto' -) -# @@protoc_insertion_point(imports) - -_sym_db = _symbol_database.Default() - - - - -DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n$code_index_mcp/scip/proto/scip.proto\x12\x04scip\"\x7f\n\x05Index\x12 \n\x08metadata\x18\x01 \x01(\x0b\x32\x0e.scip.Metadata\x12!\n\tdocuments\x18\x02 \x03(\x0b\x32\x0e.scip.Document\x12\x31\n\x10\x65xternal_symbols\x18\x03 \x03(\x0b\x32\x17.scip.SymbolInformation\"\xa7\x01\n\x08Metadata\x12&\n\x07version\x18\x01 \x01(\x0e\x32\x15.scip.ProtocolVersion\x12!\n\ttool_info\x18\x02 \x01(\x0b\x32\x0e.scip.ToolInfo\x12\x14\n\x0cproject_root\x18\x03 \x01(\t\x12:\n\x16text_document_encoding\x18\x04 \x01(\x0e\x32\x1a.scip.TextDocumentEncoding\"<\n\x08ToolInfo\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0f\n\x07version\x18\x02 \x01(\t\x12\x11\n\targuments\x18\x03 \x03(\t\"\xc5\x01\n\x08\x44ocument\x12\x15\n\rrelative_path\x18\x01 \x01(\t\x12\x10\n\x08language\x18\x02 \x01(\t\x12%\n\x0boccurrences\x18\x03 \x03(\x0b\x32\x10.scip.Occurrence\x12(\n\x07symbols\x18\x04 \x03(\x0b\x32\x17.scip.SymbolInformation\x12\x0c\n\x04text\x18\x05 \x01(\t\x12\x31\n\x11position_encoding\x18\x06 \x01(\x0e\x32\x16.scip.PositionEncoding\"\xb5\x01\n\nOccurrence\x12\x1a\n\x05range\x18\x01 \x01(\x0b\x32\x0b.scip.Range\x12\x0e\n\x06symbol\x18\x02 \x01(\t\x12\x14\n\x0csymbol_roles\x18\x03 \x01(\x05\x12%\n\x0bsyntax_kind\x18\x04 \x01(\x0e\x32\x10.scip.SyntaxKind\x12%\n\x0b\x64iagnostics\x18\x05 \x03(\x0b\x32\x10.scip.Diagnostic\x12\x17\n\x0f\x65nclosing_range\x18\x06 \x03(\t\"#\n\x05Range\x12\r\n\x05start\x18\x01 \x03(\x05\x12\x0b\n\x03\x65nd\x18\x02 \x03(\x05\"\x80\x01\n\nDiagnostic\x12 \n\x08severity\x18\x01 \x01(\x0e\x32\x0e.scip.Severity\x12\x0c\n\x04\x63ode\x18\x02 \x01(\t\x12\x0f\n\x07message\x18\x03 \x01(\t\x12\x0e\n\x06source\x18\x04 \x01(\t\x12!\n\x04tags\x18\x05 \x03(\x0e\x32\x13.scip.DiagnosticTag\"\xd6\x01\n\x11SymbolInformation\x12\x0e\n\x06symbol\x18\x01 \x01(\t\x12\x15\n\rdocumentation\x18\x02 \x03(\t\x12)\n\rrelationships\x18\x03 \x03(\x0b\x32\x12.scip.Relationship\x12\x1e\n\x04kind\x18\x04 \x01(\x0e\x32\x10.scip.SymbolKind\x12\x14\n\x0c\x64isplay_name\x18\x05 \x01(\t\x12\x1f\n\x17signature_documentation\x18\x06 \x01(\t\x12\x18\n\x10\x65nclosing_symbol\x18\x07 \x03(\t\"\x82\x01\n\x0cRelationship\x12\x0e\n\x06symbol\x18\x01 \x01(\t\x12\x14\n\x0cis_reference\x18\x02 \x01(\x08\x12\x19\n\x11is_implementation\x18\x03 \x01(\x08\x12\x1a\n\x12is_type_definition\x18\x04 \x01(\x08\x12\x15\n\ris_definition\x18\x05 \x01(\x08*1\n\x0fProtocolVersion\x12\x1e\n\x1aUnspecifiedProtocolVersion\x10\x00*[\n\x14TextDocumentEncoding\x12#\n\x1fUnspecifiedTextDocumentEncoding\x10\x00\x12\x08\n\x04UTF8\x10\x01\x12\t\n\x05UTF16\x10\x02\x12\t\n\x05UTF32\x10\x03*j\n\x10PositionEncoding\x12\x1f\n\x1bUnspecifiedPositionEncoding\x10\x00\x12\r\n\tUTF8Bytes\x10\x01\x12\x12\n\x0eUTF16CodeUnits\x10\x02\x12\x12\n\x0eUTF32CodeUnits\x10\x03*\xc8\x06\n\nSyntaxKind\x12\x19\n\x15UnspecifiedSyntaxKind\x10\x00\x12\x0b\n\x07\x43omment\x10\x01\x12\x18\n\x14PunctuationDelimiter\x10\x02\x12\x16\n\x12PunctuationBracket\x10\x03\x12\x0b\n\x07Keyword\x10\x04\x12\x15\n\x11IdentifierKeyword\x10\x05\x12\x16\n\x12IdentifierOperator\x10\x06\x12\x0e\n\nIdentifier\x10\x07\x12\x15\n\x11IdentifierBuiltin\x10\x08\x12\x12\n\x0eIdentifierNull\x10\t\x12\x16\n\x12IdentifierConstant\x10\n\x12\x1b\n\x17IdentifierMutableGlobal\x10\x0b\x12\x17\n\x13IdentifierParameter\x10\x0c\x12\x13\n\x0fIdentifierLocal\x10\r\x12\x16\n\x12IdentifierShadowed\x10\x0e\x12\x17\n\x13IdentifierNamespace\x10\x0f\x12\x16\n\x12IdentifierFunction\x10\x10\x12 \n\x1cIdentifierFunctionDefinition\x10\x11\x12\x13\n\x0fIdentifierMacro\x10\x12\x12\x1d\n\x19IdentifierMacroDefinition\x10\x13\x12\x12\n\x0eIdentifierType\x10\x14\x12\x19\n\x15IdentifierBuiltinType\x10\x15\x12\x17\n\x13IdentifierAttribute\x10\x16\x12\x0f\n\x0bRegexEscape\x10\x17\x12\x11\n\rRegexRepeated\x10\x18\x12\x11\n\rRegexWildcard\x10\x19\x12\x12\n\x0eRegexDelimiter\x10\x1a\x12\r\n\tRegexJoin\x10\x1b\x12\x11\n\rStringLiteral\x10\x1c\x12\x17\n\x13StringLiteralEscape\x10\x1d\x12\x18\n\x14StringLiteralSpecial\x10\x1e\x12\x14\n\x10StringLiteralKey\x10\x1f\x12\x14\n\x10\x43haracterLiteral\x10 \x12\x12\n\x0eNumericLiteral\x10!\x12\x12\n\x0e\x42ooleanLiteral\x10\"\x12\x07\n\x03Tag\x10#\x12\x10\n\x0cTagAttribute\x10$\x12\x10\n\x0cTagDelimiter\x10%*V\n\x08Severity\x12\x17\n\x13UnspecifiedSeverity\x10\x00\x12\t\n\x05\x45rror\x10\x01\x12\x0b\n\x07Warning\x10\x02\x12\x0f\n\x0bInformation\x10\x03\x12\x08\n\x04Hint\x10\x04*N\n\rDiagnosticTag\x12\x1c\n\x18UnspecifiedDiagnosticTag\x10\x00\x12\x0f\n\x0bUnnecessary\x10\x01\x12\x0e\n\nDeprecated\x10\x02*\xf1\x03\n\nSymbolKind\x12\x19\n\x15UnspecifiedSymbolKind\x10\x00\x12\t\n\x05\x41rray\x10\x01\x12\x0b\n\x07\x42oolean\x10\x02\x12\t\n\x05\x43lass\x10\x03\x12\x0c\n\x08\x43onstant\x10\x04\x12\x0f\n\x0b\x43onstructor\x10\x05\x12\x08\n\x04\x45num\x10\x06\x12\x0e\n\nEnumMember\x10\x07\x12\t\n\x05\x45vent\x10\x08\x12\t\n\x05\x46ield\x10\t\x12\x08\n\x04\x46ile\x10\n\x12\x0c\n\x08\x46unction\x10\x0b\x12\r\n\tInterface\x10\x0c\x12\x07\n\x03Key\x10\r\x12\n\n\x06Method\x10\x0e\x12\n\n\x06Module\x10\x0f\x12\r\n\tNamespace\x10\x10\x12\x08\n\x04Null\x10\x11\x12\n\n\x06Number\x10\x12\x12\n\n\x06Object\x10\x13\x12\x0c\n\x08Operator\x10\x14\x12\x0b\n\x07Package\x10\x15\x12\r\n\tParameter\x10\x16\x12\x0c\n\x08Property\x10\x17\x12\n\n\x06String\x10\x18\x12\n\n\x06Struct\x10\x19\x12\x11\n\rTypeParameter\x10\x1a\x12\x08\n\x04Unit\x10\x1b\x12\t\n\x05Value\x10\x1c\x12\x0c\n\x08Variable\x10\x1d\x12\x12\n\x0e\x41ssociatedType\x10\x1e\x12\x11\n\rSelfParameter\x10\x1f\x12\x0f\n\x0bUnknownKind\x10 \x12\t\n\x05Trait\x10!\x12\t\n\x05Union\x10\"\x12\t\n\x05Macro\x10#*{\n\nSymbolRole\x12\x19\n\x15UnspecifiedSymbolRole\x10\x00\x12\x0e\n\nDefinition\x10\x01\x12\n\n\x06Import\x10\x02\x12\t\n\x05Write\x10\x04\x12\x08\n\x04Read\x10\x08\x12\r\n\tGenerated\x10\x10\x12\x08\n\x04Test\x10 \x12\x08\n\x04Type\x10@BL\n\x19\x63om.sourcegraph.scip_javaZ/github.com/sourcegraph/scip/bindings/go/scip/v1b\x06proto3') - -_globals = globals() -_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals) -_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'code_index_mcp.scip.proto.scip_pb2', _globals) -if not _descriptor._USE_C_DESCRIPTORS: - _globals['DESCRIPTOR']._loaded_options = None - _globals['DESCRIPTOR']._serialized_options = b'\n\031com.sourcegraph.scip_javaZ/github.com/sourcegraph/scip/bindings/go/scip/v1' - _globals['_PROTOCOLVERSION']._serialized_start=1309 - _globals['_PROTOCOLVERSION']._serialized_end=1358 - _globals['_TEXTDOCUMENTENCODING']._serialized_start=1360 - _globals['_TEXTDOCUMENTENCODING']._serialized_end=1451 - _globals['_POSITIONENCODING']._serialized_start=1453 - _globals['_POSITIONENCODING']._serialized_end=1559 - _globals['_SYNTAXKIND']._serialized_start=1562 - _globals['_SYNTAXKIND']._serialized_end=2402 - _globals['_SEVERITY']._serialized_start=2404 - _globals['_SEVERITY']._serialized_end=2490 - _globals['_DIAGNOSTICTAG']._serialized_start=2492 - _globals['_DIAGNOSTICTAG']._serialized_end=2570 - _globals['_SYMBOLKIND']._serialized_start=2573 - _globals['_SYMBOLKIND']._serialized_end=3070 - _globals['_SYMBOLROLE']._serialized_start=3072 - _globals['_SYMBOLROLE']._serialized_end=3195 - _globals['_INDEX']._serialized_start=46 - _globals['_INDEX']._serialized_end=173 - _globals['_METADATA']._serialized_start=176 - _globals['_METADATA']._serialized_end=343 - _globals['_TOOLINFO']._serialized_start=345 - _globals['_TOOLINFO']._serialized_end=405 - _globals['_DOCUMENT']._serialized_start=408 - _globals['_DOCUMENT']._serialized_end=605 - _globals['_OCCURRENCE']._serialized_start=608 - _globals['_OCCURRENCE']._serialized_end=789 - _globals['_RANGE']._serialized_start=791 - _globals['_RANGE']._serialized_end=826 - _globals['_DIAGNOSTIC']._serialized_start=829 - _globals['_DIAGNOSTIC']._serialized_end=957 - _globals['_SYMBOLINFORMATION']._serialized_start=960 - _globals['_SYMBOLINFORMATION']._serialized_end=1174 - _globals['_RELATIONSHIP']._serialized_start=1177 - _globals['_RELATIONSHIP']._serialized_end=1307 -# @@protoc_insertion_point(module_scope) diff --git a/src/code_index_mcp/scip/strategies/__init__.py b/src/code_index_mcp/scip/strategies/__init__.py deleted file mode 100644 index 3fb54fa..0000000 --- a/src/code_index_mcp/scip/strategies/__init__.py +++ /dev/null @@ -1,5 +0,0 @@ -"""SCIP indexing strategies.""" - -from .base_strategy import SCIPIndexerStrategy - -__all__ = ['SCIPIndexerStrategy'] \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/base_strategy.py b/src/code_index_mcp/scip/strategies/base_strategy.py deleted file mode 100644 index 56972ef..0000000 --- a/src/code_index_mcp/scip/strategies/base_strategy.py +++ /dev/null @@ -1,432 +0,0 @@ -"""Base strategy interface for SCIP indexing - SCIP standard compliant.""" - -from abc import ABC, abstractmethod -from typing import List, Optional, Dict, Any -import logging - -from ..proto import scip_pb2 -from ..core.symbol_manager import SCIPSymbolManager -from ..core.position_calculator import PositionCalculator -from ..core.local_reference_resolver import LocalReferenceResolver -from ..core.relationship_manager import SCIPRelationshipManager -from ..core.relationship_types import SCIPRelationshipMapper, InternalRelationshipType - - -logger = logging.getLogger(__name__) - - -class SCIPIndexerStrategy(ABC): - """ - Base class for all SCIP indexing strategies. - - This version is fully compliant with SCIP standards and includes: - - Standard SCIP symbol ID generation - - Accurate position calculation - - Local cross-file reference resolution - - Two-phase analysis (symbol collection + reference resolution) - """ - - def __init__(self, priority: int = 50): - """ - Initialize the strategy with a priority level. - - Args: - priority: Strategy priority (higher = more preferred) - 100 = Official tools (highest) - 90 = Language-specific strategies - 50 = Custom strategies (primary) - 25 = Language-specialized defaults - 10 = Generic defaults - 1 = Fallback (lowest) - """ - self.priority = priority - - # Core components (initialized per project) - self.symbol_manager: Optional[SCIPSymbolManager] = None - self.reference_resolver: Optional[LocalReferenceResolver] = None - self.position_calculator: Optional[PositionCalculator] = None - self.relationship_manager: Optional[SCIPRelationshipManager] = None - self.relationship_mapper: Optional[SCIPRelationshipMapper] = None - - @abstractmethod - def can_handle(self, extension: str, file_path: str) -> bool: - """ - Check if this strategy can handle the given file type. - - Args: - extension: File extension (e.g., '.py') - file_path: Full path to the file - - Returns: - True if this strategy can handle the file - """ - - @abstractmethod - def get_language_name(self) -> str: - """ - Get the language name for SCIP symbol generation. - - Returns: - Language name (e.g., 'python', 'javascript', 'java') - """ - - def generate_scip_documents(self, files: List[str], project_path: str) -> List[scip_pb2.Document]: - """ - Generate SCIP documents for the given files using two-phase analysis. - - Args: - files: List of file paths to index - project_path: Root path of the project - - Returns: - List of SCIP Document objects - - Raises: - StrategyError: If the strategy cannot process the files - """ - import os - from datetime import datetime - strategy_name = self.__class__.__name__ - - logger.info(f"🐍 {strategy_name}: Starting indexing of {len(files)} files") - logger.debug(f"Files to process: {[os.path.basename(f) for f in files[:5]]}" + - (f" ... and {len(files)-5} more" if len(files) > 5 else "")) - - try: - # Initialize core components for this project - logger.debug(f"🔧 {strategy_name}: Initializing components...") - self._initialize_components(project_path) - logger.debug(f"✅ {strategy_name}: Component initialization completed") - - # Phase 1: Collect all symbol definitions - logger.info(f"📋 {strategy_name}: Phase 1 - Collecting symbol definitions from {len(files)} files") - self._collect_symbol_definitions(files, project_path) - logger.info(f"✅ {strategy_name}: Phase 1 completed") - - # Phase 2: Build symbol relationships - logger.info(f"🔗 {strategy_name}: Phase 2 - Building symbol relationships") - relationships = self._build_symbol_relationships(files, project_path) - total_relationships = sum(len(rels) for rels in relationships.values()) - logger.info(f"✅ {strategy_name}: Phase 2 completed, built {total_relationships} relationships for {len(relationships)} symbols") - - # Phase 3: Generate complete SCIP documents with resolved references and relationships - logger.info(f"📄 {strategy_name}: Phase 3 - Generating SCIP documents with resolved references and relationships") - documents = self._generate_documents_with_references(files, project_path, relationships) - logger.info(f"✅ {strategy_name}: Phase 3 completed, generated {len(documents)} documents") - - # Log statistics - if self.reference_resolver: - stats = self.reference_resolver.get_project_statistics() - logger.info(f"📊 {strategy_name}: Statistics - {stats['total_definitions']} definitions, " - f"{stats['total_references']} references, {stats['files_with_symbols']} files") - - logger.info(f"🎉 {strategy_name}: Indexing completed") - - return documents - - except Exception as e: - logger.error(f"❌ {strategy_name}: Failed: {e}") - raise StrategyError(f"Failed to generate SCIP documents: {e}") from e - - def get_external_symbols(self): - """Get external symbol information from symbol manager.""" - if self.symbol_manager: - return self.symbol_manager.get_external_symbols() - return [] - - def get_dependencies(self): - """Get dependency information from symbol manager.""" - if self.symbol_manager: - return self.symbol_manager.get_dependencies() - return {} - - def _initialize_components(self, project_path: str) -> None: - """Initialize core components for the project.""" - import os - project_name = os.path.basename(project_path) - - self.symbol_manager = SCIPSymbolManager(project_path, project_name) - self.reference_resolver = LocalReferenceResolver(project_path) - self.relationship_manager = SCIPRelationshipManager() - self.relationship_mapper = SCIPRelationshipMapper() - - logger.debug(f"Initialized components for project: {project_name}") - - @abstractmethod - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """ - Phase 1: Collect all symbol definitions from files. - - This phase should: - 1. Parse each file - 2. Extract symbol definitions - 3. Register them with the reference resolver - - Args: - files: List of file paths to process - project_path: Project root path - """ - - @abstractmethod - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """ - Phase 3: Generate complete SCIP documents with resolved references and relationships. - - This phase should: - 1. Parse each file again - 2. Generate occurrences for definitions and references - 3. Resolve references using the reference resolver - 4. Add relationships to symbol information - 5. Create complete SCIP documents - - Args: - files: List of file paths to process - project_path: Project root path - relationships: Optional dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - - Returns: - List of complete SCIP documents - """ - - @abstractmethod - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between symbols. - - This method should analyze symbol relationships and return a mapping - from symbol IDs to their relationships. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - - def _create_scip_relationships(self, symbol_relationships: List[tuple]) -> List[scip_pb2.Relationship]: - """ - Create SCIP relationships from symbol relationship tuples. - - Args: - symbol_relationships: List of (target_symbol, relationship_type) tuples - - Returns: - List of SCIP Relationship objects - """ - if not self.relationship_mapper: - logger.warning("Relationship mapper not initialized, returning empty relationships") - return [] - - try: - relationships = [] - for target_symbol, relationship_type in symbol_relationships: - if isinstance(relationship_type, str): - # Convert string to enum if needed - try: - relationship_type = InternalRelationshipType(relationship_type) - except ValueError: - logger.warning(f"Unknown relationship type: {relationship_type}") - continue - - scip_rel = self.relationship_mapper.map_to_scip_relationship( - target_symbol, relationship_type - ) - relationships.append(scip_rel) - - logger.debug(f"Created {len(relationships)} SCIP relationships") - return relationships - - except Exception as e: - logger.error(f"Failed to create SCIP relationships: {e}") - return [] - - def get_priority(self) -> int: - """Return the strategy priority.""" - return self.priority - - def get_strategy_name(self) -> str: - """Return a human-readable name for this strategy.""" - class_name = self.__class__.__name__ - return class_name - - def is_available(self) -> bool: - """ - Check if this strategy is available and ready to use. - - Returns: - True if the strategy can be used - """ - return True - - def _read_file_content(self, file_path: str) -> Optional[str]: - """ - Read file content with encoding detection. - - Args: - file_path: Path to file - - Returns: - File content or None if reading fails - """ - try: - # Try different encodings - encodings = ['utf-8', 'utf-8-sig', 'latin-1', 'cp1252'] - - for encoding in encodings: - try: - with open(file_path, 'r', encoding=encoding) as f: - return f.read() - except UnicodeDecodeError: - continue - - logger.warning(f"Could not decode {file_path} with any encoding") - return None - - except (OSError, PermissionError, FileNotFoundError) as e: - logger.warning(f"Could not read {file_path}: {e}") - return None - - def _get_relative_path(self, file_path: str, project_path: str) -> str: - """ - Get relative path from project root. - - Args: - file_path: Absolute or relative file path - project_path: Project root path - - Returns: - Relative path from project root - """ - try: - from pathlib import Path - path = Path(file_path) - if path.is_absolute(): - return str(path.relative_to(Path(project_path))) - return file_path - except ValueError: - # If path is not under project_path, return as-is - return file_path - - def _create_scip_occurrence(self, - symbol_id: str, - range_obj: scip_pb2.Range, - symbol_roles: int, - syntax_kind: int) -> scip_pb2.Occurrence: - """ - Create a SCIP occurrence. - - Args: - symbol_id: SCIP symbol ID - range_obj: SCIP Range object - symbol_roles: SCIP symbol roles - syntax_kind: SCIP syntax kind - - Returns: - SCIP Occurrence object - """ - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = symbol_roles - occurrence.syntax_kind = syntax_kind - occurrence.range.CopyFrom(range_obj) - - return occurrence - - def _create_scip_symbol_information(self, - symbol_id: str, - display_name: str, - symbol_kind: int, - documentation: List[str] = None, - relationships: List[scip_pb2.Relationship] = None) -> scip_pb2.SymbolInformation: - """ - Create SCIP symbol information with relationships. - - Args: - symbol_id: SCIP symbol ID - display_name: Human-readable name - symbol_kind: SCIP symbol kind - documentation: Optional documentation - relationships: Optional relationships - - Returns: - SCIP SymbolInformation object with relationships - """ - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = display_name - symbol_info.kind = symbol_kind - - if documentation: - symbol_info.documentation.extend(documentation) - - # Add relationships if provided - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - def _register_symbol_definition(self, symbol_id: str, file_path: str, - definition_range: scip_pb2.Range, symbol_kind: int, - display_name: str, documentation: List[str] = None) -> None: - """ - Register a symbol definition with the reference resolver. - - Args: - symbol_id: SCIP symbol ID - file_path: File path where symbol is defined - definition_range: SCIP range object for definition - symbol_kind: SCIP symbol kind - display_name: Human-readable name - documentation: Optional documentation - """ - if not self.reference_resolver: - logger.warning("Reference resolver not initialized, skipping symbol registration") - return - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=definition_range, - symbol_kind=symbol_kind, - display_name=display_name, - documentation=documentation or [] - ) - - def _check_components_initialized(self) -> bool: - """ - Check if all required components are initialized. - - Returns: - True if all components are ready - - Raises: - StrategyError: If required components are not initialized - """ - missing_components = [] - - if not self.symbol_manager: - missing_components.append("symbol_manager") - if not self.reference_resolver: - missing_components.append("reference_resolver") - if not self.relationship_manager: - missing_components.append("relationship_manager") - if not self.relationship_mapper: - missing_components.append("relationship_mapper") - - if missing_components: - raise StrategyError(f"Required components not initialized: {', '.join(missing_components)}") - - return True - - -class StrategyError(Exception): - """Base exception for strategy-related errors.""" - - -class ToolUnavailableError(StrategyError): - """Raised when a required tool is not available.""" - - -class ConversionError(StrategyError): - """Raised when conversion to SCIP format fails.""" \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/fallback_strategy.py b/src/code_index_mcp/scip/strategies/fallback_strategy.py deleted file mode 100644 index 416ebcb..0000000 --- a/src/code_index_mcp/scip/strategies/fallback_strategy.py +++ /dev/null @@ -1,539 +0,0 @@ -"""Fallback SCIP indexing strategy - SCIP standard compliant.""" - -import logging -import os -import re -from typing import List, Optional, Dict, Any, Set -from pathlib import Path - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType -from ...constants import SUPPORTED_EXTENSIONS - - -logger = logging.getLogger(__name__) - - -class FallbackStrategy(SCIPIndexerStrategy): - """SCIP-compliant fallback strategy for files without specific language support.""" - - def __init__(self, priority: int = 10): - """Initialize the fallback strategy with low priority.""" - super().__init__(priority) - - def can_handle(self, extension: str, file_path: str) -> bool: - """This strategy can handle supported file extensions as a last resort.""" - return extension.lower() in SUPPORTED_EXTENSIONS - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "text" # Generic text language - - def is_available(self) -> bool: - """Check if this strategy is available.""" - return True # Always available as fallback - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from text files.""" - logger.debug(f"FallbackStrategy Phase 1: Processing {len(files)} files for symbol collection") - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"FallbackStrategy Phase 2: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_text_file(file_path, project_path, relationships) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 2 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 2 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 2 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single text file.""" - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {os.path.relpath(file_path, project_path)}") - return - - # Collect symbols using pattern matching - relative_path = self._get_relative_path(file_path, project_path) - self._collect_symbols_from_text(relative_path, content) - logger.debug(f"Symbol collection - {relative_path}") - - def _analyze_text_file(self, file_path: str, project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> Optional[scip_pb2.Document]: - """Analyze a single text file and generate complete SCIP document.""" - # Read file content - content = self._read_file_content(file_path) - if not content: - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = self._get_relative_path(file_path, project_path) - document.language = self._detect_language_from_extension(Path(file_path).suffix) - - # Analyze content and generate occurrences - self.position_calculator = PositionCalculator(content) - occurrences, symbols = self._analyze_text_content_for_document(document.relative_path, content, document.language, relationships) - - # Add results to document - document.occurrences.extend(occurrences) - document.symbols.extend(symbols) - - logger.debug(f"Analyzed text file {document.relative_path}: " - f"{len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build basic relationships using generic patterns. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"FallbackStrategy: Building symbol relationships for {len(files)} files") - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"FallbackStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _extract_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract basic relationships using generic patterns.""" - content = self._read_file_content(file_path) - if not content: - return {} - - relationships = {} - relative_path = self._get_relative_path(file_path, project_path) - - # Generic function call patterns - function_call_pattern = r"(\w+)\s*\(" - function_def_patterns = [ - r"function\s+(\w+)\s*\(", # JavaScript - r"def\s+(\w+)\s*\(", # Python - r"fn\s+(\w+)\s*\(", # Rust/Zig - r"func\s+(\w+)\s*\(", # Go/Swift - ] - - # Basic function definition extraction - for pattern in function_def_patterns: - for match in re.finditer(pattern, content): - function_name = match.group(1) - # Could expand to extract calls within function context - - logger.debug(f"Extracted {len(relationships)} relationships from {relative_path}") - return relationships - - def _detect_language_from_extension(self, extension: str) -> str: - """Detect specific language from extension.""" - extension_mapping = { - # Programming languages - '.c': 'c', - '.cpp': 'cpp', '.cc': 'cpp', '.cxx': 'cpp', '.c++': 'cpp', - '.h': 'c', '.hpp': 'cpp', '.hh': 'cpp', '.hxx': 'cpp', - '.go': 'go', - '.rs': 'rust', - '.rb': 'ruby', - '.cs': 'csharp', - '.php': 'php', - '.swift': 'swift', - '.kt': 'kotlin', '.kts': 'kotlin', - '.scala': 'scala', - '.r': 'r', - '.lua': 'lua', - '.perl': 'perl', '.pl': 'perl', - '.zig': 'zig', - '.dart': 'dart', - - # Web and markup - '.html': 'html', '.htm': 'html', - '.css': 'css', - '.scss': 'scss', '.sass': 'sass', - '.less': 'less', - '.vue': 'vue', - '.svelte': 'svelte', - '.astro': 'astro', - - # Data and config - '.json': 'json', - '.xml': 'xml', - '.yaml': 'yaml', '.yml': 'yaml', - '.toml': 'toml', - '.ini': 'ini', - '.cfg': 'ini', - '.conf': 'ini', - - # Documentation - '.md': 'markdown', '.markdown': 'markdown', - '.mdx': 'mdx', - '.tex': 'latex', - '.rst': 'rst', - - # Database and query - '.sql': 'sql', - '.cql': 'cql', - '.cypher': 'cypher', - '.sparql': 'sparql', - '.graphql': 'graphql', '.gql': 'graphql', - - # Shell and scripts - '.sh': 'shell', '.bash': 'bash', - '.zsh': 'zsh', '.fish': 'fish', - '.ps1': 'powershell', - '.bat': 'batch', '.cmd': 'batch', - - # Template languages - '.handlebars': 'handlebars', '.hbs': 'handlebars', - '.ejs': 'ejs', - '.pug': 'pug', - '.mustache': 'mustache', - - # Other - '.dockerfile': 'dockerfile', - '.gitignore': 'gitignore', - '.env': 'dotenv', - } - - return extension_mapping.get(extension.lower(), 'text') - - # Symbol collection methods (Phase 1) - def _collect_symbols_from_text(self, file_path: str, content: str) -> None: - """Collect symbols from text content using pattern matching.""" - lines = content.split('\n') - - # Determine if this looks like code - if self._is_code_like(content): - self._collect_code_symbols(file_path, lines) - else: - # For non-code files, just create a basic file symbol - self._collect_file_symbol(file_path) - - def _collect_code_symbols(self, file_path: str, lines: List[str]): - """Collect symbols from code-like content.""" - patterns = { - 'function_like': [ - re.compile(r'(?:^|\s)(?:function|def|fn|func)\s+(\w+)', re.IGNORECASE | re.MULTILINE), - re.compile(r'(?:^|\s)(\w+)\s*\([^)]*\)\s*[{:]', re.MULTILINE), # Function definitions - re.compile(r'(?:^|\s)(\w+)\s*:=?\s*function', re.IGNORECASE | re.MULTILINE), # JS functions - ], - 'class_like': [ - re.compile(r'(?:^|\s)(?:class|struct|interface|enum)\s+(\w+)', re.IGNORECASE | re.MULTILINE), - ], - 'constant_like': [ - re.compile(r'(?:^|\s)(?:const|let|var|#define)\s+(\w+)', re.IGNORECASE | re.MULTILINE), - re.compile(r'(?:^|\s)(\w+)\s*[:=]\s*[^=]', re.MULTILINE), # Simple assignments - ], - 'config_like': [ - re.compile(r'^(\w+)\s*[:=]', re.MULTILINE), # Config keys - re.compile(r'^\[(\w+)\]', re.MULTILINE), # INI sections - ] - } - - for line_num, line in enumerate(lines): - line = line.strip() - if not line or line.startswith(('#', '//', '/*', '*', '--', ';')): - continue - - # Look for function-like patterns - for pattern in patterns['function_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and name.isidentifier() and len(name) > 1: - self._register_symbol(name, file_path, "().", "Function-like construct") - - # Look for class-like patterns - for pattern in patterns['class_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and name.isidentifier() and len(name) > 1: - self._register_symbol(name, file_path, "#", "Type definition") - - # Look for constant-like patterns - for pattern in patterns['constant_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and name.isidentifier() and len(name) > 1: - self._register_symbol(name, file_path, "", "Variable or constant") - - # Look for config-like patterns - for pattern in patterns['config_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and len(name) > 1: - self._register_symbol(name, file_path, "", "Configuration key") - - def _collect_file_symbol(self, file_path: str): - """Create a basic file-level symbol for non-code files.""" - file_name = Path(file_path).stem - self._register_symbol(file_name, file_path, "", "File") - - def _register_symbol(self, name: str, file_path: str, descriptor: str, description: str): - """Register a symbol with the reference resolver.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="text", - file_path=file_path, - symbol_path=[name], - descriptor=descriptor - ) - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.UnspecifiedSymbolKind, - display_name=name, - documentation=[description] - ) - - # Document analysis methods (Phase 2) - def _analyze_text_content_for_document(self, file_path: str, content: str, language: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> tuple: - """Analyze text content and generate SCIP data.""" - lines = content.split('\n') - - # Determine if this looks like code - if self._is_code_like(content): - return self._analyze_code_for_document(file_path, lines, language, relationships) - else: - # For non-code files, just create a basic file symbol - return self._analyze_file_for_document(file_path, language) - - def _analyze_code_for_document(self, file_path: str, lines: List[str], language: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> tuple: - """Analyze code patterns and create symbols for document.""" - occurrences = [] - symbols = [] - - patterns = { - 'function_like': [ - re.compile(r'(?:^|\s)(?:function|def|fn|func)\s+(\w+)', re.IGNORECASE | re.MULTILINE), - re.compile(r'(?:^|\s)(\w+)\s*\([^)]*\)\s*[{:]', re.MULTILINE), # Function definitions - re.compile(r'(?:^|\s)(\w+)\s*:=?\s*function', re.IGNORECASE | re.MULTILINE), # JS functions - ], - 'class_like': [ - re.compile(r'(?:^|\s)(?:class|struct|interface|enum)\s+(\w+)', re.IGNORECASE | re.MULTILINE), - ], - 'constant_like': [ - re.compile(r'(?:^|\s)(?:const|let|var|#define)\s+(\w+)', re.IGNORECASE | re.MULTILINE), - re.compile(r'(?:^|\s)(\w+)\s*[:=]\s*[^=]', re.MULTILINE), # Simple assignments - ], - 'config_like': [ - re.compile(r'^(\w+)\s*[:=]', re.MULTILINE), # Config keys - re.compile(r'^[\[(\w+)\]]', re.MULTILINE), # INI sections - ] - } - - for line_num, line in enumerate(lines): - line = line.strip() - if not line or line.startswith(('#', '//', '/*', '*', '--', ';')): - continue - - # Look for function-like patterns - for pattern in patterns['function_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and name.isidentifier() and len(name) > 1: - occ, sym = self._create_symbol_for_document( - line_num, name, file_path, scip_pb2.Function, "().", - f"Function-like construct in {language}", - relationships - ) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - - # Look for class-like patterns - for pattern in patterns['class_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and name.isidentifier() and len(name) > 1: - occ, sym = self._create_symbol_for_document( - line_num, name, file_path, scip_pb2.Class, "#", - f"Type definition in {language}", - relationships - ) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - - # Look for constant-like patterns - for pattern in patterns['constant_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and name.isidentifier() and len(name) > 1: - occ, sym = self._create_symbol_for_document( - line_num, name, file_path, scip_pb2.Variable, "", - f"Variable or constant in {language}", - relationships - ) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - - # Look for config-like patterns - for pattern in patterns['config_like']: - match = pattern.search(line) - if match: - name = match.group(1) - if name and len(name) > 1: - occ, sym = self._create_symbol_for_document( - line_num, name, file_path, scip_pb2.Constant, "", - f"Configuration key in {language}", - relationships - ) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - - return occurrences, symbols - - def _analyze_file_for_document(self, file_path: str, language: str) -> tuple: - """Create a basic file-level symbol for non-code files.""" - file_name = Path(file_path).stem - - symbol_id = self.symbol_manager.create_local_symbol( - language="text", - file_path=file_path, - symbol_path=[file_name], - descriptor="" - ) - - # Create symbol information only (no occurrence for file-level symbols) - symbol_info = self._create_symbol_information( - symbol_id, file_name, scip_pb2.File, f"{language.title()} file" - ) - - return [], [symbol_info] - - def _create_symbol_for_document(self, line_num: int, name: str, file_path: str, - symbol_kind: int, descriptor: str, description: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> tuple: - """Create a symbol with occurrence and information for document.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="text", - file_path=file_path, - symbol_path=[name], - descriptor=descriptor - ) - - # Create definition occurrence - start_col, end_col = self.position_calculator.find_name_in_line(line_num, name) - range_obj = self.position_calculator.line_col_to_range( - line_num, start_col, line_num, end_col - ) - - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Definition, scip_pb2.Identifier - ) - - # Create symbol information - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - symbol_info = self._create_symbol_information( - symbol_id, name, symbol_kind, description, scip_relationships - ) - - return occurrence, symbol_info - - # Utility methods - def _is_code_like(self, content: str) -> bool: - """Determine if the file appears to be code-like.""" - # Check for common code indicators - code_indicators = [ - r'\bfunction\b', r'\bdef\b', r'\bclass\b', r'\binterface\b', - r'\bstruct\b', r'\benum\b', r'\bconst\b', r'\bvar\b', r'\blet\b', - r'[{}();]', r'=\s*function', r'=>', r'\bif\b', r'\bfor\b', r'\bwhile\b' - ] - - code_score = 0 - for pattern in code_indicators: - if re.search(pattern, content, re.IGNORECASE): - code_score += 1 - - # If we find multiple code indicators, treat as code - return code_score >= 3 - - def _create_occurrence(self, symbol_id: str, range_obj: scip_pb2.Range, - symbol_roles: int, syntax_kind: int) -> scip_pb2.Occurrence: - """Create a SCIP occurrence.""" - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = symbol_roles - occurrence.syntax_kind = syntax_kind - occurrence.range.CopyFrom(range_obj) - return occurrence - - def _create_symbol_information(self, symbol_id: str, display_name: str, - symbol_kind: int, description: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = display_name - symbol_info.kind = symbol_kind - symbol_info.documentation.append(description) - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - return symbol_info \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/java_strategy.py b/src/code_index_mcp/scip/strategies/java_strategy.py deleted file mode 100644 index ea2409a..0000000 --- a/src/code_index_mcp/scip/strategies/java_strategy.py +++ /dev/null @@ -1,624 +0,0 @@ -"""Java SCIP indexing strategy v4 - Tree-sitter based with Python strategy architecture.""" - -import logging -import os -from typing import List, Optional, Dict, Any, Set - -try: - import tree_sitter - from tree_sitter_java import language as java_language - TREE_SITTER_AVAILABLE = True -except ImportError: - TREE_SITTER_AVAILABLE = False - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - - -logger = logging.getLogger(__name__) - - -class JavaStrategy(SCIPIndexerStrategy): - """SCIP-compliant Java indexing strategy using Tree-sitter with Python strategy architecture.""" - - SUPPORTED_EXTENSIONS = {'.java'} - - def __init__(self, priority: int = 95): - """Initialize the Java strategy v4.""" - super().__init__(priority) - - if not TREE_SITTER_AVAILABLE: - raise StrategyError("Tree-sitter not available for Java strategy") - - # Initialize Java parser - java_lang = tree_sitter.Language(java_language()) - self.parser = tree_sitter.Parser(java_lang) - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - return extension.lower() in self.SUPPORTED_EXTENSIONS and TREE_SITTER_AVAILABLE - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "java" - - def is_available(self) -> bool: - """Check if this strategy is available.""" - return TREE_SITTER_AVAILABLE - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from Java files.""" - logger.debug(f"JavaStrategy Phase 1: Processing {len(files)} files for symbol collection") - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"JavaStrategy Phase 2: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_java_file(file_path, project_path, relationships) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 2 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 2 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 2 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between Java symbols. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"JavaStrategy: Building symbol relationships for {len(files)} files") - - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_java_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"JavaStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single Java file.""" - content = self._read_file_content(file_path) - if not content: - return - - tree = self._parse_content(content) - if not tree: - return - - relative_path = self._get_relative_path(file_path, project_path) - self._collect_symbols_from_tree(tree, relative_path, content) - - def _analyze_java_file(self, file_path: str, project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> Optional[scip_pb2.Document]: - """Analyze a single Java file and generate complete SCIP document.""" - content = self._read_file_content(file_path) - if not content: - return None - - tree = self._parse_content(content) - if not tree: - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = self._get_relative_path(file_path, project_path) - document.language = self.get_language_name() - - # Analyze Tree-sitter AST and generate occurrences - self.position_calculator = PositionCalculator(content) - occurrences, symbols = self._analyze_tree_for_document(tree, document.relative_path, content, relationships) - - # Add results to document - document.occurrences.extend(occurrences) - document.symbols.extend(symbols) - - logger.debug(f"Analyzed Java file {document.relative_path}: " - f"{len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - def _parse_content(self, content: str) -> Optional[tree_sitter.Tree]: - """Parse Java content with Tree-sitter.""" - try: - return self.parser.parse(bytes(content, "utf8")) - except Exception as e: - logger.error(f"Failed to parse Java content: {e}") - return None - - def _collect_symbols_from_tree(self, tree: tree_sitter.Tree, file_path: str, content: str) -> None: - """Collect symbols from Tree-sitter tree using integrated visitor (Phase 1).""" - root = tree.root_node - - for node in self._walk_tree(root): - if node.type == "class_declaration": - self._register_class_symbol(node, file_path, content) - elif node.type == "interface_declaration": - self._register_interface_symbol(node, file_path, content) - elif node.type == "enum_declaration": - self._register_enum_symbol(node, file_path, content) - elif node.type == "method_declaration": - self._register_method_symbol(node, file_path, content) - elif node.type == "constructor_declaration": - self._register_constructor_symbol(node, file_path, content) - - def _analyze_tree_for_document(self, tree: tree_sitter.Tree, file_path: str, content: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> tuple[List[scip_pb2.Occurrence], List[scip_pb2.SymbolInformation]]: - """Analyze Tree-sitter tree to generate occurrences and symbols for SCIP document (Phase 2).""" - occurrences = [] - symbols = [] - root = tree.root_node - - for node in self._walk_tree(root): - if node.type == "class_declaration": - symbol_id = self._create_class_symbol_id(node, file_path, content) - occurrence = self._create_class_occurrence(node, symbol_id) - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - - symbol_info = self._create_class_symbol_info(node, symbol_id, content, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - elif node.type == "interface_declaration": - symbol_id = self._create_interface_symbol_id(node, file_path, content) - occurrence = self._create_interface_occurrence(node, symbol_id) - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - - symbol_info = self._create_interface_symbol_info(node, symbol_id, content, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - elif node.type in ["method_declaration", "constructor_declaration"]: - symbol_id = self._create_method_symbol_id(node, file_path, content) - occurrence = self._create_method_occurrence(node, symbol_id) - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - - symbol_info = self._create_method_symbol_info(node, symbol_id, content, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - return occurrences, symbols - - def _extract_java_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract relationships from a single Java file.""" - logger.debug(f"JavaStrategy: Starting relationship extraction from {file_path}") - - content = self._read_file_content(file_path) - if not content: - logger.debug(f"JavaStrategy: No content found in {file_path}") - return {} - - tree = self._parse_content(content) - if not tree: - logger.debug(f"JavaStrategy: Failed to parse {file_path} with Tree-sitter") - return {} - - relative_path = self._get_relative_path(file_path, project_path) - relationships = self._extract_relationships_from_tree(tree, relative_path, content) - - logger.debug(f"JavaStrategy: Extracted {len(relationships)} relationships from {relative_path}") - return relationships - - def _extract_relationships_from_tree(self, tree: tree_sitter.Tree, file_path: str, content: str) -> Dict[str, List[tuple]]: - """Extract relationships from Tree-sitter AST.""" - relationships = {} - root = tree.root_node - - for node in self._walk_tree(root): - if node.type == "class_declaration": - # Extract inheritance relationships - class_symbol_id = self._create_class_symbol_id(node, file_path, content) - - # Find extends clause - for child in node.children: - if child.type == "superclass": - for grandchild in child.children: - if grandchild.type == "type_identifier": - parent_name = grandchild.text.decode() - parent_symbol_id = self._create_class_symbol_id_by_name(parent_name, file_path) - if class_symbol_id not in relationships: - relationships[class_symbol_id] = [] - relationships[class_symbol_id].append((parent_symbol_id, InternalRelationshipType.INHERITS)) - - # Find implements clause - for child in node.children: - if child.type == "super_interfaces": - for interface_list in child.children: - if interface_list.type == "type_list": - for interface_type in interface_list.children: - if interface_type.type == "type_identifier": - interface_name = interface_type.text.decode() - interface_symbol_id = self._create_interface_symbol_id_by_name(interface_name, file_path) - if class_symbol_id not in relationships: - relationships[class_symbol_id] = [] - relationships[class_symbol_id].append((interface_symbol_id, InternalRelationshipType.IMPLEMENTS)) - - return relationships - - # Helper methods for Tree-sitter node processing - def _walk_tree(self, node: tree_sitter.Node): - """Walk through all nodes in a Tree-sitter tree.""" - yield node - for child in node.children: - yield from self._walk_tree(child) - - def _get_node_identifier(self, node: tree_sitter.Node) -> Optional[str]: - """Get the identifier name from a Tree-sitter node.""" - for child in node.children: - if child.type == "identifier": - return child.text.decode() - return None - - def _get_package_name(self, tree: tree_sitter.Tree) -> str: - """Extract package name from Tree-sitter tree.""" - root = tree.root_node - for node in self._walk_tree(root): - if node.type == "package_declaration": - for child in node.children: - if child.type == "scoped_identifier": - return child.text.decode() - return "" - - # Symbol creation methods (similar to Python strategy) - def _register_class_symbol(self, node: tree_sitter.Node, file_path: str, content: str) -> None: - """Register a class symbol definition.""" - name = self._get_node_identifier(node) - if not name: - return - - symbol_id = self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Class, - display_name=name, - documentation=["Java class"] - ) - - def _register_interface_symbol(self, node: tree_sitter.Node, file_path: str, content: str) -> None: - """Register an interface symbol definition.""" - name = self._get_node_identifier(node) - if not name: - return - - symbol_id = self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Interface, - display_name=name, - documentation=["Java interface"] - ) - - def _register_enum_symbol(self, node: tree_sitter.Node, file_path: str, content: str) -> None: - """Register an enum symbol definition.""" - name = self._get_node_identifier(node) - if not name: - return - - symbol_id = self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Enum, - display_name=name, - documentation=["Java enum"] - ) - - def _register_method_symbol(self, node: tree_sitter.Node, file_path: str, content: str) -> None: - """Register a method symbol definition.""" - name = self._get_node_identifier(node) - if not name: - return - - symbol_id = self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="()." - ) - - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Method, - display_name=name, - documentation=["Java method"] - ) - - def _register_constructor_symbol(self, node: tree_sitter.Node, file_path: str, content: str) -> None: - """Register a constructor symbol definition.""" - name = self._get_node_identifier(node) - if not name: - return - - symbol_id = self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="()." - ) - - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Method, - display_name=name, - documentation=["Java constructor"] - ) - - # Symbol ID creation methods - def _create_class_symbol_id(self, node: tree_sitter.Node, file_path: str, content: str) -> str: - """Create symbol ID for a class.""" - name = self._get_node_identifier(node) - if not name: - return "" - return self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - def _create_class_symbol_id_by_name(self, name: str, file_path: str) -> str: - """Create symbol ID for a class by name.""" - return self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - def _create_interface_symbol_id(self, node: tree_sitter.Node, file_path: str, content: str) -> str: - """Create symbol ID for an interface.""" - name = self._get_node_identifier(node) - if not name: - return "" - return self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - def _create_interface_symbol_id_by_name(self, name: str, file_path: str) -> str: - """Create symbol ID for an interface by name.""" - return self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="#" - ) - - def _create_method_symbol_id(self, node: tree_sitter.Node, file_path: str, content: str) -> str: - """Create symbol ID for a method.""" - name = self._get_node_identifier(node) - if not name: - return "" - return self.symbol_manager.create_local_symbol( - language="java", - file_path=file_path, - symbol_path=[name], - descriptor="()." - ) - - # Occurrence creation methods (using PositionCalculator) - def _create_class_occurrence(self, node: tree_sitter.Node, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for class.""" - if not self.position_calculator: - return None - - try: - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierType - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_interface_occurrence(self, node: tree_sitter.Node, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for interface.""" - if not self.position_calculator: - return None - - try: - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierType - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_method_occurrence(self, node: tree_sitter.Node, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for method.""" - if not self.position_calculator: - return None - - try: - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierFunction - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - # Symbol information creation methods (with relationships) - def _create_class_symbol_info(self, node: tree_sitter.Node, symbol_id: str, content: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for class.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = self._get_node_identifier(node) or "Unknown" - symbol_info.kind = scip_pb2.Class - - # Add documentation - symbol_info.documentation.append("Java class") - - # Add relationships if provided - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - def _create_interface_symbol_info(self, node: tree_sitter.Node, symbol_id: str, content: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for interface.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = self._get_node_identifier(node) or "Unknown" - symbol_info.kind = scip_pb2.Interface - - symbol_info.documentation.append("Java interface") - - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - def _create_method_symbol_info(self, node: tree_sitter.Node, symbol_id: str, content: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for method.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = self._get_node_identifier(node) or "Unknown" - symbol_info.kind = scip_pb2.Method - - # Determine if it's a constructor or method - if node.type == "constructor_declaration": - symbol_info.documentation.append("Java constructor") - else: - symbol_info.documentation.append("Java method") - - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - def _create_scip_relationships(self, symbol_relationships: List[tuple]) -> List[scip_pb2.Relationship]: - """Convert internal relationships to SCIP relationships.""" - scip_relationships = [] - for target_symbol_id, relationship_type in symbol_relationships: - relationship = scip_pb2.Relationship() - relationship.symbol = target_symbol_id - relationship.is_reference = True - # Map relationship types to SCIP if needed - scip_relationships.append(relationship) - return scip_relationships \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/javascript_strategy.py b/src/code_index_mcp/scip/strategies/javascript_strategy.py deleted file mode 100644 index 489fd37..0000000 --- a/src/code_index_mcp/scip/strategies/javascript_strategy.py +++ /dev/null @@ -1,974 +0,0 @@ -"""JavaScript/TypeScript SCIP indexing strategy - SCIP standard compliant.""" - -import logging -import os -from typing import List, Optional, Dict, Any, Set - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - -# Tree-sitter imports -import tree_sitter -from tree_sitter_javascript import language as js_language -from tree_sitter_typescript import language_typescript as ts_language - - -logger = logging.getLogger(__name__) - - -class JavaScriptStrategy(SCIPIndexerStrategy): - """SCIP-compliant JavaScript/TypeScript indexing strategy using Tree-sitter.""" - - SUPPORTED_EXTENSIONS = {'.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs'} - - def __init__(self, priority: int = 95): - """Initialize the JavaScript/TypeScript strategy.""" - super().__init__(priority) - - # Initialize parsers - try: - js_lang = tree_sitter.Language(js_language()) - ts_lang = tree_sitter.Language(ts_language()) - - self.js_parser = tree_sitter.Parser(js_lang) - self.ts_parser = tree_sitter.Parser(ts_lang) - logger.info("JavaScript strategy initialized with Tree-sitter support") - except Exception as e: - logger.error(f"Failed to initialize JavaScript strategy: {e}") - self.js_parser = None - self.ts_parser = None - - # Initialize dependency tracking - self.dependencies = { - 'imports': { - 'standard_library': [], - 'third_party': [], - 'local': [] - } - } - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - return extension.lower() in self.SUPPORTED_EXTENSIONS - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "javascript" - - def is_available(self) -> bool: - """Check if this strategy is available.""" - return self.js_parser is not None and self.ts_parser is not None - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from JavaScript/TypeScript files.""" - logger.debug(f"JavaScriptStrategy Phase 1: Processing {len(files)} files for symbol collection") - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): # Progress every 10 files or at end - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"JavaScriptStrategy Phase 2: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_javascript_file(file_path, project_path, relationships) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): # Progress every 10 files or at end - logger.debug(f"Phase 2 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 2 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 2 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between JavaScript/TypeScript symbols. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"JavaScriptStrategy: Building symbol relationships for {len(files)} files") - - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"JavaScriptStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single JavaScript/TypeScript file.""" - - # Reset dependencies for this file - self._reset_dependencies() - - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {os.path.relpath(file_path, project_path)}") - return - - # Parse with Tree-sitter - try: - tree = self._parse_js_content(content, file_path) - if not tree or not tree.root_node: - raise StrategyError(f"Failed to parse {os.path.relpath(file_path, project_path)}") - except Exception as e: - logger.warning(f"Parse error in {os.path.relpath(file_path, project_path)}: {e}") - return - - # Collect symbols using integrated visitor - relative_path = self._get_relative_path(file_path, project_path) - self._collect_symbols_from_tree(tree, relative_path, content) - logger.debug(f"Symbol collection - {relative_path}") - - def _analyze_javascript_file(self, file_path: str, project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> Optional[scip_pb2.Document]: - """Analyze a single JavaScript/TypeScript file and generate complete SCIP document.""" - relative_path = self._get_relative_path(file_path, project_path) - - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {relative_path}") - return None - - # Parse with Tree-sitter - try: - tree = self._parse_js_content(content, file_path) - if not tree or not tree.root_node: - raise StrategyError(f"Failed to parse {relative_path}") - except Exception as e: - logger.warning(f"Parse error in {relative_path}: {e}") - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = relative_path - document.language = self.get_language_name() - - # Analyze tree and generate occurrences - self.position_calculator = PositionCalculator(content) - - occurrences, symbols = self._analyze_tree_for_document(tree, relative_path, content, relationships) - - # Add results to document - document.occurrences.extend(occurrences) - document.symbols.extend(symbols) - - logger.debug(f"Document analysis - {relative_path}: " - f"-> {len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - def _extract_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """ - Extract relationships from a single JavaScript/TypeScript file. - - Args: - file_path: File to analyze - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - content = self._read_file_content(file_path) - if not content: - return {} - - try: - tree = self._parse_js_content(content, file_path) - if not tree or not tree.root_node: - raise StrategyError(f"Failed to parse {file_path} for relationship extraction") - except Exception as e: - logger.warning(f"Parse error in {file_path}: {e}") - return {} - - return self._extract_relationships_from_tree(tree, file_path, project_path) - - def _parse_js_content(self, content: str, file_path: str): - """Parse JavaScript/TypeScript content using Tree-sitter parser.""" - # Determine parser based on file extension - extension = os.path.splitext(file_path)[1].lower() - - if extension in {'.ts', '.tsx'}: - parser = self.ts_parser - else: - parser = self.js_parser - - if not parser: - raise StrategyError(f"No parser available for {extension}") - - content_bytes = content.encode('utf-8') - return parser.parse(content_bytes) - - def _collect_symbols_from_tree(self, tree, file_path: str, content: str) -> None: - """Collect symbols from Tree-sitter tree using integrated visitor.""" - # Use a set to track processed nodes and avoid duplicates - self._processed_nodes = set() - scope_stack = [] - - def visit_node(node, current_scope_stack=None): - if current_scope_stack is None: - current_scope_stack = scope_stack[:] - - # Skip if already processed (by memory address) - node_id = id(node) - if node_id in self._processed_nodes: - return - self._processed_nodes.add(node_id) - - node_type = node.type - - # Traditional function and class declarations - if node_type in ['function_declaration', 'method_definition', 'arrow_function']: - name = self._get_js_function_name(node) - if name: - self._register_function_symbol(node, name, file_path, current_scope_stack) - elif node_type in ['class_declaration']: - name = self._get_js_class_name(node) - if name: - self._register_class_symbol(node, name, file_path, current_scope_stack) - - # Assignment expressions with function expressions (obj.method = function() {}) - elif node_type == 'assignment_expression': - self._handle_assignment_expression(node, file_path, current_scope_stack) - - # Lexical declarations (const, let, var) - elif node_type == 'lexical_declaration': - self._handle_lexical_declaration(node, file_path, current_scope_stack) - - # Expression statements (might contain method chains) - elif node_type == 'expression_statement': - self._handle_expression_statement(node, file_path, current_scope_stack) - - # Recursively visit children - for child in node.children: - visit_node(child, current_scope_stack) - - visit_node(tree.root_node) - - def _analyze_tree_for_document(self, tree, file_path: str, content: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> tuple[List[scip_pb2.Occurrence], List[scip_pb2.SymbolInformation]]: - """Analyze Tree-sitter tree to generate occurrences and symbols for SCIP document.""" - occurrences = [] - symbols = [] - scope_stack = [] - - # Use the same processed nodes set to avoid duplicates - if not hasattr(self, '_processed_nodes'): - self._processed_nodes = set() - - def visit_node(node, current_scope_stack=None): - if current_scope_stack is None: - current_scope_stack = scope_stack[:] - - node_type = node.type - - # Traditional function and class declarations - if node_type in ['function_declaration', 'method_definition', 'arrow_function']: - name = self._get_js_function_name(node) - if name: - symbol_id = self._create_function_symbol_id(name, file_path, current_scope_stack) - occurrence = self._create_function_occurrence(node, symbol_id) - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - symbol_info = self._create_function_symbol_info(node, symbol_id, name, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - elif node_type in ['class_declaration']: - name = self._get_js_class_name(node) - if name: - symbol_id = self._create_class_symbol_id(name, file_path, current_scope_stack) - occurrence = self._create_class_occurrence(node, symbol_id) - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - symbol_info = self._create_class_symbol_info(node, symbol_id, name, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - # Assignment expressions with function expressions - elif node_type == 'assignment_expression': - occurrence, symbol_info = self._handle_assignment_for_document(node, file_path, current_scope_stack, relationships) - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - # Lexical declarations - elif node_type == 'lexical_declaration': - document_symbols = self._handle_lexical_for_document(node, file_path, current_scope_stack, relationships) - for occ, sym in document_symbols: - if occ: - occurrences.append(occ) - if sym: - symbols.append(sym) - - # Recursively visit children only if not in assignment or lexical that we handle above - if node_type not in ['assignment_expression', 'lexical_declaration']: - for child in node.children: - visit_node(child, current_scope_stack) - - visit_node(tree.root_node) - return occurrences, symbols - - def _extract_relationships_from_tree(self, tree, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract relationships from Tree-sitter tree.""" - relationships = {} - scope_stack = [] - relative_path = self._get_relative_path(file_path, project_path) - - def visit_node(node, current_scope_stack=None): - if current_scope_stack is None: - current_scope_stack = scope_stack[:] - - node_type = node.type - - if node_type == 'class_declaration': - # Extract inheritance relationships - class_name = self._get_js_class_name(node) - if class_name: - class_symbol_id = self._create_class_symbol_id(class_name, relative_path, current_scope_stack) - - # Look for extends clause - for child in node.children: - if child.type == 'class_heritage': - for heritage_child in child.children: - if heritage_child.type == 'identifier': - parent_name = self._get_node_text(heritage_child) - if parent_name: - parent_symbol_id = self._create_class_symbol_id(parent_name, relative_path, current_scope_stack) - if class_symbol_id not in relationships: - relationships[class_symbol_id] = [] - relationships[class_symbol_id].append((parent_symbol_id, InternalRelationshipType.INHERITS)) - - elif node_type in ['function_declaration', 'method_definition', 'arrow_function']: - # Extract function call relationships - function_name = self._get_js_function_name(node) - if function_name: - function_symbol_id = self._create_function_symbol_id(function_name, relative_path, current_scope_stack) - - # Find call expressions within this function - self._extract_calls_from_node(node, function_symbol_id, relationships, relative_path, current_scope_stack) - - # Recursively visit children - for child in node.children: - visit_node(child, current_scope_stack) - - visit_node(tree.root_node) - return relationships - - def _extract_calls_from_node(self, node, source_symbol_id: str, relationships: Dict, file_path: str, scope_stack: List): - """Extract function calls from a node.""" - - def visit_for_calls(n): - if n.type == 'call_expression': - # Get the function being called - function_node = n.children[0] if n.children else None - if function_node: - if function_node.type == 'identifier': - target_name = self._get_node_text(function_node) - if target_name: - target_symbol_id = self._create_function_symbol_id(target_name, file_path, scope_stack) - if source_symbol_id not in relationships: - relationships[source_symbol_id] = [] - relationships[source_symbol_id].append((target_symbol_id, InternalRelationshipType.CALLS)) - - for child in n.children: - visit_for_calls(child) - - visit_for_calls(node) - - # Helper methods for Tree-sitter node processing - def _get_node_text(self, node) -> Optional[str]: - """Get text content of a Tree-sitter node.""" - if hasattr(node, 'text'): - try: - return node.text.decode('utf-8') - except: - pass - return None - - def _get_js_function_name(self, node) -> Optional[str]: - """Extract function name from function node.""" - for child in node.children: - if child.type == 'identifier': - return self._get_node_text(child) - return None - - def _get_js_class_name(self, node) -> Optional[str]: - """Extract class name from class node.""" - for child in node.children: - if child.type == 'identifier': - return self._get_node_text(child) - return None - - # Helper methods - def _register_function_symbol(self, node, name: str, file_path: str, scope_stack: List[str]) -> None: - """Register a function symbol definition.""" - symbol_id = self._create_function_symbol_id(name, file_path, scope_stack) - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Function, - display_name=name, - documentation=["JavaScript function"] - ) - - def _register_class_symbol(self, node, name: str, file_path: str, scope_stack: List[str]) -> None: - """Register a class symbol definition.""" - symbol_id = self._create_class_symbol_id(name, file_path, scope_stack) - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Class, - display_name=name, - documentation=["JavaScript class"] - ) - - def _create_function_symbol_id(self, name: str, file_path: str, scope_stack: List[str]) -> str: - """Create symbol ID for function.""" - # SCIP standard: local - local_id = ".".join(scope_stack + [name]) if scope_stack else name - return f"local {local_id}()." - - def _create_class_symbol_id(self, name: str, file_path: str, scope_stack: List[str]) -> str: - """Create symbol ID for class.""" - # SCIP standard: local - local_id = ".".join(scope_stack + [name]) if scope_stack else name - return f"local {local_id}#" - - def _create_function_occurrence(self, node, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for function.""" - if not self.position_calculator: - return None - - try: - # Use Tree-sitter position calculation method - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierFunction - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_class_occurrence(self, node, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for class.""" - if not self.position_calculator: - return None - - try: - # Use Tree-sitter position calculation method - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierType - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_function_symbol_info(self, node, symbol_id: str, name: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for function.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = name - symbol_info.kind = scip_pb2.Function - - # Add documentation - check for JSDoc or comments - symbol_info.documentation.append("JavaScript function") - - # Add relationships if provided - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - def _create_class_symbol_info(self, node, symbol_id: str, name: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for class.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = name - symbol_info.kind = scip_pb2.Class - - # Add documentation - check for JSDoc or comments - symbol_info.documentation.append("JavaScript class") - - # Add relationships if provided - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - # JavaScript-specific syntax handlers - def _handle_assignment_expression(self, node, file_path: str, scope_stack: List[str]) -> None: - """Handle assignment expressions like obj.method = function() {}""" - left_child = None - right_child = None - - for child in node.children: - if child.type == 'member_expression': - left_child = child - elif child.type in ['function_expression', 'arrow_function']: - right_child = child - - if left_child and right_child: - # Extract method name from member expression - method_name = self._extract_member_expression_name(left_child) - if method_name: - # Use just the last part as function name for cleaner identification - clean_name = method_name.split('.')[-1] if '.' in method_name else method_name - # Register as function symbol - self._register_function_symbol(right_child, clean_name, file_path, scope_stack + method_name.split('.')[:-1]) - - def _handle_lexical_declaration(self, node, file_path: str, scope_stack: List[str]) -> None: - """Handle lexical declarations like const VAR = value""" - for child in node.children: - if child.type == 'variable_declarator': - # Get variable name and value - var_name = None - var_value = None - - for declarator_child in child.children: - if declarator_child.type == 'identifier': - var_name = self._get_node_text(declarator_child) - elif declarator_child.type in ['object_expression', 'new_expression', 'call_expression']: - var_value = declarator_child - elif declarator_child.type == 'object_pattern': - # Handle destructuring like const { v4: uuidv4 } = require('uuid') - self._handle_destructuring_pattern(declarator_child, file_path, scope_stack) - - if var_name: - # Check if this is an import/require statement - if var_value and var_value.type == 'call_expression': - # Check if it's a require() call - is_require = False - for cc in var_value.children: - if cc.type == 'identifier' and self._get_node_text(cc) == 'require': - is_require = True - break - - if is_require: - self._handle_import_statement(var_name, var_value, file_path, scope_stack) - else: - # Register as variable (like const limiter = rateLimit(...)) - self._register_variable_symbol(child, var_name, file_path, scope_stack, var_value) - - # Extract functions from call_expression (like rateLimit config) - self._extract_functions_from_call_expression(var_value, var_name, file_path, scope_stack) - else: - # Register as constant/variable symbol - self._register_variable_symbol(child, var_name, file_path, scope_stack, var_value) - # Extract functions from object expressions - if var_value and var_value.type == 'object_expression': - self._extract_functions_from_object(var_value, var_name, file_path, scope_stack) - - def _handle_expression_statement(self, node, file_path: str, scope_stack: List[str]) -> None: - """Handle expression statements that might contain method chains""" - for child in node.children: - if child.type == 'call_expression': - # Look for method chain patterns like schema.virtual().get() - self._handle_method_chain(child, file_path, scope_stack) - elif child.type == 'assignment_expression': - # Handle nested assignment expressions - self._handle_assignment_expression(child, file_path, scope_stack) - - def _handle_method_chain(self, node, file_path: str, scope_stack: List[str]) -> None: - """Handle method chains like schema.virtual('name').get(function() {})""" - # Look for chained calls that end with function expressions - for child in node.children: - if child.type == 'member_expression': - # This could be a chained method call - member_name = self._extract_member_expression_name(child) - if member_name: - # Look for function arguments - for sibling in node.children: - if sibling.type == 'arguments': - for arg in sibling.children: - if arg.type in ['function_expression', 'arrow_function']: - # Register the function with a descriptive name - func_name = f"{member_name}_callback" - self._register_function_symbol(arg, func_name, file_path, scope_stack) - - def _extract_member_expression_name(self, node) -> Optional[str]: - """Extract name from member expression like obj.prop.method""" - parts = [] - - def extract_parts(n): - if n.type == 'member_expression': - # Process children in order: object first, then property - object_child = None - property_child = None - - for child in n.children: - if child.type in ['identifier', 'member_expression']: - object_child = child - elif child.type == 'property_identifier': - property_child = child - - # Recursively extract object part first - if object_child: - if object_child.type == 'member_expression': - extract_parts(object_child) - elif object_child.type == 'identifier': - parts.append(self._get_node_text(object_child)) - - # Then add the property - if property_child: - parts.append(self._get_node_text(property_child)) - - elif n.type == 'identifier': - parts.append(self._get_node_text(n)) - - extract_parts(node) - return '.'.join(parts) if parts else None - - def _register_variable_symbol(self, node, name: str, file_path: str, scope_stack: List[str], value_node=None) -> None: - """Register a variable/constant symbol definition.""" - symbol_id = self._create_variable_symbol_id(name, file_path, scope_stack, value_node) - - # Determine symbol type based on value - symbol_kind = scip_pb2.Variable - doc_type = "JavaScript variable" - - if value_node: - if value_node.type == 'object_expression': - symbol_kind = scip_pb2.Object - doc_type = "JavaScript object" - elif value_node.type == 'new_expression': - symbol_kind = scip_pb2.Variable # new expressions create variables, not classes - doc_type = "JavaScript instance" - elif value_node.type == 'call_expression': - # Check if it's a require call vs regular function call - is_require = False - for child in value_node.children: - if child.type == 'identifier' and self._get_node_text(child) == 'require': - is_require = True - break - if is_require: - symbol_kind = scip_pb2.Namespace - doc_type = "JavaScript import" - else: - symbol_kind = scip_pb2.Variable - doc_type = "JavaScript constant" - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=symbol_kind, - display_name=name, - documentation=[doc_type] - ) - - def _handle_destructuring_pattern(self, node, file_path: str, scope_stack: List[str]) -> None: - """Handle destructuring patterns like { v4: uuidv4 }""" - for child in node.children: - if child.type == 'shorthand_property_identifier_pattern': - # Simple destructuring like { prop } - var_name = self._get_node_text(child) - if var_name: - self._register_variable_symbol(child, var_name, file_path, scope_stack) - elif child.type == 'pair_pattern': - # Renamed destructuring like { v4: uuidv4 } - for pair_child in child.children: - if pair_child.type == 'identifier': - var_name = self._get_node_text(pair_child) - if var_name: - self._register_variable_symbol(pair_child, var_name, file_path, scope_stack) - - def _handle_import_statement(self, var_name: str, call_node, file_path: str, scope_stack: List[str]) -> None: - """Handle import statements like const lib = require('module')""" - # Check if this is a require() call - callee = None - module_name = None - - for child in call_node.children: - if child.type == 'identifier': - callee = self._get_node_text(child) - elif child.type == 'arguments': - # Get the module name from arguments - for arg in child.children: - if arg.type == 'string': - module_name = self._get_node_text(arg).strip('"\'') - break - - if callee == 'require' and module_name: - # Classify dependency type - self._classify_and_store_dependency(module_name) - - # Create SCIP standard symbol ID - local_id = ".".join(scope_stack + [var_name]) if scope_stack else var_name - symbol_id = f"local {local_id}(import)" - - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Namespace, - display_name=var_name, - documentation=[f"Import from {module_name}"] - ) - - def _handle_assignment_for_document(self, node, file_path: str, scope_stack: List[str], relationships: Optional[Dict[str, List[tuple]]]) -> tuple[Optional[scip_pb2.Occurrence], Optional[scip_pb2.SymbolInformation]]: - """Handle assignment expressions for document generation""" - left_child = None - right_child = None - - for child in node.children: - if child.type == 'member_expression': - left_child = child - elif child.type in ['function_expression', 'arrow_function']: - right_child = child - - if left_child and right_child: - method_name = self._extract_member_expression_name(left_child) - if method_name: - symbol_id = self._create_function_symbol_id(method_name, file_path, scope_stack) - occurrence = self._create_function_occurrence(right_child, symbol_id) - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - symbol_info = self._create_function_symbol_info(right_child, symbol_id, method_name, scip_relationships) - return occurrence, symbol_info - - return None, None - - def _handle_lexical_for_document(self, node, file_path: str, scope_stack: List[str], relationships: Optional[Dict[str, List[tuple]]]) -> List[tuple]: - """Handle lexical declarations for document generation""" - results = [] - - for child in node.children: - if child.type == 'variable_declarator': - var_name = None - var_value = None - - for declarator_child in child.children: - if declarator_child.type == 'identifier': - var_name = self._get_node_text(declarator_child) - elif declarator_child.type in ['object_expression', 'new_expression', 'call_expression']: - var_value = declarator_child - - if var_name: - # Create occurrence and symbol info for variable - symbol_id = self._create_variable_symbol_id(var_name, file_path, scope_stack, var_value) - occurrence = self._create_variable_occurrence(child, symbol_id) - symbol_info = self._create_variable_symbol_info(child, symbol_id, var_name, var_value) - results.append((occurrence, symbol_info)) - - return results - - def _create_variable_symbol_id(self, name: str, file_path: str, scope_stack: List[str], value_node=None) -> str: - """Create symbol ID for variable.""" - # SCIP standard: local - local_id = ".".join(scope_stack + [name]) if scope_stack else name - - # Determine descriptor based on value type - descriptor = "." # Default for variables - if value_node: - if value_node.type == 'object_expression': - descriptor = "{}" - elif value_node.type == 'new_expression': - descriptor = "." # new expressions are still variables, not classes - elif value_node.type == 'call_expression': - # Check if it's a require call vs regular function call - is_require = False - for child in value_node.children: - if child.type == 'identifier' and hasattr(self, '_get_node_text'): - if self._get_node_text(child) == 'require': - is_require = True - break - descriptor = "(import)" if is_require else "." - - return f"local {local_id}{descriptor}" - - def _create_variable_occurrence(self, node, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for variable.""" - if not self.position_calculator: - return None - - try: - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierConstant - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_variable_symbol_info(self, node, symbol_id: str, name: str, value_node=None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for variable.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = name - - # Determine kind based on value - correct classification - if value_node: - if value_node.type == 'object_expression': - symbol_info.kind = scip_pb2.Object - symbol_info.documentation.append("JavaScript object literal") - elif value_node.type == 'new_expression': - symbol_info.kind = scip_pb2.Variable # new expressions create variables, not classes - symbol_info.documentation.append("JavaScript instance variable") - elif value_node.type == 'call_expression': - symbol_info.kind = scip_pb2.Namespace - symbol_info.documentation.append("JavaScript import") - elif value_node.type == 'function_expression': - symbol_info.kind = scip_pb2.Function - symbol_info.documentation.append("JavaScript function variable") - else: - symbol_info.kind = scip_pb2.Variable - symbol_info.documentation.append("JavaScript variable") - else: - symbol_info.kind = scip_pb2.Variable - symbol_info.documentation.append("JavaScript variable") - - return symbol_info - - def _extract_functions_from_object(self, object_node, parent_name: str, file_path: str, scope_stack: List[str]) -> None: - """Extract functions from object expressions like { handler: function() {} }""" - for child in object_node.children: - if child.type == 'pair': - prop_name = None - prop_value = None - - for pair_child in child.children: - if pair_child.type in ['identifier', 'property_identifier']: - prop_name = self._get_node_text(pair_child) - elif pair_child.type in ['function_expression', 'arrow_function']: - prop_value = pair_child - - if prop_name and prop_value: - # Register function with context-aware name - func_scope = scope_stack + [parent_name] - self._register_function_symbol(prop_value, prop_name, file_path, func_scope) - - def _extract_functions_from_call_expression(self, call_node, parent_name: str, file_path: str, scope_stack: List[str]) -> None: - """Extract functions from call expressions arguments like rateLimit({ handler: function() {} })""" - for child in call_node.children: - if child.type == 'arguments': - for arg in child.children: - if arg.type == 'object_expression': - self._extract_functions_from_object(arg, parent_name, file_path, scope_stack) - elif arg.type in ['function_expression', 'arrow_function']: - # Anonymous function in call - give it a descriptive name - func_name = f"{parent_name}_callback" - self._register_function_symbol(arg, func_name, file_path, scope_stack) - - def _classify_and_store_dependency(self, module_name: str) -> None: - """Classify and store dependency based on module name.""" - # Standard Node.js built-in modules - node_builtins = { - 'fs', 'path', 'http', 'https', 'url', 'crypto', 'os', 'util', 'events', - 'stream', 'buffer', 'child_process', 'cluster', 'dgram', 'dns', 'net', - 'tls', 'zlib', 'readline', 'repl', 'vm', 'worker_threads', 'async_hooks' - } - - if module_name in node_builtins: - category = 'standard_library' - elif module_name.startswith('./') or module_name.startswith('../') or module_name.startswith('/'): - category = 'local' - else: - category = 'third_party' - - # Avoid duplicates - if module_name not in self.dependencies['imports'][category]: - self.dependencies['imports'][category].append(module_name) - - def get_dependencies(self) -> Dict[str, Any]: - """Get collected dependencies for MCP response.""" - return self.dependencies - - def _reset_dependencies(self) -> None: - """Reset dependency tracking for new file analysis.""" - self.dependencies = { - 'imports': { - 'standard_library': [], - 'third_party': [], - 'local': [] - } - } \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/javascript_strategy_backup.py b/src/code_index_mcp/scip/strategies/javascript_strategy_backup.py deleted file mode 100644 index 93c2273..0000000 --- a/src/code_index_mcp/scip/strategies/javascript_strategy_backup.py +++ /dev/null @@ -1,869 +0,0 @@ -"""JavaScript/TypeScript SCIP indexing strategy v2 - SCIP standard compliant.""" - -import logging -import os -from typing import List, Optional, Dict, Any, Set -from pathlib import Path - -try: - import tree_sitter - from tree_sitter_javascript import language as js_language - from tree_sitter_typescript import language_typescript as ts_language - TREE_SITTER_AVAILABLE = True -except ImportError: - TREE_SITTER_AVAILABLE = False - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - - -logger = logging.getLogger(__name__) - - -class JavaScriptStrategy(SCIPIndexerStrategy): - """SCIP-compliant JavaScript/TypeScript indexing strategy using Tree-sitter.""" - - SUPPORTED_EXTENSIONS = {'.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs'} - - def __init__(self, priority: int = 95): - """Initialize the JavaScript/TypeScript strategy v2.""" - super().__init__(priority) - - if not TREE_SITTER_AVAILABLE: - raise StrategyError("Tree-sitter not available for JavaScript/TypeScript strategy") - - # Initialize parsers - js_lang = tree_sitter.Language(js_language()) - ts_lang = tree_sitter.Language(ts_language()) - - self.js_parser = tree_sitter.Parser(js_lang) - self.ts_parser = tree_sitter.Parser(ts_lang) - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - return extension.lower() in self.SUPPORTED_EXTENSIONS and TREE_SITTER_AVAILABLE - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "javascript" # Use 'javascript' for both JS and TS - - def is_available(self) -> bool: - """Check if this strategy is available.""" - return TREE_SITTER_AVAILABLE - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from JavaScript/TypeScript files.""" - for file_path in files: - try: - self._collect_symbols_from_file(file_path, project_path) - except Exception as e: - logger.warning(f"Failed to collect symbols from {file_path}: {e}") - continue - - def _generate_documents_with_references(self, files: List[str], project_path: str) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - - for file_path in files: - try: - document = self._analyze_js_file(file_path, project_path) - if document: - documents.append(document) - except Exception as e: - logger.error(f"Failed to analyze JavaScript/TypeScript file {file_path}: {e}") - continue - - return documents - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single JavaScript/TypeScript file.""" - # Read file content - content = self._read_file_content(file_path) - if not content: - return - - # Parse with Tree-sitter - tree = self._parse_content(content, file_path) - if not tree: - return - - # Collect symbols - relative_path = self._get_relative_path(file_path, project_path) - collector = JavaScriptSymbolCollector( - relative_path, content, tree, self.symbol_manager, self.reference_resolver - ) - collector.analyze() - - def _analyze_js_file(self, file_path: str, project_path: str) -> Optional[scip_pb2.Document]: - """Analyze a single JavaScript/TypeScript file and generate complete SCIP document.""" - # Read file content - content = self._read_file_content(file_path) - if not content: - return None - - # Parse with Tree-sitter - tree = self._parse_content(content, file_path) - if not tree: - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = self._get_relative_path(file_path, project_path) - document.language = self._detect_specific_language(Path(file_path).suffix) - - # Analyze AST and generate occurrences - self.position_calculator = PositionCalculator(content) - analyzer = JavaScriptAnalyzer( - document.relative_path, - content, - tree, - document.language, - self.symbol_manager, - self.position_calculator, - self.reference_resolver - ) - analyzer.analyze() - - # Add results to document - document.occurrences.extend(analyzer.occurrences) - document.symbols.extend(analyzer.symbols) - - logger.debug(f"Analyzed JavaScript/TypeScript file {document.relative_path}: " - f"{len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - def _parse_content(self, content: str, file_path: str) -> Optional[tree_sitter.Tree]: - """Parse content with appropriate parser.""" - try: - content_bytes = content.encode('utf-8') - - # Choose parser based on file extension - extension = Path(file_path).suffix.lower() - if extension in ['.ts', '.tsx']: - return self.ts_parser.parse(content_bytes) - else: - return self.js_parser.parse(content_bytes) - - except Exception as e: - logger.error(f"Failed to parse content: {e}") - return None - - def _detect_specific_language(self, extension: str) -> str: - """Detect specific language from extension.""" - ext_to_lang = { - '.js': 'javascript', - '.jsx': 'jsx', - '.mjs': 'javascript', - '.cjs': 'javascript', - '.ts': 'typescript', - '.tsx': 'tsx' - } - return ext_to_lang.get(extension.lower(), 'javascript') - - -class JavaScriptSymbolCollector: - """Tree-sitter based symbol collector for JavaScript/TypeScript (Phase 1).""" - - def __init__(self, file_path: str, content: str, tree: tree_sitter.Tree, symbol_manager, reference_resolver): - self.file_path = file_path - self.content = content - self.tree = tree - self.symbol_manager = symbol_manager - self.reference_resolver = reference_resolver - self.scope_stack: List[str] = [] - - def analyze(self): - """Analyze the tree-sitter AST to collect symbols.""" - root = self.tree.root_node - self._analyze_node(root) - - def _analyze_node(self, node: tree_sitter.Node): - """Recursively analyze AST nodes.""" - node_type = node.type - - if node_type == 'function_declaration': - self._register_function_symbol(node) - elif node_type == 'method_definition': - self._register_method_symbol(node) - elif node_type == 'class_declaration': - self._register_class_symbol(node) - elif node_type == 'interface_declaration': - self._register_interface_symbol(node) - elif node_type == 'type_alias_declaration': - self._register_type_alias_symbol(node) - elif node_type == 'variable_declarator': - self._register_variable_symbol(node) - - # Recursively analyze child nodes - for child in node.children: - self._analyze_node(child) - - def _register_function_symbol(self, node: tree_sitter.Node): - """Register a function symbol definition.""" - name_node = self._find_child_by_type(node, 'identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="()." - ) - - self._register_symbol(symbol_id, name, scip_pb2.Function, ["JavaScript function"]) - - def _register_method_symbol(self, node: tree_sitter.Node): - """Register a method symbol definition.""" - name_node = (self._find_child_by_type(node, 'property_identifier') or - self._find_child_by_type(node, 'identifier')) - if name_node: - name = self._get_node_text(name_node) - if name: - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="()." - ) - - self._register_symbol(symbol_id, name, scip_pb2.Method, ["JavaScript method"]) - - def _register_class_symbol(self, node: tree_sitter.Node): - """Register a class symbol definition.""" - name_node = self._find_child_by_type(node, 'identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="#" - ) - - self._register_symbol(symbol_id, name, scip_pb2.Class, ["JavaScript class"]) - - def _register_interface_symbol(self, node: tree_sitter.Node): - """Register a TypeScript interface symbol definition.""" - name_node = self._find_child_by_type(node, 'type_identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="#" - ) - - self._register_symbol(symbol_id, name, scip_pb2.Interface, ["TypeScript interface"]) - - def _register_type_alias_symbol(self, node: tree_sitter.Node): - """Register a TypeScript type alias symbol definition.""" - name_node = self._find_child_by_type(node, 'type_identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="#" - ) - - self._register_symbol(symbol_id, name, scip_pb2.TypeParameter, ["TypeScript type alias"]) - - def _register_variable_symbol(self, node: tree_sitter.Node): - """Register a variable symbol definition.""" - name_node = self._find_child_by_type(node, 'identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="" - ) - - self._register_symbol(symbol_id, name, scip_pb2.Variable, ["JavaScript variable"]) - - def _register_symbol(self, symbol_id: str, name: str, symbol_kind: int, documentation: List[str]): - """Register a symbol with the reference resolver.""" - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=self.file_path, - definition_range=dummy_range, - symbol_kind=symbol_kind, - display_name=name, - documentation=documentation - ) - - def _find_child_by_type(self, node: tree_sitter.Node, node_type: str) -> Optional[tree_sitter.Node]: - """Find first child node of the given type.""" - for child in node.children: - if child.type == node_type: - return child - return None - - def _get_node_text(self, node: tree_sitter.Node) -> str: - """Get text content of a node.""" - return self.content[node.start_byte:node.end_byte] - - -class JavaScriptAnalyzer: - """Tree-sitter based analyzer for JavaScript/TypeScript AST (Phase 2).""" - - def __init__(self, file_path: str, content: str, tree: tree_sitter.Tree, language: str, - symbol_manager, position_calculator, reference_resolver): - self.file_path = file_path - self.content = content - self.tree = tree - self.language = language - self.symbol_manager = symbol_manager - self.position_calculator = position_calculator - self.reference_resolver = reference_resolver - self.scope_stack: List[str] = [] - - # Results - self.occurrences: List[scip_pb2.Occurrence] = [] - self.symbols: List[scip_pb2.SymbolInformation] = [] - - def analyze(self): - """Analyze the tree-sitter AST.""" - root = self.tree.root_node - self._analyze_node(root) - - def _analyze_node(self, node: tree_sitter.Node): - """Recursively analyze AST nodes.""" - node_type = node.type - - if node_type == 'function_declaration': - self._handle_function_declaration(node) - elif node_type == 'method_definition': - self._handle_method_definition(node) - elif node_type == 'class_declaration': - self._handle_class_declaration(node) - elif node_type == 'interface_declaration': - self._handle_interface_declaration(node) - elif node_type == 'type_alias_declaration': - self._handle_type_alias_declaration(node) - elif node_type == 'variable_declarator': - self._handle_variable_declarator(node) - elif node_type == 'identifier': - self._handle_identifier_reference(node) - - # Recursively analyze child nodes - for child in node.children: - self._analyze_node(child) - - def _handle_function_declaration(self, node: tree_sitter.Node): - """Handle function declarations.""" - name_node = self._find_child_by_type(node, 'identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - self._create_function_symbol(node, name_node, name, False) - - def _handle_method_definition(self, node: tree_sitter.Node): - """Handle method definitions.""" - name_node = (self._find_child_by_type(node, 'property_identifier') or - self._find_child_by_type(node, 'identifier')) - if name_node: - name = self._get_node_text(name_node) - if name: - self._create_function_symbol(node, name_node, name, True) - - def _handle_class_declaration(self, node: tree_sitter.Node): - """Handle class declarations.""" - name_node = self._find_child_by_type(node, 'identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - self._create_class_symbol(node, name_node, name, scip_pb2.Class, "JavaScript class") - - # Enter class scope - self.scope_stack.append(name) - - # Analyze class body - class_body = self._find_child_by_type(node, 'class_body') - if class_body: - self._analyze_node(class_body) - - # Exit class scope - self.scope_stack.pop() - - def _handle_interface_declaration(self, node: tree_sitter.Node): - """Handle TypeScript interface declarations.""" - name_node = self._find_child_by_type(node, 'type_identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - self._create_class_symbol(node, name_node, name, scip_pb2.Interface, "TypeScript interface") - - def _handle_type_alias_declaration(self, node: tree_sitter.Node): - """Handle TypeScript type alias declarations.""" - name_node = self._find_child_by_type(node, 'type_identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - self._create_class_symbol(node, name_node, name, scip_pb2.TypeParameter, "TypeScript type alias") - - def _handle_variable_declarator(self, node: tree_sitter.Node): - """Handle variable declarations.""" - name_node = self._find_child_by_type(node, 'identifier') - if name_node: - name = self._get_node_text(name_node) - if name: - self._create_variable_symbol(node, name_node, name) - - def _handle_identifier_reference(self, node: tree_sitter.Node): - """Handle identifier references.""" - # Only handle if it's not part of a declaration - parent = node.parent - if parent and parent.type not in [ - 'function_declaration', 'class_declaration', 'variable_declarator', - 'method_definition', 'interface_declaration', 'type_alias_declaration' - ]: - name = self._get_node_text(node) - if name and len(name) > 1: # Avoid single letters - self._handle_name_reference(node, name) - - def _create_function_symbol(self, node: tree_sitter.Node, name_node: tree_sitter.Node, name: str, is_method: bool): - """Create a function or method symbol.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="()." - ) - - # Create definition occurrence - range_obj = self.position_calculator.tree_sitter_node_to_range(name_node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Definition, scip_pb2.IdentifierFunction - ) - self.occurrences.append(occurrence) - - # Create symbol information - kind = scip_pb2.Method if is_method else scip_pb2.Function - doc_type = "method" if is_method else "function" - documentation = [f"JavaScript {doc_type} in {self.language}"] - - symbol_info = self._create_symbol_information( - symbol_id, name, kind, documentation - ) - self.symbols.append(symbol_info) - - def _create_class_symbol(self, node: tree_sitter.Node, name_node: tree_sitter.Node, - name: str, symbol_kind: int, description: str): - """Create a class, interface, or type symbol.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="#" - ) - - # Create definition occurrence - range_obj = self.position_calculator.tree_sitter_node_to_range(name_node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Definition, scip_pb2.IdentifierType - ) - self.occurrences.append(occurrence) - - # Create symbol information - symbol_info = self._create_symbol_information( - symbol_id, name, symbol_kind, [description] - ) - self.symbols.append(symbol_info) - - def _create_variable_symbol(self, node: tree_sitter.Node, name_node: tree_sitter.Node, name: str): - """Create a variable symbol.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="" - ) - - # Create definition occurrence - range_obj = self.position_calculator.tree_sitter_node_to_range(name_node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Definition, scip_pb2.IdentifierLocal - ) - self.occurrences.append(occurrence) - - # Create symbol information - symbol_info = self._create_symbol_information( - symbol_id, name, scip_pb2.Variable, [f"JavaScript variable in {self.language}"] - ) - self.symbols.append(symbol_info) - - def _handle_name_reference(self, node: tree_sitter.Node, name: str): - """Handle name reference.""" - # Try to resolve the reference - resolved_symbol_id = self.reference_resolver.resolve_reference_by_name( - symbol_name=name, - context_file=self.file_path, - context_scope=self.scope_stack - ) - - if resolved_symbol_id: - # Create reference occurrence - range_obj = self.position_calculator.tree_sitter_node_to_range(node) - occurrence = self._create_occurrence( - resolved_symbol_id, range_obj, 0, scip_pb2.Identifier # 0 = reference role - ) - self.occurrences.append(occurrence) - - # Register the reference - self.reference_resolver.register_symbol_reference( - symbol_id=resolved_symbol_id, - file_path=self.file_path, - reference_range=range_obj, - context_scope=self.scope_stack - ) - - def _find_child_by_type(self, node: tree_sitter.Node, node_type: str) -> Optional[tree_sitter.Node]: - """Find first child node of the given type.""" - for child in node.children: - if child.type == node_type: - return child - return None - - def _get_node_text(self, node: tree_sitter.Node) -> str: - """Get text content of a node.""" - return self.content[node.start_byte:node.end_byte] - - def _create_occurrence(self, symbol_id: str, range_obj: scip_pb2.Range, - symbol_roles: int, syntax_kind: int) -> scip_pb2.Occurrence: - """Create a SCIP occurrence.""" - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = symbol_roles - occurrence.syntax_kind = syntax_kind - occurrence.range.CopyFrom(range_obj) - return occurrence - - def _create_symbol_information(self, symbol_id: str, display_name: str, - symbol_kind: int, documentation: List[str] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = display_name - symbol_info.kind = symbol_kind - - if documentation: - symbol_info.documentation.extend(documentation) - - return symbol_info - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between JavaScript/TypeScript symbols. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"🔗 JavaScriptStrategy: Building symbol relationships for {len(files)} files") - - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_js_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"✅ JavaScriptStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _extract_js_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """ - Extract relationships from a single JavaScript/TypeScript file. - - Args: - file_path: File to analyze - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - content = self._read_file_content(file_path) - if not content: - return {} - - # Determine language based on file extension - file_ext = Path(file_path).suffix.lower() - is_typescript = file_ext in {'.ts', '.tsx'} - - if TREE_SITTER_AVAILABLE: - return self._extract_tree_sitter_relationships(content, file_path, is_typescript) - else: - return self._extract_regex_relationships(content, file_path) - - def _extract_tree_sitter_relationships(self, content: str, file_path: str, is_typescript: bool) -> Dict[str, List[tuple]]: - """Extract relationships using tree-sitter parser.""" - try: - # Choose appropriate language - language = ts_language() if is_typescript else js_language() - parser = tree_sitter.Parser() - parser.set_language(tree_sitter.Language(language)) - - tree = parser.parse(bytes(content, "utf8")) - - extractor = JSRelationshipExtractor( - file_path=file_path, - content=content, - symbol_manager=self.symbol_manager, - is_typescript=is_typescript - ) - - extractor.extract_from_tree(tree.root_node) - return extractor.get_relationships() - - except Exception as e: - logger.warning(f"Tree-sitter relationship extraction failed for {file_path}: {e}") - return self._extract_regex_relationships(content, file_path) - - def _extract_regex_relationships(self, content: str, file_path: str) -> Dict[str, List[tuple]]: - """Extract relationships using regex patterns (fallback).""" - import re - - relationships = {} - - # Simple regex patterns for basic relationship extraction - # This is a fallback when tree-sitter is not available - - # Class inheritance patterns - class_extends_pattern = r'class\s+(\w+)\s+extends\s+(\w+)' - for match in re.finditer(class_extends_pattern, content): - child_class = match.group(1) - parent_class = match.group(2) - - child_symbol_id = self._generate_symbol_id(file_path, [child_class], "#") - parent_symbol_id = self._generate_symbol_id(file_path, [parent_class], "#") - - if child_symbol_id not in relationships: - relationships[child_symbol_id] = [] - relationships[child_symbol_id].append((parent_symbol_id, InternalRelationshipType.INHERITS)) - - # Function calls patterns (basic) - function_call_pattern = r'(\w+)\s*\(' - current_function = None - - # Simple function definition detection - function_def_pattern = r'function\s+(\w+)\s*\(' - for match in re.finditer(function_def_pattern, content): - current_function = match.group(1) - # Extract calls within this function context (simplified) - - logger.debug(f"Regex extraction found {len(relationships)} relationships in {file_path}") - return relationships - - def _generate_symbol_id(self, file_path: str, symbol_path: List[str], descriptor: str) -> str: - """Generate SCIP symbol ID for a JavaScript symbol.""" - if self.symbol_manager: - return self.symbol_manager.create_local_symbol( - language="javascript", - file_path=file_path, - symbol_path=symbol_path, - descriptor=descriptor - ) - return f"local {'/'.join(symbol_path)}{descriptor}" - - -class JSRelationshipExtractor: - """ - Tree-sitter based relationship extractor for JavaScript/TypeScript. - """ - - def __init__(self, file_path: str, content: str, symbol_manager, is_typescript: bool = False): - self.file_path = file_path - self.content = content - self.symbol_manager = symbol_manager - self.is_typescript = is_typescript - self.relationships = {} - self.current_scope = [] - - def get_relationships(self) -> Dict[str, List[tuple]]: - """Get extracted relationships.""" - return self.relationships - - def _add_relationship(self, source_symbol_id: str, target_symbol_id: str, relationship_type: InternalRelationshipType): - """Add a relationship to the collection.""" - if source_symbol_id not in self.relationships: - self.relationships[source_symbol_id] = [] - self.relationships[source_symbol_id].append((target_symbol_id, relationship_type)) - - def extract_from_tree(self, node): - """Extract relationships from tree-sitter AST.""" - self._visit_node(node) - - def _visit_node(self, node): - """Visit a tree-sitter node recursively.""" - if node.type == "class_declaration": - self._handle_class_declaration(node) - elif node.type == "function_declaration": - self._handle_function_declaration(node) - elif node.type == "method_definition": - self._handle_method_definition(node) - elif node.type == "call_expression": - self._handle_call_expression(node) - elif node.type == "import_statement": - self._handle_import_statement(node) - - # Visit child nodes - for child in node.children: - self._visit_node(child) - - def _handle_class_declaration(self, node): - """Handle class declaration and inheritance.""" - class_name = None - parent_class = None - - for child in node.children: - if child.type == "identifier" and class_name is None: - class_name = self._get_node_text(child) - elif child.type == "class_heritage": - # Find extends clause - for heritage_child in child.children: - if heritage_child.type == "extends_clause": - for extends_child in heritage_child.children: - if extends_child.type == "identifier": - parent_class = self._get_node_text(extends_child) - break - - if class_name and parent_class: - class_symbol_id = self._generate_symbol_id([class_name], "#") - parent_symbol_id = self._generate_symbol_id([parent_class], "#") - self._add_relationship(class_symbol_id, parent_symbol_id, InternalRelationshipType.INHERITS) - - def _handle_function_declaration(self, node): - """Handle function declaration.""" - function_name = None - - for child in node.children: - if child.type == "identifier": - function_name = self._get_node_text(child) - break - - if function_name: - self.current_scope.append(function_name) - # Extract calls within function body - self._extract_function_calls(node, function_name) - self.current_scope.pop() - - def _handle_method_definition(self, node): - """Handle method definition within a class.""" - method_name = None - - for child in node.children: - if child.type == "property_identifier": - method_name = self._get_node_text(child) - break - - if method_name: - full_scope = self.current_scope + [method_name] - self._extract_function_calls(node, method_name) - - def _handle_call_expression(self, node): - """Handle function/method calls.""" - if self.current_scope: - current_function = self.current_scope[-1] - - # Extract called function name - called_function = None - - for child in node.children: - if child.type == "identifier": - called_function = self._get_node_text(child) - break - elif child.type == "member_expression": - # Handle method calls like obj.method() - called_function = self._extract_member_expression(child) - break - - if called_function and current_function: - source_symbol_id = self._generate_symbol_id([current_function], "().") - target_symbol_id = self._generate_symbol_id([called_function], "().") - self._add_relationship(source_symbol_id, target_symbol_id, InternalRelationshipType.CALLS) - - def _handle_import_statement(self, node): - """Handle import statements.""" - # Extract import relationships - imported_module = None - imported_symbols = [] - - for child in node.children: - if child.type == "import_clause": - # Extract imported symbols - pass - elif child.type == "string": - # Extract module path - imported_module = self._get_node_text(child).strip('"\'') - - # Add import relationships if needed - # This could be expanded to track module dependencies - - def _extract_function_calls(self, function_node, function_name: str): - """Extract all function calls within a function.""" - old_scope = self.current_scope.copy() - if function_name not in self.current_scope: - self.current_scope.append(function_name) - - self._visit_calls_in_node(function_node) - - self.current_scope = old_scope - - def _visit_calls_in_node(self, node): - """Visit all call expressions in a node.""" - if node.type == "call_expression": - self._handle_call_expression(node) - - for child in node.children: - self._visit_calls_in_node(child) - - def _extract_member_expression(self, node) -> str: - """Extract full name from member expression (e.g., 'obj.method').""" - parts = [] - - for child in node.children: - if child.type == "identifier": - parts.append(self._get_node_text(child)) - elif child.type == "property_identifier": - parts.append(self._get_node_text(child)) - - return ".".join(parts) if parts else "" - - def _get_node_text(self, node) -> str: - """Get text content of a tree-sitter node.""" - return self.content[node.start_byte:node.end_byte] - - def _generate_symbol_id(self, symbol_path: List[str], descriptor: str) -> str: - """Generate SCIP symbol ID.""" - if self.symbol_manager: - return self.symbol_manager.create_local_symbol( - language="javascript", - file_path=self.file_path, - symbol_path=symbol_path, - descriptor=descriptor - ) - return f"local {'/'.join(symbol_path)}{descriptor}" diff --git a/src/code_index_mcp/scip/strategies/objective_c_strategy.py b/src/code_index_mcp/scip/strategies/objective_c_strategy.py deleted file mode 100644 index c27dc87..0000000 --- a/src/code_index_mcp/scip/strategies/objective_c_strategy.py +++ /dev/null @@ -1,1083 +0,0 @@ -""" -Objective-C Strategy for SCIP indexing using libclang. - -This strategy uses libclang to parse Objective-C source files (.m, .mm, .h) -and extract symbol information following SCIP standards. -""" - -import logging -import os -from typing import List, Set, Optional, Tuple, Dict, Any -from pathlib import Path - -try: - import clang.cindex as clang - from clang.cindex import CursorKind, TypeKind - LIBCLANG_AVAILABLE = True -except ImportError: - LIBCLANG_AVAILABLE = False - clang = None - CursorKind = None - TypeKind = None - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - -logger = logging.getLogger(__name__) - - -class ObjectiveCStrategy(SCIPIndexerStrategy): - """SCIP indexing strategy for Objective-C using libclang.""" - - SUPPORTED_EXTENSIONS = {'.m', '.mm', '.h'} - - def __init__(self, priority: int = 95): - """Initialize the Objective-C strategy.""" - super().__init__(priority) - self._processed_symbols: Set[str] = set() - self._symbol_counter = 0 - self.project_path: Optional[str] = None - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - if not LIBCLANG_AVAILABLE: - logger.warning("libclang not available for Objective-C processing") - return False - return extension.lower() in self.SUPPORTED_EXTENSIONS - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "objc" - - def is_available(self) -> bool: - """Check if this strategy is available.""" - return LIBCLANG_AVAILABLE - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from Objective-C files.""" - logger.debug(f"ObjectiveCStrategy Phase 1: Processing {len(files)} files for symbol collection") - - # Store project path for use in import classification - self.project_path = project_path - - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """Phase 3: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"ObjectiveCStrategy Phase 3: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_objc_file(file_path, project_path, relationships) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 3 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 3 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 3 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """Phase 2: Build relationships between Objective-C symbols.""" - logger.debug(f"ObjectiveCStrategy: Building symbol relationships for {len(files)} files") - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"ObjectiveCStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single Objective-C file using libclang.""" - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {os.path.relpath(file_path, project_path)}") - return - - try: - # Parse with libclang - index = clang.Index.create() - translation_unit = index.parse( - file_path, - args=['-ObjC', '-x', 'objective-c'], - options=clang.TranslationUnit.PARSE_DETAILED_PROCESSING_RECORD - ) - - if not translation_unit: - logger.debug(f"Parse failed: {os.path.relpath(file_path, project_path)}") - return - - # Reset processed symbols for each file - self._processed_symbols.clear() - self._symbol_counter = 0 - - # Traverse AST to collect symbols - relative_path = self._get_relative_path(file_path, project_path) - self._traverse_clang_ast_for_symbols(translation_unit.cursor, relative_path, content, file_path) - - # Extract imports/dependencies and register with symbol manager - self._extract_and_register_imports(translation_unit.cursor, file_path, project_path) - - logger.debug(f"Symbol collection completed - {relative_path}") - - except Exception as e: - logger.error(f"Error processing {file_path} with libclang: {e}") - - def _extract_and_register_imports(self, cursor: 'clang.Cursor', file_path: str, project_path: str) -> None: - """Extract imports from AST and register them with the symbol manager.""" - try: - # Traverse AST to find all import statements - self._traverse_ast_for_import_registration(cursor, file_path, project_path) - - except Exception as e: - logger.error(f"Error extracting imports from {file_path}: {e}") - - def _traverse_ast_for_import_registration(self, cursor: 'clang.Cursor', file_path: str, project_path: str) -> None: - """Traverse AST specifically to register imports with the symbol manager.""" - try: - # Process current cursor for import registration - if cursor.kind == CursorKind.INCLUSION_DIRECTIVE: - self._register_import_with_symbol_manager(cursor, file_path, project_path) - - # Recursively process children - for child in cursor.get_children(): - self._traverse_ast_for_import_registration(child, file_path, project_path) - - except Exception as e: - logger.error(f"Error traversing AST for import registration: {e}") - - def _register_import_with_symbol_manager(self, cursor: 'clang.Cursor', file_path: str, project_path: str) -> None: - """Register a single import with the symbol manager.""" - try: - # Try to get the included file path - include_path = None - framework_name = None - - # Method 1: Try to get the included file (may fail for system headers) - try: - included_file = cursor.get_included_file() - if included_file: - include_path = str(included_file) - logger.debug(f"Got include path from file: {include_path}") - except Exception as e: - logger.debug(f"Failed to get included file: {e}") - - # Method 2: Try to get from cursor spelling (the actual #import statement) - spelling = cursor.spelling - if spelling: - logger.debug(f"Got cursor spelling: {spelling}") - # Extract framework name from spelling like "Foundation/Foundation.h" or "Person.h" - framework_name = self._extract_framework_name_from_spelling(spelling) - if framework_name: - logger.debug(f"Extracted framework name from spelling: {framework_name}") - - # Classify based on spelling pattern - import_type = self._classify_import_from_spelling(spelling) - logger.debug(f"Classified import as: {import_type}") - - # Only register external dependencies (not local files) - if import_type in ['standard_library', 'third_party']: - if not self.symbol_manager: - logger.error("Symbol manager is None!") - return - - # Determine version if possible (for now, leave empty) - version = "" - - logger.debug(f"Registering external symbol: {framework_name}") - - # Register the import with the moniker manager - symbol_id = self.symbol_manager.create_external_symbol( - language="objc", - package_name=framework_name, - module_path=framework_name, - symbol_name="*", # Framework-level import - version=version, - alias=None - ) - - logger.debug(f"Registered external dependency: {framework_name} ({import_type}) -> {symbol_id}") - return - else: - logger.debug(f"Skipping local import: {framework_name} ({import_type})") - return - - # Method 3: Fallback to include_path if we have it - if include_path: - logger.debug(f"Processing include path: {include_path}") - - # Extract framework/module name - framework_name = self._extract_framework_name(include_path, cursor) - if not framework_name: - logger.debug(f"No framework name extracted from {include_path}") - return - - logger.debug(f"Extracted framework name: {framework_name}") - - # Classify the import type - import_type = self._classify_objc_import(include_path) - logger.debug(f"Classified import as: {import_type}") - - # Only register external dependencies (not local files) - if import_type in ['standard_library', 'third_party']: - if not self.symbol_manager: - logger.error("Symbol manager is None!") - return - - # Determine version if possible (for now, leave empty) - version = self._extract_framework_version(include_path) - - logger.debug(f"Registering external symbol: {framework_name}") - - # Register the import with the moniker manager - symbol_id = self.symbol_manager.create_external_symbol( - language="objc", - package_name=framework_name, - module_path=framework_name, - symbol_name="*", # Framework-level import - version=version, - alias=None - ) - - logger.debug(f"Registered external dependency: {framework_name} ({import_type}) -> {symbol_id}") - else: - logger.debug(f"Skipping local import: {framework_name} ({import_type})") - else: - logger.debug("No include path or spelling found for cursor") - - except Exception as e: - logger.error(f"Error registering import with symbol manager: {e}") - import traceback - logger.error(f"Traceback: {traceback.format_exc()}") - - def _extract_framework_name_from_spelling(self, spelling: str) -> Optional[str]: - """Extract framework name from cursor spelling.""" - try: - # Remove quotes and angle brackets - clean_spelling = spelling.strip('"<>') - - # For framework imports like "Foundation/Foundation.h" - if '/' in clean_spelling: - parts = clean_spelling.split('/') - if len(parts) >= 2: - framework_name = parts[0] - return framework_name - - # For simple includes like "MyHeader.h" - header_name = clean_spelling.replace('.h', '').replace('.m', '').replace('.mm', '') - return header_name - - except Exception as e: - logger.debug(f"Error extracting framework name from spelling {spelling}: {e}") - return None - - def _classify_import_from_spelling(self, spelling: str) -> str: - """Classify import based on spelling pattern.""" - try: - # Remove quotes and angle brackets - clean_spelling = spelling.strip('"<>') - - # Check if it's a known system framework by name (since cursor.spelling doesn't include brackets) - if '/' in clean_spelling: - framework_name = clean_spelling.split('/')[0] - system_frameworks = { - 'Foundation', 'UIKit', 'CoreData', 'CoreGraphics', 'QuartzCore', - 'AVFoundation', 'CoreLocation', 'MapKit', 'CoreAnimation', - 'Security', 'SystemConfiguration', 'CFNetwork', 'CoreFoundation', - 'AppKit', 'Cocoa', 'WebKit', 'JavaScriptCore', 'Metal', 'MetalKit', - 'GameplayKit', 'SpriteKit', 'SceneKit', 'ARKit', 'Vision', 'CoreML' - } - if framework_name in system_frameworks: - return 'standard_library' - - # Check for single framework names (like just "Foundation.h") - framework_name_only = clean_spelling.replace('.h', '').replace('.framework', '') - system_frameworks = { - 'Foundation', 'UIKit', 'CoreData', 'CoreGraphics', 'QuartzCore', - 'AVFoundation', 'CoreLocation', 'MapKit', 'CoreAnimation', - 'Security', 'SystemConfiguration', 'CFNetwork', 'CoreFoundation', - 'AppKit', 'Cocoa', 'WebKit', 'JavaScriptCore', 'Metal', 'MetalKit', - 'GameplayKit', 'SpriteKit', 'SceneKit', 'ARKit', 'Vision', 'CoreML' - } - if framework_name_only in system_frameworks: - return 'standard_library' - - # Angle brackets indicate system headers (if we had them) - if spelling.startswith('<') and spelling.endswith('>'): - return 'standard_library' - - # Quotes indicate local or third-party headers - elif spelling.startswith('"') and spelling.endswith('"'): - # Check for common third-party patterns - if any(pattern in clean_spelling.lower() for pattern in ['pods/', 'carthage/', 'node_modules/']): - return 'third_party' - - # Default for quoted imports - return 'local' - - # Check for common third-party patterns in the path - if any(pattern in clean_spelling.lower() for pattern in ['pods/', 'carthage/', 'node_modules/']): - return 'third_party' - - # Check if it looks like a local header (simple filename) - if '/' not in clean_spelling and clean_spelling.endswith('.h'): - return 'local' - - # Fallback: if it contains system-like paths, classify as standard_library - if any(pattern in clean_spelling.lower() for pattern in ['/system/', '/usr/', '/applications/xcode']): - return 'standard_library' - - # Default fallback - return 'local' - - except Exception as e: - logger.debug(f"Error classifying import from spelling {spelling}: {e}") - return 'local' - - def _extract_framework_version(self, include_path: str) -> str: - """Extract framework version from include path if available.""" - # For now, return empty string. Could be enhanced to detect versions - # from CocoaPods Podfile.lock, Carthage, or other dependency managers - return "" - - def _analyze_objc_file(self, file_path: str, project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> Optional[scip_pb2.Document]: - """Analyze a single Objective-C file and generate complete SCIP document.""" - content = self._read_file_content(file_path) - if not content: - return None - - try: - # Parse with libclang - index = clang.Index.create() - translation_unit = index.parse( - file_path, - args=['-ObjC', '-x', 'objective-c'], - options=clang.TranslationUnit.PARSE_DETAILED_PROCESSING_RECORD - ) - - if not translation_unit: - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = self._get_relative_path(file_path, project_path) - document.language = self._get_document_language(file_path) - - # Initialize position calculator - self.position_calculator = PositionCalculator(content) - - # Reset processed symbols for each file - self._processed_symbols.clear() - self._symbol_counter = 0 - - # Generate occurrences and symbols - occurrences = [] - symbols = [] - - # Traverse AST for document generation - self._traverse_clang_ast_for_document(translation_unit.cursor, content, occurrences, symbols, relationships) - - # Add results to document - document.occurrences.extend(occurrences) - document.symbols.extend(symbols) - - logger.debug(f"Analyzed Objective-C file {document.relative_path}: " - f"{len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - except Exception as e: - logger.error(f"Error analyzing {file_path} with libclang: {e}") - return None - - def _traverse_clang_ast_for_symbols(self, cursor: 'clang.Cursor', file_path: str, content: str, full_file_path: str) -> None: - """Traverse libclang AST for symbol definitions (Phase 1).""" - try: - # Process current cursor - self._process_cursor_for_symbols(cursor, file_path, content, full_file_path) - - # Recursively process children - for child in cursor.get_children(): - self._traverse_clang_ast_for_symbols(child, file_path, content, full_file_path) - - except Exception as e: - logger.error(f"Error traversing AST for symbols: {e}") - - def _traverse_clang_ast_for_imports(self, cursor: 'clang.Cursor', file_path: str, imports: 'ImportGroup') -> None: - """Traverse libclang AST specifically for import/include statements.""" - try: - # Process current cursor for imports - self._process_cursor_for_imports(cursor, file_path, imports) - - # Recursively process children - for child in cursor.get_children(): - self._traverse_clang_ast_for_imports(child, file_path, imports) - - except Exception as e: - logger.error(f"Error traversing AST for imports: {e}") - - def _traverse_clang_ast_for_document(self, cursor: 'clang.Cursor', content: str, occurrences: List, symbols: List, relationships: Optional[Dict[str, List[tuple]]] = None) -> None: - """Traverse libclang AST for document generation (Phase 3).""" - try: - # Process current cursor - self._process_cursor_for_document(cursor, content, occurrences, symbols, relationships) - - # Recursively process children - for child in cursor.get_children(): - self._traverse_clang_ast_for_document(child, content, occurrences, symbols, relationships) - - except Exception as e: - logger.error(f"Error traversing AST for document: {e}") - - def _process_cursor_for_symbols(self, cursor: 'clang.Cursor', file_path: str, content: str, full_file_path: str) -> None: - """Process a cursor for symbol registration (Phase 1).""" - try: - # Skip invalid cursors or those outside our file - if not cursor.location.file or cursor.spelling == "": - return - - # Check if cursor is in the file we're processing - cursor_file = str(cursor.location.file) - if not cursor_file.endswith(os.path.basename(full_file_path)): - return - - cursor_kind = cursor.kind - symbol_name = cursor.spelling - - # Map libclang cursor kinds to SCIP symbols - symbol_info = self._map_cursor_to_symbol(cursor, symbol_name) - if not symbol_info: - return - - symbol_id, symbol_kind, symbol_roles = symbol_info - - # Avoid duplicates - duplicate_key = f"{symbol_id}:{cursor.location.line}:{cursor.location.column}" - if duplicate_key in self._processed_symbols: - return - self._processed_symbols.add(duplicate_key) - - # Calculate position - location = cursor.location - if location.line is not None and location.column is not None: - # libclang uses 1-based indexing, convert to 0-based - line = location.line - 1 - column = location.column - 1 - - # Calculate end position (approximate) - end_line = line - end_column = column + len(symbol_name) - - # Register symbol with reference resolver - if self.position_calculator: - range_obj = self.position_calculator.line_col_to_range(line, column, end_line, end_column) - else: - # Create a simple range object if position_calculator is not available - from ..proto.scip_pb2 import Range - range_obj = Range() - range_obj.start.extend([line, column]) - range_obj.end.extend([end_line, end_column]) - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=range_obj, - symbol_kind=symbol_kind, - display_name=symbol_name, - documentation=[f"Objective-C {cursor_kind.name}"] - ) - - logger.debug(f"Registered Objective-C symbol: {symbol_name} ({cursor_kind.name}) at {line}:{column}") - - except Exception as e: - logger.error(f"Error processing cursor for symbols {cursor.spelling}: {e}") - - def _process_cursor_for_document(self, cursor: 'clang.Cursor', content: str, occurrences: List, symbols: List, relationships: Optional[Dict[str, List[tuple]]] = None) -> None: - """Process a cursor for document generation (Phase 3).""" - try: - # Skip invalid cursors or those outside our file - if not cursor.location.file or cursor.spelling == "": - return - - cursor_kind = cursor.kind - symbol_name = cursor.spelling - - # Map libclang cursor kinds to SCIP symbols - symbol_info = self._map_cursor_to_symbol(cursor, symbol_name) - if not symbol_info: - return - - symbol_id, symbol_kind, symbol_roles = symbol_info - - # Avoid duplicates - duplicate_key = f"{symbol_id}:{cursor.location.line}:{cursor.location.column}" - if duplicate_key in self._processed_symbols: - return - self._processed_symbols.add(duplicate_key) - - # Calculate position - location = cursor.location - if location.line is not None and location.column is not None: - # libclang uses 1-based indexing, convert to 0-based - line = location.line - 1 - column = location.column - 1 - - # Calculate end position (approximate) - end_line = line - end_column = column + len(symbol_name) - - # Create SCIP occurrence - occurrence = self._create_occurrence(symbol_id, line, column, end_line, end_column, symbol_roles) - if occurrence: - occurrences.append(occurrence) - - # Get relationships for this symbol - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - - # Create SCIP symbol information with relationships - symbol_info_obj = self._create_symbol_information_with_relationships(symbol_id, symbol_name, symbol_kind, scip_relationships) - if symbol_info_obj: - symbols.append(symbol_info_obj) - - logger.debug(f"Added Objective-C symbol: {symbol_name} ({cursor_kind.name}) at {line}:{column} with {len(scip_relationships)} relationships") - - except Exception as e: - logger.error(f"Error processing cursor for document {cursor.spelling}: {e}") - - def _process_cursor_for_imports(self, cursor: 'clang.Cursor', file_path: str, imports: 'ImportGroup') -> None: - """Process a cursor for import/include statements.""" - try: - # Skip invalid cursors or those outside our file - if not cursor.location.file: - return - - cursor_kind = cursor.kind - - # Process inclusion directives (#import, #include, @import) - if cursor_kind == CursorKind.INCLUSION_DIRECTIVE: - self._process_inclusion_directive(cursor, file_path, imports) - - except Exception as e: - logger.error(f"Error processing cursor for imports: {e}") - - def _process_inclusion_directive(self, cursor: 'clang.Cursor', file_path: str, imports: 'ImportGroup') -> None: - """Process a single #import/#include/@import directive.""" - try: - # Get the included file - included_file = cursor.get_included_file() - if not included_file: - return - - include_path = str(included_file) - - # Extract framework/module name - framework_name = self._extract_framework_name(include_path, cursor) - if not framework_name: - return - - # Classify the import type - import_type = self._classify_objc_import(include_path) - - # Add to imports - imports.add_import(framework_name, import_type) - - # Register with moniker manager for external dependencies - if import_type in ['standard_library', 'third_party'] and self.symbol_manager: - self._register_framework_dependency(framework_name, import_type, include_path) - - logger.debug(f"Processed import: {framework_name} ({import_type}) from {include_path}") - - except Exception as e: - logger.error(f"Error processing inclusion directive: {e}") - - def _extract_framework_name(self, include_path: str, cursor: 'clang.Cursor') -> Optional[str]: - """Extract framework/module name from include path.""" - try: - # Get the original spelling from the cursor (what was actually written) - spelling = cursor.spelling - if spelling: - # Remove quotes and angle brackets - clean_spelling = spelling.strip('"<>') - - # For framework imports like - if '/' in clean_spelling: - parts = clean_spelling.split('/') - if len(parts) >= 2: - framework_name = parts[0] - # Common iOS/macOS frameworks - if framework_name in ['Foundation', 'UIKit', 'CoreData', 'CoreGraphics', - 'QuartzCore', 'AVFoundation', 'CoreLocation', 'MapKit']: - return framework_name - # For other frameworks, use the framework name - return framework_name - - # For simple includes like "MyHeader.h" - header_name = clean_spelling.replace('.h', '').replace('.m', '').replace('.mm', '') - return header_name - - # Fallback: extract from full path - if '/' in include_path: - path_parts = include_path.split('/') - - # Look for .framework in path - for i, part in enumerate(path_parts): - if part.endswith('.framework') and i + 1 < len(path_parts): - return part.replace('.framework', '') - - # Look for Headers directory (common in frameworks) - if 'Headers' in path_parts: - headers_idx = path_parts.index('Headers') - if headers_idx > 0: - framework_part = path_parts[headers_idx - 1] - if framework_part.endswith('.framework'): - return framework_part.replace('.framework', '') - - # Use the filename without extension - filename = path_parts[-1] - return filename.replace('.h', '').replace('.m', '').replace('.mm', '') - - return None - - except Exception as e: - logger.debug(f"Error extracting framework name from {include_path}: {e}") - return None - - def _classify_objc_import(self, include_path: str) -> str: - """Classify Objective-C import as system, third-party, or local.""" - try: - # System frameworks (typical macOS/iOS system paths) - system_indicators = [ - '/Applications/Xcode.app/', - '/System/Library/', - '/usr/include/', - 'Platforms/iPhoneOS.platform/', - 'Platforms/iPhoneSimulator.platform/', - 'Platforms/MacOSX.platform/' - ] - - for indicator in system_indicators: - if indicator in include_path: - return 'standard_library' - - # Common system frameworks by name - system_frameworks = { - 'Foundation', 'UIKit', 'CoreData', 'CoreGraphics', 'QuartzCore', - 'AVFoundation', 'CoreLocation', 'MapKit', 'CoreAnimation', - 'Security', 'SystemConfiguration', 'CFNetwork', 'CoreFoundation', - 'AppKit', 'Cocoa', 'WebKit', 'JavaScriptCore' - } - - for framework in system_frameworks: - if f'/{framework}.framework/' in include_path or f'{framework}/' in include_path: - return 'standard_library' - - # Third-party dependency managers - third_party_indicators = [ - '/Pods/', # CocoaPods - '/Carthage/', # Carthage - '/node_modules/', # React Native - '/DerivedData/', # Sometimes used for third-party - ] - - for indicator in third_party_indicators: - if indicator in include_path: - return 'third_party' - - # Check if it's within the project directory - if hasattr(self, 'project_path') and self.project_path: - if include_path.startswith(str(self.project_path)): - return 'local' - - # Check for relative paths (usually local) - if include_path.startswith('./') or include_path.startswith('../'): - return 'local' - - # If path contains common local indicators - if any(indicator in include_path.lower() for indicator in ['src/', 'source/', 'include/', 'headers/']): - return 'local' - - # Default to third-party for unknown external dependencies - return 'third_party' - - except Exception as e: - logger.debug(f"Error classifying import {include_path}: {e}") - return 'third_party' - - def _register_framework_dependency(self, framework_name: str, import_type: str, include_path: str) -> None: - """Register framework dependency with moniker manager.""" - try: - if not self.symbol_manager: - return - - # Determine package manager based on import type and path - if import_type == 'standard_library': - manager = 'system' - elif '/Pods/' in include_path: - manager = 'cocoapods' - elif '/Carthage/' in include_path: - manager = 'carthage' - else: - manager = 'unknown' - - # Register the external symbol for the framework - self.symbol_manager.create_external_symbol( - language="objc", - package_name=framework_name, - module_path=framework_name, - symbol_name="*", # Framework-level import - version="", # Version detection could be added later - alias=None - ) - - logger.debug(f"Registered framework dependency: {framework_name} via {manager}") - - except Exception as e: - logger.error(f"Error registering framework dependency {framework_name}: {e}") - - def _map_cursor_to_symbol(self, cursor: 'clang.Cursor', symbol_name: str) -> Optional[Tuple[str, int, int]]: - """Map libclang cursor to SCIP symbol information.""" - try: - cursor_kind = cursor.kind - - # Map Objective-C specific cursors - if cursor_kind == CursorKind.OBJC_INTERFACE_DECL: - # @interface ClassName - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Class, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_PROTOCOL_DECL: - # @protocol ProtocolName - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Interface, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_CATEGORY_DECL: - # @interface ClassName (CategoryName) - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Class, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_INSTANCE_METHOD_DECL: - # Instance method: - (void)methodName - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Method, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_CLASS_METHOD_DECL: - # Class method: + (void)methodName - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Method, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_PROPERTY_DECL: - # @property declaration - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Property, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_IVAR_DECL: - # Instance variable - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Field, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_IMPLEMENTATION_DECL: - # @implementation ClassName - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Class, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.OBJC_CATEGORY_IMPL_DECL: - # @implementation ClassName (CategoryName) - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Class, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.FUNCTION_DECL: - # Regular C function - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Function, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.VAR_DECL: - # Variable declaration - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.Variable, scip_pb2.SymbolRole.Definition) - - elif cursor_kind == CursorKind.TYPEDEF_DECL: - # Type definition - symbol_id = f"local {self._get_local_id_for_cursor(cursor)}" - return (symbol_id, scip_pb2.SymbolKind.TypeParameter, scip_pb2.SymbolRole.Definition) - - # Add more cursor mappings as needed - return None - - except Exception as e: - logger.error(f"Error mapping cursor {symbol_name}: {e}") - return None - - def _get_local_id(self) -> str: - """Generate unique local symbol ID.""" - self._symbol_counter += 1 - return f"objc_{self._symbol_counter}" - - def _get_local_id_for_cursor(self, cursor: 'clang.Cursor') -> str: - """Generate consistent local symbol ID based on cursor properties.""" - # Create deterministic ID based on cursor type, name, and location - cursor_type = cursor.kind.name.lower() - symbol_name = cursor.spelling or "unnamed" - line = cursor.location.line - - return f"{cursor_type}_{symbol_name}_{line}" - - def _create_occurrence(self, symbol_id: str, start_line: int, start_col: int, - end_line: int, end_col: int, symbol_roles: int) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence.""" - try: - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = symbol_roles - occurrence.range.start.extend([start_line, start_col]) - occurrence.range.end.extend([end_line, end_col]) - - return occurrence - - except Exception as e: - logger.error(f"Error creating occurrence: {e}") - return None - - def _create_symbol_information(self, symbol_id: str, display_name: str, symbol_kind: int) -> Optional[scip_pb2.SymbolInformation]: - """Create SCIP symbol information.""" - try: - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.kind = symbol_kind - symbol_info.display_name = display_name - - return symbol_info - - except Exception as e: - logger.error(f"Error creating symbol information: {e}") - return None - - def _create_symbol_information_with_relationships(self, symbol_id: str, display_name: str, symbol_kind: int, relationships: List['scip_pb2.Relationship']) -> Optional[scip_pb2.SymbolInformation]: - """Create SCIP symbol information with relationships.""" - try: - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.kind = symbol_kind - symbol_info.display_name = display_name - - # Add relationships if provided - if relationships: - symbol_info.relationships.extend(relationships) - - return symbol_info - - except Exception as e: - logger.error(f"Error creating symbol information with relationships: {e}") - return None - - def _extract_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract relationships from a single Objective-C file using libclang.""" - content = self._read_file_content(file_path) - if not content: - return {} - - try: - # Parse with libclang - index = clang.Index.create() - translation_unit = index.parse( - file_path, - args=['-ObjC', '-x', 'objective-c'], - options=clang.TranslationUnit.PARSE_DETAILED_PROCESSING_RECORD - ) - - if not translation_unit: - return {} - - return self._extract_relationships_from_ast(translation_unit.cursor, file_path, project_path) - - except Exception as e: - logger.error(f"Error extracting relationships from {file_path}: {e}") - return {} - - def _extract_relationships_from_ast(self, cursor: 'clang.Cursor', file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract relationships from libclang AST.""" - relationships = {} - relative_path = self._get_relative_path(file_path, project_path) - - # Track current method context for method calls - current_method_symbol = None - - def traverse_for_relationships(cursor_node, parent_method=None): - """Recursively traverse AST to find relationships.""" - nonlocal current_method_symbol - - try: - # Skip if cursor is not in our file - if not cursor_node.location.file or cursor_node.spelling == "": - pass - else: - cursor_file = str(cursor_node.location.file) - if cursor_file.endswith(os.path.basename(file_path)): - cursor_kind = cursor_node.kind - - # Track method context - if cursor_kind in (CursorKind.OBJC_INSTANCE_METHOD_DECL, CursorKind.OBJC_CLASS_METHOD_DECL): - method_symbol_id = f"local {self._get_local_id_for_cursor(cursor_node)}" - current_method_symbol = method_symbol_id - parent_method = method_symbol_id - - # Detect Objective-C method calls - elif cursor_kind == CursorKind.OBJC_MESSAGE_EXPR: - if parent_method: - # Get the method being called - called_method = self._extract_method_from_message_expr(cursor_node) - if called_method: - target_symbol_id = f"local objc_call_{called_method}_{cursor_node.location.line}" - - if parent_method not in relationships: - relationships[parent_method] = [] - relationships[parent_method].append((target_symbol_id, InternalRelationshipType.CALLS)) - - logger.debug(f"Found method call: {parent_method} -> {target_symbol_id}") - - # Detect C function calls - elif cursor_kind == CursorKind.CALL_EXPR: - if parent_method: - function_name = cursor_node.spelling - if function_name: - target_symbol_id = f"local c_func_{function_name}_{cursor_node.location.line}" - - if parent_method not in relationships: - relationships[parent_method] = [] - relationships[parent_method].append((target_symbol_id, InternalRelationshipType.CALLS)) - - logger.debug(f"Found function call: {parent_method} -> {target_symbol_id}") - - # Recursively process children - for child in cursor_node.get_children(): - traverse_for_relationships(child, parent_method) - - except Exception as e: - logger.error(f"Error processing cursor for relationships: {e}") - - # Start traversal - traverse_for_relationships(cursor) - - return relationships - - def _extract_method_from_message_expr(self, cursor: 'clang.Cursor') -> Optional[str]: - """Extract method name from Objective-C message expression.""" - try: - # Get the selector/method name from the message expression - # This is a simplified extraction - could be enhanced - for child in cursor.get_children(): - if child.kind == CursorKind.OBJC_MESSAGE_EXPR: - return child.spelling - elif child.spelling and len(child.spelling) > 0: - # Try to get method name from any meaningful child - return child.spelling - - # Fallback: use the cursor's own spelling if available - return cursor.spelling if cursor.spelling else None - - except Exception as e: - logger.error(f"Error extracting method from message expression: {e}") - return None - - def _create_scip_relationships(self, relationships: List[tuple]) -> List['scip_pb2.Relationship']: - """Convert internal relationships to SCIP relationships.""" - scip_relationships = [] - - for target_symbol, relationship_type in relationships: - try: - relationship = scip_pb2.Relationship() - relationship.symbol = target_symbol - - # Map relationship type to SCIP flags - if relationship_type == InternalRelationshipType.CALLS: - relationship.is_reference = True - elif relationship_type == InternalRelationshipType.INHERITS: - relationship.is_reference = True - elif relationship_type == InternalRelationshipType.IMPLEMENTS: - relationship.is_implementation = True - else: - relationship.is_reference = True # Default fallback - - scip_relationships.append(relationship) - - except Exception as e: - logger.error(f"Error creating SCIP relationship: {e}") - continue - - return scip_relationships - - def _get_document_language(self, file_path: str) -> str: - """Get the document language identifier.""" - if file_path.endswith('.mm'): - return 'objcpp' - return 'objc' - - # Utility methods from base strategy - def _read_file_content(self, file_path: str) -> Optional[str]: - """Read file content safely.""" - try: - with open(file_path, 'r', encoding='utf-8', errors='ignore') as f: - return f.read() - except Exception as e: - logger.warning(f"Failed to read file {file_path}: {e}") - return None - - def _get_relative_path(self, file_path: str, project_path: str) -> str: - """Get relative path from project root.""" - return os.path.relpath(file_path, project_path).replace(os.sep, '/') - - def get_supported_languages(self) -> List[str]: - """Return list of supported language identifiers.""" - return ["objective-c", "objc", "objective-c-header"] - - -class StrategyError(Exception): - """Exception raised when a strategy cannot process files.""" - pass \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/python_strategy.py b/src/code_index_mcp/scip/strategies/python_strategy.py deleted file mode 100644 index b14da42..0000000 --- a/src/code_index_mcp/scip/strategies/python_strategy.py +++ /dev/null @@ -1,413 +0,0 @@ -"""Python SCIP indexing strategy - SCIP standard compliant.""" - -import ast -import logging -import os -from typing import List, Optional, Dict, Any, Set -from pathlib import Path - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - - -logger = logging.getLogger(__name__) - - -class PythonStrategy(SCIPIndexerStrategy): - """SCIP-compliant Python indexing strategy using AST analysis.""" - - SUPPORTED_EXTENSIONS = {'.py', '.pyw'} - - def __init__(self, priority: int = 90): - """Initialize the Python strategy.""" - super().__init__(priority) - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - return extension.lower() in self.SUPPORTED_EXTENSIONS - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "python" - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from Python files.""" - logger.debug(f"PythonStrategy Phase 1: Processing {len(files)} files for symbol collection") - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): # Progress every 10 files or at end - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"PythonStrategy Phase 2: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_python_file(file_path, project_path, relationships) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): # Progress every 10 files or at end - logger.debug(f"Phase 2 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 2 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 2 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between Python symbols. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"PythonStrategy: Building symbol relationships for {len(files)} files") - - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"PythonStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single Python file.""" - - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {os.path.relpath(file_path, project_path)}") - return - - # Parse AST - try: - tree = ast.parse(content, filename=file_path) - except SyntaxError as e: - logger.warning(f"Syntax error in {os.path.relpath(file_path, project_path)}: {e}") - return - - # Collect symbols using integrated visitor - relative_path = self._get_relative_path(file_path, project_path) - self._collect_symbols_from_ast(tree, relative_path, content) - logger.debug(f"Symbol collection - {relative_path}") - - def _analyze_python_file(self, file_path: str, project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> Optional[scip_pb2.Document]: - """Analyze a single Python file and generate complete SCIP document.""" - relative_path = self._get_relative_path(file_path, project_path) - - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {relative_path}") - return None - - # Parse AST - try: - tree = ast.parse(content, filename=file_path) - except SyntaxError as e: - logger.warning(f"Syntax error in {relative_path}: {e}") - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = relative_path - document.language = self.get_language_name() - - # Analyze AST and generate occurrences - self.position_calculator = PositionCalculator(content) - - occurrences, symbols = self._analyze_ast_for_document(tree, relative_path, content, relationships) - - # Add results to document - document.occurrences.extend(occurrences) - document.symbols.extend(symbols) - - logger.debug(f"Document analysis - {relative_path}: " - f"-> {len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - def _extract_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """ - Extract relationships from a single Python file. - - Args: - file_path: File to analyze - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - content = self._read_file_content(file_path) - if not content: - return {} - - try: - tree = ast.parse(content) - except SyntaxError as e: - logger.warning(f"Syntax error in {file_path}: {e}") - return {} - - return self._extract_relationships_from_ast(tree, file_path, project_path) - - def _collect_symbols_from_ast(self, tree: ast.AST, file_path: str, content: str) -> None: - """Collect symbols from AST using integrated visitor.""" - scope_stack = [] - - for node in ast.walk(tree): - if isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef): - self._register_function_symbol(node, node.name, file_path, scope_stack) - elif isinstance(node, ast.ClassDef): - self._register_class_symbol(node, node.name, file_path, scope_stack) - - def _analyze_ast_for_document(self, tree: ast.AST, file_path: str, content: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> tuple[List[scip_pb2.Occurrence], List[scip_pb2.SymbolInformation]]: - """Analyze AST to generate occurrences and symbols for SCIP document.""" - occurrences = [] - symbols = [] - scope_stack = [] - - # Simple implementation - can be enhanced later - for node in ast.walk(tree): - if isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef): - symbol_id = self._create_function_symbol_id(node.name, file_path, scope_stack) - occurrence = self._create_function_occurrence(node, symbol_id) - # Get relationships for this symbol - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - - symbol_info = self._create_function_symbol_info(node, symbol_id, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - elif isinstance(node, ast.ClassDef): - symbol_id = self._create_class_symbol_id(node.name, file_path, scope_stack) - occurrence = self._create_class_occurrence(node, symbol_id) - # Get relationships for this symbol - symbol_relationships = relationships.get(symbol_id, []) if relationships else [] - scip_relationships = self._create_scip_relationships(symbol_relationships) if symbol_relationships else [] - - symbol_info = self._create_class_symbol_info(node, symbol_id, scip_relationships) - - if occurrence: - occurrences.append(occurrence) - if symbol_info: - symbols.append(symbol_info) - - return occurrences, symbols - - def _extract_relationships_from_ast(self, tree: ast.AST, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract relationships from AST.""" - relationships = {} - scope_stack = [] - - for node in ast.walk(tree): - if isinstance(node, ast.ClassDef): - # Extract inheritance relationships - relative_path = self._get_relative_path(file_path, project_path) - class_symbol_id = self._create_class_symbol_id(node.name, relative_path, scope_stack) - - for base in node.bases: - if isinstance(base, ast.Name): - parent_symbol_id = self._create_class_symbol_id(base.id, relative_path, scope_stack) - if class_symbol_id not in relationships: - relationships[class_symbol_id] = [] - relationships[class_symbol_id].append((parent_symbol_id, InternalRelationshipType.INHERITS)) - - elif isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef): - # Extract function call relationships - relative_path = self._get_relative_path(file_path, project_path) - function_symbol_id = self._create_function_symbol_id(node.name, relative_path, scope_stack) - - for child in ast.walk(node): - if isinstance(child, ast.Call): - if isinstance(child.func, ast.Name): - target_symbol_id = self._create_function_symbol_id(child.func.id, relative_path, scope_stack) - if function_symbol_id not in relationships: - relationships[function_symbol_id] = [] - relationships[function_symbol_id].append((target_symbol_id, InternalRelationshipType.CALLS)) - - return relationships - - # Helper methods - def _register_function_symbol(self, node: ast.AST, name: str, file_path: str, scope_stack: List[str]) -> None: - """Register a function symbol definition.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=file_path, - symbol_path=scope_stack + [name], - descriptor="()." - ) - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Function, - display_name=name, - documentation=["Python function"] - ) - - def _register_class_symbol(self, node: ast.AST, name: str, file_path: str, scope_stack: List[str]) -> None: - """Register a class symbol definition.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=file_path, - symbol_path=scope_stack + [name], - descriptor="#" - ) - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Class, - display_name=name, - documentation=["Python class"] - ) - - def _create_function_symbol_id(self, name: str, file_path: str, scope_stack: List[str]) -> str: - """Create symbol ID for function.""" - return self.symbol_manager.create_local_symbol( - language="python", - file_path=file_path, - symbol_path=scope_stack + [name], - descriptor="()." - ) - - def _create_class_symbol_id(self, name: str, file_path: str, scope_stack: List[str]) -> str: - """Create symbol ID for class.""" - return self.symbol_manager.create_local_symbol( - language="python", - file_path=file_path, - symbol_path=scope_stack + [name], - descriptor="#" - ) - - def _create_function_occurrence(self, node: ast.AST, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for function.""" - if not self.position_calculator: - return None - - try: - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierFunction - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_class_occurrence(self, node: ast.AST, symbol_id: str) -> Optional[scip_pb2.Occurrence]: - """Create SCIP occurrence for class.""" - if not self.position_calculator: - return None - - try: - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = scip_pb2.Definition - occurrence.syntax_kind = scip_pb2.IdentifierType - occurrence.range.CopyFrom(range_obj) - return occurrence - except: - return None - - def _create_function_symbol_info(self, node: ast.AST, symbol_id: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for function.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = node.name - symbol_info.kind = scip_pb2.Function - - # Add docstring if available - docstring = ast.get_docstring(node) - if docstring: - symbol_info.documentation.append(docstring) - - # Add relationships if provided - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info - - def _create_class_symbol_info(self, node: ast.AST, symbol_id: str, relationships: Optional[List[scip_pb2.Relationship]] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information for class.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = node.name - symbol_info.kind = scip_pb2.Class - - # Add docstring if available - docstring = ast.get_docstring(node) - if docstring: - symbol_info.documentation.append(docstring) - - # Add relationships if provided - if relationships and self.relationship_manager: - self.relationship_manager.add_relationships_to_symbol(symbol_info, relationships) - - return symbol_info \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/python_strategy_backup.py b/src/code_index_mcp/scip/strategies/python_strategy_backup.py deleted file mode 100644 index f1d0000..0000000 --- a/src/code_index_mcp/scip/strategies/python_strategy_backup.py +++ /dev/null @@ -1,830 +0,0 @@ -"""Python SCIP indexing strategy - SCIP standard compliant.""" - -import ast -import logging -import os -from typing import List, Optional, Dict, Any, Set -from pathlib import Path - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - - -logger = logging.getLogger(__name__) - - -class PythonStrategy(SCIPIndexerStrategy): - """SCIP-compliant Python indexing strategy using AST analysis.""" - - SUPPORTED_EXTENSIONS = {'.py', '.pyw'} - - def __init__(self, priority: int = 90): - """Initialize the Python strategy.""" - super().__init__(priority) - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - return extension.lower() in self.SUPPORTED_EXTENSIONS - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "python" - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from Python files.""" - logger.debug(f"PythonStrategy Phase 1: Processing {len(files)} files for symbol collection") - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): # Progress every 10 files or at end - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"PythonStrategy Phase 2: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_python_file(file_path, project_path) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): # Progress every 10 files or at end - logger.debug(f"Phase 2 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 2 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 2 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single Python file.""" - - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {os.path.relpath(file_path, project_path)}") - return - - # Parse AST - try: - tree = ast.parse(content, filename=file_path) - except SyntaxError as e: - logger.warning(f"Syntax error in {os.path.relpath(file_path, project_path)}: {e}") - return - - # Collect symbols - relative_path = self._get_relative_path(file_path, project_path) - collector = PythonSymbolCollector( - relative_path, content, self.symbol_manager, self.reference_resolver - ) - collector.visit(tree) - logger.debug(f"Symbol collection - {relative_path}") - - def _analyze_python_file(self, file_path: str, project_path: str) -> Optional[scip_pb2.Document]: - """Analyze a single Python file and generate complete SCIP document.""" - relative_path = self._get_relative_path(file_path, project_path) - - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {relative_path}") - return None - - # Parse AST - try: - tree = ast.parse(content, filename=file_path) - except SyntaxError as e: - logger.warning(f"Syntax error in {relative_path}: {e}") - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = relative_path - document.language = self.get_language_name() - - # Analyze AST and generate occurrences - self.position_calculator = PositionCalculator(content) - - analyzer = PythonAnalyzer( - document.relative_path, - content, - self.symbol_manager, - self.position_calculator, - self.reference_resolver - ) - - analyzer.visit(tree) - - # Add results to document - document.occurrences.extend(analyzer.occurrences) - document.symbols.extend(analyzer.symbols) - - logger.debug(f"Document analysis - {relative_path}: " - f"-> {len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - - return document - - -class PythonSymbolCollector(ast.NodeVisitor): - """AST visitor that collects Python symbol definitions (Phase 1).""" - - def __init__(self, file_path: str, content: str, symbol_manager, reference_resolver): - self.file_path = file_path - self.content = content - self.symbol_manager = symbol_manager - self.reference_resolver = reference_resolver - self.scope_stack: List[str] = [] # Track current scope - - def visit_FunctionDef(self, node: ast.FunctionDef): - """Visit function definition.""" - self._register_function_symbol(node, node.name, is_async=False) - - # Enter function scope - self.scope_stack.append(node.name) - self.generic_visit(node) - self.scope_stack.pop() - - def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef): - """Visit async function definition.""" - self._register_function_symbol(node, node.name, is_async=True) - - # Enter function scope - self.scope_stack.append(node.name) - self.generic_visit(node) - self.scope_stack.pop() - - def visit_ClassDef(self, node: ast.ClassDef): - """Visit class definition.""" - self._register_class_symbol(node, node.name) - - # Enter class scope - self.scope_stack.append(node.name) - self.generic_visit(node) - self.scope_stack.pop() - - def _register_function_symbol(self, node: ast.AST, name: str, is_async: bool = False): - """Register a function symbol definition.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="()." - ) - - # Create a dummy range for registration (will be calculated properly in Phase 2) - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - documentation = [] - if is_async: - documentation.append("Async function") - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=self.file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Function, - display_name=name, - documentation=documentation - ) - - def _register_class_symbol(self, node: ast.AST, name: str): - """Register a class symbol definition.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="#" - ) - - # Create a dummy range for registration - dummy_range = scip_pb2.Range() - dummy_range.start.extend([0, 0]) - dummy_range.end.extend([0, 1]) - - self.reference_resolver.register_symbol_definition( - symbol_id=symbol_id, - file_path=self.file_path, - definition_range=dummy_range, - symbol_kind=scip_pb2.Class, - display_name=name, - documentation=["Python class"] - ) - - -class PythonAnalyzer(ast.NodeVisitor): - """AST visitor that generates complete SCIP data (Phase 2).""" - - def __init__(self, file_path: str, content: str, symbol_manager, position_calculator, reference_resolver): - self.file_path = file_path - self.content = content - self.symbol_manager = symbol_manager - self.position_calculator = position_calculator - self.reference_resolver = reference_resolver - self.scope_stack: List[str] = [] - - # Results - self.occurrences: List[scip_pb2.Occurrence] = [] - self.symbols: List[scip_pb2.SymbolInformation] = [] - - def visit_FunctionDef(self, node: ast.FunctionDef): - """Visit function definition.""" - self._handle_function_definition(node, node.name, is_async=False) - - # Enter function scope - self.scope_stack.append(node.name) - self.generic_visit(node) - self.scope_stack.pop() - - def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef): - """Visit async function definition.""" - self._handle_function_definition(node, node.name, is_async=True) - - # Enter function scope - self.scope_stack.append(node.name) - self.generic_visit(node) - self.scope_stack.pop() - - def visit_ClassDef(self, node: ast.ClassDef): - """Visit class definition.""" - self._handle_class_definition(node, node.name) - - # Enter class scope - self.scope_stack.append(node.name) - self.generic_visit(node) - self.scope_stack.pop() - - def visit_Import(self, node: ast.Import): - """Visit import statement.""" - for alias in node.names: - self._handle_import(node, alias.name, alias.asname) - self.generic_visit(node) - - def visit_ImportFrom(self, node: ast.ImportFrom): - """Visit from ... import ... statement.""" - module_name = node.module or "" - for alias in node.names: - self._handle_from_import(node, module_name, alias.name, alias.asname) - self.generic_visit(node) - - def visit_Name(self, node: ast.Name): - """Visit name references.""" - if isinstance(node.ctx, ast.Load): - # This is a reference to a name - self._handle_name_reference(node, node.id) - self.generic_visit(node) - - def visit_Attribute(self, node: ast.Attribute): - """Visit attribute access.""" - if isinstance(node.ctx, ast.Load): - self._handle_attribute_reference(node, node.attr) - self.generic_visit(node) - - def _handle_function_definition(self, node: ast.AST, name: str, is_async: bool = False): - """Handle function definition.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="()." - ) - - # Create definition occurrence - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Definition, scip_pb2.IdentifierFunction - ) - self.occurrences.append(occurrence) - - # Create symbol information - documentation = [] - if is_async: - documentation.append("Async function") - - # Add docstring if available - docstring = ast.get_docstring(node) - if docstring: - documentation.append(docstring) - - # Add parameter information - if hasattr(node, 'args') and node.args.args: - params = [arg.arg for arg in node.args.args] - documentation.append(f"Parameters: {', '.join(params)}") - - symbol_info = self._create_symbol_information( - symbol_id, name, scip_pb2.Function, documentation - ) - self.symbols.append(symbol_info) - - def _handle_class_definition(self, node: ast.AST, name: str): - """Handle class definition.""" - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=self.file_path, - symbol_path=self.scope_stack + [name], - descriptor="#" - ) - - # Create definition occurrence - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Definition, scip_pb2.IdentifierType - ) - self.occurrences.append(occurrence) - - # Create symbol information - documentation = ["Python class"] - - # Add docstring if available - docstring = ast.get_docstring(node) - if docstring: - documentation.append(docstring) - - # Add base class information - if hasattr(node, 'bases') and node.bases: - base_names = [] - for base in node.bases: - if isinstance(base, ast.Name): - base_names.append(base.id) - elif isinstance(base, ast.Attribute): - base_names.append(ast.unparse(base)) - if base_names: - documentation.append(f"Inherits from: {', '.join(base_names)}") - - symbol_info = self._create_symbol_information( - symbol_id, name, scip_pb2.Class, documentation - ) - self.symbols.append(symbol_info) - - def _handle_import(self, node: ast.AST, module_name: str, alias_name: Optional[str]): - """Handle import statement with moniker support.""" - display_name = alias_name or module_name - - # Determine if this is a standard library or external package import - if self._is_stdlib_module(module_name): - # Standard library import - symbol_id = self.symbol_manager.create_stdlib_symbol( - language="python", - module_name=module_name, - symbol_name="", - descriptor="" - ) - elif self._is_external_package(module_name): - # External package import using moniker system - symbol_id = self.symbol_manager.create_external_symbol( - language="python", - package_name=self._extract_package_name(module_name), - module_path=self._extract_module_path(module_name), - symbol_name="", - alias=alias_name - ) - else: - # Local project import - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=f"{module_name.replace('.', '/')}.py", - symbol_path=[], - descriptor="" - ) - - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Import, scip_pb2.IdentifierNamespace - ) - self.occurrences.append(occurrence) - - def _handle_from_import(self, node: ast.AST, module_name: str, import_name: str, alias_name: Optional[str]): - """Handle from ... import ... statement with moniker support.""" - display_name = alias_name or import_name - - # Determine if this is a standard library or external package import - if self._is_stdlib_module(module_name): - # Standard library import - symbol_id = self.symbol_manager.create_stdlib_symbol( - language="python", - module_name=module_name, - symbol_name=import_name, - descriptor="" - ) - elif self._is_external_package(module_name): - # External package import using moniker system - symbol_id = self.symbol_manager.create_external_symbol( - language="python", - package_name=self._extract_package_name(module_name), - module_path=self._extract_module_path(module_name), - symbol_name=import_name, - alias=alias_name, - descriptor=self._infer_descriptor_from_name(import_name) - ) - else: - # Local project import - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=f"{module_name.replace('.', '/')}.py", - symbol_path=[import_name], - descriptor=self._infer_descriptor_from_name(import_name) - ) - - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = self._create_occurrence( - symbol_id, range_obj, scip_pb2.Import, scip_pb2.Identifier - ) - self.occurrences.append(occurrence) - - def _handle_name_reference(self, node: ast.AST, name: str): - """Handle name reference with import resolution.""" - # First try to resolve to imported external symbol - imported_symbol_id = self.symbol_manager.resolve_import_reference(name, self.file_path) - - if imported_symbol_id: - # This is a reference to an imported symbol - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = self._create_occurrence( - imported_symbol_id, range_obj, 0, scip_pb2.Identifier # 0 = reference role - ) - self.occurrences.append(occurrence) - return - - # Try to resolve local reference - resolved_symbol_id = self.reference_resolver.resolve_reference_by_name( - symbol_name=name, - context_file=self.file_path, - context_scope=self.scope_stack - ) - - if resolved_symbol_id: - # Create reference occurrence - range_obj = self.position_calculator.ast_node_to_range(node) - occurrence = self._create_occurrence( - resolved_symbol_id, range_obj, 0, scip_pb2.Identifier # 0 = reference role - ) - self.occurrences.append(occurrence) - - # Register the reference - self.reference_resolver.register_symbol_reference( - symbol_id=resolved_symbol_id, - file_path=self.file_path, - reference_range=range_obj, - context_scope=self.scope_stack - ) - - def _handle_attribute_reference(self, node: ast.AST, attr_name: str): - """Handle attribute reference.""" - # For now, create a simple local reference - # In a full implementation, this would resolve through the object type - range_obj = self.position_calculator.ast_node_to_range(node) - - # Try to create a local symbol for the attribute - symbol_id = self.symbol_manager.create_local_symbol( - language="python", - file_path=self.file_path, - symbol_path=self.scope_stack + [attr_name], - descriptor="" - ) - - occurrence = self._create_occurrence( - symbol_id, range_obj, 0, scip_pb2.Identifier - ) - self.occurrences.append(occurrence) - - def _create_occurrence(self, symbol_id: str, range_obj: scip_pb2.Range, - symbol_roles: int, syntax_kind: int) -> scip_pb2.Occurrence: - """Create a SCIP occurrence.""" - occurrence = scip_pb2.Occurrence() - occurrence.symbol = symbol_id - occurrence.symbol_roles = symbol_roles - occurrence.syntax_kind = syntax_kind - occurrence.range.CopyFrom(range_obj) - return occurrence - - def _create_symbol_information(self, symbol_id: str, display_name: str, - symbol_kind: int, documentation: List[str] = None) -> scip_pb2.SymbolInformation: - """Create SCIP symbol information.""" - symbol_info = scip_pb2.SymbolInformation() - symbol_info.symbol = symbol_id - symbol_info.display_name = display_name - symbol_info.kind = symbol_kind - - if documentation: - symbol_info.documentation.extend(documentation) - - return symbol_info - - def _is_stdlib_module(self, module_name: str) -> bool: - """Check if module is part of Python standard library.""" - # Standard library modules (partial list - could be expanded) - stdlib_modules = { - 'os', 'sys', 'json', 'datetime', 'collections', 'itertools', - 'functools', 'typing', 're', 'math', 'random', 'pathlib', - 'urllib', 'http', 'email', 'csv', 'xml', 'html', 'sqlite3', - 'threading', 'asyncio', 'multiprocessing', 'subprocess', - 'unittest', 'logging', 'configparser', 'argparse', 'io', - 'shutil', 'glob', 'tempfile', 'zipfile', 'tarfile', - 'pickle', 'base64', 'hashlib', 'hmac', 'secrets', 'uuid', - 'time', 'calendar', 'zoneinfo', 'locale', 'gettext', - 'decimal', 'fractions', 'statistics', 'cmath', 'bisect', - 'heapq', 'queue', 'weakref', 'copy', 'pprint', 'reprlib', - 'enum', 'dataclasses', 'contextlib', 'abc', 'atexit', - 'traceback', 'gc', 'inspect', 'site', 'warnings', 'keyword', - 'builtins', '__future__', 'imp', 'importlib', 'pkgutil', - 'modulefinder', 'runpy', 'ast', 'dis', 'pickletools' - } - - # Get the root module name (e.g., 'os.path' -> 'os') - root_module = module_name.split('.')[0] - return root_module in stdlib_modules - - def _is_external_package(self, module_name: str) -> bool: - """Check if module is from an external package (not stdlib, not local).""" - # If it's stdlib, it's not external - if self._is_stdlib_module(module_name): - return False - - # Check if it starts with known external package patterns - # (This could be enhanced with actual dependency parsing) - external_patterns = [ - 'numpy', 'pandas', 'scipy', 'matplotlib', 'seaborn', - 'sklearn', 'torch', 'tensorflow', 'keras', 'cv2', - 'requests', 'urllib3', 'httpx', 'aiohttp', - 'flask', 'django', 'fastapi', 'starlette', - 'sqlalchemy', 'psycopg2', 'pymongo', 'redis', - 'pytest', 'unittest2', 'mock', 'nose', - 'click', 'typer', 'argparse', 'fire', - 'pyyaml', 'toml', 'configparser', 'python-dotenv', - 'pillow', 'imageio', 'opencv', 'scikit', - 'beautifulsoup4', 'lxml', 'scrapy', - 'celery', 'rq', 'dramatiq', - 'pydantic', 'marshmallow', 'cerberus', - 'cryptography', 'bcrypt', 'passlib' - ] - - root_module = module_name.split('.')[0] - return any(root_module.startswith(pattern) for pattern in external_patterns) - - def _extract_package_name(self, module_name: str) -> str: - """Extract package name from module path.""" - # For most packages, the root module is the package name - root_module = module_name.split('.')[0] - - # Handle special cases where module name differs from package name - package_mapping = { - 'cv2': 'opencv-python', - 'sklearn': 'scikit-learn', - 'PIL': 'Pillow', - 'bs4': 'beautifulsoup4', - 'yaml': 'PyYAML', - } - - return package_mapping.get(root_module, root_module) - - def _extract_module_path(self, module_name: str) -> str: - """Extract module path within package.""" - parts = module_name.split('.') - if len(parts) > 1: - # Return submodule path (everything after package name) - return '/'.join(parts[1:]) - return "" - - def _infer_descriptor_from_name(self, name: str) -> str: - """Infer SCIP descriptor from symbol name.""" - # Simple heuristics for Python symbols - if name.isupper(): # Constants like MAX_SIZE - return "." - elif name.istitle(): # Classes like MyClass - return "#" - elif name.endswith('Error') or name.endswith('Exception'): # Exception classes - return "#" - else: # Functions, variables, etc. - return "()." if name.islower() else "." - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between Python symbols. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"PythonStrategy: Building symbol relationships for {len(files)} files") - - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"PythonStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _extract_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """ - Extract relationships from a single Python file. - - Args: - file_path: File to analyze - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - content = self._read_file_content(file_path) - if not content: - return {} - - try: - tree = ast.parse(content) - except SyntaxError as e: - logger.warning(f"Syntax error in {file_path}: {e}") - return {} - - extractor = PythonRelationshipExtractor( - file_path=file_path, - project_path=project_path, - symbol_manager=self.symbol_manager - ) - - extractor.visit(tree) - return extractor.get_relationships() - - -class PythonRelationshipExtractor(ast.NodeVisitor): - """ - AST visitor for extracting Python symbol relationships. - """ - - def __init__(self, file_path: str, project_path: str, symbol_manager): - self.file_path = file_path - self.project_path = project_path - self.symbol_manager = symbol_manager - self.relationships = {} - self.current_scope = [] - self.current_class = None - self.current_function = None - - def get_relationships(self) -> Dict[str, List[tuple]]: - """Get extracted relationships.""" - return self.relationships - - def _add_relationship(self, source_symbol_id: str, target_symbol_id: str, relationship_type: InternalRelationshipType): - """Add a relationship to the collection.""" - if source_symbol_id not in self.relationships: - self.relationships[source_symbol_id] = [] - self.relationships[source_symbol_id].append((target_symbol_id, relationship_type)) - - def visit_ClassDef(self, node: ast.ClassDef): - """Visit class definition and extract inheritance relationships.""" - old_class = self.current_class - self.current_class = node.name - self.current_scope.append(node.name) - - # Generate class symbol ID - class_symbol_id = self._generate_symbol_id(self.current_scope, "#") - - # Extract inheritance relationships - for base in node.bases: - if isinstance(base, ast.Name): - # Direct inheritance: class Child(Parent) - parent_symbol_id = self._generate_symbol_id([base.id], "#") - self._add_relationship( - class_symbol_id, - parent_symbol_id, - InternalRelationshipType.INHERITS - ) - elif isinstance(base, ast.Attribute): - # Module-qualified inheritance: class Child(module.Parent) - parent_name = self._extract_attribute_name(base) - parent_symbol_id = self._generate_symbol_id([parent_name], "#") - self._add_relationship( - class_symbol_id, - parent_symbol_id, - InternalRelationshipType.INHERITS - ) - - # Visit class body - self.generic_visit(node) - - self.current_scope.pop() - self.current_class = old_class - - def visit_FunctionDef(self, node: ast.FunctionDef): - """Visit function definition and extract call relationships.""" - old_function = self.current_function - self.current_function = node.name - self.current_scope.append(node.name) - - # Generate function symbol ID - function_symbol_id = self._generate_symbol_id(self.current_scope, "().") - - # Extract function calls from body - call_extractor = FunctionCallExtractor(function_symbol_id, self) - for stmt in node.body: - call_extractor.visit(stmt) - - # Visit function body - self.generic_visit(node) - - self.current_scope.pop() - self.current_function = old_function - - def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef): - """Visit async function definition.""" - # Treat async functions the same as regular functions - self.visit_FunctionDef(node) - - def _generate_symbol_id(self, symbol_path: List[str], descriptor: str) -> str: - """Generate SCIP symbol ID for a symbol.""" - if self.symbol_manager: - return self.symbol_manager.create_local_symbol( - language="python", - file_path=self.file_path, - symbol_path=symbol_path, - descriptor=descriptor - ) - return f"local {'/'.join(symbol_path)}{descriptor}" - - def _extract_attribute_name(self, node: ast.Attribute) -> str: - """Extract full name from attribute node (e.g., 'module.Class').""" - if isinstance(node.value, ast.Name): - return f"{node.value.id}.{node.attr}" - elif isinstance(node.value, ast.Attribute): - return f"{self._extract_attribute_name(node.value)}.{node.attr}" - return node.attr - - -class FunctionCallExtractor(ast.NodeVisitor): - """ - Specialized visitor for extracting function calls within a function. - """ - - def __init__(self, source_function_id: str, parent_extractor): - self.source_function_id = source_function_id - self.parent_extractor = parent_extractor - - def visit_Call(self, node: ast.Call): - """Visit function call and extract relationship.""" - target_name = None - - if isinstance(node.func, ast.Name): - # Simple function call: func() - target_name = node.func.id - elif isinstance(node.func, ast.Attribute): - # Method call or module function call: obj.method() or module.func() - target_name = self.parent_extractor._extract_attribute_name(node.func) - - if target_name: - # Generate target symbol ID - target_symbol_id = self.parent_extractor._generate_symbol_id([target_name], "().") - - # Add call relationship - self.parent_extractor._add_relationship( - self.source_function_id, - target_symbol_id, - InternalRelationshipType.CALLS - ) - - # Continue visiting nested calls - self.generic_visit(node) \ No newline at end of file diff --git a/src/code_index_mcp/scip/strategies/zig_strategy.py b/src/code_index_mcp/scip/strategies/zig_strategy.py deleted file mode 100644 index a277923..0000000 --- a/src/code_index_mcp/scip/strategies/zig_strategy.py +++ /dev/null @@ -1,309 +0,0 @@ -"""Zig SCIP indexing strategy - SCIP standard compliant.""" - -import logging -import os -import re -from typing import List, Optional, Dict, Any, Set -from pathlib import Path - -import tree_sitter -from tree_sitter_zig import language as zig_language - -from .base_strategy import SCIPIndexerStrategy, StrategyError -from ..proto import scip_pb2 -from ..core.position_calculator import PositionCalculator -from ..core.relationship_types import InternalRelationshipType - - -logger = logging.getLogger(__name__) - - -class ZigStrategy(SCIPIndexerStrategy): - """SCIP-compliant Zig indexing strategy.""" - - SUPPORTED_EXTENSIONS = {'.zig', '.zon'} - - def __init__(self, priority: int = 95): - """Initialize the Zig strategy.""" - super().__init__(priority) - - # Initialize parser - lang = tree_sitter.Language(zig_language()) - self.parser = tree_sitter.Parser(lang) - self.use_tree_sitter = True - - def can_handle(self, extension: str, file_path: str) -> bool: - """Check if this strategy can handle the file type.""" - return extension.lower() in self.SUPPORTED_EXTENSIONS - - def get_language_name(self) -> str: - """Get the language name for SCIP symbol generation.""" - return "zig" - - def is_available(self) -> bool: - """Check if this strategy is available.""" - return self.use_tree_sitter and self.parser is not None - - def _collect_symbol_definitions(self, files: List[str], project_path: str) -> None: - """Phase 1: Collect all symbol definitions from Zig files.""" - logger.debug(f"ZigStrategy Phase 1: Processing {len(files)} files for symbol collection") - processed_count = 0 - error_count = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - self._collect_symbols_from_file(file_path, project_path) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 1 progress: {i}/{len(files)} files, last file: {relative_path}") - - except Exception as e: - error_count += 1 - logger.warning(f"Phase 1 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 1 summary: {processed_count} files processed, {error_count} errors") - - def _generate_documents_with_references(self, files: List[str], project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> List[scip_pb2.Document]: - """Phase 2: Generate complete SCIP documents with resolved references.""" - documents = [] - logger.debug(f"ZigStrategy Phase 2: Generating documents for {len(files)} files") - processed_count = 0 - error_count = 0 - total_occurrences = 0 - total_symbols = 0 - - for i, file_path in enumerate(files, 1): - relative_path = os.path.relpath(file_path, project_path) - - try: - document = self._analyze_zig_file(file_path, project_path, relationships) - if document: - documents.append(document) - total_occurrences += len(document.occurrences) - total_symbols += len(document.symbols) - processed_count += 1 - - if i % 10 == 0 or i == len(files): - logger.debug(f"Phase 2 progress: {i}/{len(files)} files, " - f"last file: {relative_path}, " - f"{len(document.occurrences) if document else 0} occurrences") - - except Exception as e: - error_count += 1 - logger.error(f"Phase 2 failed for {relative_path}: {e}") - continue - - logger.info(f"Phase 2 summary: {processed_count} documents generated, {error_count} errors, " - f"{total_occurrences} total occurrences, {total_symbols} total symbols") - - return documents - - def _collect_symbols_from_file(self, file_path: str, project_path: str) -> None: - """Collect symbol definitions from a single Zig file.""" - # Read file content - content = self._read_file_content(file_path) - if not content: - logger.debug(f"Empty file skipped: {os.path.relpath(file_path, project_path)}") - return - - relative_path = self._get_relative_path(file_path, project_path) - - if self.use_tree_sitter and self.parser: - # Parse with Tree-sitter - tree = self._parse_content(content) - if tree: - self._collect_symbols_from_tree_sitter(tree, relative_path, content) - logger.debug(f"Tree-sitter symbol collection - {relative_path}") - return - - raise StrategyError(f"Failed to parse {relative_path} with tree-sitter for symbol collection") - - def _analyze_zig_file(self, file_path: str, project_path: str, relationships: Optional[Dict[str, List[tuple]]] = None) -> Optional[scip_pb2.Document]: - """Analyze a single Zig file and generate complete SCIP document.""" - # Read file content - content = self._read_file_content(file_path) - if not content: - return None - - # Create SCIP document - document = scip_pb2.Document() - document.relative_path = self._get_relative_path(file_path, project_path) - document.language = "zig" - - # Initialize position calculator - self.position_calculator = PositionCalculator(content) - - if self.use_tree_sitter and self.parser: - # Parse with Tree-sitter - tree = self._parse_content(content) - if tree: - occurrences, symbols = self._analyze_tree_sitter_for_document(tree, document.relative_path, content, relationships) - document.occurrences.extend(occurrences) - document.symbols.extend(symbols) - - logger.debug(f"Analyzed Zig file {document.relative_path}: " - f"{len(document.occurrences)} occurrences, {len(document.symbols)} symbols") - return document - - raise StrategyError(f"Failed to parse {document.relative_path} with tree-sitter for document analysis") - - return document - - def _parse_content(self, content: str) -> Optional: - """Parse content with tree-sitter parser.""" - if not self.parser: - return None - - try: - content_bytes = content.encode('utf-8') - return self.parser.parse(content_bytes) - except Exception as e: - logger.error(f"Failed to parse content with tree-sitter: {e}") - return None - - def _build_symbol_relationships(self, files: List[str], project_path: str) -> Dict[str, List[tuple]]: - """ - Build relationships between Zig symbols. - - Args: - files: List of file paths to process - project_path: Project root path - - Returns: - Dictionary mapping symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - logger.debug(f"ZigStrategy: Building symbol relationships for {len(files)} files") - - all_relationships = {} - - for file_path in files: - try: - file_relationships = self._extract_relationships_from_file(file_path, project_path) - all_relationships.update(file_relationships) - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - - total_symbols_with_relationships = len(all_relationships) - total_relationships = sum(len(rels) for rels in all_relationships.values()) - - logger.debug(f"ZigStrategy: Built {total_relationships} relationships for {total_symbols_with_relationships} symbols") - return all_relationships - - def _extract_relationships_from_file(self, file_path: str, project_path: str) -> Dict[str, List[tuple]]: - """Extract relationships from a single Zig file.""" - content = self._read_file_content(file_path) - if not content: - return {} - - relative_path = self._get_relative_path(file_path, project_path) - - if self.use_tree_sitter and self.parser: - tree = self._parse_content(content) - if tree: - return self._extract_relationships_from_tree_sitter(tree, relative_path, content) - - raise StrategyError(f"Failed to parse {relative_path} with tree-sitter for relationship extraction") - - # Tree-sitter based methods - def _collect_symbols_from_tree_sitter(self, tree, file_path: str, content: str) -> None: - """Collect symbols using Tree-sitter AST.""" - scope_stack = [] - - def visit_node(node): - node_type = node.type - - # Function declarations - if node_type == 'function_declaration': - self._register_function_symbol_ts(node, file_path, scope_stack, content) - # Struct declarations - elif node_type == 'struct_declaration': - self._register_struct_symbol_ts(node, file_path, scope_stack, content) - # Enum declarations - elif node_type == 'enum_declaration': - self._register_enum_symbol_ts(node, file_path, scope_stack, content) - # Const/var declarations - elif node_type in ['const_declaration', 'var_declaration']: - self._register_variable_symbol_ts(node, file_path, scope_stack, content) - # Test declarations - elif node_type == 'test_declaration': - self._register_test_symbol_ts(node, file_path, scope_stack, content) - - # Recursively analyze child nodes - for child in node.children: - visit_node(child) - - visit_node(tree.root_node) - - def _analyze_tree_sitter_for_document(self, tree, file_path: str, content: str) -> tuple: - """Analyze Tree-sitter AST to generate SCIP occurrences and symbols.""" - occurrences = [] - symbols = [] - scope_stack = [] - - def visit_node(node): - node_type = node.type - - # Process different node types - if node_type == 'function_declaration': - occ, sym = self._process_function_ts(node, file_path, scope_stack, content) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - elif node_type == 'struct_declaration': - occ, sym = self._process_struct_ts(node, file_path, scope_stack, content) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - elif node_type == 'enum_declaration': - occ, sym = self._process_enum_ts(node, file_path, scope_stack, content) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - elif node_type in ['const_declaration', 'var_declaration']: - occ, sym = self._process_variable_ts(node, file_path, scope_stack, content) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - elif node_type == 'test_declaration': - occ, sym = self._process_test_ts(node, file_path, scope_stack, content) - if occ: occurrences.append(occ) - if sym: symbols.append(sym) - elif node_type == 'identifier': - occ = self._process_identifier_ts(node, file_path, scope_stack, content) - if occ: occurrences.append(occ) - - # Recursively analyze child nodes - for child in node.children: - visit_node(child) - - visit_node(tree.root_node) - return occurrences, symbols - - def _extract_relationships_from_tree_sitter(self, tree, file_path: str, content: str) -> Dict[str, List[tuple]]: - """Extract relationships from Tree-sitter AST.""" - relationships = {} - scope_stack = [] - - def visit_node(node): - node_type = node.type - - if node_type in ['function_declaration', 'test_declaration']: - # Extract function call relationships within this function - function_name = self._get_function_name_ts(node, content) - if function_name: - function_symbol_id = self.symbol_manager.create_local_symbol( - language="zig", - file_path=file_path, - symbol_path=scope_stack + [function_name], - descriptor="()." - ) - - # Find call expressions within this function - self._extract_calls_from_node_ts(node, function_symbol_id, relationships, file_path, scope_stack, content) - - # Recursively visit children - for child in node.children: - visit_node(child) - - visit_node(tree.root_node) - return relationships diff --git a/src/code_index_mcp/search/ag.py b/src/code_index_mcp/search/ag.py index e2506a2..aa3eb33 100644 --- a/src/code_index_mcp/search/ag.py +++ b/src/code_index_mcp/search/ag.py @@ -27,7 +27,8 @@ def search( context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: bool = False + regex: bool = False, + max_line_length: Optional[int] = None ) -> Dict[str, List[Tuple[int, str]]]: """ Execute a search using The Silver Searcher (ag). @@ -40,6 +41,7 @@ def search( file_pattern: File pattern to filter fuzzy: Enable word boundary matching (not true fuzzy search) regex: Enable regex pattern matching + max_line_length: Optional. Limit the length of lines when context_lines is used """ # ag prints line numbers and groups by file by default, which is good. # --noheading is used to be consistent with other tools' output format. @@ -93,6 +95,26 @@ def search( cmd.extend(['-G', regex_pattern]) + processed_patterns = set() + exclude_dirs = getattr(self, 'exclude_dirs', []) + exclude_file_patterns = getattr(self, 'exclude_file_patterns', []) + + for directory in exclude_dirs: + normalized = directory.strip() + if not normalized or normalized in processed_patterns: + continue + cmd.extend(['--ignore', normalized]) + processed_patterns.add(normalized) + + for pattern in exclude_file_patterns: + normalized = pattern.strip() + if not normalized or normalized in processed_patterns: + continue + if normalized.startswith('!'): + normalized = normalized[1:] + cmd.extend(['--ignore', normalized]) + processed_patterns.add(normalized) + # Add -- to treat pattern as a literal argument, preventing injection cmd.append('--') cmd.append(search_pattern) @@ -116,10 +138,10 @@ def search( if process.returncode > 1: raise RuntimeError(f"ag failed with exit code {process.returncode}: {process.stderr}") - return parse_search_output(process.stdout, base_path) + return parse_search_output(process.stdout, base_path, max_line_length) except FileNotFoundError: raise RuntimeError("'ag' (The Silver Searcher) not found. Please install it and ensure it's in your PATH.") except Exception as e: # Re-raise other potential exceptions like permission errors - raise RuntimeError(f"An error occurred while running ag: {e}") + raise RuntimeError(f"An error occurred while running ag: {e}") diff --git a/src/code_index_mcp/search/base.py b/src/code_index_mcp/search/base.py index 038e6b5..5e4c63b 100644 --- a/src/code_index_mcp/search/base.py +++ b/src/code_index_mcp/search/base.py @@ -10,17 +10,25 @@ import subprocess import sys from abc import ABC, abstractmethod -from typing import Dict, List, Optional, Tuple, Any +from typing import Any, Dict, List, Optional, Tuple, TYPE_CHECKING from ..indexing.qualified_names import normalize_file_path -def parse_search_output(output: str, base_path: str) -> Dict[str, List[Tuple[int, str]]]: +if TYPE_CHECKING: # pragma: no cover + from ..utils.file_filter import FileFilter + +def parse_search_output( + output: str, + base_path: str, + max_line_length: Optional[int] = None +) -> Dict[str, List[Tuple[int, str]]]: """ Parse the output of command-line search tools (grep, ag, rg). Args: output: The raw output from the command-line tool. base_path: The base path of the project to make file paths relative. + max_line_length: Optional maximum line length to truncate long lines. Returns: A dictionary where keys are file paths and values are lists of (line_number, line_content) tuples. @@ -33,26 +41,53 @@ def parse_search_output(output: str, base_path: str) -> Dict[str, List[Tuple[int if not line.strip(): continue try: - # Handle Windows paths which might have a drive letter, e.g., C: + # Try to parse as a matched line first (format: path:linenum:content) parts = line.split(':', 2) - if sys.platform == "win32" and len(parts[0]) == 1 and parts[1].startswith('\\'): - # Re-join drive letter with the rest of the path + + # Check if this might be a context line (format: path-linenum-content) + # Context lines use '-' as separator in grep/ag output + if len(parts) < 3 and '-' in line: + # Try to parse as context line + # Match pattern: path-linenum-content or path-linenum-\tcontent + match = re.match(r'^(.*?)-(\d+)[-\t](.*)$', line) + if match: + file_path_abs = match.group(1) + line_number_str = match.group(2) + content = match.group(3) + else: + # If regex doesn't match, skip this line + continue + elif sys.platform == "win32" and len(parts) >= 3 and len(parts[0]) == 1 and parts[1].startswith('\\'): + # Handle Windows paths with drive letter (e.g., C:\path\file.txt) file_path_abs = f"{parts[0]}:{parts[1]}" line_number_str = parts[2].split(':', 1)[0] - content = parts[2].split(':', 1)[1] - else: + content = parts[2].split(':', 1)[1] if ':' in parts[2] else parts[2] + elif len(parts) >= 3: + # Standard format: path:linenum:content file_path_abs = parts[0] line_number_str = parts[1] content = parts[2] + else: + # Line doesn't match any expected format + continue line_number = int(line_number_str) - # Make the file path relative to the base_path - relative_path = os.path.relpath(file_path_abs, normalized_base_path) + # If the path is already relative (doesn't start with /), keep it as is + # Otherwise, make it relative to the base_path + if os.path.isabs(file_path_abs): + relative_path = os.path.relpath(file_path_abs, normalized_base_path) + else: + # Path is already relative, use it as is + relative_path = file_path_abs # Normalize path separators for consistency relative_path = normalize_file_path(relative_path) + # Truncate content if it exceeds max_line_length + if max_line_length and len(content) > max_line_length: + content = content[:max_line_length] + '... (truncated)' + if relative_path not in results: results[relative_path] = [] results[relative_path].append((line_number, content)) @@ -150,6 +185,16 @@ class SearchStrategy(ABC): Each strategy is responsible for searching code using a specific tool or method. """ + def configure_excludes(self, file_filter: Optional['FileFilter']) -> None: + """Configure shared exclusion settings for the strategy.""" + self.file_filter = file_filter + if file_filter: + self.exclude_dirs = sorted(set(file_filter.exclude_dirs)) + self.exclude_file_patterns = sorted(set(file_filter.exclude_files)) + else: + self.exclude_dirs = [] + self.exclude_file_patterns = [] + @property @abstractmethod def name(self) -> str: @@ -175,7 +220,8 @@ def search( context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: bool = False + regex: bool = False, + max_line_length: Optional[int] = None ) -> Dict[str, List[Tuple[int, str]]]: """ Execute a search using the specific strategy. @@ -193,4 +239,3 @@ def search( A dictionary mapping filenames to lists of (line_number, line_content) tuples. """ pass - diff --git a/src/code_index_mcp/search/basic.py b/src/code_index_mcp/search/basic.py index 57aab77..9ef1846 100644 --- a/src/code_index_mcp/search/basic.py +++ b/src/code_index_mcp/search/basic.py @@ -1,9 +1,10 @@ """ Basic, pure-Python search strategy. """ +import fnmatch import os import re -import fnmatch +from pathlib import Path from typing import Dict, List, Optional, Tuple from .base import SearchStrategy, create_word_boundary_pattern, is_safe_regex_pattern @@ -46,7 +47,8 @@ def search( context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: bool = False + regex: bool = False, + max_line_length: Optional[int] = None ) -> Dict[str, List[Tuple[int, str]]]: """ Execute a basic, line-by-line search. @@ -60,6 +62,7 @@ def search( file_pattern: File pattern to filter fuzzy: Enable word boundary matching regex: Enable regex pattern matching + max_line_length: Optional. Limit the length of lines when context_lines is used """ results: Dict[str, List[Tuple[int, str]]] = {} @@ -81,28 +84,38 @@ def search( except re.error as e: raise ValueError(f"Invalid regex pattern: {pattern}, error: {e}") - for root, _, files in os.walk(base_path): + file_filter = getattr(self, 'file_filter', None) + base = Path(base_path) + + for root, dirs, files in os.walk(base_path): + if file_filter: + dirs[:] = [d for d in dirs if not file_filter.should_exclude_directory(d)] + for file in files: - # Improved file pattern matching with glob support if file_pattern and not self._matches_pattern(file, file_pattern): continue - file_path = os.path.join(root, file) + file_path = Path(root) / file + + if file_filter and not file_filter.should_process_path(file_path, base): + continue + rel_path = os.path.relpath(file_path, base_path) - + try: with open(file_path, 'r', encoding='utf-8', errors='ignore') as f: for line_num, line in enumerate(f, 1): if search_regex.search(line): + content = line.rstrip('\n') + if max_line_length and len(content) > max_line_length: + content = content[:max_line_length] + '... (truncated)' + if rel_path not in results: results[rel_path] = [] - # Strip newline for consistent output - results[rel_path].append((line_num, line.rstrip('\n'))) + results[rel_path].append((line_num, content)) except (UnicodeDecodeError, PermissionError, OSError): - # Ignore files that can't be opened or read due to encoding/permission issues continue except Exception: - # Ignore any other unexpected exceptions to maintain robustness continue - return results \ No newline at end of file + return results diff --git a/src/code_index_mcp/search/grep.py b/src/code_index_mcp/search/grep.py index cd2d18e..f24c469 100644 --- a/src/code_index_mcp/search/grep.py +++ b/src/code_index_mcp/search/grep.py @@ -32,7 +32,8 @@ def search( context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: bool = False + regex: bool = False, + max_line_length: Optional[int] = None ) -> Dict[str, List[Tuple[int, str]]]: """ Execute a search using standard grep. @@ -45,6 +46,7 @@ def search( file_pattern: File pattern to filter fuzzy: Enable word boundary matching regex: Enable regex pattern matching + max_line_length: Optional. Limit the length of lines when context_lines is used """ # -r: recursive, -n: line number cmd = ['grep', '-r', '-n'] @@ -81,6 +83,27 @@ def search( # Note: grep's --include uses glob patterns, not regex cmd.append(f'--include={file_pattern}') + exclude_dirs = getattr(self, 'exclude_dirs', []) + exclude_file_patterns = getattr(self, 'exclude_file_patterns', []) + + processed_dirs = set() + for directory in exclude_dirs: + normalized = directory.strip() + if not normalized or normalized in processed_dirs: + continue + cmd.append(f'--exclude-dir={normalized}') + processed_dirs.add(normalized) + + processed_files = set() + for pattern in exclude_file_patterns: + normalized = pattern.strip() + if not normalized or normalized in processed_files: + continue + if normalized.startswith('!'): + normalized = normalized[1:] + cmd.append(f'--exclude={normalized}') + processed_files.add(normalized) + # Add -- to treat pattern as a literal argument, preventing injection cmd.append('--') cmd.append(search_pattern) @@ -102,9 +125,9 @@ def search( if process.returncode > 1: raise RuntimeError(f"grep failed with exit code {process.returncode}: {process.stderr}") - return parse_search_output(process.stdout, base_path) + return parse_search_output(process.stdout, base_path, max_line_length) except FileNotFoundError: raise RuntimeError("'grep' not found. Please install it and ensure it's in your PATH.") except Exception as e: - raise RuntimeError(f"An error occurred while running grep: {e}") + raise RuntimeError(f"An error occurred while running grep: {e}") diff --git a/src/code_index_mcp/search/ripgrep.py b/src/code_index_mcp/search/ripgrep.py index 15dd6c0..8a5c325 100644 --- a/src/code_index_mcp/search/ripgrep.py +++ b/src/code_index_mcp/search/ripgrep.py @@ -27,7 +27,8 @@ def search( context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: bool = False + regex: bool = False, + max_line_length: Optional[int] = None ) -> Dict[str, List[Tuple[int, str]]]: """ Execute a search using ripgrep. @@ -40,6 +41,7 @@ def search( file_pattern: File pattern to filter fuzzy: Enable word boundary matching (not true fuzzy search) regex: Enable regex pattern matching + max_line_length: Optional. Limit the length of lines when context_lines is used """ cmd = ['rg', '--line-number', '--no-heading', '--color=never', '--no-ignore'] @@ -67,6 +69,31 @@ def search( if file_pattern: cmd.extend(['--glob', file_pattern]) + exclude_dirs = getattr(self, 'exclude_dirs', []) + exclude_file_patterns = getattr(self, 'exclude_file_patterns', []) + + processed_patterns = set() + + for directory in exclude_dirs: + normalized = directory.strip() + if not normalized or normalized in processed_patterns: + continue + cmd.extend(['--glob', f'!**/{normalized}/**']) + processed_patterns.add(normalized) + + for pattern in exclude_file_patterns: + normalized = pattern.strip() + if not normalized or normalized in processed_patterns: + continue + if normalized.startswith('!'): + glob_pattern = normalized + elif any(ch in normalized for ch in '*?[') or '/' in normalized: + glob_pattern = f'!{normalized}' + else: + glob_pattern = f'!**/{normalized}' + cmd.extend(['--glob', glob_pattern]) + processed_patterns.add(normalized) + # Add -- to treat pattern as a literal argument, preventing injection cmd.append('--') cmd.append(search_pattern) @@ -87,10 +114,10 @@ def search( if process.returncode > 1: raise RuntimeError(f"ripgrep failed with exit code {process.returncode}: {process.stderr}") - return parse_search_output(process.stdout, base_path) + return parse_search_output(process.stdout, base_path, max_line_length) except FileNotFoundError: raise RuntimeError("ripgrep (rg) not found. Please install it and ensure it's in your PATH.") except Exception as e: # Re-raise other potential exceptions like permission errors - raise RuntimeError(f"An error occurred while running ripgrep: {e}") + raise RuntimeError(f"An error occurred while running ripgrep: {e}") diff --git a/src/code_index_mcp/search/ugrep.py b/src/code_index_mcp/search/ugrep.py index 69f2cc4..d4302c1 100644 --- a/src/code_index_mcp/search/ugrep.py +++ b/src/code_index_mcp/search/ugrep.py @@ -27,7 +27,8 @@ def search( context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: bool = False + regex: bool = False, + max_line_length: Optional[int] = None ) -> Dict[str, List[Tuple[int, str]]]: """ Execute a search using the 'ug' command-line tool. @@ -40,11 +41,12 @@ def search( file_pattern: File pattern to filter fuzzy: Enable true fuzzy search (ugrep native support) regex: Enable regex pattern matching + max_line_length: Optional. Limit the length of lines when context_lines is used """ if not self.is_available(): return {"error": "ugrep (ug) command not found."} - cmd = ['ug', '--line-number', '--no-heading'] + cmd = ['ug', '-r', '--line-number', '--no-heading'] if fuzzy: # ugrep has native fuzzy search support @@ -65,7 +67,31 @@ def search( cmd.extend(['-A', str(context_lines), '-B', str(context_lines)]) if file_pattern: - cmd.extend(['-g', file_pattern]) # Correct parameter for file patterns + cmd.extend(['--include', file_pattern]) + + processed_patterns = set() + exclude_dirs = getattr(self, 'exclude_dirs', []) + exclude_file_patterns = getattr(self, 'exclude_file_patterns', []) + + for directory in exclude_dirs: + normalized = directory.strip() + if not normalized or normalized in processed_patterns: + continue + cmd.extend(['--ignore', f'**/{normalized}/**']) + processed_patterns.add(normalized) + + for pattern in exclude_file_patterns: + normalized = pattern.strip() + if not normalized or normalized in processed_patterns: + continue + if normalized.startswith('!'): + ignore_pattern = normalized[1:] + elif any(ch in normalized for ch in '*?[') or '/' in normalized: + ignore_pattern = normalized + else: + ignore_pattern = f'**/{normalized}' + cmd.extend(['--ignore', ignore_pattern]) + processed_patterns.add(normalized) # Add '--' to treat pattern as a literal argument, preventing injection cmd.append('--') @@ -89,7 +115,7 @@ def search( error_output = process.stderr.strip() return {"error": f"ugrep execution failed with code {process.returncode}", "details": error_output} - return parse_search_output(process.stdout, base_path) + return parse_search_output(process.stdout, base_path, max_line_length) except FileNotFoundError: return {"error": "ugrep (ug) command not found. Please ensure it's installed and in your PATH."} diff --git a/src/code_index_mcp/server.py b/src/code_index_mcp/server.py index da36810..2d1eb80 100644 --- a/src/code_index_mcp/server.py +++ b/src/code_index_mcp/server.py @@ -13,10 +13,9 @@ import logging from contextlib import asynccontextmanager from dataclasses import dataclass -from typing import AsyncIterator, Dict, Any, Optional, List +from typing import AsyncIterator, Dict, Any, List # Third-party imports -from mcp import types from mcp.server.fastmcp import FastMCP, Context # Local imports @@ -24,7 +23,6 @@ from .services import ( SearchService, FileService, SettingsService, FileWatcherService ) -from .indexing.unified_index_manager import UnifiedIndexManager from .services.settings_service import manage_temp_directory from .services.file_discovery_service import FileDiscoveryService from .services.project_management_service import ProjectManagementService @@ -61,7 +59,6 @@ class CodeIndexerContext: base_path: str settings: ProjectSettings file_count: int = 0 - index_manager: Optional['UnifiedIndexManager'] = None file_watcher_service: FileWatcherService = None @asynccontextmanager @@ -88,10 +85,6 @@ async def indexer_lifespan(_server: FastMCP) -> AsyncIterator[CodeIndexerContext if context.file_watcher_service: context.file_watcher_service.stop_monitoring() - # Only save index if project path has been set - if context.base_path and context.index_manager: - context.index_manager.save_index() - # Create the MCP server with lifespan manager mcp = FastMCP("CodeIndexer", lifespan=indexer_lifespan, dependencies=["pathlib"]) @@ -112,13 +105,7 @@ def get_file_content(file_path: str) -> str: # Use FileService for simple file reading - this is appropriate for a resource return FileService(ctx).get_file_content(file_path) -@mcp.resource("structure://project") -@handle_mcp_resource_errors -def get_project_structure() -> str: - """Get the structure of the project as a JSON tree.""" - ctx = mcp.get_context() - return ProjectManagementService(ctx).get_project_structure() - +# Removed: structure://project resource - not necessary for most workflows # Removed: settings://stats resource - this information is available via get_settings_info() tool # and is more of a debugging/technical detail rather than context AI needs @@ -139,7 +126,8 @@ def search_code_advanced( context_lines: int = 0, file_pattern: str = None, fuzzy: bool = False, - regex: bool = None + regex: bool = None, + max_line_length: int = None ) -> Dict[str, Any]: """ Search for a code pattern in the project using an advanced, fast tool. @@ -153,6 +141,7 @@ def search_code_advanced( context_lines: Number of lines to show before and after the match. file_pattern: A glob pattern to filter files to search in (e.g., "*.py", "*.js", "test_*.py"). + max_line_length: Optional. Default None (no limit). Limits the length of lines when context_lines is used. All search tools now handle glob patterns consistently: - ugrep: Uses glob patterns (*.py, *.{js,ts}) - ripgrep: Uses glob patterns (*.py, *.{js,ts}) @@ -181,7 +170,8 @@ def search_code_advanced( context_lines=context_lines, file_pattern=file_pattern, fuzzy=fuzzy, - regex=regex + regex=regex, + max_line_length=max_line_length ) @mcp.tool() @@ -247,6 +237,16 @@ def refresh_index(ctx: Context) -> str: """ return IndexManagementService(ctx).rebuild_index() +@mcp.tool() +@handle_mcp_tool_errors(return_type='str') +def build_deep_index(ctx: Context) -> str: + """ + Build the deep index (full symbol extraction) for the current project. + + This performs a complete re-index and loads it into memory. + """ + return IndexManagementService(ctx).rebuild_deep_index() + @mcp.tool() @handle_mcp_tool_errors(return_type='dict') def get_settings_info(ctx: Context) -> Dict[str, Any]: @@ -298,62 +298,7 @@ def configure_file_watcher( return SystemManagementService(ctx).configure_file_watcher(enabled, debounce_seconds, additional_exclude_patterns) # ----- PROMPTS ----- - -@mcp.prompt() -def analyze_code(file_path: str = "", query: str = "") -> list[types.PromptMessage]: - """Prompt for analyzing code in the project.""" - messages = [ - types.PromptMessage(role="user", content=types.TextContent(type="text", text=f"""I need you to analyze some code from my project. - -{f'Please analyze the file: {file_path}' if file_path else ''} -{f'I want to understand: {query}' if query else ''} - -First, let me give you some context about the project structure. Then, I'll provide the code to analyze. -""")), - types.PromptMessage( - role="assistant", - content=types.TextContent( - type="text", - text="I'll help you analyze the code. Let me first examine the project structure to get a better understanding of the codebase." - ) - ) - ] - return messages - -@mcp.prompt() -def code_search(query: str = "") -> types.TextContent: - """Prompt for searching code in the project.""" - search_text = "\"query\"" if not query else f"\"{query}\"" - return types.TextContent( - type="text", - text=f"""I need to search through my codebase for {search_text}. - -Please help me find all occurrences of this query and explain what each match means in its context. -Focus on the most relevant files and provide a brief explanation of how each match is used in the code. - -If there are too many results, prioritize the most important ones and summarize the patterns you see.""" - ) - -@mcp.prompt() -def set_project() -> list[types.PromptMessage]: - """Prompt for setting the project path.""" - messages = [ - types.PromptMessage(role="user", content=types.TextContent(type="text", text=""" - I need to analyze code from a project, but I haven't set the project path yet. Please help me set up the project path and index the code. - - First, I need to specify which project directory to analyze. - """)), - types.PromptMessage(role="assistant", content=types.TextContent(type="text", text=""" - Before I can help you analyze any code, we need to set up the project path. This is a required first step. - - Please provide the full path to your project folder. For example: - - Windows: "C:/Users/username/projects/my-project" - - macOS/Linux: "/home/username/projects/my-project" - - Once you provide the path, I'll use the `set_project_path` tool to configure the code analyzer to work with your project. - """)) - ] - return messages +# Removed: analyze_code, code_search, set_project prompts def main(): """Main function to run the MCP server.""" diff --git a/src/code_index_mcp/services/base_service.py b/src/code_index_mcp/services/base_service.py index f9931e7..a29e6bf 100644 --- a/src/code_index_mcp/services/base_service.py +++ b/src/code_index_mcp/services/base_service.py @@ -132,9 +132,9 @@ def index_provider(self): @property def index_manager(self): """ - Convenient access to the unified index manager. + Convenient access to the index manager. Returns: - The UnifiedIndexManager instance, or None if not available + The index manager instance, or None if not available """ return self.helper.index_manager diff --git a/src/code_index_mcp/services/code_intelligence_service.py b/src/code_index_mcp/services/code_intelligence_service.py index fec4b3f..af0f1a2 100644 --- a/src/code_index_mcp/services/code_intelligence_service.py +++ b/src/code_index_mcp/services/code_intelligence_service.py @@ -1,33 +1,32 @@ """ Code Intelligence Service - Business logic for code analysis and understanding. -This service handles the business logic for analyzing code files, extracting -intelligence, and providing comprehensive code insights. It composes technical -tools to achieve business goals. +This service handles the business logic for analyzing code files using the new +JSON-based indexing system optimized for LLM consumption. """ +import logging import os from typing import Dict, Any from .base_service import BaseService from ..tools.filesystem import FileSystemTool +from ..indexing import get_index_manager + +logger = logging.getLogger(__name__) class CodeIntelligenceService(BaseService): """ - Business service for code analysis and intelligence. + Business service for code analysis and intelligence using JSON indexing. - This service orchestrates code analysis workflows by composing - technical tools to achieve business goals like understanding code - structure, extracting insights, and providing comprehensive analysis. + This service provides comprehensive code analysis using the optimized + JSON-based indexing system for fast LLM-friendly responses. """ def __init__(self, ctx): super().__init__(ctx) self._filesystem_tool = FileSystemTool() - # Use new enhanced symbol analyzer instead of legacy SCIPQueryTool - from ..tools.scip.scip_symbol_analyzer import SCIPSymbolAnalyzer - self._symbol_analyzer = SCIPSymbolAnalyzer() def analyze_file(self, file_path: str) -> Dict[str, Any]: """ @@ -49,11 +48,29 @@ def analyze_file(self, file_path: str) -> Dict[str, Any]: # Business validation self._validate_analysis_request(file_path) - # Use enhanced SCIP analysis - analysis = self._perform_enhanced_scip_analysis(file_path) + # Use the global index manager + index_manager = get_index_manager() + + # Debug logging + logger.info(f"Getting file summary for: {file_path}") + logger.info(f"Index manager state - Project path: {index_manager.project_path}") + logger.info(f"Index manager state - Has builder: {index_manager.index_builder is not None}") + if index_manager.index_builder: + logger.info(f"Index manager state - Has index: {index_manager.index_builder.in_memory_index is not None}") + + # Get file summary from JSON index + summary = index_manager.get_file_summary(file_path) + logger.info(f"Summary result: {summary is not None}") + + # If deep index isn't available yet, return a helpful hint instead of error + if not summary: + return { + "status": "needs_deep_index", + "message": "Deep index not available. Please run build_deep_index before calling get_file_summary.", + "file_path": file_path + } - # Direct conversion to output format (no intermediate transformations) - return analysis.to_dict() + return summary def _validate_analysis_request(self, file_path: str) -> None: """ @@ -65,47 +82,23 @@ def _validate_analysis_request(self, file_path: str) -> None: Raises: ValueError: If validation fails """ - # Business rule: Project must be set up - self._require_project_setup() + # Business rule: Project must be set up OR auto-initialization must be possible + if self.base_path: + # Standard validation if project is set up in context + self._require_valid_file_path(file_path) + full_path = os.path.join(self.base_path, file_path) + if not os.path.exists(full_path): + raise ValueError(f"File does not exist: {file_path}") + else: + # Allow proceeding if auto-initialization might work + # The index manager will handle project discovery + logger.info("Project not set in context, relying on index auto-initialization") + + # Basic file path validation only + if not file_path or '..' in file_path: + raise ValueError(f"Invalid file path: {file_path}") - # Business rule: File path must be valid - self._require_valid_file_path(file_path) - # Business rule: File must exist - full_path = os.path.join(self.base_path, file_path) - if not os.path.exists(full_path): - raise ValueError(f"File does not exist: {file_path}") - - def _get_scip_tool(self): - """Get SCIP tool instance from the index manager.""" - if self.index_manager: - # Access the SCIP tool from unified index manager - return self.index_manager._get_scip_tool() - return None - - def _perform_enhanced_scip_analysis(self, file_path: str): - """ - Enhanced SCIP analysis using the new symbol analyzer. - - Args: - file_path: File path to analyze - - Returns: - FileAnalysis object with accurate symbol information - """ - # Get SCIP tool for index access - scip_tool = self._get_scip_tool() - if not scip_tool: - raise RuntimeError("SCIP tool is not available for file analysis") - - # Get raw SCIP index - scip_index = scip_tool.get_raw_index() - if not scip_index: - raise RuntimeError("SCIP index is not available for file analysis") - - # Use enhanced analyzer for accurate symbol analysis - return self._symbol_analyzer.analyze_file(file_path, scip_index) - diff --git a/src/code_index_mcp/services/file_discovery_service.py b/src/code_index_mcp/services/file_discovery_service.py index 0f03011..d777511 100644 --- a/src/code_index_mcp/services/file_discovery_service.py +++ b/src/code_index_mcp/services/file_discovery_service.py @@ -1,17 +1,15 @@ """ File Discovery Service - Business logic for intelligent file discovery. -This service handles the business logic for finding files in a project, -including pattern matching, relevance scoring, and result optimization. -It composes technical tools to achieve business goals. +This service handles the business logic for finding files using the new +JSON-based indexing system optimized for LLM consumption. """ from typing import Dict, Any, List, Optional from dataclasses import dataclass from .base_service import BaseService -from ..tools.filesystem import FileMatchingTool -from ..utils import ValidationHelper +from ..indexing import get_shallow_index_manager @dataclass @@ -26,24 +24,19 @@ class FileDiscoveryResult: class FileDiscoveryService(BaseService): """ - Business service for intelligent file discovery. + Business service for intelligent file discovery using JSON indexing. - This service orchestrates file discovery workflows by composing - technical tools to achieve business goals like finding relevant - files, optimizing search results, and providing meaningful metadata. + This service provides fast file discovery using the optimized JSON + indexing system for efficient LLM-oriented responses. """ def __init__(self, ctx): super().__init__(ctx) - self._matcher_tool = FileMatchingTool() + self._index_manager = get_shallow_index_manager() def find_files(self, pattern: str, max_results: Optional[int] = None) -> List[str]: """ - Find files matching the given pattern using intelligent discovery. - - This is the main business method that orchestrates the file discovery - workflow, ensuring the index is available, applying business rules, - and optimizing results for the user. + Find files matching the given pattern using JSON indexing. Args: pattern: Glob pattern to search for (e.g., "*.py", "test_*.js") @@ -58,14 +51,14 @@ def find_files(self, pattern: str, max_results: Optional[int] = None) -> List[st # Business validation self._validate_discovery_request(pattern) - # Business logic: Ensure index is ready - self._ensure_index_available() - - # Business workflow: Execute discovery - discovery_result = self._execute_discovery_workflow(pattern, max_results) - - # Business result formatting - return self._format_discovery_result(discovery_result) + # Get files from JSON index + files = self._index_manager.find_files(pattern) + + # Apply max_results limit if specified + if max_results and len(files) > max_results: + files = files[:max_results] + + return files def _validate_discovery_request(self, pattern: str) -> None: """ @@ -83,213 +76,3 @@ def _validate_discovery_request(self, pattern: str) -> None: # Validate pattern if not pattern or not pattern.strip(): raise ValueError("Search pattern cannot be empty") - - # Business rule: Validate glob pattern - error = ValidationHelper.validate_glob_pattern(pattern) - if error: - raise ValueError(f"Invalid search pattern: {error}") - - def _ensure_index_available(self) -> None: - """ - Business logic to ensure index is available for discovery. - - Now uses unified index manager instead of direct SCIP tool access. - - Raises: - RuntimeError: If index cannot be made available - """ - # Business rule: Check if unified index manager is available - if not self.index_manager: - raise RuntimeError("Index manager not available. Please initialize project first.") - - # Business rule: Check if index provider is available - provider = self.index_provider - if provider and provider.is_available(): - return - - # Business logic: Initialize or refresh index - try: - if not self.index_manager.initialize(): - raise RuntimeError("Failed to initialize index manager") - - # Update context with file count - provider = self.index_provider - if provider: - file_count = len(provider.get_file_list()) - self.helper.update_file_count(file_count) - - except Exception as e: - raise RuntimeError(f"Failed to ensure index availability: {e}") from e - - def _execute_discovery_workflow(self, pattern: str, max_results: Optional[int]) -> FileDiscoveryResult: - """ - Execute the core file discovery business workflow. - - Args: - pattern: Search pattern - max_results: Maximum results limit - - Returns: - FileDiscoveryResult with discovery data - """ - # Get all indexed files through unified interface - provider = self.index_provider - if not provider: - raise RuntimeError("Index provider not available. Please initialize project first.") - - all_files = provider.get_file_list() - - # Apply pattern matching using technical tool - matched_files = self._matcher_tool.match_glob_pattern(all_files, pattern) - - # Business logic: Apply relevance sorting - sorted_files = self._matcher_tool.sort_by_relevance(matched_files, pattern) - - # Business logic: Apply result limits if specified - if max_results: - limited_files = self._matcher_tool.limit_results(sorted_files, max_results) - else: - limited_files = sorted_files - - # Business logic: Determine search strategy used - search_strategy = self._determine_search_strategy(pattern, len(all_files), len(matched_files)) - - # Extract file paths for result - file_paths = [file_info.relative_path for file_info in limited_files] - - # Gather business metadata - metadata = self._gather_discovery_metadata(all_files, matched_files, limited_files, pattern) - - return FileDiscoveryResult( - files=file_paths, - total_count=len(matched_files), - pattern_used=pattern, - search_strategy=search_strategy, - metadata=metadata - ) - - def _determine_search_strategy(self, pattern: str, total_files: int, matched_files: int) -> str: - """ - Business logic to determine what search strategy was most effective. - - Args: - pattern: Search pattern used - total_files: Total files in index - matched_files: Number of files matched - - Returns: - String describing the search strategy - """ - is_glob_pattern = '*' in pattern or '?' in pattern - - if is_glob_pattern: - # Glob pattern strategy determination - if matched_files == 0: - strategy = "glob_pattern_no_matches" - elif matched_files < 10: - strategy = "glob_pattern_focused" - elif matched_files > total_files * 0.5: # More than 50% of files matched - strategy = "glob_pattern_very_broad" - else: - strategy = "glob_pattern_broad" - else: - # Exact filename strategy determination - if matched_files == 0: - strategy = "exact_filename_not_found" - elif matched_files == 1: - strategy = "exact_filename_found" - else: - strategy = "exact_filename_multiple_matches" - - return strategy - - def _get_project_metadata_from_index_manager(self) -> Dict[str, Any]: - """ - Get project metadata from unified index manager. - - Returns: - Dictionary with project metadata, or default values if not available - """ - if self.index_manager: - try: - status = self.index_manager.get_index_status() - if status and status.get('metadata'): - metadata = status['metadata'] - return { - 'project_root': metadata.get('project_root', self.base_path), - 'total_files': status.get('file_count', 0), - 'tool_version': metadata.get('tool_version', 'unified-manager'), - 'languages': [] # Languages info not available in current IndexMetadata - } - elif status: - # Fallback to status info - return { - 'project_root': self.base_path, - 'total_files': status.get('file_count', 0), - 'tool_version': 'unified-manager', - 'languages': [] - } - except (AttributeError, KeyError, TypeError): - pass # Fall through to default if metadata access fails - - # Fallback to default metadata if index manager not available - return { - 'project_root': self.base_path, - 'total_files': 0, - 'tool_version': 'unknown', - 'languages': [] - } - - def _gather_discovery_metadata(self, all_files, matched_files, limited_files, pattern: str) -> Dict[str, Any]: - """ - Gather business metadata about the discovery operation. - - Args: - all_files: All files in index - matched_files: Files that matched the pattern - limited_files: Final limited result set - pattern: Search pattern used - - Returns: - Dictionary with business metadata - """ - # Get project metadata from unified index manager - project_metadata = self._get_project_metadata_from_index_manager() - - # Calculate business metrics - match_ratio = len(matched_files) / len(all_files) if all_files else 0 - - # Analyze file types in results - file_languages = {} - for file_info in matched_files: - lang = file_info.language - file_languages[lang] = file_languages.get(lang, 0) + 1 - - # Analyze pattern characteristics - pattern_type = 'glob' if ('*' in pattern or '?' in pattern) else 'exact' - pattern_complexity = 'simple' if pattern.count('*') <= 1 else 'complex' - - return { - 'total_indexed_files': len(all_files), - 'total_matches': len(matched_files), - 'returned_results': len(limited_files), - 'match_ratio': round(match_ratio, 3), - 'languages_found': file_languages, - 'project_languages': project_metadata.get('languages', []), - 'search_efficiency': 'high' if match_ratio < 0.1 else 'medium' if match_ratio < 0.5 else 'low', - 'pattern_type': pattern_type, - 'pattern_complexity': pattern_complexity, - 'original_pattern': pattern - } - - def _format_discovery_result(self, discovery_result: FileDiscoveryResult) -> List[str]: - """ - Format the discovery result according to business requirements. - - Args: - discovery_result: Raw discovery result - - Returns: - Simple list of file paths - """ - return discovery_result.files diff --git a/src/code_index_mcp/services/file_watcher_service.py b/src/code_index_mcp/services/file_watcher_service.py index 7526fdd..c2ef64c 100644 --- a/src/code_index_mcp/services/file_watcher_service.py +++ b/src/code_index_mcp/services/file_watcher_service.py @@ -11,7 +11,7 @@ import os import traceback from threading import Timer -from typing import Optional, Callable +from typing import Optional, Callable, List from pathlib import Path try: @@ -311,7 +311,7 @@ class DebounceEventHandler(FileSystemEventHandler): """ def __init__(self, debounce_seconds: float, rebuild_callback: Callable, - base_path: Path, logger: logging.Logger): + base_path: Path, logger: logging.Logger, additional_excludes: Optional[List[str]] = None): """ Initialize the debounce event handler. @@ -320,7 +320,10 @@ def __init__(self, debounce_seconds: float, rebuild_callback: Callable, rebuild_callback: Function to call when rebuild is needed base_path: Base project path for filtering logger: Logger instance for debug messages + additional_excludes: Additional patterns to exclude """ + from ..utils import FileFilter + super().__init__() self.debounce_seconds = debounce_seconds self.rebuild_callback = rebuild_callback @@ -328,18 +331,8 @@ def __init__(self, debounce_seconds: float, rebuild_callback: Callable, self.debounce_timer: Optional[Timer] = None self.logger = logger - # Exclusion patterns for directories and files to ignore - self.exclude_patterns = { - '.git', '.svn', '.hg', - 'node_modules', '__pycache__', '.venv', 'venv', - '.DS_Store', 'Thumbs.db', - 'dist', 'build', 'target', '.idea', '.vscode', - '.pytest_cache', '.coverage', '.tox', - 'bin', 'obj' # Additional build directories - } - - # Convert supported extensions to set for faster lookup - self.supported_extensions = set(SUPPORTED_EXTENSIONS) + # Use centralized file filtering + self.file_filter = FileFilter(additional_excludes) def on_any_event(self, event: FileSystemEvent) -> None: """ @@ -360,7 +353,7 @@ def on_any_event(self, event: FileSystemEvent) -> None: def should_process_event(self, event: FileSystemEvent) -> bool: """ - Determine if event should trigger index rebuild. + Determine if event should trigger index rebuild using centralized filtering. Args: event: The file system event to evaluate @@ -381,139 +374,23 @@ def should_process_event(self, event: FileSystemEvent) -> bool: else: target_path = event.src_path - # Fast path exclusion - check if path is in excluded directory before any processing - if self._is_path_in_excluded_directory(target_path): - return False - - # Unified path checking + # Use centralized filtering logic try: path = Path(target_path) - return self._should_process_path(path) - except Exception: - return False - - def _should_process_path(self, path: Path) -> bool: - """ - Check if a specific path should trigger index rebuild. - - Args: - path: The file path to check - - Returns: - True if path should trigger rebuild, False otherwise - """ - # Skip excluded paths - if self.is_excluded_path(path): - return False - - # Only process supported file types - if not self.is_supported_file_type(path): - return False - - # Skip temporary files - if self.is_temporary_file(path): - return False - - return True - - def _is_path_in_excluded_directory(self, file_path: str) -> bool: - """ - Fast check if a file path is within an excluded directory. - - This method performs a quick string-based check to avoid expensive - Path operations for files in excluded directories like .venv. - - Args: - file_path: The file path to check + should_process = self.file_filter.should_process_path(path, self.base_path) - Returns: - True if the path is in an excluded directory, False otherwise - """ - try: - # Normalize path separators for cross-platform compatibility - normalized_path = file_path.replace('\\', '/') - base_path_normalized = str(self.base_path).replace('\\', '/') - - # Get relative path string - if not normalized_path.startswith(base_path_normalized): - return True # Path outside project - exclude it + # Skip temporary files using centralized logic + if not should_process or self.file_filter.is_temporary_file(path): + return False - relative_path = normalized_path[len(base_path_normalized):].lstrip('/') - - # Quick check: if any excluded pattern appears as a path component - path_parts = relative_path.split('/') - for part in path_parts: - if part in self.exclude_patterns: - return True - - return False - except Exception: - # If any error occurs, err on the side of exclusion - return True - - def is_excluded_path(self, path: Path) -> bool: - """ - Check if path should be excluded from monitoring. - - Args: - path: The file path to check - - Returns: - True if path should be excluded, False otherwise - """ - try: - relative_path = path.relative_to(self.base_path) - parts = relative_path.parts - - # Check if any part of the path matches exclusion patterns - return any(part in self.exclude_patterns for part in parts) - except ValueError: - # Path is not relative to base_path - exclude it return True except Exception: - # Handle any other path processing issues - return True - - def is_supported_file_type(self, path: Path) -> bool: - """ - Check if file type is supported for indexing. - - Args: - path: The file path to check - - Returns: - True if file type is supported, False otherwise - """ - return path.suffix.lower() in self.supported_extensions - - def is_temporary_file(self, path: Path) -> bool: - """ - Check if file is a temporary file. - - Args: - path: The file path to check + return False - Returns: - True if file appears to be temporary, False otherwise - """ - name = path.name.lower() - # Common temporary file patterns - temp_patterns = ['.tmp', '.swp', '.swo', '~', '.bak', '.orig'] - # Check for temporary file extensions - if any(name.endswith(pattern) for pattern in temp_patterns): - return True - # Check for vim/editor temporary files - if name.startswith('.') and (name.endswith('.swp') or name.endswith('.swo')): - return True - - # Check for backup files (e.g., file.py~, file.py.bak) - if '~' in name or '.bak' in name: - return True - return False def reset_debounce_timer(self) -> None: """Reset the debounce timer, canceling any existing timer.""" diff --git a/src/code_index_mcp/services/index_management_service.py b/src/code_index_mcp/services/index_management_service.py index 7c9f42b..f56c760 100644 --- a/src/code_index_mcp/services/index_management_service.py +++ b/src/code_index_mcp/services/index_management_service.py @@ -2,10 +2,12 @@ Index Management Service - Business logic for index lifecycle management. This service handles the business logic for index rebuilding, status monitoring, -and index-related operations. It composes technical tools to achieve business goals. +and index-related operations using the new JSON-based indexing system. """ import time import logging +import os +import json from typing import Dict, Any from dataclasses import dataclass @@ -13,8 +15,7 @@ logger = logging.getLogger(__name__) from .base_service import BaseService -from ..tools.scip import SCIPIndexTool -from ..tools.config import ProjectConfigTool +from ..indexing import get_index_manager, get_shallow_index_manager, DeepIndexManager @dataclass @@ -30,23 +31,24 @@ class IndexManagementService(BaseService): """ Business service for index lifecycle management. - This service orchestrates index management workflows by composing - technical tools to achieve business goals like rebuilding indexes, - monitoring index status, and managing index lifecycle. + This service orchestrates index management workflows using the new + JSON-based indexing system for optimal LLM performance. """ def __init__(self, ctx): super().__init__(ctx) - self._scip_tool = SCIPIndexTool() - self._config_tool = ProjectConfigTool() + # Deep manager (symbols/files, legacy JSON index manager) + self._index_manager = get_index_manager() + # Shallow manager (file-list only) for default workflows + self._shallow_manager = get_shallow_index_manager() + # Optional wrapper for explicit deep builds + self._deep_wrapper = DeepIndexManager() def rebuild_index(self) -> str: """ - Rebuild the project index using business logic. + Rebuild the project index (DEFAULT: shallow file list). - This is the main business method that orchestrates the index - rebuild workflow, ensuring proper validation, cleanup, and - state management. + For deep/symbol rebuilds, use build_deep_index() tool instead. Returns: Success message with rebuild information @@ -57,11 +59,17 @@ def rebuild_index(self) -> str: # Business validation self._validate_rebuild_request() - # Business workflow: Execute rebuild - result = self._execute_rebuild_workflow() + # Shallow rebuild only (fast path) + if not self._shallow_manager.set_project_path(self.base_path): + raise RuntimeError("Failed to set project path (shallow) in index manager") + if not self._shallow_manager.build_index(): + raise RuntimeError("Failed to rebuild shallow index") - # Business result formatting - return self._format_rebuild_result(result) + try: + count = len(self._shallow_manager.get_file_list()) + except Exception: + count = 0 + return f"Shallow index re-built with {count} files." def get_rebuild_status(self) -> Dict[str, Any]: """ @@ -78,29 +86,20 @@ def get_rebuild_status(self) -> Dict[str, Any]: 'is_rebuilding': False } - # Get index availability status - try to load existing index first - if not self._scip_tool.is_index_available(): - self._scip_tool.load_existing_index(self.base_path) - is_available = self._scip_tool.is_index_available() - - # Get basic status information - status = { - 'status': 'ready' if is_available else 'needs_rebuild', - 'index_available': is_available, - 'is_rebuilding': False, # We don't track background rebuilds in this simplified version - 'project_path': self.base_path + # Get index stats from the new JSON system + stats = self._index_manager.get_index_stats() + + return { + 'status': 'ready' if stats.get('status') == 'loaded' else 'needs_rebuild', + 'index_available': stats.get('status') == 'loaded', + 'is_rebuilding': False, + 'project_path': self.base_path, + 'file_count': stats.get('indexed_files', 0), + 'total_symbols': stats.get('total_symbols', 0), + 'symbol_types': stats.get('symbol_types', {}), + 'languages': stats.get('languages', []) } - # Add file count if index is available - if is_available: - try: - status['file_count'] = self._scip_tool.get_file_count() - status['metadata'] = self._scip_tool.get_project_metadata() - except Exception as e: - status['error'] = f"Failed to get index metadata: {e}" - - return status - def _validate_rebuild_request(self) -> None: """ Validate the index rebuild request according to business rules. @@ -118,20 +117,19 @@ def _execute_rebuild_workflow(self) -> IndexRebuildResult: Returns: IndexRebuildResult with rebuild data """ - start_time = time.time() - # Business step 1: Clear existing index state - self._clear_existing_index() - - # Business step 2: Rebuild index using technical tool - file_count = self._rebuild_index_data() + # Set project path in index manager + if not self._index_manager.set_project_path(self.base_path): + raise RuntimeError("Failed to set project path in index manager") - # Business step 3: Update system state - self._update_index_state(file_count) + # Rebuild the index + if not self._index_manager.refresh_index(): + raise RuntimeError("Failed to rebuild index") - # Business step 4: Save updated configuration - self._save_rebuild_metadata() + # Get stats for result + stats = self._index_manager.get_index_stats() + file_count = stats.get('indexed_files', 0) rebuild_time = time.time() - start_time @@ -142,105 +140,59 @@ def _execute_rebuild_workflow(self) -> IndexRebuildResult: message=f"Index rebuilt successfully with {file_count} files" ) - def _clear_existing_index(self) -> None: - """Business logic to clear existing index state.""" - # Clear unified index manager - self.helper.clear_index_cache() + def _format_rebuild_result(self, result: IndexRebuildResult) -> str: + """ + Format the rebuild result according to business requirements. + + Args: + result: Rebuild result data - # No logging + Returns: + Formatted result string for MCP response + """ + return f"Project re-indexed. Found {result.file_count} files." - def _rebuild_index_data(self) -> int: + def build_shallow_index(self) -> str: """ - Business logic to rebuild index data using technical tools. + Build and persist the shallow index (file list only). Returns: - Number of files indexed + Success message including file count if available. Raises: - RuntimeError: If rebuild fails + ValueError/RuntimeError on validation or build failure """ - try: - # Business logic: Manual rebuild through unified manager - if not self.index_manager: - raise RuntimeError("Index manager not available") - - # Force rebuild - success = self.index_manager.refresh_index(force=True) - if not success: - raise RuntimeError("Index rebuild failed") - - # Get file count from provider - provider = self.index_provider - if provider: - file_count = len(provider.get_file_list()) - - # Save the rebuilt index - if not self.index_manager.save_index(): - logger.warning("Manual rebuild: Index built but save failed") - - return file_count - else: - raise RuntimeError("No index provider available after rebuild") - - except Exception as e: - raise RuntimeError(f"Failed to rebuild index: {e}") from e - - def _update_index_state(self, file_count: int) -> None: - """Business logic to update system state after rebuild.""" - # No logging - - # Update context with new file count - self.helper.update_file_count(file_count) - - # No logging - - def _save_rebuild_metadata(self) -> None: - """Business logic to save SCIP index and metadata.""" - - try: - # Initialize config tool if needed - if not self._config_tool.get_project_path(): - self._config_tool.initialize_settings(self.base_path) - - # Get the SCIP index from the tool - scip_index = self._scip_tool.get_raw_index() - if scip_index is None: - raise RuntimeError("No SCIP index available to save") - - # Save the actual SCIP protobuf index - settings = self._config_tool._settings - settings.save_scip_index(scip_index) - # Also save legacy JSON metadata for compatibility - index_data = { - 'index_metadata': { - 'version': '4.0-scip', - 'source_format': 'scip', - 'last_rebuilt': time.time(), - 'rebuild_trigger': 'manual' - }, - 'project_metadata': self._scip_tool.get_project_metadata() - } - - # Save metadata (legacy format) - self._config_tool.save_index_data(index_data) - - # Update project configuration - config = self._config_tool.create_default_config(self.base_path) - config['last_indexed'] = time.time() - self._config_tool.save_project_config(config) + # Ensure project is set up + self._require_project_setup() - except Exception: - pass + # Initialize manager with current base path + if not self._shallow_manager.set_project_path(self.base_path): + raise RuntimeError("Failed to set project path in index manager") - def _format_rebuild_result(self, result: IndexRebuildResult) -> str: - """ - Format the rebuild result according to business requirements. + # Build shallow index + if not self._shallow_manager.build_index(): + raise RuntimeError("Failed to build shallow index") - Args: - result: Rebuild result data + # Try to report count + count = 0 + try: + shallow_path = getattr(self._shallow_manager, 'index_path', None) + if shallow_path and os.path.exists(shallow_path): + with open(shallow_path, 'r', encoding='utf-8') as f: + data = json.load(f) + if isinstance(data, list): + count = len(data) + except Exception as e: # noqa: BLE001 - safe fallback to zero + logger.debug(f"Unable to read shallow index count: {e}") + + return f"Shallow index built{f' with {count} files' if count else ''}." + + def rebuild_deep_index(self) -> str: + """Rebuild the deep index using the original workflow.""" + # Business validation + self._validate_rebuild_request() - Returns: - Formatted result string for MCP response - """ - return f"Project re-indexed. Found {result.file_count} files." + # Deep rebuild via existing workflow + result = self._execute_rebuild_workflow() + return self._format_rebuild_result(result) diff --git a/src/code_index_mcp/services/project_management_service.py b/src/code_index_mcp/services/project_management_service.py index ac3013b..c0f3a63 100644 --- a/src/code_index_mcp/services/project_management_service.py +++ b/src/code_index_mcp/services/project_management_service.py @@ -2,30 +2,20 @@ Project Management Service - Business logic for project lifecycle management. This service handles the business logic for project initialization, configuration, -and lifecycle management. It composes technical tools to achieve business goals. +and lifecycle management using the new JSON-based indexing system. """ -import json import logging from typing import Dict, Any from dataclasses import dataclass from contextlib import contextmanager from .base_service import BaseService -from ..tools.config import ProjectConfigTool from ..utils.response_formatter import ResponseFormatter from ..constants import SUPPORTED_EXTENSIONS -from ..indexing.unified_index_manager import UnifiedIndexManager +from ..indexing import get_index_manager, get_shallow_index_manager logger = logging.getLogger(__name__) -# Optional SCIP tools import -try: - from ..tools.scip import SCIPIndexTool - SCIP_AVAILABLE = True -except ImportError: - SCIPIndexTool = None - SCIP_AVAILABLE = False - @dataclass class ProjectInitializationResult: @@ -49,12 +39,16 @@ class ProjectManagementService(BaseService): def __init__(self, ctx): super().__init__(ctx) + # Deep index manager (legacy full index) + self._index_manager = get_index_manager() + # Shallow index manager (default for initialization) + self._shallow_manager = get_shallow_index_manager() + from ..tools.config import ProjectConfigTool self._config_tool = ProjectConfigTool() - self._scip_tool = SCIPIndexTool() if SCIP_AVAILABLE else None # Import FileWatcherTool locally to avoid circular import from ..tools.monitoring import FileWatcherTool self._watcher_tool = FileWatcherTool(ctx) - + @contextmanager def _noop_operation(self, *_args, **_kwargs): @@ -111,22 +105,25 @@ def _execute_initialization_workflow(self, path: str) -> ProjectInitializationRe Returns: ProjectInitializationResult with initialization data """ + # Business step 1: Initialize config tool + self._config_tool.initialize_settings(path) + # Normalize path for consistent processing normalized_path = self._config_tool.normalize_project_path(path) - # Business step 1: Cleanup existing project state + # Business step 2: Cleanup existing project state self._cleanup_existing_project() - # Business step 2: Initialize project configuration - self._initialize_project_configuration(normalized_path) + # Business step 3: Initialize shallow index by default (fast path) + index_result = self._initialize_shallow_index_manager(normalized_path) - # Business step 3: Initialize unified index manager - index_result = self._initialize_index_manager(normalized_path) + # Business step 3.1: Store index manager in context for other services + self.helper.update_index_manager(self._index_manager) # Business step 4: Setup file monitoring monitoring_result = self._setup_file_monitoring(normalized_path) - # Business step 5: Update system state + # Business step 4: Update system state self._update_project_state(normalized_path, index_result['file_count']) # Business step 6: Get search capabilities info @@ -150,25 +147,48 @@ def _cleanup_existing_project(self) -> None: # Clear existing index cache self.helper.clear_index_cache() - # Clear SCIP tool state - self._scip_tool.clear_index() + # Clear any existing index state + pass - def _initialize_project_configuration(self, project_path: str) -> None: - """Business logic to initialize project configuration.""" - with self._noop_operation(): + def _initialize_json_index_manager(self, project_path: str) -> Dict[str, Any]: + """ + Business logic to initialize JSON index manager. - # Initialize settings using config tool - settings = self._config_tool.initialize_settings(project_path) + Args: + project_path: Project path - # Update context with new settings - self.helper.update_settings(settings) - self.helper.update_base_path(project_path) + Returns: + Dictionary with initialization results + """ + # Set project path in index manager + if not self._index_manager.set_project_path(project_path): + raise RuntimeError(f"Failed to set project path: {project_path}") + + # Update context + self.helper.update_base_path(project_path) + + # Try to load existing index or build new one + if self._index_manager.load_index(): + source = "loaded_existing" + else: + if not self._index_manager.build_index(): + raise RuntimeError("Failed to build index") + source = "built_new" - self._config_tool.get_settings_path() + # Get stats + stats = self._index_manager.get_index_stats() + file_count = stats.get('indexed_files', 0) + + return { + 'file_count': file_count, + 'source': source, + 'total_symbols': stats.get('total_symbols', 0), + 'languages': stats.get('languages', []) + } - def _initialize_index_manager(self, project_path: str) -> Dict[str, Any]: + def _initialize_shallow_index_manager(self, project_path: str) -> Dict[str, Any]: """ - Business logic to initialize unified index manager. + Business logic to initialize the shallow index manager by default. Args: project_path: Project path @@ -176,28 +196,35 @@ def _initialize_index_manager(self, project_path: str) -> Dict[str, Any]: Returns: Dictionary with initialization results """ - with self._noop_operation(): - # Create unified index manager - index_manager = UnifiedIndexManager(project_path, self.helper.settings) - - # Store in context - self.helper.update_index_manager(index_manager) - - # Initialize the manager (this will load existing or build new) - if index_manager.initialize(): - provider = index_manager.get_provider() - if provider: - file_count = len(provider.get_file_list()) - return { - 'file_count': file_count, - 'source': 'unified_manager' - } - - # Fallback if initialization fails - return { - 'file_count': 0, - 'source': 'failed' - } + # Set project path in shallow manager + if not self._shallow_manager.set_project_path(project_path): + raise RuntimeError(f"Failed to set project path (shallow): {project_path}") + + # Update context + self.helper.update_base_path(project_path) + + # Try to load existing shallow index or build new one + if self._shallow_manager.load_index(): + source = "loaded_existing" + else: + if not self._shallow_manager.build_index(): + raise RuntimeError("Failed to build shallow index") + source = "built_new" + + # Determine file count from shallow list + try: + files = self._shallow_manager.get_file_list() + file_count = len(files) + except Exception: # noqa: BLE001 - safe fallback + file_count = 0 + + return { + 'file_count': file_count, + 'source': source, + 'total_symbols': 0, + 'languages': [] + } + def _is_valid_existing_index(self, index_data: Dict[str, Any]) -> bool: """ @@ -230,7 +257,7 @@ def _load_existing_index(self, index_data: Dict[str, Any]) -> Dict[str, Any]: Returns: Dictionary with loading results """ - + # Note: Legacy index loading is now handled by UnifiedIndexManager # This method is kept for backward compatibility but functionality moved @@ -238,60 +265,13 @@ def _load_existing_index(self, index_data: Dict[str, Any]) -> Dict[str, Any]: # Extract file count from metadata file_count = index_data.get('project_metadata', {}).get('total_files', 0) - + return { 'file_count': file_count, 'source': 'loaded_existing' } - def _build_new_index(self, project_path: str) -> Dict[str, Any]: - """ - Business logic to build new project index. - - Args: - project_path: Project path to index - - Returns: - Dictionary with build results - """ - - - try: - # Use SCIP tool to build index - file_count = self._scip_tool.build_index(project_path) - - # Save the new index using config tool - # Note: This is a simplified approach - in a full implementation, - # we would need to convert SCIP data to the expected format - index_data = { - 'index_metadata': { - 'version': '4.0-scip', - 'source_format': 'scip', - 'created_at': __import__('time').time() - }, - 'project_metadata': { - 'project_root': project_path, - 'total_files': file_count, - 'tool_version': 'scip-builder' - } - } - - self._config_tool.save_index_data(index_data) - - # Save project configuration - config = self._config_tool.create_default_config(project_path) - self._config_tool.save_project_config(config) - - # No logging - - return { - 'file_count': file_count, - 'source': 'built_new' - } - - except Exception as e: - raise ValueError(f"Failed to build project index: {e}") from e def _setup_file_monitoring(self, project_path: str) -> str: """ @@ -303,25 +283,31 @@ def _setup_file_monitoring(self, project_path: str) -> str: Returns: String describing monitoring setup result """ - + try: - # Create rebuild callback that uses our SCIP tool + # Create rebuild callback that uses the JSON index manager def rebuild_callback(): logger.info("File watcher triggered rebuild callback") try: - logger.debug(f"Starting index rebuild for: {project_path}") - # Business logic: File changed, rebuild through unified manager - if self.helper.index_manager: - success = self.helper.index_manager.refresh_index(force=True) - if success: - provider = self.helper.index_manager.get_provider() - file_count = len(provider.get_file_list()) if provider else 0 - logger.info(f"File watcher rebuild completed successfully - indexed {file_count} files") + logger.debug(f"Starting shallow index rebuild for: {project_path}") + # Business logic: File changed, rebuild using SHALLOW index manager + try: + if not self._shallow_manager.set_project_path(project_path): + logger.warning("Shallow manager set_project_path failed") + return False + if self._shallow_manager.build_index(): + files = self._shallow_manager.get_file_list() + logger.info(f"File watcher shallow rebuild completed successfully - files {len(files)}") return True - - logger.warning("File watcher rebuild failed - no index manager available") - return False + else: + logger.warning("File watcher shallow rebuild failed") + return False + except Exception as e: + import traceback + logger.error(f"File watcher shallow rebuild failed: {e}") + logger.error(f"Traceback: {traceback.format_exc()}") + return False except Exception as e: import traceback logger.error(f"File watcher rebuild failed: {e}") @@ -347,7 +333,7 @@ def rebuild_callback(): def _update_project_state(self, project_path: str, file_count: int) -> None: """Business logic to update system state after project initialization.""" - + # Update context with file count self.helper.update_file_count(file_count) @@ -422,39 +408,4 @@ def get_project_config(self) -> str: return ResponseFormatter.config_response(config_data) - def get_project_structure(self) -> str: - """ - Get the project directory structure for MCP resource. - - Returns: - JSON formatted project structure - """ - - # Check if project is configured - if not self.helper.base_path: - structure_data = { - "status": "not_configured", - "message": ("Project path not set. Please use set_project_path " - "to set a project directory first.") - } - return json.dumps(structure_data, indent=2) - - # Check if we have index cache with directory tree - if (hasattr(self.ctx.request_context.lifespan_context, 'index_cache') and - self.ctx.request_context.lifespan_context.index_cache and - 'directory_tree' in self.ctx.request_context.lifespan_context.index_cache): - - directory_tree = self.ctx.request_context.lifespan_context.index_cache['directory_tree'] - return json.dumps(directory_tree, indent=2) - - # If no directory tree available, try to build basic structure - try: - # Use config tool to get basic project structure - basic_structure = self._config_tool.get_basic_project_structure(self.helper.base_path) - return json.dumps(basic_structure, indent=2) - except Exception as e: - error_data = { - "error": f"Unable to get project structure: {e}", - "status": "error" - } - return json.dumps(error_data, indent=2) + # Removed: get_project_structure; the project structure resource is deprecated diff --git a/src/code_index_mcp/services/search_service.py b/src/code_index_mcp/services/search_service.py index 7daa3c9..a2c2799 100644 --- a/src/code_index_mcp/services/search_service.py +++ b/src/code_index_mcp/services/search_service.py @@ -5,24 +5,20 @@ and search strategy selection. """ -from typing import Dict, Any, Optional +from pathlib import Path +from typing import Any, Dict, List, Optional from .base_service import BaseService -from ..utils import ValidationHelper, ResponseFormatter +from ..utils import FileFilter, ResponseFormatter, ValidationHelper from ..search.base import is_safe_regex_pattern class SearchService(BaseService): - """ - Service for managing code search operations. - - This service handles: - - Code search with various parameters and options - - Search tool management and detection - - Search strategy selection and optimization - - Search capabilities reporting - """ + """Service for managing code search operations.""" + def __init__(self, ctx): + super().__init__(ctx) + self.file_filter = self._create_file_filter() def search_code( # pylint: disable=too-many-arguments self, @@ -31,47 +27,24 @@ def search_code( # pylint: disable=too-many-arguments context_lines: int = 0, file_pattern: Optional[str] = None, fuzzy: bool = False, - regex: Optional[bool] = None + regex: Optional[bool] = None, + max_line_length: Optional[int] = None ) -> Dict[str, Any]: - """ - Search for code patterns in the project. - - Handles the logic for search_code_advanced MCP tool. - - Args: - pattern: The search pattern - case_sensitive: Whether search should be case-sensitive - context_lines: Number of context lines to show - file_pattern: Glob pattern to filter files - fuzzy: Whether to enable fuzzy matching - regex: Regex mode - True/False to force, None for auto-detection - - Returns: - Dictionary with search results or error information - - Raises: - ValueError: If project is not set up or search parameters are invalid - """ + """Search for code patterns in the project.""" self._require_project_setup() - # Smart regex detection if regex parameter is None if regex is None: regex = is_safe_regex_pattern(pattern) - if regex: - pass - # Validate search pattern error = ValidationHelper.validate_search_pattern(pattern, regex) if error: raise ValueError(error) - # Validate file pattern if provided if file_pattern: error = ValidationHelper.validate_glob_pattern(file_pattern) if error: raise ValueError(f"Invalid file pattern: {error}") - # Get search strategy from settings if not self.settings: raise ValueError("Settings not available") @@ -79,7 +52,7 @@ def search_code( # pylint: disable=too-many-arguments if not strategy: raise ValueError("No search strategies available") - + self._configure_strategy(strategy) try: results = strategy.search( @@ -89,25 +62,16 @@ def search_code( # pylint: disable=too-many-arguments context_lines=context_lines, file_pattern=file_pattern, fuzzy=fuzzy, - regex=regex + regex=regex, + max_line_length=max_line_length ) - return ResponseFormatter.search_results_response(results) - except Exception as e: - raise ValueError(f"Search failed using '{strategy.name}': {e}") from e - + filtered = self._filter_results(results) + return ResponseFormatter.search_results_response(filtered) + except Exception as exc: + raise ValueError(f"Search failed using '{strategy.name}': {exc}") from exc def refresh_search_tools(self) -> str: - """ - Refresh the available search tools. - - Handles the logic for refresh_search_tools MCP tool. - - Returns: - Success message with available tools information - - Raises: - ValueError: If refresh operation fails - """ + """Refresh the available search tools.""" if not self.settings: raise ValueError("Settings not available") @@ -118,14 +82,8 @@ def refresh_search_tools(self) -> str: preferred = config['preferred_tool'] return f"Search tools refreshed. Available: {available}. Preferred: {preferred}." - def get_search_capabilities(self) -> Dict[str, Any]: - """ - Get information about search capabilities and available tools. - - Returns: - Dictionary with search tool information and capabilities - """ + """Get information about search capabilities and available tools.""" if not self.settings: return {"error": "Settings not available"} @@ -142,3 +100,73 @@ def get_search_capabilities(self) -> Dict[str, Any]: } return capabilities + + def _configure_strategy(self, strategy) -> None: + """Apply shared exclusion configuration to the strategy if supported.""" + configure = getattr(strategy, 'configure_excludes', None) + if not configure: + return + + try: + configure(self.file_filter) + except Exception: # pragma: no cover - defensive fallback + pass + + def _create_file_filter(self) -> FileFilter: + """Build a shared file filter drawing from project settings.""" + additional_dirs: List[str] = [] + additional_file_patterns: List[str] = [] + + settings = self.settings + if settings: + try: + config = settings.get_file_watcher_config() + except Exception: # pragma: no cover - fallback if config fails + config = {} + + for key in ('exclude_patterns', 'additional_exclude_patterns'): + patterns = config.get(key) or [] + for pattern in patterns: + if not isinstance(pattern, str): + continue + normalized = pattern.strip() + if not normalized: + continue + additional_dirs.append(normalized) + additional_file_patterns.append(normalized) + + file_filter = FileFilter(additional_dirs or None) + + if additional_file_patterns: + file_filter.exclude_files.update(additional_file_patterns) + + return file_filter + + def _filter_results(self, results: Dict[str, Any]) -> Dict[str, Any]: + """Filter out matches that reside under excluded paths.""" + if not isinstance(results, dict) or not results: + return results + + if 'error' in results or not self.file_filter or not self.base_path: + return results + + base_path = Path(self.base_path) + filtered: Dict[str, Any] = {} + + for rel_path, matches in results.items(): + if not isinstance(rel_path, str): + continue + + normalized = Path(rel_path.replace('\\', '/')) + try: + absolute = (base_path / normalized).resolve() + except Exception: # pragma: no cover - invalid path safety + continue + + try: + if self.file_filter.should_process_path(absolute, base_path): + filtered[rel_path] = matches + except Exception: # pragma: no cover - defensive fallback + continue + + return filtered diff --git a/src/code_index_mcp/services/settings_service.py b/src/code_index_mcp/services/settings_service.py index 74b21ff..bd641c4 100644 --- a/src/code_index_mcp/services/settings_service.py +++ b/src/code_index_mcp/services/settings_service.py @@ -13,6 +13,7 @@ from ..utils import ResponseFormatter from ..constants import SETTINGS_DIR from ..project_settings import ProjectSettings +from ..indexing import get_index_manager def manage_temp_directory(action: str) -> Dict[str, Any]: @@ -34,7 +35,12 @@ def manage_temp_directory(action: str) -> Dict[str, Any]: if action not in ['create', 'check']: raise ValueError(f"Invalid action: {action}. Must be 'create' or 'check'") - temp_dir = os.path.join(tempfile.gettempdir(), SETTINGS_DIR) + # Try to get the actual temp directory from index manager, fallback to default + try: + index_manager = get_index_manager() + temp_dir = index_manager.temp_dir if index_manager.temp_dir else os.path.join(tempfile.gettempdir(), SETTINGS_DIR) + except: + temp_dir = os.path.join(tempfile.gettempdir(), SETTINGS_DIR) if action == 'create': existed_before = os.path.exists(temp_dir) @@ -118,13 +124,17 @@ def get_settings_info(self) -> Dict[str, Any]: Dictionary with settings directory, config, stats, and status information """ temp_dir = os.path.join(tempfile.gettempdir(), SETTINGS_DIR) + + # Get the actual index directory from the index manager + index_manager = get_index_manager() + actual_temp_dir = index_manager.temp_dir if index_manager.temp_dir else temp_dir # Check if base_path is set if not self.base_path: return ResponseFormatter.settings_info_response( settings_directory="", - temp_directory=temp_dir, - temp_directory_exists=os.path.exists(temp_dir), + temp_directory=actual_temp_dir, + temp_directory_exists=os.path.exists(actual_temp_dir), config={}, stats={}, exists=False, @@ -136,13 +146,13 @@ def get_settings_info(self) -> Dict[str, Any]: # Get config and stats config = self.settings.load_config() if self.settings else {} stats = self.settings.get_stats() if self.settings else {} - settings_directory = self.settings.settings_path if self.settings else "" + settings_directory = actual_temp_dir exists = os.path.exists(settings_directory) if settings_directory else False return ResponseFormatter.settings_info_response( settings_directory=settings_directory, - temp_directory=temp_dir, - temp_directory_exists=os.path.exists(temp_dir), + temp_directory=actual_temp_dir, + temp_directory_exists=os.path.exists(actual_temp_dir), config=config, stats=stats, exists=exists diff --git a/src/code_index_mcp/tools/__init__.py b/src/code_index_mcp/tools/__init__.py index 7242df9..f69d664 100644 --- a/src/code_index_mcp/tools/__init__.py +++ b/src/code_index_mcp/tools/__init__.py @@ -6,14 +6,11 @@ business layer to achieve business goals. """ -from .scip import SCIPIndexTool, SCIPSymbolAnalyzer from .filesystem import FileMatchingTool, FileSystemTool from .config import ProjectConfigTool, SettingsTool from .monitoring import FileWatcherTool __all__ = [ - 'SCIPIndexTool', - 'SCIPSymbolAnalyzer', 'FileMatchingTool', 'FileSystemTool', 'ProjectConfigTool', diff --git a/src/code_index_mcp/tools/config/project_config_tool.py b/src/code_index_mcp/tools/config/project_config_tool.py index 812dd93..c2738dd 100644 --- a/src/code_index_mcp/tools/config/project_config_tool.py +++ b/src/code_index_mcp/tools/config/project_config_tool.py @@ -9,7 +9,6 @@ from pathlib import Path from ...project_settings import ProjectSettings -from ...constants import SUPPORTED_EXTENSIONS class ProjectConfigTool: @@ -96,12 +95,12 @@ def save_index_data(self, index_data: Dict[str, Any]) -> None: self._settings.save_index(index_data) - def check_index_version(self) -> Optional[str]: + def check_index_version(self) -> bool: """ - Check the version of existing index. + Check if JSON index is the latest version. Returns: - Version string or None if no index exists + True if JSON index exists and is recent, False if needs rebuild Raises: RuntimeError: If settings not initialized @@ -109,14 +108,21 @@ def check_index_version(self) -> Optional[str]: if not self._settings: raise RuntimeError("Settings not initialized") - return self._settings.detect_index_version() - - def migrate_legacy_index(self) -> bool: + # Check if JSON index exists and is fresh + from ...indexing import get_index_manager + index_manager = get_index_manager() + + # Set project path if available + if self._settings.base_path: + index_manager.set_project_path(self._settings.base_path) + stats = index_manager.get_index_stats() + return stats.get('status') == 'loaded' + + return False + + def cleanup_legacy_files(self) -> None: """ - Migrate legacy index format if needed. - - Returns: - True if migration successful or not needed, False if manual rebuild required + Clean up legacy index files. Raises: RuntimeError: If settings not initialized @@ -124,7 +130,7 @@ def migrate_legacy_index(self) -> bool: if not self._settings: raise RuntimeError("Settings not initialized") - return self._settings.migrate_legacy_index() + self._settings.cleanup_legacy_files() def get_search_tool_info(self) -> Dict[str, Any]: """ @@ -171,9 +177,12 @@ def create_default_config(self, project_path: str) -> Dict[str, Any]: Returns: Default configuration dictionary """ + from ...utils import FileFilter + + file_filter = FileFilter() return { "base_path": project_path, - "supported_extensions": SUPPORTED_EXTENSIONS, + "supported_extensions": list(file_filter.supported_extensions), "last_indexed": None, "file_watcher": self.get_file_watcher_config() if self._settings else {} } @@ -246,8 +255,12 @@ def get_basic_project_structure(self, project_path: str) -> Dict[str, Any]: Returns: Basic directory structure dictionary """ + from ...utils import FileFilter + + file_filter = FileFilter() + def build_tree(path: str, max_depth: int = 3, current_depth: int = 0) -> Dict[str, Any]: - """Build directory tree with limited depth.""" + """Build directory tree with limited depth using centralized filtering.""" if current_depth >= max_depth: return {"type": "directory", "truncated": True} @@ -255,24 +268,18 @@ def build_tree(path: str, max_depth: int = 3, current_depth: int = 0) -> Dict[st items = [] path_obj = Path(path) - # Skip hidden directories and common ignore patterns - skip_patterns = {'.git', '.svn', '__pycache__', 'node_modules', '.vscode', '.idea'} - for item in sorted(path_obj.iterdir()): - if item.name.startswith('.') and item.name not in {'.gitignore', '.env'}: - continue - if item.name in skip_patterns: - continue - if item.is_dir(): - items.append({ - "name": item.name, - "type": "directory", - "children": build_tree(str(item), max_depth, current_depth + 1) - }) + # Use centralized directory filtering + if not file_filter.should_exclude_directory(item.name): + items.append({ + "name": item.name, + "type": "directory", + "children": build_tree(str(item), max_depth, current_depth + 1) + }) else: - # Only include supported file types - if item.suffix.lower() in SUPPORTED_EXTENSIONS: + # Use centralized file filtering + if not file_filter.should_exclude_file(item): items.append({ "name": item.name, "type": "file", diff --git a/src/code_index_mcp/tools/filesystem/file_matching_tool.py b/src/code_index_mcp/tools/filesystem/file_matching_tool.py index 8e66b92..22ebdf6 100644 --- a/src/code_index_mcp/tools/filesystem/file_matching_tool.py +++ b/src/code_index_mcp/tools/filesystem/file_matching_tool.py @@ -9,7 +9,14 @@ from typing import List, Set from pathlib import Path -from ..scip.scip_index_tool import FileInfo +# FileInfo defined locally for file matching operations +from dataclasses import dataclass + +@dataclass +class FileInfo: + """File information structure.""" + relative_path: str + language: str class FileMatchingTool: diff --git a/src/code_index_mcp/tools/scip/__init__.py b/src/code_index_mcp/tools/scip/__init__.py deleted file mode 100644 index d2e86d3..0000000 --- a/src/code_index_mcp/tools/scip/__init__.py +++ /dev/null @@ -1,8 +0,0 @@ -""" -SCIP Tools - Technical components for SCIP operations. -""" - -from .scip_index_tool import SCIPIndexTool -from .scip_symbol_analyzer import SCIPSymbolAnalyzer - -__all__ = ['SCIPIndexTool', 'SCIPSymbolAnalyzer'] diff --git a/src/code_index_mcp/tools/scip/relationship_info.py b/src/code_index_mcp/tools/scip/relationship_info.py deleted file mode 100644 index 8076f4a..0000000 --- a/src/code_index_mcp/tools/scip/relationship_info.py +++ /dev/null @@ -1,358 +0,0 @@ -""" -Relationship Information - New unified relationship data structures - -This module defines the new relationship data structures for enhanced -symbol relationship analysis with complete SCIP standard support. -""" - -from dataclasses import dataclass, field -from typing import Dict, List, Optional, Any -from enum import Enum - - -class RelationshipType(Enum): - """Unified relationship types for all programming languages""" - - # Function relationships - FUNCTION_CALL = "function_call" - METHOD_CALL = "method_call" - - # Type relationships - INHERITANCE = "inheritance" - INTERFACE_IMPLEMENTATION = "interface_implementation" - TYPE_REFERENCE = "type_reference" - - # Variable relationships - VARIABLE_REFERENCE = "variable_reference" - VARIABLE_ASSIGNMENT = "variable_assignment" - - # Module relationships - MODULE_IMPORT = "module_import" - MODULE_EXPORT = "module_export" - - # Generic relationships (fallback) - REFERENCE = "reference" - DEFINITION = "definition" - - -@dataclass -class RelationshipInfo: - """Complete information about a single relationship""" - - target: str # Target symbol name - target_symbol_id: str # Complete SCIP symbol ID - line: int # Line where relationship occurs - column: int # Column where relationship occurs - relationship_type: RelationshipType # Type of relationship - source: Optional[str] = None # Source symbol name (for reverse relationships) - source_symbol_id: Optional[str] = None # Source symbol ID (for reverse relationships) - - def to_dict(self) -> Dict[str, Any]: - """Convert to dictionary format for JSON output""" - result = { - "target": self.target, - "target_symbol_id": self.target_symbol_id, - "line": self.line, - "column": self.column, - "relationship_type": self.relationship_type.value - } - - if self.source: - result["source"] = self.source - if self.source_symbol_id: - result["source_symbol_id"] = self.source_symbol_id - - return result - - -@dataclass -class SymbolRelationships: - """Container for all relationships of a symbol""" - - # Active relationships (this symbol to others) - calls: List[RelationshipInfo] = field(default_factory=list) - inherits_from: List[RelationshipInfo] = field(default_factory=list) - implements: List[RelationshipInfo] = field(default_factory=list) - references: List[RelationshipInfo] = field(default_factory=list) - - # Passive relationships (others to this symbol) - called_by: List[RelationshipInfo] = field(default_factory=list) - inherited_by: List[RelationshipInfo] = field(default_factory=list) - implemented_by: List[RelationshipInfo] = field(default_factory=list) - referenced_by: List[RelationshipInfo] = field(default_factory=list) - - def add_relationship(self, relationship: RelationshipInfo, is_reverse: bool = False): - """Add a relationship to the appropriate category""" - rel_type = relationship.relationship_type - - if is_reverse: - # This is a reverse relationship (others -> this symbol) - if rel_type in [RelationshipType.FUNCTION_CALL, RelationshipType.METHOD_CALL]: - self.called_by.append(relationship) - elif rel_type == RelationshipType.INHERITANCE: - self.inherited_by.append(relationship) - elif rel_type == RelationshipType.INTERFACE_IMPLEMENTATION: - self.implemented_by.append(relationship) - else: - self.referenced_by.append(relationship) - else: - # This is a forward relationship (this symbol -> others) - if rel_type in [RelationshipType.FUNCTION_CALL, RelationshipType.METHOD_CALL]: - self.calls.append(relationship) - elif rel_type == RelationshipType.INHERITANCE: - self.inherits_from.append(relationship) - elif rel_type == RelationshipType.INTERFACE_IMPLEMENTATION: - self.implements.append(relationship) - else: - self.references.append(relationship) - - def get_total_count(self) -> int: - """Get total number of relationships""" - return (len(self.calls) + len(self.called_by) + - len(self.inherits_from) + len(self.inherited_by) + - len(self.implements) + len(self.implemented_by) + - len(self.references) + len(self.referenced_by)) - - def to_dict(self) -> Dict[str, List[Dict[str, Any]]]: - """Convert to dictionary format for JSON output""" - result = {} - - # Only include non-empty relationship categories - if self.calls: - result["calls"] = [rel.to_dict() for rel in self.calls] - if self.called_by: - result["called_by"] = [rel.to_dict() for rel in self.called_by] - if self.inherits_from: - result["inherits_from"] = [rel.to_dict() for rel in self.inherits_from] - if self.inherited_by: - result["inherited_by"] = [rel.to_dict() for rel in self.inherited_by] - if self.implements: - result["implements"] = [rel.to_dict() for rel in self.implements] - if self.implemented_by: - result["implemented_by"] = [rel.to_dict() for rel in self.implemented_by] - if self.references: - result["references"] = [rel.to_dict() for rel in self.references] - if self.referenced_by: - result["referenced_by"] = [rel.to_dict() for rel in self.referenced_by] - - return result - - -@dataclass -class RelationshipsSummary: - """Summary statistics for all relationships in a file""" - - total_relationships: int - by_type: Dict[str, int] - cross_file_relationships: int - - def to_dict(self) -> Dict[str, Any]: - """Convert to dictionary format for JSON output""" - return { - "total_relationships": self.total_relationships, - "by_type": self.by_type, - "cross_file_relationships": self.cross_file_relationships - } - - -class SCIPRelationshipReader: - """Reads and parses relationships from SCIP index""" - - def __init__(self): - """Initialize the relationship reader""" - pass - - def extract_relationships_from_document(self, document) -> Dict[str, SymbolRelationships]: - """ - Extract all relationships from a SCIP document - - Args: - document: SCIP document containing symbols and relationships - - Returns: - Dictionary mapping symbol_id -> SymbolRelationships - """ - all_relationships = {} - - # Process each symbol in the document - for symbol_info in document.symbols: - symbol_id = symbol_info.symbol - symbol_name = symbol_info.display_name - - if not symbol_info.relationships: - continue - - # Create relationships container for this symbol - symbol_rels = SymbolRelationships() - - # Process each relationship - for scip_relationship in symbol_info.relationships: - rel_info = self._parse_scip_relationship( - scip_relationship, symbol_name, symbol_id, document - ) - if rel_info: - symbol_rels.add_relationship(rel_info) - - if symbol_rels.get_total_count() > 0: - all_relationships[symbol_id] = symbol_rels - - # Build reverse relationships - self._build_reverse_relationships(all_relationships, document) - - return all_relationships - - def _parse_scip_relationship(self, scip_relationship, source_name: str, - source_symbol_id: str, document) -> Optional[RelationshipInfo]: - """ - Parse a single SCIP relationship into RelationshipInfo - - Args: - scip_relationship: SCIP Relationship object - source_name: Name of the source symbol - source_symbol_id: SCIP ID of the source symbol - document: SCIP document for context - - Returns: - RelationshipInfo object or None if parsing fails - """ - target_symbol_id = scip_relationship.symbol - - # Extract target symbol name from symbol ID - target_name = self._extract_symbol_name(target_symbol_id) - - # Determine relationship type from SCIP flags - rel_type = self._determine_relationship_type(scip_relationship, target_symbol_id) - - # Find the location where this relationship occurs - line, column = self._find_relationship_location( - source_symbol_id, target_symbol_id, document - ) - - return RelationshipInfo( - target=target_name, - target_symbol_id=target_symbol_id, - line=line, - column=column, - relationship_type=rel_type - ) - - def _determine_relationship_type(self, scip_relationship, target_symbol_id: str) -> RelationshipType: - """Determine the relationship type from SCIP flags and symbol ID""" - - # Check SCIP relationship flags - if scip_relationship.is_implementation: - return RelationshipType.INTERFACE_IMPLEMENTATION - elif scip_relationship.is_type_definition: - return RelationshipType.TYPE_REFERENCE - elif scip_relationship.is_definition: - return RelationshipType.DEFINITION - elif scip_relationship.is_reference: - # Need to determine if it's inheritance, call, or reference - if target_symbol_id.endswith("#"): - # Class symbol - could be inheritance or type reference - return RelationshipType.INHERITANCE # Assume inheritance for now - elif target_symbol_id.endswith("()."): - # Function symbol - function call - return RelationshipType.FUNCTION_CALL - else: - # Generic reference - return RelationshipType.REFERENCE - else: - # Fallback - return RelationshipType.REFERENCE - - def _extract_symbol_name(self, symbol_id: str) -> str: - """Extract the symbol name from SCIP symbol ID""" - try: - # SCIP symbol format: scip- / - if "/" in symbol_id: - symbol_part = symbol_id.split("/")[-1] - # Remove descriptor suffix (like #, ()., etc.) - if symbol_part.endswith("#"): - return symbol_part[:-1] - elif symbol_part.endswith("()."): - return symbol_part[:-3] - else: - return symbol_part - return symbol_id - except: - return symbol_id - - def _find_relationship_location(self, source_symbol_id: str, target_symbol_id: str, - document) -> tuple[int, int]: - """Find the line and column where the relationship occurs""" - - # Look for occurrences that reference the target symbol - for occurrence in document.occurrences: - if occurrence.symbol == target_symbol_id: - if hasattr(occurrence, 'range') and occurrence.range: - start = occurrence.range.start - if len(start) >= 2: - return start[0] + 1, start[1] + 1 # Convert to 1-based indexing - - # Fallback: look for the source symbol definition - for occurrence in document.occurrences: - if occurrence.symbol == source_symbol_id: - if hasattr(occurrence, 'range') and occurrence.range: - start = occurrence.range.start - if len(start) >= 2: - return start[0] + 1, start[1] + 1 # Convert to 1-based indexing - - # Default fallback - return 0, 0 - - def _build_reverse_relationships(self, all_relationships: Dict[str, SymbolRelationships], - document): - """Build reverse relationships (called_by, inherited_by, etc.)""" - - # Create a mapping of all symbols for reverse lookup - symbol_names = {} - for symbol_info in document.symbols: - symbol_names[symbol_info.symbol] = symbol_info.display_name - - # Build reverse relationships (iterate over a copy to avoid modification during iteration) - for source_symbol_id, source_rels in list(all_relationships.items()): - source_name = symbol_names.get(source_symbol_id, "unknown") - - # Process each forward relationship to create reverse relationships - for rel in source_rels.calls: - self._add_reverse_relationship( - all_relationships, rel.target_symbol_id, rel, source_name, source_symbol_id - ) - - for rel in source_rels.inherits_from: - self._add_reverse_relationship( - all_relationships, rel.target_symbol_id, rel, source_name, source_symbol_id - ) - - for rel in source_rels.implements: - self._add_reverse_relationship( - all_relationships, rel.target_symbol_id, rel, source_name, source_symbol_id - ) - - for rel in source_rels.references: - self._add_reverse_relationship( - all_relationships, rel.target_symbol_id, rel, source_name, source_symbol_id - ) - - def _add_reverse_relationship(self, all_relationships: Dict[str, SymbolRelationships], - target_symbol_id: str, original_rel: RelationshipInfo, - source_name: str, source_symbol_id: str): - """Add a reverse relationship to the target symbol""" - - if target_symbol_id not in all_relationships: - all_relationships[target_symbol_id] = SymbolRelationships() - - # Create reverse relationship - reverse_rel = RelationshipInfo( - target=source_name, - target_symbol_id=source_symbol_id, - line=original_rel.line, - column=original_rel.column, - relationship_type=original_rel.relationship_type, - source=original_rel.target, - source_symbol_id=original_rel.target_symbol_id - ) - - # Add as reverse relationship - all_relationships[target_symbol_id].add_relationship(reverse_rel, is_reverse=True) \ No newline at end of file diff --git a/src/code_index_mcp/tools/scip/scip_index_tool.py b/src/code_index_mcp/tools/scip/scip_index_tool.py deleted file mode 100644 index e3620f7..0000000 --- a/src/code_index_mcp/tools/scip/scip_index_tool.py +++ /dev/null @@ -1,230 +0,0 @@ -""" -SCIP Index Tool - Pure technical component for SCIP index operations. - -This tool handles low-level SCIP index operations without any business logic. -It provides technical capabilities that can be composed by business services. -""" - -from typing import Optional, List -from dataclasses import dataclass -from pathlib import Path -import logging -from ...scip.proto.scip_pb2 import Index as SCIPIndex -from ...indexing.scip_builder import SCIPIndexBuilder - -logger = logging.getLogger(__name__) - -# Import FileInfo from the central location to avoid duplication -from ...indexing.index_provider import FileInfo - - -class SCIPIndexTool: - """ - Pure technical component for SCIP index operations. - - This tool provides low-level SCIP index capabilities without any - business logic or decision making. It's designed to be composed - by business services to achieve business goals. - """ - - def __init__(self): - self._scip_index: Optional[SCIPIndex] = None - self._builder = SCIPIndexBuilder() - self._project_path: Optional[str] = None - self._settings = None # Will be set when needed - - def is_index_available(self) -> bool: - """ - Check if SCIP index is available and ready for use. - - Returns: - True if index is available, False otherwise - """ - return self._scip_index is not None and len(self._scip_index.documents) > 0 - - def build_index(self, project_path: str) -> int: - """ - Build SCIP index for the specified project path. - - This is a pure technical operation that unconditionally rebuilds the index. - Business logic for deciding when to rebuild should be handled by the caller. - - Args: - project_path: Absolute path to the project directory - - Returns: - Number of files indexed - - Raises: - ValueError: If project path is invalid - RuntimeError: If index building fails - """ - if not Path(project_path).exists(): - logger.error(f"SCIP INDEX: Project path does not exist: {project_path}") - raise ValueError(f"Project path does not exist: {project_path}") - - # Build new index (pure technical operation) - try: - logger.info(f"Building index for {project_path}") - self._project_path = project_path - - # Initialize settings for this project - from ...project_settings import ProjectSettings - self._settings = ProjectSettings(project_path, skip_load=False) - - self._scip_index = self._builder.build_scip_index(project_path) - logger.info(f"Built index with {len(self._scip_index.documents)} files") - - return len(self._scip_index.documents) - except Exception as e: - logger.error(f"Failed to build index: {e}") - raise RuntimeError(f"Failed to build SCIP index: {e}") from e - - def save_index(self) -> bool: - """ - Save the current SCIP index to disk. - - This is a pure technical operation that saves the current in-memory index. - - Returns: - True if saved successfully, False otherwise - """ - try: - if self._settings is None: - logger.error("No settings available, cannot save index") - return False - - if self._scip_index is None: - logger.error("No index available to save") - return False - - self.save_current_index() - logger.info("Index saved successfully") - return True - except Exception as e: - logger.error(f"Failed to save index: {e}") - return False - - def get_file_list(self) -> List[FileInfo]: - """ - Get list of all indexed files. - - Returns: - List of FileInfo objects for all indexed files - - Raises: - RuntimeError: If index is not available - """ - if not self.is_index_available(): - raise RuntimeError("SCIP index is not available. Call build_index() first.") - - files = [] - for document in self._scip_index.documents: - file_info = FileInfo( - relative_path=document.relative_path, - language=document.language, - absolute_path=str(Path(self._project_path) / document.relative_path) if self._project_path else "" - ) - files.append(file_info) - - return files - - def get_file_count(self) -> int: - """ - Get the number of indexed files. - - Returns: - Number of files in the index - - Raises: - RuntimeError: If index is not available - """ - if not self.is_index_available(): - raise RuntimeError("SCIP index is not available") - - return len(self._scip_index.documents) - - def get_project_metadata(self) -> dict: - """ - Get project metadata from SCIP index. - - Returns: - Dictionary containing project metadata - - Raises: - RuntimeError: If index is not available - """ - if not self.is_index_available(): - raise RuntimeError("SCIP index is not available") - - return { - 'project_root': self._scip_index.metadata.project_root, - 'total_files': len(self._scip_index.documents), - 'tool_version': self._scip_index.metadata.tool_info.version, - 'languages': list(set(doc.language for doc in self._scip_index.documents)) - } - - def load_existing_index(self, project_path: str) -> bool: - """ - Try to load existing SCIP index from disk. - - Args: - project_path: Absolute path to the project directory - - Returns: - True if loaded successfully, False if no index exists or load failed - """ - try: - from ...project_settings import ProjectSettings - - self._project_path = project_path - settings = ProjectSettings(project_path, skip_load=False) - self._settings = settings - - # Try to load existing SCIP index - scip_index = settings.load_scip_index() - if scip_index is not None: - self._scip_index = scip_index - return True - else: - return False - - except Exception as e: - return False - - def save_current_index(self) -> bool: - """ - Save the current SCIP index to disk. - - Returns: - True if saved successfully, False otherwise - """ - if self._scip_index is None: - return False - - if self._settings is None: - return False - - try: - self._settings.save_scip_index(self._scip_index) - return True - except Exception: - return False - - def clear_index(self) -> None: - """Clear the current SCIP index.""" - self._scip_index = None - self._project_path = None - # Keep settings for potential reload - - def get_raw_index(self) -> Optional[SCIPIndex]: - """ - Get the raw SCIP index for advanced operations. - - Note: This should only be used by other technical tools, - not by business services. - - Returns: - Raw SCIP index or None if not available - """ - return self._scip_index diff --git a/src/code_index_mcp/tools/scip/scip_symbol_analyzer.py b/src/code_index_mcp/tools/scip/scip_symbol_analyzer.py deleted file mode 100644 index 5bd4e31..0000000 --- a/src/code_index_mcp/tools/scip/scip_symbol_analyzer.py +++ /dev/null @@ -1,1411 +0,0 @@ -""" -SCIP Symbol Analyzer - Enhanced symbol analysis for accurate code intelligence - -This module provides the main SCIPSymbolAnalyzer class that replaces the legacy -SCIPQueryTool with accurate symbol location detection, proper type classification, -and comprehensive call relationship analysis. -""" - -import os -import logging -from typing import Dict, List, Optional, Any, Set -from functools import lru_cache - -from .symbol_definitions import ( - SymbolDefinition, FileAnalysis, ImportGroup, LocationInfo, - SymbolLocationError, SymbolResolutionError -) -from .relationship_info import SCIPRelationshipReader -from ...scip.core.symbol_manager import SCIPSymbolManager - -logger = logging.getLogger(__name__) - -# Try to import SCIP protobuf definitions -try: - from ...scip.proto import scip_pb2 - SCIP_PROTO_AVAILABLE = True -except ImportError: - scip_pb2 = None - SCIP_PROTO_AVAILABLE = False - logger.warning("SCIP protobuf definitions not available") - - -class SCIPSymbolAnalyzer: - """ - Enhanced SCIP symbol analyzer with accurate position detection and call relationships. - - This class replaces the legacy SCIPQueryTool and provides: - - Accurate symbol location extraction from SCIP Range data - - Proper symbol type classification using SCIP SymbolKind enum - - Comprehensive call relationship analysis - - Cross-file symbol resolution - - LLM-optimized output formatting - """ - - def __init__(self): - """Initialize the symbol analyzer.""" - self._symbol_kind_cache: Dict[int, str] = {} - self._scip_symbol_cache: Dict[str, Dict[str, Any]] = {} - self._symbol_parser: Optional[SCIPSymbolManager] = None - self._relationship_reader = SCIPRelationshipReader() - - # Initialize SCIP symbol kind mapping - self._init_symbol_kind_mapping() - - def _init_symbol_kind_mapping(self): - """Initialize SCIP SymbolKind enum mapping.""" - if not SCIP_PROTO_AVAILABLE: - # Fallback numeric mapping when protobuf not available - self._symbol_kind_map = { - 3: 'class', # CLASS - 11: 'function', # FUNCTION - 14: 'method', # METHOD - 29: 'variable', # VARIABLE - 4: 'constant', # CONSTANT - 6: 'enum', # ENUM - 7: 'enum_member', # ENUM_MEMBER - 9: 'field', # FIELD - 23: 'property', # PROPERTY - 5: 'constructor', # CONSTRUCTOR - 15: 'module', # MODULE - 16: 'namespace', # NAMESPACE - 12: 'interface', # INTERFACE - 25: 'struct', # STRUCT - 33: 'trait', # TRAIT - 35: 'macro', # MACRO - } - else: - # Use actual protobuf enum when available - self._symbol_kind_map = {} - # Will be populated dynamically using scip_pb2.SymbolKind.Name() - - def analyze_file(self, file_path: str, scip_index) -> FileAnalysis: - """ - Main entry point for file analysis. - - Args: - file_path: Relative path to the file to analyze - scip_index: SCIP index containing all project data - - Returns: - FileAnalysis object with complete symbol information - - Raises: - ValueError: If file not found or analysis fails - """ - try: - logger.debug(f"Starting analysis for file: {file_path}") - - # Initialize symbol parser from index metadata (for scip-* symbol parsing) - try: - project_root = getattr(getattr(scip_index, 'metadata', None), 'project_root', '') or '' - if project_root: - self._symbol_parser = SCIPSymbolManager(project_root) - except Exception: - self._symbol_parser = None - - # Step 1: Find the document in SCIP index - document = self._find_document(file_path, scip_index) - if not document: - logger.warning(f"Document not found in SCIP index: {file_path}") - return self._create_empty_analysis(file_path) - - logger.debug(f"Found document with {len(document.symbols)} symbols") - - # Step 2: Extract all symbols with accurate metadata - symbols = self._extract_all_symbols(document) - logger.debug(f"Extracted {len(symbols)} symbols") - - # Step 3: Extract call relationships - self._extract_call_relationships(document, symbols, scip_index) - logger.debug("Completed call relationship extraction") - - # Step 4: Organize results into final structure - result = self._organize_results(document, symbols, scip_index) - logger.debug(f"Analysis complete: {len(result.functions)} functions, {len(result.classes)} classes") - - return result - - except Exception as e: - logger.error(f"Failed to analyze file {file_path}: {e}") - # Return partial analysis rather than failing completely - return self._create_error_analysis(file_path, str(e)) - - def _find_document(self, file_path: str, scip_index) -> Optional[Any]: - """ - Find the SCIP document for the given file path. - - Args: - file_path: File path to search for - scip_index: SCIP index object - - Returns: - SCIP document or None if not found - """ - if not hasattr(scip_index, 'documents'): - logger.error("Invalid SCIP index: missing documents attribute") - return None - - # Normalize path for comparison - normalized_target = self._normalize_path(file_path) - - # Try exact match first - for document in scip_index.documents: - if self._normalize_path(document.relative_path) == normalized_target: - return document - - # Try case-insensitive match - normalized_lower = normalized_target.lower() - for document in scip_index.documents: - if self._normalize_path(document.relative_path).lower() == normalized_lower: - logger.debug(f"Found case-insensitive match for {file_path}") - return document - - return None - - def _normalize_path(self, path: str) -> str: - """Normalize file path for consistent comparison.""" - return path.replace('\\', '/').lstrip('./') - - def _extract_all_symbols(self, document) -> Dict[str, SymbolDefinition]: - """ - Extract all symbols from the document in a single pass. - - Args: - document: SCIP document object - - Returns: - Dictionary mapping SCIP symbols to SymbolDefinition objects - """ - symbols = {} - - for symbol_info in document.symbols: - try: - # Extract basic symbol information - scip_symbol = symbol_info.symbol - display_name = getattr(symbol_info, 'display_name', '') - symbol_kind = getattr(symbol_info, 'kind', 0) - - # Parse symbol name and classification - parsed_name, class_name = self._parse_symbol_identity(scip_symbol, display_name) - if not parsed_name: - continue - - # Get symbol type from SCIP kind - symbol_type = self._classify_symbol_type(symbol_kind, scip_symbol) - - # Extract precise location - # Extract location (never fails now) - location = self._extract_precise_location(scip_symbol, document) - - # Debug: Check location type - if not isinstance(location, LocationInfo): - logger.error(f"Location extraction returned wrong type: {type(location)} for symbol {scip_symbol}") - location = LocationInfo(line=1, column=1) # Fallback - - # Create symbol definition - symbol_def = SymbolDefinition( - name=parsed_name, - line=location.line, - column=location.column, - symbol_type=symbol_type, - class_name=class_name, - scip_symbol=scip_symbol - ) - - # Extract additional metadata - self._enrich_symbol_metadata(symbol_def, symbol_info, document) - - symbols[scip_symbol] = symbol_def - logger.debug(f"Processed symbol: {parsed_name} ({symbol_type}) at {location.line}:{location.column}") - - except Exception as e: - logger.warning(f"Failed to process symbol {getattr(symbol_info, 'symbol', 'unknown')}: {e}") - continue - - return symbols - - def _parse_symbol_identity(self, scip_symbol: str, display_name: str = '') -> tuple[str, Optional[str]]: - """ - Parse symbol name and class ownership from SCIP symbol string. - - Args: - scip_symbol: SCIP symbol identifier - display_name: Display name from symbol info - - Returns: - Tuple of (symbol_name, class_name) - """ - # Use display name if available and meaningful - if display_name and not display_name.startswith('__'): - name = display_name - else: - # Extract from SCIP symbol - name = self._extract_name_from_scip_symbol(scip_symbol) - - # Extract class name if this is a class member - class_name = self._extract_class_name(scip_symbol) - - return name, class_name - - @lru_cache(maxsize=500) - def _extract_name_from_scip_symbol(self, scip_symbol: str) -> str: - """Extract clean, human-readable symbol name from SCIP symbol identifier.""" - try: - if scip_symbol.startswith('local:'): - # local:src.module.Class#method_name(). - symbol_path = scip_symbol[6:] # Remove 'local:' prefix - - if '#' in symbol_path: - # Method or field: extract after '#' - method_part = symbol_path.split('#')[-1] - return self._clean_symbol_name(method_part) - else: - # Class or top-level function: extract last part - class_part = symbol_path.split('.')[-1] - return self._clean_symbol_name(class_part) - - elif scip_symbol.startswith('external:'): - # external:module.path/ClassName#method_name(). - if '/' in scip_symbol: - after_slash = scip_symbol.split('/')[-1] - if '#' in after_slash: - method_part = after_slash.split('#')[-1] - return self._clean_symbol_name(method_part) - else: - return self._clean_symbol_name(after_slash) - else: - # Just module reference - module_part = scip_symbol[9:] # Remove 'external:' - return self._clean_symbol_name(module_part.split('.')[-1]) - - # Fallback: clean up whatever we have - return self._clean_symbol_name(scip_symbol.split('/')[-1].split('#')[-1]) - - except Exception as e: - logger.debug(f"Error extracting name from {scip_symbol}: {e}") - return "unknown" - - def _clean_symbol_name(self, raw_name: str) -> str: - """Clean symbol name for human readability.""" - # Remove common suffixes and prefixes - cleaned = raw_name.rstrip('().#') - - # Remove module path prefixes if present - if '.' in cleaned: - cleaned = cleaned.split('.')[-1] - - # Handle special cases - if not cleaned or cleaned.isdigit(): - return "unknown" - - return cleaned - - @lru_cache(maxsize=500) - def _extract_class_name(self, scip_symbol: str) -> Optional[str]: - """Extract clean class name if this symbol belongs to a class. - - Supports: - - Legacy local/external formats with '#': local:...Class#method / external:.../Class#method - - Current scip-* local format where descriptors encode path as - //(). - """ - try: - # Newer scip-* local symbols: parse descriptors path - if scip_symbol.startswith('scip-'): - parts = scip_symbol.split(' ', 4) - descriptors = parts[4] if len(parts) == 5 else (parts[3] if len(parts) >= 4 else '') - if descriptors: - components = [p for p in descriptors.split('/') if p] - if len(components) >= 2: - candidate = components[-2] - return candidate if candidate and not candidate.isdigit() else None - - if '#' not in scip_symbol: - return None - - if scip_symbol.startswith('local:'): - # local:src.module.ClassName#method - symbol_path = scip_symbol[6:] # Remove 'local:' - class_part = symbol_path.split('#')[0] - - # Extract just the class name (last part of module path) - if '.' in class_part: - class_name = class_part.split('.')[-1] - else: - class_name = class_part - - return class_name if class_name and not class_name.isdigit() else None - - elif scip_symbol.startswith('external:'): - # external:module/ClassName#method - if '/' in scip_symbol: - path_part = scip_symbol.split('/')[-1] - if '#' in path_part: - class_name = path_part.split('#')[0] - return class_name if class_name and not class_name.isdigit() else None - - except Exception as e: - logger.debug(f"Error extracting class name from {scip_symbol}: {e}") - - return None - - def _classify_symbol_type(self, scip_kind: int, scip_symbol: str) -> str: - """ - Classify symbol type using SCIP SymbolKind enum. - - Args: - scip_kind: SCIP SymbolKind enum value - scip_symbol: SCIP symbol string for additional context - - Returns: - Standardized symbol type string - """ - # Try to get cached result - if scip_kind in self._symbol_kind_cache: - base_type = self._symbol_kind_cache[scip_kind] - else: - base_type = self._get_scip_kind_name(scip_kind) - self._symbol_kind_cache[scip_kind] = base_type - - # Refine classification based on index symbol structure - if base_type == 'function': - # Legacy/colon formats use '#' - if '#' in scip_symbol: - return 'method' - # Current scip-* local descriptors path: //(). - if scip_symbol.startswith('scip-'): - parts = scip_symbol.split(' ', 4) - descriptors = parts[4] if len(parts) == 5 else (parts[3] if len(parts) >= 4 else '') - if descriptors: - components = [p for p in descriptors.split('/') if p] - if len(components) >= 2: - last_comp = components[-1] - if last_comp.endswith('().') or last_comp.endswith('()'): - return 'method' - - return base_type - - def _get_scip_kind_name(self, kind: int) -> str: - """Get symbol type name from SCIP SymbolKind.""" - if SCIP_PROTO_AVAILABLE: - try: - # Use protobuf enum name - enum_name = scip_pb2.SymbolKind.Name(kind) - return self._normalize_kind_name(enum_name) - except (ValueError, AttributeError): - pass - - # Fallback to numeric mapping - return self._symbol_kind_map.get(kind, 'unknown') - - def _normalize_kind_name(self, enum_name: str) -> str: - """Normalize SCIP enum name to standard type.""" - enum_name = enum_name.lower() - - # Map SCIP names to our standard names - if enum_name == 'class': - return 'class' - elif enum_name in ['function', 'func']: - return 'function' - elif enum_name == 'method': - return 'method' - elif enum_name in ['variable', 'var']: - return 'variable' - elif enum_name in ['constant', 'const']: - return 'constant' - elif enum_name == 'field': - return 'field' - elif enum_name == 'property': - return 'property' - else: - return enum_name - - def _extract_precise_location(self, scip_symbol: str, document) -> LocationInfo: - """ - Never-fail location extraction with intelligent fallbacks using SCIPSymbolManager. - - Args: - scip_symbol: SCIP symbol identifier - document: SCIP document containing occurrences - - Returns: - LocationInfo with best available location and confidence level - """ - # Layer 1: Standard SCIP occurrence location - location = self._find_definition_location(scip_symbol, document) - if location: - location.confidence = 'definition' - return location - - location = self._find_any_location(scip_symbol, document) - if location: - location.confidence = 'occurrence' - return location - - # Layer 2: SCIPSymbolManager-based symbol structure inference - if self._symbol_parser: - location = self._infer_location_from_symbol_structure(scip_symbol, document) - if location: - location.confidence = 'inferred' - return location - - # Layer 3: Symbol type-based default location - location = self._get_default_location_by_symbol_type(scip_symbol) - location.confidence = 'default' - return location - - def _find_definition_location(self, scip_symbol: str, document) -> Optional[LocationInfo]: - """Find the definition occurrence for a symbol.""" - for occurrence in document.occurrences: - if occurrence.symbol == scip_symbol and self._is_definition(occurrence): - location = self._parse_occurrence_location(occurrence) - if location: - return location - return None - - def _find_any_location(self, scip_symbol: str, document) -> Optional[LocationInfo]: - """Find any occurrence with location data for a symbol.""" - for occurrence in document.occurrences: - if occurrence.symbol == scip_symbol: - location = self._parse_occurrence_location(occurrence) - if location: - return location - return None - - def _is_definition(self, occurrence) -> bool: - """Check if an occurrence represents a definition.""" - if not hasattr(occurrence, 'symbol_roles'): - return False - - try: - if SCIP_PROTO_AVAILABLE: - return bool(occurrence.symbol_roles & scip_pb2.SymbolRole.Definition) - else: - # Fallback: Definition role = 1 - return bool(occurrence.symbol_roles & 1) - except (AttributeError, TypeError): - return False - - def _parse_occurrence_location(self, occurrence) -> Optional[LocationInfo]: - """Parse location information from SCIP occurrence.""" - try: - if not hasattr(occurrence, 'range') or not occurrence.range: - return None - - range_obj = occurrence.range - if not hasattr(range_obj, 'start') or not range_obj.start: - return None - - start = range_obj.start - if len(start) >= 2: - # SCIP uses 0-based indexing, convert to 1-based - line = start[0] + 1 - column = start[1] + 1 - return LocationInfo(line=line, column=column) - - except (AttributeError, IndexError, TypeError) as e: - logger.debug(f"Failed to parse occurrence location: {e}") - - return None - - def _enrich_symbol_metadata(self, symbol: SymbolDefinition, symbol_info, document): - """Enrich symbol with additional metadata from SCIP data.""" - # Extract documentation if available - if hasattr(symbol_info, 'documentation') and symbol_info.documentation: - # Could extract docstrings here if needed - pass - - # For functions/methods, extract parameter information - if symbol.is_callable(): - symbol.parameters = self._extract_function_parameters(symbol.scip_symbol, symbol_info, document) - symbol.return_type = self._extract_return_type(symbol.scip_symbol, symbol_info) - symbol.is_async = self._is_async_function(symbol.scip_symbol, symbol_info) - - # For classes, extract methods and attributes - elif symbol.symbol_type == 'class': - symbol.methods, symbol.attributes = self._extract_class_members(symbol.scip_symbol, document) - symbol.inherits_from = self._extract_inheritance(symbol.scip_symbol, symbol_info) - - # For variables, extract type and scope information - elif symbol.symbol_type == 'variable': - symbol.type = self._extract_variable_type(symbol.scip_symbol, symbol_info) - symbol.is_global = self._is_global_variable(symbol.scip_symbol, document) - - # For constants, extract value if available - elif symbol.symbol_type == 'constant': - symbol.value = self._extract_constant_value(symbol.scip_symbol, symbol_info) - - def _extract_call_relationships(self, document, symbols: Dict[str, SymbolDefinition], scip_index): - """ - Extract all relationships from SCIP document using the new relationship reader. - - Args: - document: SCIP document containing symbols and relationships - symbols: Dictionary of extracted symbols - scip_index: Full SCIP index for cross-file resolution - """ - logger.debug("Starting relationship extraction using SCIP relationship reader") - - # Use the new relationship reader to extract all relationships - all_relationships = self._relationship_reader.extract_relationships_from_document(document) - - # Assign relationships to symbols - for symbol_id, symbol_def in symbols.items(): - if symbol_id in all_relationships: - symbol_def.relationships = all_relationships[symbol_id] - logger.debug(f"Assigned {symbol_def.relationships.get_total_count()} relationships to {symbol_def.name}") - - logger.debug(f"Relationship extraction completed for {len(symbols)} symbols") - - def _organize_results(self, document, symbols: Dict[str, SymbolDefinition], scip_index=None) -> FileAnalysis: - """ - Organize extracted symbols into final FileAnalysis structure. - - Args: - document: SCIP document - symbols: Extracted symbol definitions - scip_index: Full SCIP index for external symbol extraction - - Returns: - FileAnalysis with organized results - """ - # Create file analysis result - result = FileAnalysis( - file_path=document.relative_path, - language=document.language, - line_count=self._estimate_line_count(document), - size_bytes=0 # TODO: Could get from filesystem if needed - ) - - # Add symbols to appropriate collections - for symbol in symbols.values(): - result.add_symbol(symbol) - - # Extract import information from occurrences - self._extract_imports(document, result.imports) - - # Also extract imports from external symbols (for strategies like Objective-C) - if scip_index: - self._extract_imports_from_external_symbols(scip_index, result.imports) - - return result - - - - def _estimate_line_count(self, document) -> int: - """Estimate line count from document data.""" - # Try to get from document text if available - if hasattr(document, 'text') and document.text: - return len(document.text.splitlines()) - - # Fallback: estimate from occurrence ranges - max_line = 0 - for occurrence in document.occurrences: - try: - if occurrence.range and occurrence.range.start: - line = occurrence.range.start[0] + 1 - max_line = max(max_line, line) - except (AttributeError, IndexError): - continue - - return max_line if max_line > 0 else 100 # Default estimate - - def _is_function_call(self, occurrence) -> bool: - """ - Check if an occurrence represents a function call. - - Based on debug analysis, function calls have roles=0 in our SCIP data, - so we need to identify them by other characteristics. - - Args: - occurrence: SCIP occurrence object - - Returns: - True if this occurrence is a function call - """ - try: - symbol = occurrence.symbol - roles = getattr(occurrence, 'symbol_roles', 0) - - # Check if it's a definition (role = 1) - these are NOT calls - if roles & 1: - return False - - # Check if it's an import (role = 2) - these are NOT calls - if roles & 2: - return False - - # For roles = 0, check if it looks like a function call by symbol format - if roles == 0: - # Function calls typically have () in the symbol - if '()' in symbol: - # But exclude definitions at line start positions - if hasattr(occurrence, 'range') and occurrence.range: - if hasattr(occurrence.range, 'start') and occurrence.range.start: - line = occurrence.range.start[0] + 1 - col = occurrence.range.start[1] + 1 - # Function definitions usually start at column 1 or 5 (indented) - # Function calls are usually at higher column positions - return col > 5 - return True - - # Traditional role-based detection as fallback - if SCIP_PROTO_AVAILABLE: - return bool(roles & (scip_pb2.SymbolRole.Read | scip_pb2.SymbolRole.Reference)) - else: - # Fallback: Read=8, Reference=4 - return bool(roles & (8 | 4)) - - except (AttributeError, TypeError): - return False - - def _find_containing_function(self, occurrence, function_symbols: Dict[str, SymbolDefinition], document) -> Optional[SymbolDefinition]: - """ - Find which function contains the given occurrence. - - Args: - occurrence: SCIP occurrence object - function_symbols: Map of SCIP symbols to function definitions - document: SCIP document - - Returns: - SymbolDefinition of the containing function, or None - """ - try: - occurrence_line = self._get_occurrence_line(occurrence) - if occurrence_line <= 0: - return None - - # Find the function that contains this line - best_match = None - best_distance = float('inf') - - for scip_symbol, func_def in function_symbols.items(): - # Function should start before or at the occurrence line - if func_def.line <= occurrence_line: - distance = occurrence_line - func_def.line - if distance < best_distance: - best_distance = distance - best_match = func_def - - return best_match - - except Exception as e: - logger.debug(f"Error finding containing function: {e}") - return None - - def _get_occurrence_line(self, occurrence) -> int: - """Extract line number from SCIP occurrence.""" - try: - if hasattr(occurrence, 'range') and occurrence.range: - if hasattr(occurrence.range, 'start') and occurrence.range.start: - return occurrence.range.start[0] + 1 # Convert to 1-based - except (AttributeError, IndexError, TypeError): - pass - return 0 - - def _resolve_call_target(self, target_symbol: str, scip_index, current_document) -> Optional[Dict[str, Any]]: - """Use SCIPSymbolManager to resolve call target information. - - Args: - target_symbol: SCIP symbol being called - scip_index: Full SCIP index for cross-file lookup - current_document: Current document for local symbol context - - Returns: - Dictionary with call target information or None - """ - if not self._symbol_parser: - return self._fallback_resolve_target(target_symbol, current_document) - - try: - # Use SCIPSymbolManager to parse symbol - symbol_info = self._symbol_parser.parse_symbol(target_symbol) - if not symbol_info: - return None - - # Extract clear symbol name from descriptors - target_name = self._extract_symbol_name_from_descriptors(symbol_info.descriptors) - - # Handle based on manager type - if symbol_info.manager == 'local': - # Local call: use existing file path extraction - file_path = self._symbol_parser.get_file_path_from_symbol(target_symbol) - target_line = self._find_local_symbol_location(target_symbol, current_document) - return { - 'name': target_name, - 'scope': 'local', - 'file': file_path or current_document.relative_path, - 'line': target_line - } - - elif symbol_info.manager in ['stdlib', 'pip', 'npm']: - # External call: get info from parsed results - return { - 'name': target_name, - 'scope': 'external', - 'package': symbol_info.package, - 'module': self._extract_module_from_descriptors(symbol_info.descriptors) - } - - return None - - except Exception as e: - logger.debug(f"Error resolving call target {target_symbol}: {e}") - return None - - - def _find_symbol_definition(self, target_symbol: str, scip_index) -> tuple[Optional[str], int]: - """ - Find the definition location of a symbol in the SCIP index. - - Args: - target_symbol: SCIP symbol to find - scip_index: Full SCIP index - - Returns: - Tuple of (file_path, line_number) or (None, 0) if not found - """ - try: - for document in scip_index.documents: - for occurrence in document.occurrences: - if (occurrence.symbol == target_symbol and - self._is_definition(occurrence)): - line = self._get_occurrence_line(occurrence) - return document.relative_path, line - except Exception as e: - logger.debug(f"Error finding symbol definition: {e}") - - return None, 0 - - def _extract_symbol_name_from_descriptors(self, descriptors: str) -> str: - """Extract symbol name from SCIP descriptors.""" - # utils.py/helper_function() -> helper_function - # MyClass/method() -> method - if '/' in descriptors: - symbol_part = descriptors.split('/')[-1] - return symbol_part.rstrip('().') - return descriptors.rstrip('().') - - def _extract_module_from_descriptors(self, descriptors: str) -> Optional[str]: - """Extract module name from descriptors.""" - # os/ -> os, pathlib/Path -> pathlib - if '/' in descriptors: - return descriptors.split('/')[0] - return descriptors.strip('/') - - def _fallback_resolve_target(self, target_symbol: str, current_document) -> Optional[Dict[str, Any]]: - """Fallback resolution when SCIPSymbolManager is not available.""" - try: - # Parse the target symbol using legacy method - target_name, target_class = self._parse_symbol_identity(target_symbol) - if not target_name: - return None - - # Basic resolution for legacy formats - if target_symbol.startswith('local:'): - target_location = self._find_local_symbol_location(target_symbol, current_document) - return { - 'name': target_name, - 'scope': 'local', - 'file': current_document.relative_path, - 'line': target_location - } - - return { - 'name': target_name, - 'scope': 'unknown', - 'file': 'unknown', - 'line': 0 - } - - except Exception as e: - logger.debug(f"Fallback resolution failed for {target_symbol}: {e}") - return None - - def _find_local_symbol_location(self, target_symbol: str, document) -> int: - """Find the line number for a local symbol definition.""" - try: - for occurrence in document.occurrences: - if (occurrence.symbol == target_symbol and - self._is_definition(occurrence)): - return self._get_occurrence_line(occurrence) - except Exception as e: - logger.debug(f"Error finding local symbol location: {e}") - return 0 - - - - def _extract_imports(self, document, imports: ImportGroup): - """Use SCIPSymbolManager to correctly parse imports.""" - if not self._symbol_parser: - logger.debug("No symbol parser available, skipping import extraction") - return - - try: - seen_modules = set() - - # Method 1: Extract from occurrences with Import role (traditional approach) - for occurrence in document.occurrences: - # Only process Import role symbols - if not self._is_import_occurrence(occurrence): - continue - - symbol_info = self._symbol_parser.parse_symbol(occurrence.symbol) - if not symbol_info: - continue - - # Handle based on manager type - if symbol_info.manager == 'stdlib': - module_name = self._extract_module_from_descriptors(symbol_info.descriptors) - if module_name and module_name not in seen_modules: - imports.add_import(module_name, 'standard_library') - seen_modules.add(module_name) - - elif symbol_info.manager == 'pip': - # pip packages: package name is the module name - package_name = symbol_info.package - if package_name and package_name not in seen_modules: - imports.add_import(package_name, 'third_party') - seen_modules.add(package_name) - - elif symbol_info.manager == 'local': - # Local imports: extract module path from descriptors - module_path = self._extract_local_module_path(symbol_info.descriptors) - if module_path and module_path not in seen_modules: - imports.add_import(module_path, 'local') - seen_modules.add(module_path) - - logger.debug(f"Extracted {len(seen_modules)} unique imports from SCIP occurrences") - - except Exception as e: - logger.debug(f"Error extracting imports from occurrences: {e}") - - def _extract_imports_from_external_symbols(self, scip_index, imports: ImportGroup): - """Extract imports from SCIP index external symbols (for strategies like Objective-C).""" - try: - if not hasattr(scip_index, 'external_symbols'): - logger.debug("No external_symbols in SCIP index") - return - - seen_modules = set() - - for symbol_info in scip_index.external_symbols: - if not symbol_info.symbol: - continue - - # Parse the external symbol - parsed_symbol = self._symbol_parser.parse_symbol(symbol_info.symbol) if self._symbol_parser else None - if not parsed_symbol: - # Fallback: try to extract framework name from symbol string - framework_name = self._extract_framework_from_symbol_string(symbol_info.symbol) - if framework_name and framework_name not in seen_modules: - # Classify based on symbol pattern - import_type = self._classify_external_symbol(symbol_info.symbol) - imports.add_import(framework_name, import_type) - seen_modules.add(framework_name) - logger.debug(f"Extracted external dependency: {framework_name} ({import_type})") - continue - - # Handle based on manager type - if parsed_symbol.manager in ['system', 'unknown']: - # For Objective-C system frameworks - package_name = parsed_symbol.package - if package_name and package_name not in seen_modules: - imports.add_import(package_name, 'standard_library') - seen_modules.add(package_name) - - elif parsed_symbol.manager in ['cocoapods', 'carthage']: - # Third-party Objective-C dependencies - package_name = parsed_symbol.package - if package_name and package_name not in seen_modules: - imports.add_import(package_name, 'third_party') - seen_modules.add(package_name) - - logger.debug(f"Extracted {len(seen_modules)} unique imports from external symbols") - - except Exception as e: - logger.debug(f"Error extracting imports from external symbols: {e}") - - def _extract_framework_from_symbol_string(self, symbol_string: str) -> Optional[str]: - """Extract framework name from SCIP symbol string.""" - try: - # Handle symbols like "scip-unknown unknown Foundation Foundation *." - parts = symbol_string.split() - if len(parts) >= 4: - # The package name is typically the 3rd or 4th part - for part in parts[2:5]: # Check parts 2, 3, 4 - if part and part != 'unknown' and not part.endswith('.'): - return part - return None - except Exception: - return None - - def _classify_external_symbol(self, symbol_string: str) -> str: - """Classify external symbol as standard_library, third_party, or local.""" - try: - # Check for known system frameworks - system_frameworks = { - 'Foundation', 'UIKit', 'CoreData', 'CoreGraphics', 'QuartzCore', - 'AVFoundation', 'CoreLocation', 'MapKit', 'CoreAnimation', - 'Security', 'SystemConfiguration', 'CFNetwork', 'CoreFoundation', - 'AppKit', 'Cocoa', 'WebKit', 'JavaScriptCore' - } - - for framework in system_frameworks: - if framework in symbol_string: - return 'standard_library' - - # Check for third-party indicators - if any(indicator in symbol_string.lower() for indicator in ['cocoapods', 'carthage', 'pods']): - return 'third_party' - - return 'standard_library' # Default for external symbols - - except Exception: - return 'standard_library' - - def _parse_external_module(self, external_symbol: str) -> Optional[Dict[str, str]]: - """Parse external SCIP symbol to extract module information.""" - try: - if not external_symbol.startswith('external:'): - return None - - # Remove 'external:' prefix and parse path - symbol_path = external_symbol[9:] - - # Extract base module path (before '/' or '#') - if '/' in symbol_path: - module_path = symbol_path.split('/')[0] - elif '#' in symbol_path: - module_path = symbol_path.split('#')[0] - else: - module_path = symbol_path - - # Clean up module path - module_path = module_path.rstrip('.') - if not module_path: - return None - - # Categorize the import - category = self._categorize_import(module_path) - - return { - 'module': module_path, - 'category': category - } - - except Exception as e: - logger.debug(f"Error parsing external module {external_symbol}: {e}") - return None - - def _categorize_import(self, module_path: str) -> str: - """Categorize import as standard_library, third_party, or local.""" - # Standard library modules (common ones) - stdlib_modules = { - 'os', 'sys', 'json', 'time', 'datetime', 'logging', 'pathlib', - 'typing', 'dataclasses', 'functools', 'itertools', 'collections', - 're', 'math', 'random', 'threading', 'subprocess', 'shutil', - 'contextlib', 'traceback', 'warnings', 'weakref', 'copy', - 'pickle', 'base64', 'hashlib', 'hmac', 'uuid', 'urllib', - 'http', 'socketserver', 'email', 'mimetypes', 'csv', 'configparser', - 'argparse', 'getopt', 'tempfile', 'glob', 'fnmatch', 'linecache', - 'pprint', 'textwrap', 'string', 'struct', 'codecs', 'unicodedata', - 'io', 'gzip', 'bz2', 'lzma', 'zipfile', 'tarfile' - } - - # Local imports (relative imports or project-specific patterns) - if module_path.startswith('.'): - return 'local' - - # Check for common project patterns - if any(pattern in module_path for pattern in ['src.', 'lib.', 'app.', 'project.']): - return 'local' - - # Standard library check - base_module = module_path.split('.')[0] - if base_module in stdlib_modules: - return 'standard_library' - - # Everything else is third_party - return 'third_party' - - - def _is_import_occurrence(self, occurrence) -> bool: - """Check if occurrence represents an import.""" - # Import role = 2 (based on debug results) - return hasattr(occurrence, 'symbol_roles') and (occurrence.symbol_roles & 2) - - def _extract_local_module_path(self, descriptors: str) -> Optional[str]: - """Extract module path from local descriptors.""" - # utils.py/helper_function() -> utils - # services/user_service.py/UserService -> services.user_service - if '/' in descriptors: - file_part = descriptors.split('/')[0] - if file_part.endswith('.py'): - return file_part[:-3].replace('/', '.') - return file_part.replace('/', '.') - return None - - def _extract_class_name_from_descriptors(self, descriptors: str) -> Optional[str]: - """Extract class name from descriptors.""" - # test_empty_functions.py/TestClass# -> TestClass - # test_empty_functions.py/TestClass/method() -> TestClass (if this is class symbol) - parts = descriptors.split('/') - if len(parts) >= 2: - class_part = parts[1] - # Remove trailing # if present (class symbols end with #) - return class_part.rstrip('#') - return None - - def _is_class_member(self, descriptors: str, class_name: str) -> bool: - """Check if descriptors belongs to specified class member.""" - # test_empty_functions.py/TestClass/method_one() contains TestClass - return f"/{class_name}/" in descriptors - - def _extract_member_name(self, descriptors: str, class_name: str) -> Optional[str]: - """Extract class member name.""" - # test_empty_functions.py/TestClass/method_one() -> method_one - if f"/{class_name}/" in descriptors: - after_class = descriptors.split(f"/{class_name}/", 1)[1] - return after_class.rstrip('().') - return None - - def _is_method_kind(self, kind: int) -> bool: - """Check if SCIP kind represents a method or function.""" - method_kinds = {'function', 'method'} - kind_name = self._get_scip_kind_name(kind) - return kind_name in method_kinds - - def _infer_location_from_symbol_structure(self, scip_symbol: str, document) -> Optional[LocationInfo]: - """Infer location based on symbol structure using SCIPSymbolManager.""" - symbol_info = self._symbol_parser.parse_symbol(scip_symbol) - if not symbol_info: - return None - - try: - # Strategy 1: If class member, estimate based on class location - if '/' in symbol_info.descriptors: - parts = symbol_info.descriptors.split('/') - if len(parts) >= 3: # file.py/ClassName/member - class_symbol = f"{symbol_info.scheme} {symbol_info.manager} {symbol_info.package} {'/'.join(parts[:2])}" - class_location = self._find_symbol_location_in_document(class_symbol, document) - if class_location: - # Members usually 2-10 lines after class definition - return LocationInfo( - line=class_location.line + 3, - column=class_location.column + 4 - ) - - # Strategy 2: Estimate based on file path (if symbol belongs to current file) - if symbol_info.manager == 'local': - file_path = self._symbol_parser.get_file_path_from_symbol(scip_symbol) - if file_path and file_path in document.relative_path: - return self._estimate_position_in_file(symbol_info.descriptors, document) - - except Exception as e: - logger.debug(f"Symbol location inference failed: {e}") - - return None - - def _find_symbol_location_in_document(self, target_symbol: str, document) -> Optional[LocationInfo]: - """Find location of target symbol in document.""" - for occurrence in document.occurrences: - if occurrence.symbol == target_symbol: - location = self._parse_occurrence_location(occurrence) - if location: - return location - return None - - def _estimate_position_in_file(self, descriptors: str, document) -> Optional[LocationInfo]: - """Estimate position based on descriptors and document structure.""" - # Simple heuristic: estimate line based on symbol type - if 'class' in descriptors.lower(): - return LocationInfo(line=max(1, len(document.occurrences) // 4), column=1) - elif any(marker in descriptors.lower() for marker in ['function', 'method']): - return LocationInfo(line=max(5, len(document.occurrences) // 2), column=1) - else: - return LocationInfo(line=1, column=1) - - def _get_default_location_by_symbol_type(self, scip_symbol: str) -> LocationInfo: - """Provide reasonable default location based on symbol type.""" - symbol_lower = scip_symbol.lower() - if 'class' in symbol_lower: - return LocationInfo(line=1, column=1) # Classes usually at file start - elif any(marker in symbol_lower for marker in ['function', 'method']): - return LocationInfo(line=5, column=1) # Functions usually after imports - else: - return LocationInfo(line=1, column=1) # Other symbols default position - - def _create_empty_analysis(self, file_path: str) -> FileAnalysis: - """Create empty analysis result for missing files.""" - return FileAnalysis( - file_path=file_path, - language='unknown', - line_count=0, - size_bytes=0 - ) - - def _create_error_analysis(self, file_path: str, error_message: str) -> FileAnalysis: - """Create error analysis result.""" - logger.error(f"Analysis error for {file_path}: {error_message}") - result = FileAnalysis( - file_path=file_path, - language='unknown', - line_count=0, - size_bytes=0 - ) - # Could add error information to metadata if needed - return result - - def _extract_function_parameters(self, scip_symbol: str, symbol_info, document) -> List[str]: - """ - Extract function parameter names from SCIP data. - - Args: - scip_symbol: SCIP symbol identifier - symbol_info: SCIP symbol information - document: SCIP document containing occurrences - - Returns: - List of parameter names - """ - try: - # Try to extract from documentation (Python strategy stores params here) - if hasattr(symbol_info, 'documentation') and symbol_info.documentation: - for doc_line in symbol_info.documentation: - if doc_line.startswith('Parameters: '): - param_str = doc_line[12:] # Remove 'Parameters: ' - return [p.strip() for p in param_str.split(',') if p.strip()] - - # Try to extract from symbol information signature - if hasattr(symbol_info, 'signature') and symbol_info.signature: - return self._parse_signature_parameters(symbol_info.signature) - - # Fallback: try to extract from symbol occurrences and surrounding context - return self._extract_parameters_from_occurrences(scip_symbol, document) - - except Exception as e: - logger.debug(f"Failed to extract parameters for {scip_symbol}: {e}") - return [] - - def _parse_signature_parameters(self, signature: str) -> List[str]: - """Parse parameter names from function signature.""" - try: - # Basic signature parsing - handle common patterns - if '(' in signature and ')' in signature: - param_section = signature.split('(')[1].split(')')[0] - if not param_section.strip(): - return [] - - params = [] - for param in param_section.split(','): - param = param.strip() - if param: - # Extract parameter name (before type annotation if present) - param_name = param.split(':')[0].strip() - if param_name and param_name != 'self': - params.append(param_name) - elif param_name == 'self': - params.append('self') - - return params - - except Exception as e: - logger.debug(f"Error parsing signature parameters: {e}") - - return [] - - def _extract_parameters_from_occurrences(self, scip_symbol: str, document) -> List[str]: - """Extract parameters by analyzing symbol occurrences in the document.""" - # This is a simplified implementation - # A more sophisticated approach would analyze the AST or source code directly - return [] - - def _extract_return_type(self, scip_symbol: str, symbol_info) -> Optional[str]: - """Extract return type from SCIP data.""" - try: - if hasattr(symbol_info, 'signature') and symbol_info.signature: - signature = symbol_info.signature - if '->' in signature: - return_part = signature.split('->')[-1].strip() - return return_part if return_part else None - except Exception as e: - logger.debug(f"Error extracting return type for {scip_symbol}: {e}") - return None - - def _is_async_function(self, scip_symbol: str, symbol_info) -> bool: - """Check if function is async based on SCIP data.""" - try: - # Check documentation for async marker (Python AST analyzer stores this) - if hasattr(symbol_info, 'documentation') and symbol_info.documentation: - for doc_line in symbol_info.documentation: - if doc_line == 'Async function': - return True - - # Fallback: check signature - if hasattr(symbol_info, 'signature') and symbol_info.signature: - return 'async' in symbol_info.signature.lower() - except Exception as e: - logger.debug(f"Error checking async status for {scip_symbol}: {e}") - return False - - def _extract_class_members(self, class_scip_symbol: str, document) -> tuple[List[str], List[str]]: - """Use SCIPSymbolManager to parse class members.""" - methods = [] - attributes = [] - - if not self._symbol_parser: - return methods, attributes - - try: - # Parse class symbol to get descriptors - class_info = self._symbol_parser.parse_symbol(class_scip_symbol) - if not class_info: - return methods, attributes - - # Extract class name from descriptors: file.py/ClassName -> ClassName - class_name = self._extract_class_name_from_descriptors(class_info.descriptors) - if not class_name: - return methods, attributes - - # Find all class members by looking for matching descriptors - for symbol_info in document.symbols: - if not self._symbol_parser: - continue - - member_info = self._symbol_parser.parse_symbol(symbol_info.symbol) - if not member_info or member_info.manager != 'local': - continue - - # Check if this symbol belongs to the class - if self._is_class_member(member_info.descriptors, class_name): - member_name = self._extract_member_name(member_info.descriptors, class_name) - if member_name: - # Classify based on SCIP kind - if self._is_method_kind(symbol_info.kind): - methods.append(member_name) - else: - attributes.append(member_name) - - except Exception as e: - logger.debug(f"Error extracting class members for {class_scip_symbol}: {e}") - - return methods, attributes - - def _extract_inheritance(self, class_scip_symbol: str, symbol_info) -> List[str]: - """Extract class inheritance information from SCIP data.""" - # This would require more sophisticated SCIP relationship analysis - # For now, return empty list - return [] - - def _extract_variable_type(self, scip_symbol: str, symbol_info) -> Optional[str]: - """Extract variable type from SCIP data.""" - try: - if hasattr(symbol_info, 'signature') and symbol_info.signature: - # Try to extract type annotation - signature = symbol_info.signature - if ':' in signature: - type_part = signature.split(':')[1].strip() - return type_part if type_part else None - except Exception as e: - logger.debug(f"Error extracting variable type for {scip_symbol}: {e}") - return None - - def _is_global_variable(self, scip_symbol: str, document) -> Optional[bool]: - """Check if variable is global based on SCIP symbol structure.""" - try: - # Global variables typically don't have class context - if '#' not in scip_symbol: - return True - return False - except Exception as e: - logger.debug(f"Error checking global status for {scip_symbol}: {e}") - return None - - def _extract_constant_value(self, scip_symbol: str, symbol_info) -> Optional[str]: - """Extract constant value from SCIP data.""" - try: - if hasattr(symbol_info, 'signature') and symbol_info.signature: - signature = symbol_info.signature - if '=' in signature: - value_part = signature.split('=')[1].strip() - return value_part if value_part else None - except Exception as e: - logger.debug(f"Error extracting constant value for {scip_symbol}: {e}") - return None - - def extract_scip_relationships(self, file_path: str, scip_index) -> Dict[str, List[tuple]]: - """ - Extract SCIP relationships from a file using the enhanced analysis pipeline. - - This method provides integration between the symbol analyzer and the new - SCIP relationship management system introduced in the implementation plan. - - Args: - file_path: Relative path to the file to analyze - scip_index: SCIP index containing all project data - - Returns: - Dictionary mapping source_symbol_id -> [(target_symbol_id, relationship_type), ...] - Compatible with SCIPRelationshipManager input format - - Raises: - ValueError: If file analysis fails or file not found - """ - try: - # Perform complete file analysis - file_analysis = self.analyze_file(file_path, scip_index) - - # Extract all SCIP relationships using the enhanced data structures - relationships = file_analysis.to_scip_relationships(self._symbol_parser) - - logger.debug(f"Extracted SCIP relationships for {file_path}: " - f"{len(relationships)} symbols with relationships, " - f"{sum(len(rels) for rels in relationships.values())} total relationships") - - return relationships - - except Exception as e: - logger.error(f"Failed to extract SCIP relationships from {file_path}: {e}") - raise ValueError(f"SCIP relationship extraction failed: {e}") - - def batch_extract_relationships(self, file_paths: List[str], scip_index) -> Dict[str, Dict[str, List[tuple]]]: - """ - Extract SCIP relationships from multiple files efficiently. - - This method provides batch processing capabilities for the relationship - management system, optimizing performance for large codebases. - - Args: - file_paths: List of relative file paths to analyze - scip_index: SCIP index containing all project data - - Returns: - Dictionary mapping file_path -> {source_symbol_id -> [(target_symbol_id, relationship_type), ...]} - """ - results = {} - - for i, file_path in enumerate(file_paths, 1): - try: - relationships = self.extract_scip_relationships(file_path, scip_index) - results[file_path] = relationships - - if i % 10 == 0 or i == len(file_paths): - logger.debug(f"Batch relationship extraction progress: {i}/{len(file_paths)} files") - - except Exception as e: - logger.warning(f"Failed to extract relationships from {file_path}: {e}") - results[file_path] = {} # Empty result for failed files - continue - - total_files = len(results) - total_relationships = sum( - sum(len(rels) for rels in file_rels.values()) - for file_rels in results.values() - ) - - logger.info(f"Batch relationship extraction completed: {total_files} files, {total_relationships} total relationships") - - return results \ No newline at end of file diff --git a/src/code_index_mcp/tools/scip/symbol_definitions.py b/src/code_index_mcp/tools/scip/symbol_definitions.py deleted file mode 100644 index 2ef957b..0000000 --- a/src/code_index_mcp/tools/scip/symbol_definitions.py +++ /dev/null @@ -1,294 +0,0 @@ -""" -Symbol Definitions - Core data structures for enhanced symbol analysis - -This module defines the data structures used by SCIPSymbolAnalyzer to represent -accurate symbol information and call relationships in a format optimized for LLM consumption. -""" - -from typing import Dict, List, Optional, Any -from dataclasses import dataclass, field - -from .relationship_info import SymbolRelationships - - -class SymbolLocationError(Exception): - """Raised when symbol location cannot be determined from SCIP data.""" - pass - - -class SymbolResolutionError(Exception): - """Raised when symbol cannot be resolved or parsed.""" - pass - - -@dataclass -class LocationInfo: - """Precise location information for a symbol.""" - line: int - column: int - confidence: str = 'high' # 'high', 'fallback', 'estimated' - - def to_dict(self) -> Dict[str, int]: - """Convert to dictionary format for JSON output.""" - return {"line": self.line, "column": self.column} - - -# CallRelationships class removed - now using unified SymbolRelationships - - -@dataclass -class SymbolDefinition: - """Enhanced symbol definition with accurate metadata.""" - name: str - line: int - column: int - symbol_type: str # 'function', 'method', 'class', 'variable', 'constant' - - # Optional metadata - class_name: Optional[str] = None - parameters: List[str] = field(default_factory=list) - return_type: Optional[str] = None - is_async: bool = False - - # Unified relationships (for all symbol types) - relationships: SymbolRelationships = field(default_factory=lambda: SymbolRelationships()) - - # Additional class-specific fields - methods: List[str] = field(default_factory=list) # For classes - attributes: List[str] = field(default_factory=list) # For classes - inherits_from: List[str] = field(default_factory=list) # For classes - - # Variable/constant-specific fields - is_global: Optional[bool] = None # For variables - type: Optional[str] = None # For variables - value: Optional[str] = None # For constants - - # Internal tracking - scip_symbol: str = "" # Original SCIP symbol for debugging - - def is_callable(self) -> bool: - """Check if this symbol represents a callable (function/method).""" - return self.symbol_type in ['function', 'method'] - - def is_class_member(self) -> bool: - """Check if this symbol belongs to a class.""" - return self.class_name is not None - - def to_function_dict(self) -> Dict[str, Any]: - """Convert to function format for JSON output.""" - result = { - "name": self.name, - "line": self.line, - "column": self.column, - "class": self.class_name, - "parameters": self.parameters, - "return_type": self.return_type, - "is_async": self.is_async - } - - # Add relationships if they exist - relationships_dict = self.relationships.to_dict() - if relationships_dict: - result["relationships"] = relationships_dict - - return result - - def to_class_dict(self) -> Dict[str, Any]: - """Convert to class format for JSON output.""" - result = { - "name": self.name, - "line": self.line, - "column": self.column, - "methods": self.methods, - "attributes": self.attributes, - "inherits_from": self.inherits_from - } - - # Add relationships if they exist - relationships_dict = self.relationships.to_dict() - if relationships_dict: - result["relationships"] = relationships_dict - - return result - - def to_variable_dict(self) -> Dict[str, Any]: - """Convert to variable format for JSON output.""" - result = { - "name": self.name, - "line": self.line, - "column": self.column, - "is_global": self.is_global, - "type": self.type - } - - # Add relationships if they exist - relationships_dict = self.relationships.to_dict() - if relationships_dict: - result["relationships"] = relationships_dict - - return result - - def to_constant_dict(self) -> Dict[str, Any]: - """Convert to constant format for JSON output.""" - return { - "name": self.name, - "line": self.line, - "column": self.column, - "value": self.value - } - - def to_scip_relationships(self, symbol_manager=None, language="", file_path="") -> List[tuple]: - """Convert symbol relationships to SCIP format.""" - scip_relationships = [] - - # Convert all relationships to SCIP tuples - for rel in self.relationships.calls: - scip_relationships.append((rel.target_symbol_id, "calls")) - for rel in self.relationships.inherits_from: - scip_relationships.append((rel.target_symbol_id, "inherits")) - for rel in self.relationships.implements: - scip_relationships.append((rel.target_symbol_id, "implements")) - for rel in self.relationships.references: - scip_relationships.append((rel.target_symbol_id, "references")) - - return scip_relationships - - -@dataclass -class ImportGroup: - """Organized import information.""" - standard_library: List[str] = field(default_factory=list) - third_party: List[str] = field(default_factory=list) - local: List[str] = field(default_factory=list) - - def add_import(self, module_name: str, import_type: str = 'unknown'): - """Add an import to the appropriate group.""" - if import_type == 'standard_library': - if module_name not in self.standard_library: - self.standard_library.append(module_name) - elif import_type == 'third_party': - if module_name not in self.third_party: - self.third_party.append(module_name) - elif import_type == 'local': - if module_name not in self.local: - self.local.append(module_name) - - def to_dict(self) -> Dict[str, List[str]]: - """Convert to dictionary format for JSON output.""" - return { - "standard_library": self.standard_library, - "third_party": self.third_party, - "local": self.local - } - - -@dataclass -class FileAnalysis: - """Complete file analysis result matching the exact output specification.""" - file_path: str - language: str - line_count: int - size_bytes: int = 0 - - # Symbol collections organized by type - functions: List[SymbolDefinition] = field(default_factory=list) - classes: List[SymbolDefinition] = field(default_factory=list) - variables: List[SymbolDefinition] = field(default_factory=list) - constants: List[SymbolDefinition] = field(default_factory=list) - - # Dependency information - imports: ImportGroup = field(default_factory=lambda: ImportGroup()) - - - def add_symbol(self, symbol: SymbolDefinition): - """Add a symbol to the appropriate collection based on its type.""" - if symbol.symbol_type == 'function' or symbol.symbol_type == 'method': - self.functions.append(symbol) - elif symbol.symbol_type == 'class': - self.classes.append(symbol) - elif symbol.symbol_type == 'variable': - self.variables.append(symbol) - elif symbol.symbol_type == 'constant': - self.constants.append(symbol) - - def get_function_by_name(self, name: str) -> Optional[SymbolDefinition]: - """Find a function by name.""" - for func in self.functions: - if func.name == name: - return func - return None - - def get_class_by_name(self, name: str) -> Optional[SymbolDefinition]: - """Find a class by name.""" - for cls in self.classes: - if cls.name == name: - return cls - return None - - - def to_dict(self) -> Dict[str, Any]: - """Convert to final JSON output format - EXACT specification.""" - return { - "file_path": self.file_path, - "language": self.language, - "basic_info": { - "line_count": self.line_count - }, - "symbols": { - "functions": [func.to_function_dict() for func in self.functions], - "classes": [cls.to_class_dict() for cls in self.classes], - "variables": [var.to_variable_dict() for var in self.variables], - "constants": [const.to_constant_dict() for const in self.constants] - }, - "dependencies": { - "imports": self.imports.to_dict() - }, - "status": "success" - } - - def to_scip_relationships(self, symbol_manager=None) -> Dict[str, List[tuple]]: - """ - Extract all SCIP relationships from this file analysis. - - This method provides a unified interface to get all symbol relationships - in SCIP format, compatible with the relationship management system. - - Args: - symbol_manager: Optional symbol manager for generating proper symbol IDs - - Returns: - Dictionary mapping source_symbol_id -> [(target_symbol_id, relationship_type), ...] - """ - all_relationships = {} - - # Process all symbol types - all_symbols = self.functions + self.classes + self.variables + self.constants - - for symbol in all_symbols: - # Create source symbol ID - if symbol_manager: - source_symbol_id = symbol_manager.create_local_symbol( - language=self.language, - file_path=self.file_path, - symbol_path=[symbol.name], - descriptor=self._get_symbol_descriptor(symbol) - ) - else: - source_symbol_id = f"local {symbol.name}{self._get_symbol_descriptor(symbol)}" - - # Get relationships for this symbol - symbol_relationships = symbol.to_scip_relationships(symbol_manager, self.language, self.file_path) - - if symbol_relationships: - all_relationships[source_symbol_id] = symbol_relationships - - return all_relationships - - def _get_symbol_descriptor(self, symbol: SymbolDefinition) -> str: - """Get SCIP descriptor suffix for a symbol.""" - if symbol.symbol_type in ['function', 'method']: - return "()." - elif symbol.symbol_type == 'class': - return "#" - else: - return "" \ No newline at end of file diff --git a/src/code_index_mcp/utils/__init__.py b/src/code_index_mcp/utils/__init__.py index 7e0d99b..cd3fb92 100644 --- a/src/code_index_mcp/utils/__init__.py +++ b/src/code_index_mcp/utils/__init__.py @@ -12,6 +12,7 @@ from .context_helper import ContextHelper from .validation import ValidationHelper from .response_formatter import ResponseFormatter +from .file_filter import FileFilter __all__ = [ 'handle_mcp_errors', @@ -19,5 +20,6 @@ 'handle_mcp_tool_errors', 'ContextHelper', 'ValidationHelper', - 'ResponseFormatter' + 'ResponseFormatter', + 'FileFilter' ] \ No newline at end of file diff --git a/src/code_index_mcp/utils/file_filter.py b/src/code_index_mcp/utils/file_filter.py new file mode 100644 index 0000000..5cd9938 --- /dev/null +++ b/src/code_index_mcp/utils/file_filter.py @@ -0,0 +1,177 @@ +""" +Centralized file filtering logic for the Code Index MCP server. + +This module provides unified filtering capabilities used across all components +that need to determine which files and directories should be processed or excluded. +""" + +import fnmatch +from pathlib import Path +from typing import List, Optional, Set + +from ..constants import FILTER_CONFIG + + +class FileFilter: + """Centralized file filtering logic.""" + + def __init__(self, additional_excludes: Optional[List[str]] = None): + """ + Initialize the file filter. + + Args: + additional_excludes: Additional directory patterns to exclude + """ + self.exclude_dirs = set(FILTER_CONFIG["exclude_directories"]) + self.exclude_files = set(FILTER_CONFIG["exclude_files"]) + self.supported_extensions = set(FILTER_CONFIG["supported_extensions"]) + + # Add user-defined exclusions + if additional_excludes: + self.exclude_dirs.update(additional_excludes) + + def should_exclude_directory(self, dir_name: str) -> bool: + """ + Check if directory should be excluded from processing. + + Args: + dir_name: Directory name to check + + Returns: + True if directory should be excluded, False otherwise + """ + # Skip hidden directories except for specific allowed ones + if dir_name.startswith('.') and dir_name not in {'.env', '.gitignore'}: + return True + + # Check against exclude patterns + return dir_name in self.exclude_dirs + + def should_exclude_file(self, file_path: Path) -> bool: + """ + Check if file should be excluded from processing. + + Args: + file_path: Path object for the file to check + + Returns: + True if file should be excluded, False otherwise + """ + # Extension check - only process supported file types + if file_path.suffix.lower() not in self.supported_extensions: + return True + + # Hidden files (except specific allowed ones) + if file_path.name.startswith('.') and file_path.name not in {'.gitignore', '.env'}: + return True + + # Filename pattern check using glob patterns + for pattern in self.exclude_files: + if fnmatch.fnmatch(file_path.name, pattern): + return True + + return False + + def should_process_path(self, path: Path, base_path: Path) -> bool: + """ + Unified path processing logic to determine if a file should be processed. + + Args: + path: File path to check + base_path: Project base path for relative path calculation + + Returns: + True if file should be processed, False otherwise + """ + try: + # Ensure we're working with absolute paths + if not path.is_absolute(): + path = base_path / path + + # Get relative path from base + relative_path = path.relative_to(base_path) + + # Check each path component for excluded directories + for part in relative_path.parts[:-1]: # Exclude filename + if self.should_exclude_directory(part): + return False + + # Check file itself + return not self.should_exclude_file(path) + + except (ValueError, OSError): + # Path not relative to base_path or other path errors + return False + + def is_supported_file_type(self, file_path: Path) -> bool: + """ + Check if file type is supported for indexing. + + Args: + file_path: Path to check + + Returns: + True if file type is supported, False otherwise + """ + return file_path.suffix.lower() in self.supported_extensions + + def is_temporary_file(self, file_path: Path) -> bool: + """ + Check if file appears to be a temporary file. + + Args: + file_path: Path to check + + Returns: + True if file appears temporary, False otherwise + """ + name = file_path.name + + # Common temporary file patterns + temp_patterns = ['*.tmp', '*.temp', '*.swp', '*.swo', '*~'] + + for pattern in temp_patterns: + if fnmatch.fnmatch(name, pattern): + return True + + # Files ending in .bak or .orig + if name.endswith(('.bak', '.orig')): + return True + + return False + + def filter_file_list(self, files: List[str], base_path: str) -> List[str]: + """ + Filter a list of file paths, keeping only those that should be processed. + + Args: + files: List of file paths (absolute or relative) + base_path: Project base path + + Returns: + Filtered list of file paths that should be processed + """ + base = Path(base_path) + filtered = [] + + for file_path_str in files: + file_path = Path(file_path_str) + if self.should_process_path(file_path, base): + filtered.append(file_path_str) + + return filtered + + def get_exclude_summary(self) -> dict: + """ + Get summary of current exclusion configuration. + + Returns: + Dictionary with exclusion configuration details + """ + return { + "exclude_directories_count": len(self.exclude_dirs), + "exclude_files_count": len(self.exclude_files), + "supported_extensions_count": len(self.supported_extensions), + "exclude_directories": sorted(self.exclude_dirs), + "exclude_files": sorted(self.exclude_files) + } \ No newline at end of file diff --git a/test/sample-projects/zig/code-index-example/src/main.zig b/test/sample-projects/zig/code-index-example/src/main.zig index 8a92646..792cfc1 100644 --- a/test/sample-projects/zig/code-index-example/src/main.zig +++ b/test/sample-projects/zig/code-index-example/src/main.zig @@ -1,10 +1,29 @@ const std = @import("std"); +const builtin = @import("builtin"); +const testing = @import("testing"); const code_index_example = @import("code_index_example"); +const utils = @import("./utils.zig"); +const math_utils = @import("./math.zig"); pub fn main() !void { // Prints to stderr, ignoring potential errors. std.debug.print("All your {s} are belong to us.\n", .{"codebase"}); try code_index_example.bufferedPrint(); + + // Test our custom utilities + const result = utils.processData("Hello, World!"); + std.debug.print("Processed result: {s}\n", .{result}); + + // Test math utilities + const sum = math_utils.calculateSum(10, 20); + std.debug.print("Sum: {}\n", .{sum}); + + // Platform-specific code + if (builtin.os.tag == .windows) { + std.debug.print("Running on Windows\n", .{}); + } else { + std.debug.print("Running on Unix-like system\n", .{}); + } } test "simple test" { diff --git a/test/sample-projects/zig/code-index-example/src/math.zig b/test/sample-projects/zig/code-index-example/src/math.zig new file mode 100644 index 0000000..dba7420 --- /dev/null +++ b/test/sample-projects/zig/code-index-example/src/math.zig @@ -0,0 +1,262 @@ +//! Mathematical utility functions and data structures +const std = @import("std"); +const math = @import("math"); +const testing = @import("testing"); + +// Mathematical constants +pub const PI: f64 = 3.14159265358979323846; +pub const E: f64 = 2.71828182845904523536; +pub const GOLDEN_RATIO: f64 = 1.61803398874989484820; + +// Complex number representation +pub const Complex = struct { + real: f64, + imag: f64, + + pub fn init(real: f64, imag: f64) Complex { + return Complex{ .real = real, .imag = imag }; + } + + pub fn add(self: Complex, other: Complex) Complex { + return Complex{ + .real = self.real + other.real, + .imag = self.imag + other.imag, + }; + } + + pub fn multiply(self: Complex, other: Complex) Complex { + return Complex{ + .real = self.real * other.real - self.imag * other.imag, + .imag = self.real * other.imag + self.imag * other.real, + }; + } + + pub fn magnitude(self: Complex) f64 { + return @sqrt(self.real * self.real + self.imag * self.imag); + } + + pub fn conjugate(self: Complex) Complex { + return Complex{ .real = self.real, .imag = -self.imag }; + } +}; + +// Point in 2D space +pub const Point2D = struct { + x: f64, + y: f64, + + pub fn init(x: f64, y: f64) Point2D { + return Point2D{ .x = x, .y = y }; + } + + pub fn distance(self: Point2D, other: Point2D) f64 { + const dx = self.x - other.x; + const dy = self.y - other.y; + return @sqrt(dx * dx + dy * dy); + } + + pub fn midpoint(self: Point2D, other: Point2D) Point2D { + return Point2D{ + .x = (self.x + other.x) / 2.0, + .y = (self.y + other.y) / 2.0, + }; + } +}; + +// Statistics utilities +pub const Statistics = struct { + pub fn mean(values: []const f64) f64 { + if (values.len == 0) return 0.0; + + var sum: f64 = 0.0; + for (values) |value| { + sum += value; + } + + return sum / @as(f64, @floatFromInt(values.len)); + } + + pub fn median(values: []const f64, buffer: []f64) f64 { + if (values.len == 0) return 0.0; + + // Copy to buffer and sort + for (values, 0..) |value, i| { + buffer[i] = value; + } + std.sort.insertionSort(f64, buffer[0..values.len], {}, std.sort.asc(f64)); + + const n = values.len; + if (n % 2 == 1) { + return buffer[n / 2]; + } else { + return (buffer[n / 2 - 1] + buffer[n / 2]) / 2.0; + } + } + + pub fn standardDeviation(values: []const f64) f64 { + if (values.len <= 1) return 0.0; + + const avg = mean(values); + var sum_sq_diff: f64 = 0.0; + + for (values) |value| { + const diff = value - avg; + sum_sq_diff += diff * diff; + } + + return @sqrt(sum_sq_diff / @as(f64, @floatFromInt(values.len - 1))); + } +}; + +// Basic math functions +pub fn factorial(n: u32) u64 { + if (n <= 1) return 1; + return @as(u64, n) * factorial(n - 1); +} + +pub fn fibonacci(n: u32) u64 { + if (n <= 1) return n; + return fibonacci(n - 1) + fibonacci(n - 2); +} + +pub fn gcd(a: u32, b: u32) u32 { + if (b == 0) return a; + return gcd(b, a % b); +} + +pub fn lcm(a: u32, b: u32) u32 { + return (a * b) / gcd(a, b); +} + +pub fn isPrime(n: u32) bool { + if (n < 2) return false; + if (n == 2) return true; + if (n % 2 == 0) return false; + + var i: u32 = 3; + while (i * i <= n) : (i += 2) { + if (n % i == 0) return false; + } + + return true; +} + +// Function used by main.zig +pub fn calculateSum(a: i32, b: i32) i32 { + return a + b; +} + +pub fn power(base: f64, exponent: i32) f64 { + if (exponent == 0) return 1.0; + if (exponent < 0) return 1.0 / power(base, -exponent); + + var result: f64 = 1.0; + var exp = exponent; + var b = base; + + while (exp > 0) { + if (exp % 2 == 1) { + result *= b; + } + b *= b; + exp /= 2; + } + + return result; +} + +// Matrix operations (2x2 for simplicity) +pub const Matrix2x2 = struct { + data: [2][2]f64, + + pub fn init(a: f64, b: f64, c: f64, d: f64) Matrix2x2 { + return Matrix2x2{ + .data = [_][2]f64{ + [_]f64{ a, b }, + [_]f64{ c, d }, + }, + }; + } + + pub fn multiply(self: Matrix2x2, other: Matrix2x2) Matrix2x2 { + return Matrix2x2{ + .data = [_][2]f64{ + [_]f64{ + self.data[0][0] * other.data[0][0] + self.data[0][1] * other.data[1][0], + self.data[0][0] * other.data[0][1] + self.data[0][1] * other.data[1][1], + }, + [_]f64{ + self.data[1][0] * other.data[0][0] + self.data[1][1] * other.data[1][0], + self.data[1][0] * other.data[0][1] + self.data[1][1] * other.data[1][1], + }, + }, + }; + } + + pub fn determinant(self: Matrix2x2) f64 { + return self.data[0][0] * self.data[1][1] - self.data[0][1] * self.data[1][0]; + } +}; + +// Tests +test "complex number operations" { + const z1 = Complex.init(3.0, 4.0); + const z2 = Complex.init(1.0, 2.0); + + const sum = z1.add(z2); + try std.testing.expectEqual(@as(f64, 4.0), sum.real); + try std.testing.expectEqual(@as(f64, 6.0), sum.imag); + + const magnitude = z1.magnitude(); + try std.testing.expectApproxEqAbs(@as(f64, 5.0), magnitude, 0.0001); +} + +test "point distance calculation" { + const p1 = Point2D.init(0.0, 0.0); + const p2 = Point2D.init(3.0, 4.0); + + const dist = p1.distance(p2); + try std.testing.expectApproxEqAbs(@as(f64, 5.0), dist, 0.0001); +} + +test "factorial calculation" { + try std.testing.expectEqual(@as(u64, 1), factorial(0)); + try std.testing.expectEqual(@as(u64, 1), factorial(1)); + try std.testing.expectEqual(@as(u64, 120), factorial(5)); +} + +test "fibonacci sequence" { + try std.testing.expectEqual(@as(u64, 0), fibonacci(0)); + try std.testing.expectEqual(@as(u64, 1), fibonacci(1)); + try std.testing.expectEqual(@as(u64, 13), fibonacci(7)); +} + +test "prime number detection" { + try std.testing.expect(isPrime(2)); + try std.testing.expect(isPrime(17)); + try std.testing.expect(!isPrime(4)); + try std.testing.expect(!isPrime(1)); +} + +test "statistics calculations" { + const values = [_]f64{ 1.0, 2.0, 3.0, 4.0, 5.0 }; + + const avg = Statistics.mean(&values); + try std.testing.expectEqual(@as(f64, 3.0), avg); + + var buffer: [10]f64 = undefined; + const med = Statistics.median(&values, &buffer); + try std.testing.expectEqual(@as(f64, 3.0), med); +} + +test "matrix operations" { + const m1 = Matrix2x2.init(1.0, 2.0, 3.0, 4.0); + const m2 = Matrix2x2.init(5.0, 6.0, 7.0, 8.0); + + const product = m1.multiply(m2); + try std.testing.expectEqual(@as(f64, 19.0), product.data[0][0]); + try std.testing.expectEqual(@as(f64, 22.0), product.data[0][1]); + + const det = m1.determinant(); + try std.testing.expectEqual(@as(f64, -2.0), det); +} \ No newline at end of file diff --git a/test/sample-projects/zig/code-index-example/src/root.zig b/test/sample-projects/zig/code-index-example/src/root.zig index 94c7cd0..1cc95e3 100644 --- a/test/sample-projects/zig/code-index-example/src/root.zig +++ b/test/sample-projects/zig/code-index-example/src/root.zig @@ -1,5 +1,48 @@ //! By convention, root.zig is the root source file when making a library. const std = @import("std"); +const fmt = @import("fmt"); +const mem = @import("mem"); +const json = @import("json"); + +// Define custom types and structures +pub const Config = struct { + name: []const u8, + version: u32, + debug: bool, + + pub fn init(name: []const u8, version: u32) Config { + return Config{ + .name = name, + .version = version, + .debug = false, + }; + } + + pub fn setDebug(self: *Config, debug: bool) void { + self.debug = debug; + } +}; + +pub const ErrorType = enum { + None, + InvalidInput, + OutOfMemory, + NetworkError, + + pub fn toString(self: ErrorType) []const u8 { + return switch (self) { + .None => "No error", + .InvalidInput => "Invalid input", + .OutOfMemory => "Out of memory", + .NetworkError => "Network error", + }; + } +}; + +// Global constants +pub const VERSION: u32 = 1; +pub const MAX_BUFFER_SIZE: usize = 4096; +var global_config: Config = undefined; pub fn bufferedPrint() !void { // Stdout is for the actual output of your application, for example if you @@ -18,6 +61,75 @@ pub fn add(a: i32, b: i32) i32 { return a + b; } +pub fn multiply(a: i32, b: i32) i32 { + return a * b; +} + +pub fn processConfig(config: *const Config) !void { + std.debug.print("Processing config: {s} v{}\n", .{ config.name, config.version }); + if (config.debug) { + std.debug.print("Debug mode enabled\n", .{}); + } +} + +pub fn handleError(err: ErrorType) void { + std.debug.print("Error: {s}\n", .{err.toString()}); +} + +// Advanced function with error handling +pub fn parseNumber(input: []const u8) !i32 { + if (input.len == 0) { + return error.InvalidInput; + } + + return std.fmt.parseInt(i32, input, 10) catch |err| switch (err) { + error.InvalidCharacter => error.InvalidInput, + error.Overflow => error.OutOfMemory, + else => err, + }; +} + +// Generic function +pub fn swap(comptime T: type, a: *T, b: *T) void { + const temp = a.*; + a.* = b.*; + b.* = temp; +} + test "basic add functionality" { try std.testing.expect(add(3, 7) == 10); } + +test "config initialization" { + var config = Config.init("test-app", 1); + try std.testing.expectEqualStrings("test-app", config.name); + try std.testing.expectEqual(@as(u32, 1), config.version); + try std.testing.expectEqual(false, config.debug); + + config.setDebug(true); + try std.testing.expectEqual(true, config.debug); +} + +test "error type handling" { + const err = ErrorType.InvalidInput; + try std.testing.expectEqualStrings("Invalid input", err.toString()); +} + +test "number parsing" { + const result = try parseNumber("42"); + try std.testing.expectEqual(@as(i32, 42), result); + + // Test error case + const invalid_result = parseNumber(""); + try std.testing.expectError(error.InvalidInput, invalid_result); +} + +test "generic swap function" { + var a: i32 = 10; + var b: i32 = 20; + + swap(i32, &a, &b); + + try std.testing.expectEqual(@as(i32, 20), a); + try std.testing.expectEqual(@as(i32, 10), b); +} diff --git a/test/sample-projects/zig/code-index-example/src/utils.zig b/test/sample-projects/zig/code-index-example/src/utils.zig new file mode 100644 index 0000000..eab54ce --- /dev/null +++ b/test/sample-projects/zig/code-index-example/src/utils.zig @@ -0,0 +1,169 @@ +//! Utility functions for string processing and data manipulation +const std = @import("std"); +const mem = @import("mem"); +const ascii = @import("ascii"); + +// Constants for utility functions +pub const DEFAULT_BUFFER_SIZE: usize = 256; +pub const MAX_STRING_LENGTH: usize = 1024; + +// Custom error types +pub const UtilError = error{ + BufferTooSmall, + InvalidString, + ProcessingFailed, +}; + +// String processing utilities +pub const StringProcessor = struct { + buffer: []u8, + allocator: std.mem.Allocator, + + pub fn init(allocator: std.mem.Allocator, buffer_size: usize) !StringProcessor { + const buffer = try allocator.alloc(u8, buffer_size); + return StringProcessor{ + .buffer = buffer, + .allocator = allocator, + }; + } + + pub fn deinit(self: *StringProcessor) void { + self.allocator.free(self.buffer); + } + + pub fn toUpperCase(self: *StringProcessor, input: []const u8) ![]const u8 { + if (input.len > self.buffer.len) { + return UtilError.BufferTooSmall; + } + + for (input, 0..) |char, i| { + self.buffer[i] = std.ascii.toUpper(char); + } + + return self.buffer[0..input.len]; + } + + pub fn reverse(self: *StringProcessor, input: []const u8) ![]const u8 { + if (input.len > self.buffer.len) { + return UtilError.BufferTooSmall; + } + + for (input, 0..) |char, i| { + self.buffer[input.len - 1 - i] = char; + } + + return self.buffer[0..input.len]; + } +}; + +// Data validation functions +pub fn validateEmail(email: []const u8) bool { + if (email.len == 0) return false; + + var has_at = false; + var has_dot = false; + + for (email) |char| { + if (char == '@') { + if (has_at) return false; // Multiple @ symbols + has_at = true; + } else if (char == '.') { + has_dot = true; + } + } + + return has_at and has_dot; +} + +pub fn isValidIdentifier(identifier: []const u8) bool { + if (identifier.len == 0) return false; + + // First character must be letter or underscore + if (!std.ascii.isAlphabetic(identifier[0]) and identifier[0] != '_') { + return false; + } + + // Rest must be alphanumeric or underscore + for (identifier[1..]) |char| { + if (!std.ascii.isAlphanumeric(char) and char != '_') { + return false; + } + } + + return true; +} + +// Simple string processing function used by main.zig +pub fn processData(input: []const u8) []const u8 { + return if (input.len > 0) "Processed!" else "Empty input"; +} + +// Array utilities +pub fn findMax(numbers: []const i32) ?i32 { + if (numbers.len == 0) return null; + + var max = numbers[0]; + for (numbers[1..]) |num| { + if (num > max) { + max = num; + } + } + + return max; +} + +pub fn bubbleSort(numbers: []i32) void { + const n = numbers.len; + if (n <= 1) return; + + var i: usize = 0; + while (i < n - 1) : (i += 1) { + var j: usize = 0; + while (j < n - i - 1) : (j += 1) { + if (numbers[j] > numbers[j + 1]) { + const temp = numbers[j]; + numbers[j] = numbers[j + 1]; + numbers[j + 1] = temp; + } + } + } +} + +// Tests +test "string processor initialization" { + var processor = try StringProcessor.init(std.testing.allocator, 100); + defer processor.deinit(); + + const result = try processor.toUpperCase("hello"); + try std.testing.expectEqualStrings("HELLO", result); +} + +test "email validation" { + try std.testing.expect(validateEmail("test@example.com")); + try std.testing.expect(!validateEmail("invalid-email")); + try std.testing.expect(!validateEmail("")); +} + +test "identifier validation" { + try std.testing.expect(isValidIdentifier("valid_id")); + try std.testing.expect(isValidIdentifier("_private")); + try std.testing.expect(!isValidIdentifier("123invalid")); + try std.testing.expect(!isValidIdentifier("")); +} + +test "find maximum in array" { + const numbers = [_]i32{ 3, 1, 4, 1, 5, 9, 2, 6 }; + const max = findMax(&numbers); + try std.testing.expectEqual(@as(?i32, 9), max); + + const empty: []const i32 = &[_]i32{}; + try std.testing.expectEqual(@as(?i32, null), findMax(empty)); +} + +test "bubble sort" { + var numbers = [_]i32{ 64, 34, 25, 12, 22, 11, 90 }; + bubbleSort(&numbers); + + const expected = [_]i32{ 11, 12, 22, 25, 34, 64, 90 }; + try std.testing.expectEqualSlices(i32, &expected, &numbers); +} \ No newline at end of file diff --git a/tests/search/test_search_filters.py b/tests/search/test_search_filters.py new file mode 100644 index 0000000..787461d --- /dev/null +++ b/tests/search/test_search_filters.py @@ -0,0 +1,52 @@ +"""Tests covering shared search filtering behaviour.""" +import os +from types import SimpleNamespace +from unittest.mock import patch +from pathlib import Path as _TestPath +import sys + +ROOT = _TestPath(__file__).resolve().parents[2] +SRC_PATH = ROOT / 'src' +if str(SRC_PATH) not in sys.path: + sys.path.insert(0, str(SRC_PATH)) + +from code_index_mcp.search.basic import BasicSearchStrategy +from code_index_mcp.search.ripgrep import RipgrepStrategy +from code_index_mcp.utils.file_filter import FileFilter + + +def test_basic_strategy_skips_excluded_directories(tmp_path): + base = tmp_path + src_dir = base / "src" + src_dir.mkdir() + (src_dir / 'app.js').write_text("const db = 'mongo';\n") + + node_modules_dir = base / "node_modules" / "pkg" + node_modules_dir.mkdir(parents=True) + (node_modules_dir / 'index.js').write_text("// mongo dependency\n") + + strategy = BasicSearchStrategy() + strategy.configure_excludes(FileFilter()) + + results = strategy.search("mongo", str(base), case_sensitive=False) + + included_path = os.path.join("src", "app.js") + excluded_path = os.path.join("node_modules", "pkg", "index.js") + + assert included_path in results + assert excluded_path not in results + + +@patch("code_index_mcp.search.ripgrep.subprocess.run") +def test_ripgrep_strategy_adds_exclude_globs(mock_run, tmp_path): + mock_run.return_value = SimpleNamespace(returncode=0, stdout="", stderr="") + + strategy = RipgrepStrategy() + strategy.configure_excludes(FileFilter()) + + strategy.search("mongo", str(tmp_path)) + + cmd = mock_run.call_args[0][0] + glob_args = [cmd[i + 1] for i, arg in enumerate(cmd) if arg == '--glob' and i + 1 < len(cmd)] + + assert any(value.startswith('!**/node_modules/') for value in glob_args) diff --git a/uv.lock b/uv.lock index a2c9dde..08294cf 100644 --- a/uv.lock +++ b/uv.lock @@ -49,13 +49,12 @@ wheels = [ [[package]] name = "code-index-mcp" -version = "2.1.2" +version = "2.4.1" source = { editable = "." } dependencies = [ - { name = "libclang" }, { name = "mcp" }, + { name = "msgpack" }, { name = "pathspec" }, - { name = "protobuf" }, { name = "tree-sitter" }, { name = "tree-sitter-java" }, { name = "tree-sitter-javascript" }, @@ -66,10 +65,9 @@ dependencies = [ [package.metadata] requires-dist = [ - { name = "libclang", specifier = ">=16.0.0" }, { name = "mcp", specifier = ">=0.3.0" }, + { name = "msgpack", specifier = ">=1.0.0" }, { name = "pathspec", specifier = ">=0.12.1" }, - { name = "protobuf", specifier = ">=4.21.0" }, { name = "tree-sitter", specifier = ">=0.20.0" }, { name = "tree-sitter-java", specifier = ">=0.20.0" }, { name = "tree-sitter-javascript", specifier = ">=0.20.0" }, @@ -151,23 +149,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/76/c6/c88e154df9c4e1a2a66ccf0005a88dfb2650c1dffb6f5ce603dfbd452ce3/idna-3.10-py3-none-any.whl", hash = "sha256:946d195a0d259cbba61165e88e65941f16e9b36ea6ddb97f00452bae8b1287d3", size = 70442 }, ] -[[package]] -name = "libclang" -version = "18.1.1" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/6e/5c/ca35e19a4f142adffa27e3d652196b7362fa612243e2b916845d801454fc/libclang-18.1.1.tar.gz", hash = "sha256:a1214966d08d73d971287fc3ead8dfaf82eb07fb197680d8b3859dbbbbf78250", size = 39612 } -wheels = [ - { url = "https://files.pythonhosted.org/packages/4b/49/f5e3e7e1419872b69f6f5e82ba56e33955a74bd537d8a1f5f1eff2f3668a/libclang-18.1.1-1-py2.py3-none-macosx_11_0_arm64.whl", hash = "sha256:0b2e143f0fac830156feb56f9231ff8338c20aecfe72b4ffe96f19e5a1dbb69a", size = 25836045 }, - { url = "https://files.pythonhosted.org/packages/e2/e5/fc61bbded91a8830ccce94c5294ecd6e88e496cc85f6704bf350c0634b70/libclang-18.1.1-py2.py3-none-macosx_10_9_x86_64.whl", hash = "sha256:6f14c3f194704e5d09769108f03185fce7acaf1d1ae4bbb2f30a72c2400cb7c5", size = 26502641 }, - { url = "https://files.pythonhosted.org/packages/db/ed/1df62b44db2583375f6a8a5e2ca5432bbdc3edb477942b9b7c848c720055/libclang-18.1.1-py2.py3-none-macosx_11_0_arm64.whl", hash = "sha256:83ce5045d101b669ac38e6da8e58765f12da2d3aafb3b9b98d88b286a60964d8", size = 26420207 }, - { url = "https://files.pythonhosted.org/packages/1d/fc/716c1e62e512ef1c160e7984a73a5fc7df45166f2ff3f254e71c58076f7c/libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl", hash = "sha256:c533091d8a3bbf7460a00cb6c1a71da93bffe148f172c7d03b1c31fbf8aa2a0b", size = 24515943 }, - { url = "https://files.pythonhosted.org/packages/3c/3d/f0ac1150280d8d20d059608cf2d5ff61b7c3b7f7bcf9c0f425ab92df769a/libclang-18.1.1-py2.py3-none-manylinux2014_aarch64.whl", hash = "sha256:54dda940a4a0491a9d1532bf071ea3ef26e6dbaf03b5000ed94dd7174e8f9592", size = 23784972 }, - { url = "https://files.pythonhosted.org/packages/fe/2f/d920822c2b1ce9326a4c78c0c2b4aa3fde610c7ee9f631b600acb5376c26/libclang-18.1.1-py2.py3-none-manylinux2014_armv7l.whl", hash = "sha256:cf4a99b05376513717ab5d82a0db832c56ccea4fd61a69dbb7bccf2dfb207dbe", size = 20259606 }, - { url = "https://files.pythonhosted.org/packages/2d/c2/de1db8c6d413597076a4259cea409b83459b2db997c003578affdd32bf66/libclang-18.1.1-py2.py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:69f8eb8f65c279e765ffd28aaa7e9e364c776c17618af8bff22a8df58677ff4f", size = 24921494 }, - { url = "https://files.pythonhosted.org/packages/0b/2d/3f480b1e1d31eb3d6de5e3ef641954e5c67430d5ac93b7fa7e07589576c7/libclang-18.1.1-py2.py3-none-win_amd64.whl", hash = "sha256:4dd2d3b82fab35e2bf9ca717d7b63ac990a3519c7e312f19fa8e86dcc712f7fb", size = 26415083 }, - { url = "https://files.pythonhosted.org/packages/71/cf/e01dc4cc79779cd82d77888a88ae2fa424d93b445ad4f6c02bfc18335b70/libclang-18.1.1-py2.py3-none-win_arm64.whl", hash = "sha256:3f0e1f49f04d3cd198985fea0511576b0aee16f9ff0e0f0cad7f9c57ec3c20e8", size = 22361112 }, -] - [[package]] name = "mcp" version = "1.4.1" @@ -188,26 +169,60 @@ wheels = [ ] [[package]] -name = "pathspec" -version = "0.12.1" +name = "msgpack" +version = "1.1.1" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/ca/bc/f35b8446f4531a7cb215605d100cd88b7ac6f44ab3fc94870c120ab3adbf/pathspec-0.12.1.tar.gz", hash = "sha256:a482d51503a1ab33b1c67a6c3813a26953dbdc71c31dacaef9a838c4e29f5712", size = 51043 } +sdist = { url = "https://files.pythonhosted.org/packages/45/b1/ea4f68038a18c77c9467400d166d74c4ffa536f34761f7983a104357e614/msgpack-1.1.1.tar.gz", hash = "sha256:77b79ce34a2bdab2594f490c8e80dd62a02d650b91a75159a63ec413b8d104cd", size = 173555 } wheels = [ - { url = "https://files.pythonhosted.org/packages/cc/20/ff623b09d963f88bfde16306a54e12ee5ea43e9b597108672ff3a408aad6/pathspec-0.12.1-py3-none-any.whl", hash = "sha256:a0d503e138a4c123b27490a4f7beda6a01c6f288df0e4a8b79c7eb0dc7b4cc08", size = 31191 }, + { url = "https://files.pythonhosted.org/packages/33/52/f30da112c1dc92cf64f57d08a273ac771e7b29dea10b4b30369b2d7e8546/msgpack-1.1.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:353b6fc0c36fde68b661a12949d7d49f8f51ff5fa019c1e47c87c4ff34b080ed", size = 81799 }, + { url = "https://files.pythonhosted.org/packages/e4/35/7bfc0def2f04ab4145f7f108e3563f9b4abae4ab0ed78a61f350518cc4d2/msgpack-1.1.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:79c408fcf76a958491b4e3b103d1c417044544b68e96d06432a189b43d1215c8", size = 78278 }, + { url = "https://files.pythonhosted.org/packages/e8/c5/df5d6c1c39856bc55f800bf82778fd4c11370667f9b9e9d51b2f5da88f20/msgpack-1.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:78426096939c2c7482bf31ef15ca219a9e24460289c00dd0b94411040bb73ad2", size = 402805 }, + { url = "https://files.pythonhosted.org/packages/20/8e/0bb8c977efecfe6ea7116e2ed73a78a8d32a947f94d272586cf02a9757db/msgpack-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8b17ba27727a36cb73aabacaa44b13090feb88a01d012c0f4be70c00f75048b4", size = 408642 }, + { url = "https://files.pythonhosted.org/packages/59/a1/731d52c1aeec52006be6d1f8027c49fdc2cfc3ab7cbe7c28335b2910d7b6/msgpack-1.1.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:7a17ac1ea6ec3c7687d70201cfda3b1e8061466f28f686c24f627cae4ea8efd0", size = 395143 }, + { url = "https://files.pythonhosted.org/packages/2b/92/b42911c52cda2ba67a6418ffa7d08969edf2e760b09015593c8a8a27a97d/msgpack-1.1.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:88d1e966c9235c1d4e2afac21ca83933ba59537e2e2727a999bf3f515ca2af26", size = 395986 }, + { url = "https://files.pythonhosted.org/packages/61/dc/8ae165337e70118d4dab651b8b562dd5066dd1e6dd57b038f32ebc3e2f07/msgpack-1.1.1-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:f6d58656842e1b2ddbe07f43f56b10a60f2ba5826164910968f5933e5178af75", size = 402682 }, + { url = "https://files.pythonhosted.org/packages/58/27/555851cb98dcbd6ce041df1eacb25ac30646575e9cd125681aa2f4b1b6f1/msgpack-1.1.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:96decdfc4adcbc087f5ea7ebdcfd3dee9a13358cae6e81d54be962efc38f6338", size = 406368 }, + { url = "https://files.pythonhosted.org/packages/d4/64/39a26add4ce16f24e99eabb9005e44c663db00e3fce17d4ae1ae9d61df99/msgpack-1.1.1-cp310-cp310-win32.whl", hash = "sha256:6640fd979ca9a212e4bcdf6eb74051ade2c690b862b679bfcb60ae46e6dc4bfd", size = 65004 }, + { url = "https://files.pythonhosted.org/packages/7d/18/73dfa3e9d5d7450d39debde5b0d848139f7de23bd637a4506e36c9800fd6/msgpack-1.1.1-cp310-cp310-win_amd64.whl", hash = "sha256:8b65b53204fe1bd037c40c4148d00ef918eb2108d24c9aaa20bc31f9810ce0a8", size = 71548 }, + { url = "https://files.pythonhosted.org/packages/7f/83/97f24bf9848af23fe2ba04380388216defc49a8af6da0c28cc636d722502/msgpack-1.1.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:71ef05c1726884e44f8b1d1773604ab5d4d17729d8491403a705e649116c9558", size = 82728 }, + { url = "https://files.pythonhosted.org/packages/aa/7f/2eaa388267a78401f6e182662b08a588ef4f3de6f0eab1ec09736a7aaa2b/msgpack-1.1.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:36043272c6aede309d29d56851f8841ba907a1a3d04435e43e8a19928e243c1d", size = 79279 }, + { url = "https://files.pythonhosted.org/packages/f8/46/31eb60f4452c96161e4dfd26dbca562b4ec68c72e4ad07d9566d7ea35e8a/msgpack-1.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a32747b1b39c3ac27d0670122b57e6e57f28eefb725e0b625618d1b59bf9d1e0", size = 423859 }, + { url = "https://files.pythonhosted.org/packages/45/16/a20fa8c32825cc7ae8457fab45670c7a8996d7746ce80ce41cc51e3b2bd7/msgpack-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8a8b10fdb84a43e50d38057b06901ec9da52baac6983d3f709d8507f3889d43f", size = 429975 }, + { url = "https://files.pythonhosted.org/packages/86/ea/6c958e07692367feeb1a1594d35e22b62f7f476f3c568b002a5ea09d443d/msgpack-1.1.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ba0c325c3f485dc54ec298d8b024e134acf07c10d494ffa24373bea729acf704", size = 413528 }, + { url = "https://files.pythonhosted.org/packages/75/05/ac84063c5dae79722bda9f68b878dc31fc3059adb8633c79f1e82c2cd946/msgpack-1.1.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:88daaf7d146e48ec71212ce21109b66e06a98e5e44dca47d853cbfe171d6c8d2", size = 413338 }, + { url = "https://files.pythonhosted.org/packages/69/e8/fe86b082c781d3e1c09ca0f4dacd457ede60a13119b6ce939efe2ea77b76/msgpack-1.1.1-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:d8b55ea20dc59b181d3f47103f113e6f28a5e1c89fd5b67b9140edb442ab67f2", size = 422658 }, + { url = "https://files.pythonhosted.org/packages/3b/2b/bafc9924df52d8f3bb7c00d24e57be477f4d0f967c0a31ef5e2225e035c7/msgpack-1.1.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:4a28e8072ae9779f20427af07f53bbb8b4aa81151054e882aee333b158da8752", size = 427124 }, + { url = "https://files.pythonhosted.org/packages/a2/3b/1f717e17e53e0ed0b68fa59e9188f3f610c79d7151f0e52ff3cd8eb6b2dc/msgpack-1.1.1-cp311-cp311-win32.whl", hash = "sha256:7da8831f9a0fdb526621ba09a281fadc58ea12701bc709e7b8cbc362feabc295", size = 65016 }, + { url = "https://files.pythonhosted.org/packages/48/45/9d1780768d3b249accecc5a38c725eb1e203d44a191f7b7ff1941f7df60c/msgpack-1.1.1-cp311-cp311-win_amd64.whl", hash = "sha256:5fd1b58e1431008a57247d6e7cc4faa41c3607e8e7d4aaf81f7c29ea013cb458", size = 72267 }, + { url = "https://files.pythonhosted.org/packages/e3/26/389b9c593eda2b8551b2e7126ad3a06af6f9b44274eb3a4f054d48ff7e47/msgpack-1.1.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ae497b11f4c21558d95de9f64fff7053544f4d1a17731c866143ed6bb4591238", size = 82359 }, + { url = "https://files.pythonhosted.org/packages/ab/65/7d1de38c8a22cf8b1551469159d4b6cf49be2126adc2482de50976084d78/msgpack-1.1.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:33be9ab121df9b6b461ff91baac6f2731f83d9b27ed948c5b9d1978ae28bf157", size = 79172 }, + { url = "https://files.pythonhosted.org/packages/0f/bd/cacf208b64d9577a62c74b677e1ada005caa9b69a05a599889d6fc2ab20a/msgpack-1.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6f64ae8fe7ffba251fecb8408540c34ee9df1c26674c50c4544d72dbf792e5ce", size = 425013 }, + { url = "https://files.pythonhosted.org/packages/4d/ec/fd869e2567cc9c01278a736cfd1697941ba0d4b81a43e0aa2e8d71dab208/msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a494554874691720ba5891c9b0b39474ba43ffb1aaf32a5dac874effb1619e1a", size = 426905 }, + { url = "https://files.pythonhosted.org/packages/55/2a/35860f33229075bce803a5593d046d8b489d7ba2fc85701e714fc1aaf898/msgpack-1.1.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cb643284ab0ed26f6957d969fe0dd8bb17beb567beb8998140b5e38a90974f6c", size = 407336 }, + { url = "https://files.pythonhosted.org/packages/8c/16/69ed8f3ada150bf92745fb4921bd621fd2cdf5a42e25eb50bcc57a5328f0/msgpack-1.1.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:d275a9e3c81b1093c060c3837e580c37f47c51eca031f7b5fb76f7b8470f5f9b", size = 409485 }, + { url = "https://files.pythonhosted.org/packages/c6/b6/0c398039e4c6d0b2e37c61d7e0e9d13439f91f780686deb8ee64ecf1ae71/msgpack-1.1.1-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:4fd6b577e4541676e0cc9ddc1709d25014d3ad9a66caa19962c4f5de30fc09ef", size = 412182 }, + { url = "https://files.pythonhosted.org/packages/b8/d0/0cf4a6ecb9bc960d624c93effaeaae75cbf00b3bc4a54f35c8507273cda1/msgpack-1.1.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:bb29aaa613c0a1c40d1af111abf025f1732cab333f96f285d6a93b934738a68a", size = 419883 }, + { url = "https://files.pythonhosted.org/packages/62/83/9697c211720fa71a2dfb632cad6196a8af3abea56eece220fde4674dc44b/msgpack-1.1.1-cp312-cp312-win32.whl", hash = "sha256:870b9a626280c86cff9c576ec0d9cbcc54a1e5ebda9cd26dab12baf41fee218c", size = 65406 }, + { url = "https://files.pythonhosted.org/packages/c0/23/0abb886e80eab08f5e8c485d6f13924028602829f63b8f5fa25a06636628/msgpack-1.1.1-cp312-cp312-win_amd64.whl", hash = "sha256:5692095123007180dca3e788bb4c399cc26626da51629a31d40207cb262e67f4", size = 72558 }, + { url = "https://files.pythonhosted.org/packages/a1/38/561f01cf3577430b59b340b51329803d3a5bf6a45864a55f4ef308ac11e3/msgpack-1.1.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:3765afa6bd4832fc11c3749be4ba4b69a0e8d7b728f78e68120a157a4c5d41f0", size = 81677 }, + { url = "https://files.pythonhosted.org/packages/09/48/54a89579ea36b6ae0ee001cba8c61f776451fad3c9306cd80f5b5c55be87/msgpack-1.1.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:8ddb2bcfd1a8b9e431c8d6f4f7db0773084e107730ecf3472f1dfe9ad583f3d9", size = 78603 }, + { url = "https://files.pythonhosted.org/packages/a0/60/daba2699b308e95ae792cdc2ef092a38eb5ee422f9d2fbd4101526d8a210/msgpack-1.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:196a736f0526a03653d829d7d4c5500a97eea3648aebfd4b6743875f28aa2af8", size = 420504 }, + { url = "https://files.pythonhosted.org/packages/20/22/2ebae7ae43cd8f2debc35c631172ddf14e2a87ffcc04cf43ff9df9fff0d3/msgpack-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d592d06e3cc2f537ceeeb23d38799c6ad83255289bb84c2e5792e5a8dea268a", size = 423749 }, + { url = "https://files.pythonhosted.org/packages/40/1b/54c08dd5452427e1179a40b4b607e37e2664bca1c790c60c442c8e972e47/msgpack-1.1.1-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4df2311b0ce24f06ba253fda361f938dfecd7b961576f9be3f3fbd60e87130ac", size = 404458 }, + { url = "https://files.pythonhosted.org/packages/2e/60/6bb17e9ffb080616a51f09928fdd5cac1353c9becc6c4a8abd4e57269a16/msgpack-1.1.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:e4141c5a32b5e37905b5940aacbc59739f036930367d7acce7a64e4dec1f5e0b", size = 405976 }, + { url = "https://files.pythonhosted.org/packages/ee/97/88983e266572e8707c1f4b99c8fd04f9eb97b43f2db40e3172d87d8642db/msgpack-1.1.1-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:b1ce7f41670c5a69e1389420436f41385b1aa2504c3b0c30620764b15dded2e7", size = 408607 }, + { url = "https://files.pythonhosted.org/packages/bc/66/36c78af2efaffcc15a5a61ae0df53a1d025f2680122e2a9eb8442fed3ae4/msgpack-1.1.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:4147151acabb9caed4e474c3344181e91ff7a388b888f1e19ea04f7e73dc7ad5", size = 424172 }, + { url = "https://files.pythonhosted.org/packages/8c/87/a75eb622b555708fe0427fab96056d39d4c9892b0c784b3a721088c7ee37/msgpack-1.1.1-cp313-cp313-win32.whl", hash = "sha256:500e85823a27d6d9bba1d057c871b4210c1dd6fb01fbb764e37e4e8847376323", size = 65347 }, + { url = "https://files.pythonhosted.org/packages/ca/91/7dc28d5e2a11a5ad804cf2b7f7a5fcb1eb5a4966d66a5d2b41aee6376543/msgpack-1.1.1-cp313-cp313-win_amd64.whl", hash = "sha256:6d489fba546295983abd142812bda76b57e33d0b9f5d5b71c09a583285506f69", size = 72341 }, ] [[package]] -name = "protobuf" -version = "6.31.1" +name = "pathspec" +version = "0.12.1" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/52/f3/b9655a711b32c19720253f6f06326faf90580834e2e83f840472d752bc8b/protobuf-6.31.1.tar.gz", hash = "sha256:d8cac4c982f0b957a4dc73a80e2ea24fab08e679c0de9deb835f4a12d69aca9a", size = 441797 } +sdist = { url = "https://files.pythonhosted.org/packages/ca/bc/f35b8446f4531a7cb215605d100cd88b7ac6f44ab3fc94870c120ab3adbf/pathspec-0.12.1.tar.gz", hash = "sha256:a482d51503a1ab33b1c67a6c3813a26953dbdc71c31dacaef9a838c4e29f5712", size = 51043 } wheels = [ - { url = "https://files.pythonhosted.org/packages/f3/6f/6ab8e4bf962fd5570d3deaa2d5c38f0a363f57b4501047b5ebeb83ab1125/protobuf-6.31.1-cp310-abi3-win32.whl", hash = "sha256:7fa17d5a29c2e04b7d90e5e32388b8bfd0e7107cd8e616feef7ed3fa6bdab5c9", size = 423603 }, - { url = "https://files.pythonhosted.org/packages/44/3a/b15c4347dd4bf3a1b0ee882f384623e2063bb5cf9fa9d57990a4f7df2fb6/protobuf-6.31.1-cp310-abi3-win_amd64.whl", hash = "sha256:426f59d2964864a1a366254fa703b8632dcec0790d8862d30034d8245e1cd447", size = 435283 }, - { url = "https://files.pythonhosted.org/packages/6a/c9/b9689a2a250264a84e66c46d8862ba788ee7a641cdca39bccf64f59284b7/protobuf-6.31.1-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:6f1227473dc43d44ed644425268eb7c2e488ae245d51c6866d19fe158e207402", size = 425604 }, - { url = "https://files.pythonhosted.org/packages/76/a1/7a5a94032c83375e4fe7e7f56e3976ea6ac90c5e85fac8576409e25c39c3/protobuf-6.31.1-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:a40fc12b84c154884d7d4c4ebd675d5b3b5283e155f324049ae396b95ddebc39", size = 322115 }, - { url = "https://files.pythonhosted.org/packages/fa/b1/b59d405d64d31999244643d88c45c8241c58f17cc887e73bcb90602327f8/protobuf-6.31.1-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:4ee898bf66f7a8b0bd21bce523814e6fbd8c6add948045ce958b73af7e8878c6", size = 321070 }, - { url = "https://files.pythonhosted.org/packages/f7/af/ab3c51ab7507a7325e98ffe691d9495ee3d3aa5f589afad65ec920d39821/protobuf-6.31.1-py3-none-any.whl", hash = "sha256:720a6c7e6b77288b85063569baae8536671b39f15cc22037ec7045658d80489e", size = 168724 }, + { url = "https://files.pythonhosted.org/packages/cc/20/ff623b09d963f88bfde16306a54e12ee5ea43e9b597108672ff3a408aad6/pathspec-0.12.1-py3-none-any.whl", hash = "sha256:a0d503e138a4c123b27490a4f7beda6a01c6f288df0e4a8b79c7eb0dc7b4cc08", size = 31191 }, ] [[package]] @@ -512,3 +527,4 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/db/d9/c495884c6e548fce18a8f40568ff120bc3a4b7b99813081c8ac0c936fa64/watchdog-6.0.0-py3-none-win_amd64.whl", hash = "sha256:cbafb470cf848d93b5d013e2ecb245d4aa1c8fd0504e863ccefa32445359d680", size = 79070 }, { url = "https://files.pythonhosted.org/packages/33/e8/e40370e6d74ddba47f002a32919d91310d6074130fe4e17dabcafc15cbf1/watchdog-6.0.0-py3-none-win_ia64.whl", hash = "sha256:a1914259fa9e1454315171103c6a30961236f508b9b623eae470268bbcc6a22f", size = 79067 }, ] +