Arbitrary File Loading Vulnerability in context_analysis.py #10551

New issue

Open

opened 2026-04-18 17:12:01 +00:00 by HAL9000 · 0 comments

HAL9000 commented

2026-04-18 17:12:01 +00:00

Owner

Metadata

Commit: HEAD (current working tree)
Branch: main
Issue Type: Security Vulnerability
Severity: CRITICAL

Background and Context

The _load_files() method in src/cleveragents/agents/graphs/context_analysis.py (lines 179-200) uses LangChain's TextLoader to load files from user-provided file paths without any validation. This creates a critical arbitrary file read vulnerability that allows attackers to read sensitive files from the system.

Vulnerability Details

The current implementation:

Accepts user-provided file paths directly
Passes them to TextLoader(str(path)) without validation
Does not check if paths are within allowed directories
Does not prevent path traversal sequences like ../
Does not validate file extensions or restrict to safe file types

Code Evidence

Lines 179-200 in context_analysis.py:

def _load_files(self, file_paths: list[str]) -> list[Document]:
    """Load files using TextLoader without path validation"""
    documents = []
    for path in file_paths:
        # VULNERABLE: No validation of path
        loader = TextLoader(str(path))
        documents.extend(loader.load())
    return documents

Attack Scenario

An attacker could invoke the agent with malicious file paths:

from cleveragents.agents import ContextAnalysisAgent
agent = ContextAnalysisAgent(llm=mock_llm)
state = {
    "file_paths": ["/etc/passwd", "../../.env", "/root/.ssh/id_rsa"],
    "documents": [],
    "dependencies": {},
    "summary": "",
    "relevance_scores": {},
    "chunks": [],
    "error": None,
}
result = agent.invoke(state, config={"configurable": {"thread_id": "test"}})
# This would load /etc/passwd, ../../.env, and /root/.ssh/id_rsa

Expected Behavior

The _load_files() method should:

Validate all file paths before loading
Reject paths that attempt to escape the project directory
Resolve symlinks and check the resolved path is within allowed boundaries
Only allow loading files with safe extensions (.py, .txt, .md, .json, .yaml, etc.)
Reject absolute paths and paths containing .. sequences
Provide clear error messages when a path is rejected

Acceptance Criteria

All file paths are validated before being passed to TextLoader
Paths containing .. are rejected with a clear error message
Absolute paths (starting with /) are rejected
Symlinks are resolved and checked to be within project directory
Only whitelisted file extensions are allowed
pathlib.Path.resolve() is used to resolve symlinks and normalize paths
Resolved path is verified to be under the project root directory
Unit tests cover all validation scenarios (path traversal, absolute paths, symlinks, invalid extensions)
Integration tests verify the agent rejects malicious file paths
Security audit confirms no bypass vectors exist

Subtasks

Add path validation utility function to validate and sanitize file paths
Implement whitelist of allowed file extensions
Add path traversal detection (reject .. sequences)
Add absolute path detection and rejection
Implement symlink resolution using pathlib.Path.resolve()
Add project root boundary check
Update _load_files() to use validation before TextLoader
Add comprehensive unit tests for path validation
Add integration tests for agent with malicious paths
Update documentation with security considerations
Create security advisory if this affects released versions

Definition of Done

This issue is complete when:

All path validation is implemented and tested
Unit test coverage for validation is >= 95%
Integration tests pass with both valid and malicious paths
Code review confirms no security bypass vectors
Security audit sign-off obtained
Documentation updated with security best practices
All subtasks are checked off

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata **Commit**: HEAD (current working tree) **Branch**: main **Issue Type**: Security Vulnerability **Severity**: CRITICAL ## Background and Context The `_load_files()` method in `src/cleveragents/agents/graphs/context_analysis.py` (lines 179-200) uses LangChain's `TextLoader` to load files from user-provided file paths without any validation. This creates a critical arbitrary file read vulnerability that allows attackers to read sensitive files from the system. ### Vulnerability Details The current implementation: 1. Accepts user-provided file paths directly 2. Passes them to `TextLoader(str(path))` without validation 3. Does not check if paths are within allowed directories 4. Does not prevent path traversal sequences like `../` 5. Does not validate file extensions or restrict to safe file types ### Code Evidence **Lines 179-200 in context_analysis.py:** ```python def _load_files(self, file_paths: list[str]) -> list[Document]: """Load files using TextLoader without path validation""" documents = [] for path in file_paths: # VULNERABLE: No validation of path loader = TextLoader(str(path)) documents.extend(loader.load()) return documents ``` ### Attack Scenario An attacker could invoke the agent with malicious file paths: ```python from cleveragents.agents import ContextAnalysisAgent agent = ContextAnalysisAgent(llm=mock_llm) state = { "file_paths": ["/etc/passwd", "../../.env", "/root/.ssh/id_rsa"], "documents": [], "dependencies": {}, "summary": "", "relevance_scores": {}, "chunks": [], "error": None, } result = agent.invoke(state, config={"configurable": {"thread_id": "test"}}) # This would load /etc/passwd, ../../.env, and /root/.ssh/id_rsa ``` ## Expected Behavior The `_load_files()` method should: 1. Validate all file paths before loading 2. Reject paths that attempt to escape the project directory 3. Resolve symlinks and check the resolved path is within allowed boundaries 4. Only allow loading files with safe extensions (.py, .txt, .md, .json, .yaml, etc.) 5. Reject absolute paths and paths containing `..` sequences 6. Provide clear error messages when a path is rejected ## Acceptance Criteria - [ ] All file paths are validated before being passed to `TextLoader` - [ ] Paths containing `..` are rejected with a clear error message - [ ] Absolute paths (starting with `/`) are rejected - [ ] Symlinks are resolved and checked to be within project directory - [ ] Only whitelisted file extensions are allowed - [ ] `pathlib.Path.resolve()` is used to resolve symlinks and normalize paths - [ ] Resolved path is verified to be under the project root directory - [ ] Unit tests cover all validation scenarios (path traversal, absolute paths, symlinks, invalid extensions) - [ ] Integration tests verify the agent rejects malicious file paths - [ ] Security audit confirms no bypass vectors exist ## Subtasks - [ ] Add path validation utility function to validate and sanitize file paths - [ ] Implement whitelist of allowed file extensions - [ ] Add path traversal detection (reject `..` sequences) - [ ] Add absolute path detection and rejection - [ ] Implement symlink resolution using `pathlib.Path.resolve()` - [ ] Add project root boundary check - [ ] Update `_load_files()` to use validation before `TextLoader` - [ ] Add comprehensive unit tests for path validation - [ ] Add integration tests for agent with malicious paths - [ ] Update documentation with security considerations - [ ] Create security advisory if this affects released versions ## Definition of Done This issue is complete when: 1. All path validation is implemented and tested 2. Unit test coverage for validation is >= 95% 3. Integration tests pass with both valid and malicious paths 4. Code review confirms no security bypass vectors 5. Security audit sign-off obtained 6. Documentation updated with security best practices 7. All subtasks are checked off --- **Automated by CleverAgents Bot** Agent: new-issue-creator

HAL9000 added the

labels

2026-04-18 17:13:05 +00:00