Arbitrary File Loading Vulnerability in context_analysis.py #10551

Open
opened 2026-04-18 17:12:01 +00:00 by HAL9000 · 0 comments
Owner

Metadata

Commit: HEAD (current working tree)
Branch: main
Issue Type: Security Vulnerability
Severity: CRITICAL

Background and Context

The _load_files() method in src/cleveragents/agents/graphs/context_analysis.py (lines 179-200) uses LangChain's TextLoader to load files from user-provided file paths without any validation. This creates a critical arbitrary file read vulnerability that allows attackers to read sensitive files from the system.

Vulnerability Details

The current implementation:

  1. Accepts user-provided file paths directly
  2. Passes them to TextLoader(str(path)) without validation
  3. Does not check if paths are within allowed directories
  4. Does not prevent path traversal sequences like ../
  5. Does not validate file extensions or restrict to safe file types

Code Evidence

Lines 179-200 in context_analysis.py:

def _load_files(self, file_paths: list[str]) -> list[Document]:
    """Load files using TextLoader without path validation"""
    documents = []
    for path in file_paths:
        # VULNERABLE: No validation of path
        loader = TextLoader(str(path))
        documents.extend(loader.load())
    return documents

Attack Scenario

An attacker could invoke the agent with malicious file paths:

from cleveragents.agents import ContextAnalysisAgent
agent = ContextAnalysisAgent(llm=mock_llm)
state = {
    "file_paths": ["/etc/passwd", "../../.env", "/root/.ssh/id_rsa"],
    "documents": [],
    "dependencies": {},
    "summary": "",
    "relevance_scores": {},
    "chunks": [],
    "error": None,
}
result = agent.invoke(state, config={"configurable": {"thread_id": "test"}})
# This would load /etc/passwd, ../../.env, and /root/.ssh/id_rsa

Expected Behavior

The _load_files() method should:

  1. Validate all file paths before loading
  2. Reject paths that attempt to escape the project directory
  3. Resolve symlinks and check the resolved path is within allowed boundaries
  4. Only allow loading files with safe extensions (.py, .txt, .md, .json, .yaml, etc.)
  5. Reject absolute paths and paths containing .. sequences
  6. Provide clear error messages when a path is rejected

Acceptance Criteria

  • All file paths are validated before being passed to TextLoader
  • Paths containing .. are rejected with a clear error message
  • Absolute paths (starting with /) are rejected
  • Symlinks are resolved and checked to be within project directory
  • Only whitelisted file extensions are allowed
  • pathlib.Path.resolve() is used to resolve symlinks and normalize paths
  • Resolved path is verified to be under the project root directory
  • Unit tests cover all validation scenarios (path traversal, absolute paths, symlinks, invalid extensions)
  • Integration tests verify the agent rejects malicious file paths
  • Security audit confirms no bypass vectors exist

Subtasks

  • Add path validation utility function to validate and sanitize file paths
  • Implement whitelist of allowed file extensions
  • Add path traversal detection (reject .. sequences)
  • Add absolute path detection and rejection
  • Implement symlink resolution using pathlib.Path.resolve()
  • Add project root boundary check
  • Update _load_files() to use validation before TextLoader
  • Add comprehensive unit tests for path validation
  • Add integration tests for agent with malicious paths
  • Update documentation with security considerations
  • Create security advisory if this affects released versions

Definition of Done

This issue is complete when:

  1. All path validation is implemented and tested
  2. Unit test coverage for validation is >= 95%
  3. Integration tests pass with both valid and malicious paths
  4. Code review confirms no security bypass vectors
  5. Security audit sign-off obtained
  6. Documentation updated with security best practices
  7. All subtasks are checked off

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata **Commit**: HEAD (current working tree) **Branch**: main **Issue Type**: Security Vulnerability **Severity**: CRITICAL ## Background and Context The `_load_files()` method in `src/cleveragents/agents/graphs/context_analysis.py` (lines 179-200) uses LangChain's `TextLoader` to load files from user-provided file paths without any validation. This creates a critical arbitrary file read vulnerability that allows attackers to read sensitive files from the system. ### Vulnerability Details The current implementation: 1. Accepts user-provided file paths directly 2. Passes them to `TextLoader(str(path))` without validation 3. Does not check if paths are within allowed directories 4. Does not prevent path traversal sequences like `../` 5. Does not validate file extensions or restrict to safe file types ### Code Evidence **Lines 179-200 in context_analysis.py:** ```python def _load_files(self, file_paths: list[str]) -> list[Document]: """Load files using TextLoader without path validation""" documents = [] for path in file_paths: # VULNERABLE: No validation of path loader = TextLoader(str(path)) documents.extend(loader.load()) return documents ``` ### Attack Scenario An attacker could invoke the agent with malicious file paths: ```python from cleveragents.agents import ContextAnalysisAgent agent = ContextAnalysisAgent(llm=mock_llm) state = { "file_paths": ["/etc/passwd", "../../.env", "/root/.ssh/id_rsa"], "documents": [], "dependencies": {}, "summary": "", "relevance_scores": {}, "chunks": [], "error": None, } result = agent.invoke(state, config={"configurable": {"thread_id": "test"}}) # This would load /etc/passwd, ../../.env, and /root/.ssh/id_rsa ``` ## Expected Behavior The `_load_files()` method should: 1. Validate all file paths before loading 2. Reject paths that attempt to escape the project directory 3. Resolve symlinks and check the resolved path is within allowed boundaries 4. Only allow loading files with safe extensions (.py, .txt, .md, .json, .yaml, etc.) 5. Reject absolute paths and paths containing `..` sequences 6. Provide clear error messages when a path is rejected ## Acceptance Criteria - [ ] All file paths are validated before being passed to `TextLoader` - [ ] Paths containing `..` are rejected with a clear error message - [ ] Absolute paths (starting with `/`) are rejected - [ ] Symlinks are resolved and checked to be within project directory - [ ] Only whitelisted file extensions are allowed - [ ] `pathlib.Path.resolve()` is used to resolve symlinks and normalize paths - [ ] Resolved path is verified to be under the project root directory - [ ] Unit tests cover all validation scenarios (path traversal, absolute paths, symlinks, invalid extensions) - [ ] Integration tests verify the agent rejects malicious file paths - [ ] Security audit confirms no bypass vectors exist ## Subtasks - [ ] Add path validation utility function to validate and sanitize file paths - [ ] Implement whitelist of allowed file extensions - [ ] Add path traversal detection (reject `..` sequences) - [ ] Add absolute path detection and rejection - [ ] Implement symlink resolution using `pathlib.Path.resolve()` - [ ] Add project root boundary check - [ ] Update `_load_files()` to use validation before `TextLoader` - [ ] Add comprehensive unit tests for path validation - [ ] Add integration tests for agent with malicious paths - [ ] Update documentation with security considerations - [ ] Create security advisory if this affects released versions ## Definition of Done This issue is complete when: 1. All path validation is implemented and tested 2. Unit test coverage for validation is >= 95% 3. Integration tests pass with both valid and malicious paths 4. Code review confirms no security bypass vectors 5. Security audit sign-off obtained 6. Documentation updated with security best practices 7. All subtasks are checked off --- **Automated by CleverAgents Bot** Agent: new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10551
No description provided.