BUG-HUNT: [resource] Memory exhaustion vulnerability in file_tools.py read handler allows DoS via large files #7231

Open
opened 2026-04-10 09:52:22 +00:00 by HAL9000 · 6 comments
Owner

Metadata

  • Branch: bugfix/m3-file-read-memory-exhaustion
  • Commit Message: fix(file_tools): add file size limit and error handling to _handle_file_read to prevent DoS
  • Milestone: (none — backlog routing; see note below)
  • Parent Epic: (orphan — see note below)

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Related issue: #6580 covers a closely related unbounded memory allocation issue in the same function (_handle_file_read). This issue provides additional detail on the DoS attack scenario, error handling gaps, and a concrete suggested fix. Human review should determine whether to consolidate these issues.


Background and Context

The _handle_file_read function in src/cleveragents/tool/builtins/file_tools.py is the core handler for the builtin/file-read tool. It is invoked whenever an agent reads a file from the filesystem. Currently, it calls path.read_text() without any size guard or error handling, meaning the entire file content is loaded into memory unconditionally.

This creates a denial-of-service (DoS) vulnerability: any agent task that involves reading a large file (e.g., a database dump, multi-GB log file, or large dataset) will cause memory exhaustion, crashing the tool execution and potentially the entire CleverAgents runtime.


Current Behavior

_handle_file_read reads the entire file into memory with no size check:

def _handle_file_read(inputs: dict[str, Any]) -> dict[str, Any]:
    """Read file content."""
    path = validate_path(inputs["path"], inputs.get("sandbox_root"))
    encoding: str = inputs.get("encoding", "utf-8")
    offset: int = inputs.get("offset", 0)
    limit: int | None = inputs.get("limit")

    content = path.read_text(encoding=encoding)  # ← Reads entire file into memory
    lines = content.splitlines(keepends=True)
    # ... rest of function

Location:

  • File: src/cleveragents/tool/builtins/file_tools.py
  • Function: _handle_file_read
  • Lines: ~84–93

Vulnerabilities identified:

  1. Unbounded memory allocation: path.read_text() loads the entire file regardless of size
  2. No size checking: No validation of file size before reading
  3. No error handling: OSError and UnicodeDecodeError can crash tool execution silently
  4. DoS potential: GB-sized files (logs, datasets, media) cause memory exhaustion

Attack / failure scenario:

  1. Agent receives a task to read a large file (e.g., a database dump or large log)
  2. Tool attempts to load the multi-GB file into memory
  3. System runs out of memory, causing:
    • Tool execution crash
    • Plan execution failure
    • Potential system-wide memory pressure affecting other processes

Expected Behavior

File reading should be bounded and resilient:

  • Check file size before reading; reject files above a reasonable limit (e.g., 50 MB)
  • Use streaming/chunked reading when offset/limit is specified, to avoid loading the full file
  • Handle OSError and UnicodeDecodeError gracefully with informative error messages

Suggested fix:

def _handle_file_read(inputs: dict[str, Any]) -> dict[str, Any]:
    """Read file content."""
    path = validate_path(inputs["path"], inputs.get("sandbox_root"))

    # Check file size before reading
    try:
        file_size = path.stat().st_size
        if file_size > 50 * 1024 * 1024:  # 50 MB limit
            raise ValueError(f"File too large: {file_size} bytes (max 50 MB)")
    except OSError as exc:
        raise ValueError(f"Cannot access file: {exc}") from exc

    encoding: str = inputs.get("encoding", "utf-8")
    offset: int = inputs.get("offset", 0)
    limit: int | None = inputs.get("limit")

    try:
        content = path.read_text(encoding=encoding)
    except (OSError, UnicodeDecodeError) as exc:
        raise ValueError(f"Cannot read file: {exc}") from exc

    lines = content.splitlines(keepends=True)
    # ... rest of function

Acceptance Criteria

  • _handle_file_read checks file size via path.stat().st_size before calling read_text()
  • Files exceeding the configured size limit (default 50 MB) raise a ValueError with a clear message
  • OSError raised during stat() or read_text() is caught and re-raised as ValueError
  • UnicodeDecodeError raised during read_text() is caught and re-raised as ValueError
  • The size limit is configurable (not hardcoded magic number)
  • All existing tests continue to pass
  • New Behave scenarios cover: file-too-large rejection, OS error handling, encoding error handling

Supporting Information

  • Related issue: #6580 — same root cause, filed in an earlier bug hunt cycle
  • Impact: Security (DoS via memory exhaustion), Reliability (unpredictable crashes on large files), Resource Management (violates bounded resource usage principles)

Subtasks

  • Confirm the exact size threshold with the team (50 MB default suggested)
  • Implement path.stat().st_size guard before path.read_text()
  • Wrap read_text() call in try/except (OSError, UnicodeDecodeError)
  • Make the size limit configurable (e.g., via tool input schema or project config)
  • Tests (Behave): Add scenarios for file-too-large, OS error, and encoding error cases
  • Tests (Robot): Add integration test for large-file rejection via builtin/file-read
  • Verify coverage ≥ 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage ≥ 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Metadata - **Branch**: `bugfix/m3-file-read-memory-exhaustion` - **Commit Message**: `fix(file_tools): add file size limit and error handling to _handle_file_read to prevent DoS` - **Milestone**: *(none — backlog routing; see note below)* - **Parent Epic**: *(orphan — see note below)* > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. > **Related issue:** #6580 covers a closely related unbounded memory allocation issue in the same function (`_handle_file_read`). This issue provides additional detail on the DoS attack scenario, error handling gaps, and a concrete suggested fix. Human review should determine whether to consolidate these issues. --- ## Background and Context The `_handle_file_read` function in `src/cleveragents/tool/builtins/file_tools.py` is the core handler for the `builtin/file-read` tool. It is invoked whenever an agent reads a file from the filesystem. Currently, it calls `path.read_text()` without any size guard or error handling, meaning the entire file content is loaded into memory unconditionally. This creates a denial-of-service (DoS) vulnerability: any agent task that involves reading a large file (e.g., a database dump, multi-GB log file, or large dataset) will cause memory exhaustion, crashing the tool execution and potentially the entire CleverAgents runtime. --- ## Current Behavior `_handle_file_read` reads the entire file into memory with no size check: ```python def _handle_file_read(inputs: dict[str, Any]) -> dict[str, Any]: """Read file content.""" path = validate_path(inputs["path"], inputs.get("sandbox_root")) encoding: str = inputs.get("encoding", "utf-8") offset: int = inputs.get("offset", 0) limit: int | None = inputs.get("limit") content = path.read_text(encoding=encoding) # ← Reads entire file into memory lines = content.splitlines(keepends=True) # ... rest of function ``` **Location:** - File: `src/cleveragents/tool/builtins/file_tools.py` - Function: `_handle_file_read` - Lines: ~84–93 **Vulnerabilities identified:** 1. **Unbounded memory allocation**: `path.read_text()` loads the entire file regardless of size 2. **No size checking**: No validation of file size before reading 3. **No error handling**: `OSError` and `UnicodeDecodeError` can crash tool execution silently 4. **DoS potential**: GB-sized files (logs, datasets, media) cause memory exhaustion **Attack / failure scenario:** 1. Agent receives a task to read a large file (e.g., a database dump or large log) 2. Tool attempts to load the multi-GB file into memory 3. System runs out of memory, causing: - Tool execution crash - Plan execution failure - Potential system-wide memory pressure affecting other processes --- ## Expected Behavior File reading should be bounded and resilient: - Check file size before reading; reject files above a reasonable limit (e.g., 50 MB) - Use streaming/chunked reading when `offset`/`limit` is specified, to avoid loading the full file - Handle `OSError` and `UnicodeDecodeError` gracefully with informative error messages **Suggested fix:** ```python def _handle_file_read(inputs: dict[str, Any]) -> dict[str, Any]: """Read file content.""" path = validate_path(inputs["path"], inputs.get("sandbox_root")) # Check file size before reading try: file_size = path.stat().st_size if file_size > 50 * 1024 * 1024: # 50 MB limit raise ValueError(f"File too large: {file_size} bytes (max 50 MB)") except OSError as exc: raise ValueError(f"Cannot access file: {exc}") from exc encoding: str = inputs.get("encoding", "utf-8") offset: int = inputs.get("offset", 0) limit: int | None = inputs.get("limit") try: content = path.read_text(encoding=encoding) except (OSError, UnicodeDecodeError) as exc: raise ValueError(f"Cannot read file: {exc}") from exc lines = content.splitlines(keepends=True) # ... rest of function ``` --- ## Acceptance Criteria - [ ] `_handle_file_read` checks file size via `path.stat().st_size` before calling `read_text()` - [ ] Files exceeding the configured size limit (default 50 MB) raise a `ValueError` with a clear message - [ ] `OSError` raised during `stat()` or `read_text()` is caught and re-raised as `ValueError` - [ ] `UnicodeDecodeError` raised during `read_text()` is caught and re-raised as `ValueError` - [ ] The size limit is configurable (not hardcoded magic number) - [ ] All existing tests continue to pass - [ ] New Behave scenarios cover: file-too-large rejection, OS error handling, encoding error handling --- ## Supporting Information - Related issue: #6580 — same root cause, filed in an earlier bug hunt cycle - Impact: **Security** (DoS via memory exhaustion), **Reliability** (unpredictable crashes on large files), **Resource Management** (violates bounded resource usage principles) --- ## Subtasks - [ ] Confirm the exact size threshold with the team (50 MB default suggested) - [ ] Implement `path.stat().st_size` guard before `path.read_text()` - [ ] Wrap `read_text()` call in `try/except (OSError, UnicodeDecodeError)` - [ ] Make the size limit configurable (e.g., via tool input schema or project config) - [ ] Tests (Behave): Add scenarios for file-too-large, OS error, and encoding error cases - [ ] Tests (Robot): Add integration test for large-file rejection via `builtin/file-read` - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage ≥ 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

⚠️ Orphan Issue — Manual Linking Required

This issue was created without a parent Epic because no suitable parent Epic could be identified for file_tools.py bug fixes during automated search.

A human reviewer should link this issue to the appropriate parent Epic using Forgejo's dependency system (this issue blocks the parent Epic).

Candidate parent Epics to consider:

  • Any Epic covering tool/builtins security hardening
  • Any Epic covering file_tools.py improvements
  • The general bug-fix Epic for the milestone this issue is eventually assigned to

Related issue: #6580 covers the same root cause and may already be linked to a parent Epic — check that issue's dependency chain for guidance.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

⚠️ **Orphan Issue — Manual Linking Required** This issue was created without a parent Epic because no suitable parent Epic could be identified for `file_tools.py` bug fixes during automated search. A human reviewer should link this issue to the appropriate parent Epic using Forgejo's dependency system (this issue **blocks** the parent Epic). Candidate parent Epics to consider: - Any Epic covering `tool/builtins` security hardening - Any Epic covering `file_tools.py` improvements - The general bug-fix Epic for the milestone this issue is eventually assigned to **Related issue:** #6580 covers the same root cause and may already be linked to a parent Epic — check that issue's dependency chain for guidance. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

UAT verified: File tools core functionality tests pass. The file read operations are comprehensively tested and working correctly for normal use cases:

  • File read operations with various encodings
  • File read with offset and limit parameters
  • Path validation and traversal prevention
  • File tool registry and lifecycle operations

While this issue identifies a specific resource exhaustion vulnerability for very large files, the core file reading functionality is verified as working correctly through the test suite for normal-sized files.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

UAT verified: File tools core functionality tests pass. The file read operations are comprehensively tested and working correctly for normal use cases: - File read operations with various encodings - File read with offset and limit parameters - Path validation and traversal prevention - File tool registry and lifecycle operations While this issue identifies a specific resource exhaustion vulnerability for very large files, the core file reading functionality is verified as working correctly through the test suite for normal-sized files. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

UAT verified: Core Actor, Skills & Tools functionality tests pass. The file tools and related components have working baseline functionality:

  • Actor configuration creation and validation working
  • Tool registry instantiation and basic operations functional
  • Core imports and module loading successful
  • File tools infrastructure operational

While this issue identifies a specific resource exhaustion vulnerability for very large files, the core file tools functionality is verified as working correctly for normal-sized files.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

UAT verified: Core Actor, Skills & Tools functionality tests pass. The file tools and related components have working baseline functionality: - Actor configuration creation and validation working - Tool registry instantiation and basic operations functional - Core imports and module loading successful - File tools infrastructure operational While this issue identifies a specific resource exhaustion vulnerability for very large files, the core file tools functionality is verified as working correctly for normal-sized files. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

Verified — Security bug: memory exhaustion via large files in file_tools.py. MoSCoW: Must-have. Priority: High — DoS vulnerability.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Security bug: memory exhaustion via large files in file_tools.py. MoSCoW: Must-have. Priority: High — DoS vulnerability. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Security bug: memory exhaustion via large files in file_tools.py. MoSCoW: Must-have. Priority: High — DoS vulnerability.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Security bug: memory exhaustion via large files in file_tools.py. MoSCoW: Must-have. Priority: High — DoS vulnerability. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Security bug: memory exhaustion via large files in file_tools.py. MoSCoW: Must-have. Priority: High — DoS vulnerability.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Security bug: memory exhaustion via large files in file_tools.py. MoSCoW: Must-have. Priority: High — DoS vulnerability. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7231
No description provided.