BUG-HUNT: [security] file_tools.py _handle_file_edit() reads file with default encoding but writes with specified encoding causing data corruption #7342

Open
opened 2026-04-10 17:56:59 +00:00 by HAL9000 · 3 comments
Owner

Bug Report: [security/data-integrity] _handle_file_edit() reads without encoding but writes with encoding causing potential data corruption

Severity Assessment

  • Impact: File contents can be silently corrupted when editing files with non-UTF-8 encodings (Latin-1, Windows-1252, etc.), or when the file contains mixed encoding content
  • Likelihood: Medium — affects any workflow that calls builtin/file-edit on files with non-default encoding
  • Priority: Medium

Location

  • File: src/cleveragents/tool/builtins/file_tools.py
  • Function/Class: _handle_file_edit()
  • Lines: ~55-70

Description

The _handle_file_edit() handler reads file content using path.read_text() without specifying an encoding, but writes the result with the default encoding path.write_text(content) also without encoding. While both default to UTF-8 on modern systems, this inconsistency creates a silent data corruption risk:

  1. The encoding parameter is accepted as input (inputs.get("encoding", "utf-8")) and passed to _handle_file_read(), but _handle_file_edit() ignores it entirely — never uses the encoding for read or write.
  2. If a caller sets encoding: latin-1 expecting the file to be treated as Latin-1 encoded, the file-edit tool silently ignores this and reads/writes with the system default (UTF-8), potentially corrupting non-ASCII characters.

Evidence

def _handle_file_edit(inputs: dict[str, Any]) -> dict[str, Any]:
    """Edit file with string replacement."""
    path = validate_path(inputs["path"], inputs.get("sandbox_root"))
    old_text: str = inputs["old_text"]
    new_text: str = inputs["new_text"]
    replace_all: bool = inputs.get("replace_all", False)

    content = path.read_text()    # BUG: No encoding specified! Ignores "encoding" input param
    count = content.count(old_text)

    if count == 0:
        raise ValueError(f"old_text not found in {path}")

    if replace_all:
        content = content.replace(old_text, new_text)
    else:
        content = content.replace(old_text, new_text, 1)
        count = 1

    path.write_text(content)      # BUG: No encoding specified! Ignores "encoding" input param

Contrast with _handle_file_read() which properly uses the encoding parameter:

def _handle_file_read(inputs: dict[str, Any]) -> dict[str, Any]:
    encoding: str = inputs.get("encoding", "utf-8")
    content = path.read_text(encoding=encoding)  # Correct usage

Expected Behavior

_handle_file_edit() should honor the encoding input parameter:

def _handle_file_edit(inputs: dict[str, Any]) -> dict[str, Any]:
    path = validate_path(inputs["path"], inputs.get("sandbox_root"))
    old_text: str = inputs["old_text"]
    new_text: str = inputs["new_text"]
    replace_all: bool = inputs.get("replace_all", False)
    encoding: str = inputs.get("encoding", "utf-8")  # Honor encoding param

    content = path.read_text(encoding=encoding)  # Use specified encoding
    count = content.count(old_text)
    # ... rest of method
    path.write_text(content, encoding=encoding)  # Write with same encoding

Actual Behavior

The encoding parameter is silently ignored. Files are always read/written with the system default encoding, regardless of the encoding input parameter. This creates:

  1. Data corruption when editing files with non-UTF-8 encodings
  2. API inconsistencyfile-read honors encoding, but file-edit ignores it

Category

error-handling

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: [security/data-integrity] _handle_file_edit() reads without encoding but writes with encoding causing potential data corruption ### Severity Assessment - **Impact**: File contents can be silently corrupted when editing files with non-UTF-8 encodings (Latin-1, Windows-1252, etc.), or when the file contains mixed encoding content - **Likelihood**: Medium — affects any workflow that calls `builtin/file-edit` on files with non-default encoding - **Priority**: Medium ### Location - **File**: `src/cleveragents/tool/builtins/file_tools.py` - **Function/Class**: `_handle_file_edit()` - **Lines**: ~55-70 ### Description The `_handle_file_edit()` handler reads file content using `path.read_text()` **without specifying an encoding**, but writes the result with the default encoding `path.write_text(content)` also without encoding. While both default to UTF-8 on modern systems, this inconsistency creates a silent data corruption risk: 1. The `encoding` parameter is accepted as input (`inputs.get("encoding", "utf-8")`) and passed to `_handle_file_read()`, but **`_handle_file_edit()` ignores it entirely** — never uses the encoding for read or write. 2. If a caller sets `encoding: latin-1` expecting the file to be treated as Latin-1 encoded, the `file-edit` tool silently ignores this and reads/writes with the system default (UTF-8), potentially corrupting non-ASCII characters. ### Evidence ```python def _handle_file_edit(inputs: dict[str, Any]) -> dict[str, Any]: """Edit file with string replacement.""" path = validate_path(inputs["path"], inputs.get("sandbox_root")) old_text: str = inputs["old_text"] new_text: str = inputs["new_text"] replace_all: bool = inputs.get("replace_all", False) content = path.read_text() # BUG: No encoding specified! Ignores "encoding" input param count = content.count(old_text) if count == 0: raise ValueError(f"old_text not found in {path}") if replace_all: content = content.replace(old_text, new_text) else: content = content.replace(old_text, new_text, 1) count = 1 path.write_text(content) # BUG: No encoding specified! Ignores "encoding" input param ``` Contrast with `_handle_file_read()` which properly uses the encoding parameter: ```python def _handle_file_read(inputs: dict[str, Any]) -> dict[str, Any]: encoding: str = inputs.get("encoding", "utf-8") content = path.read_text(encoding=encoding) # Correct usage ``` ### Expected Behavior `_handle_file_edit()` should honor the `encoding` input parameter: ```python def _handle_file_edit(inputs: dict[str, Any]) -> dict[str, Any]: path = validate_path(inputs["path"], inputs.get("sandbox_root")) old_text: str = inputs["old_text"] new_text: str = inputs["new_text"] replace_all: bool = inputs.get("replace_all", False) encoding: str = inputs.get("encoding", "utf-8") # Honor encoding param content = path.read_text(encoding=encoding) # Use specified encoding count = content.count(old_text) # ... rest of method path.write_text(content, encoding=encoding) # Write with same encoding ``` ### Actual Behavior The `encoding` parameter is silently ignored. Files are always read/written with the system default encoding, regardless of the `encoding` input parameter. This creates: 1. **Data corruption** when editing files with non-UTF-8 encodings 2. **API inconsistency** — `file-read` honors `encoding`, but `file-edit` ignores it ### Category error-handling ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor
Author
Owner

Verified — Security/data integrity bug: encoding mismatch in file_tools.py can cause data corruption. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Security/data integrity bug: encoding mismatch in file_tools.py can cause data corruption. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Security/data integrity bug: encoding mismatch in file_tools.py can cause data corruption. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Security/data integrity bug: encoding mismatch in file_tools.py can cause data corruption. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Security/data integrity bug: encoding mismatch in file_tools.py can cause data corruption. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Security/data integrity bug: encoding mismatch in file_tools.py can cause data corruption. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7342
No description provided.