BUG-HUNT: [resource] ActorLoader.discover() reads each YAML file twice — unnecessary I/O and TOCTOU window #7577

Open
opened 2026-04-10 22:26:44 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: [resource] — ActorLoader.discover() Reads YAML Files Twice

Severity Assessment

  • Impact: Each YAML file is read twice during discovery: once on line 115 (content = resolved.read_bytes()) for parsing, and again on line 201 (content_hash = _compute_hash(resolved_path.read_bytes())) for hashing. This doubles the I/O for all actor files. More critically, the file may change between the two reads (TOCTOU), causing the stored hash to not match the parsed content — the cache would store a mismatch between config and hash, causing the next discovery to falsely skip re-loading a modified file.
  • Likelihood: Low for TOCTOU (rare file modification during discovery), High for unnecessary I/O.
  • Priority: Medium

Location

  • File: src/cleveragents/actor/loader.py
  • Function/Class: ActorLoader.discover
  • Lines: 115, 201

Description

In discover(), line 115 reads the file content:

content = resolved.read_bytes()   # First read
content_hash = _compute_hash(content)   # Hash of first read

But later, when building new_actors (line 200-201):

for name, entries in pending.items():
    resolved_path, config = entries[0]
    content_hash = _compute_hash(resolved_path.read_bytes())  # Second read!

The file is read again to compute the hash. If the file changed between the first and second read:

  1. config is parsed from the old content
  2. content_hash reflects the new content
  3. The cache stores config+hash that are mismatched
  4. Next discovery sees matching hash but has wrong config

Evidence

# loader.py line 115
content = resolved.read_bytes()          # First read
content_hash = _compute_hash(content)    # Hash from first read
...
# loader.py line 201
content_hash = _compute_hash(resolved_path.read_bytes())  # Second read!

Expected Behavior

The hash should be computed from the same bytes used for parsing, using the already-read content variable.

Actual Behavior

File is read twice, wasting I/O and creating a TOCTOU window between parse and hash computation.

Suggested Fix

Pass the already-read content bytes through pending alongside the config, and use those bytes for hashing instead of re-reading:

# In pending, store (resolved_path, config, content_hash) tuples
# Eliminate the second read_bytes() call at line 201
for name, entries in pending.items():
    resolved_path, config, content_hash = entries[0]  # Use cached hash

Category

resource

TDD Note

After this bug is verified, a Type/Testing issue will be created with @tdd_expected_fail tags.


Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: [resource] — ActorLoader.discover() Reads YAML Files Twice ### Severity Assessment - **Impact**: Each YAML file is read twice during discovery: once on line 115 (`content = resolved.read_bytes()`) for parsing, and again on line 201 (`content_hash = _compute_hash(resolved_path.read_bytes())`) for hashing. This doubles the I/O for all actor files. More critically, the file may change between the two reads (TOCTOU), causing the stored hash to not match the parsed content — the cache would store a mismatch between config and hash, causing the next discovery to falsely skip re-loading a modified file. - **Likelihood**: Low for TOCTOU (rare file modification during discovery), High for unnecessary I/O. - **Priority**: Medium ### Location - **File**: src/cleveragents/actor/loader.py - **Function/Class**: ActorLoader.discover - **Lines**: 115, 201 ### Description In `discover()`, line 115 reads the file content: ```python content = resolved.read_bytes() # First read content_hash = _compute_hash(content) # Hash of first read ``` But later, when building new_actors (line 200-201): ```python for name, entries in pending.items(): resolved_path, config = entries[0] content_hash = _compute_hash(resolved_path.read_bytes()) # Second read! ``` The file is read again to compute the hash. If the file changed between the first and second read: 1. config is parsed from the old content 2. content_hash reflects the new content 3. The cache stores config+hash that are mismatched 4. Next discovery sees matching hash but has wrong config ### Evidence ```python # loader.py line 115 content = resolved.read_bytes() # First read content_hash = _compute_hash(content) # Hash from first read ... # loader.py line 201 content_hash = _compute_hash(resolved_path.read_bytes()) # Second read! ``` ### Expected Behavior The hash should be computed from the same bytes used for parsing, using the already-read content variable. ### Actual Behavior File is read twice, wasting I/O and creating a TOCTOU window between parse and hash computation. ### Suggested Fix Pass the already-read `content` bytes through `pending` alongside the config, and use those bytes for hashing instead of re-reading: ```python # In pending, store (resolved_path, config, content_hash) tuples # Eliminate the second read_bytes() call at line 201 for name, entries in pending.items(): resolved_path, config, content_hash = entries[0] # Use cached hash ``` ### Category resource ### TDD Note After this bug is verified, a Type/Testing issue will be created with @tdd_expected_fail tags. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.6.0 milestone 2026-04-10 23:07:14 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Backlog — Minor bug or optimization that does not block milestone delivery
  • Milestone: Assigned to appropriate milestone for future work
  • Story Points: 2 (S) — Small fix
  • MoSCoW: Could Have — Nice to fix but not blocking

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Issue triaged by project owner: - **State**: Verified - **Priority**: Backlog — Minor bug or optimization that does not block milestone delivery - **Milestone**: Assigned to appropriate milestone for future work - **Story Points**: 2 (S) — Small fix - **MoSCoW**: Could Have — Nice to fix but not blocking --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7577
No description provided.