BUG-HUNT: [resource] Memory leak in ActorLoader cache - deleted/renamed files never removed from cache #7148

Open
opened 2026-04-10 08:11:20 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: fix/resource-actor-loader-cache-memory-leak
  • Commit Message: fix(resource): remove stale entries from ActorLoader cache to prevent memory leak
  • Milestone: (none — backlog per Milestone Scope Guard)
  • Parent Epic: #7023

Background and Context

The ActorLoader class in src/cleveragents/actor/loader.py maintains two internal cache dictionaries (_actors and _path_to_name) that grow indefinitely over the lifetime of a long-running process. The discover() method adds new entries to these caches but never removes stale ones — entries for deleted, renamed, or validation-failed YAML files remain in memory forever.

This is a resource management issue that manifests as a slow memory leak in long-running processes that repeatedly call discover() (e.g., daemon processes, watch-mode operations, or server deployments).

Current Behavior

The ActorLoader caches accumulate entries without bound:

class ActorLoader:
    def __init__(self, ...):
        self._actors: dict[str, _CacheEntry] = {}        # Grows indefinitely
        self._path_to_name: dict[Path, str] = {}         # Grows indefinitely
        self._lock = threading.RLock()

    def discover(self) -> list[ActorConfigSchema]:
        # Updates caches but never removes stale entries
        for yaml_path in self._find_yaml_files():
            # Adds new entries but never cleans up:
            # - Deleted files (path in cache but file gone)
            # - Renamed files (old path remains in cache)
            # - Files that fail validation (in path_to_name but not _actors)

Additionally, _CacheEntry.load_count is incremented but never used for any eviction or cleanup logic.

Memory Leak Scenarios

  1. Deleted Files: File /actors/old.yaml is discovered and cached. File is deleted from filesystem. Cache entry remains forever in both _path_to_name and _actors.

  2. Renamed Files: File /actors/v1.yaml is discovered and cached. File is renamed to /actors/v2.yaml. Old entry remains, new entry is added — duplicate memory usage that never resolves.

  3. Validation Failures: File /actors/broken.yaml is added to _path_to_name. Validation fails, so it is not added to _actors. The path entry remains in cache forever, even if the file is later fixed or deleted.

  4. Unused load_count: _CacheEntry.load_count is incremented on each cache hit but is never used for LRU eviction, age-based cleanup, or any other purpose.

Expected Behavior

The discover() method should reconcile the cache against the current filesystem state on each call:

  • Entries for files that no longer exist should be removed from both _path_to_name and _actors.
  • Entries for renamed files (old path no longer present) should be evicted.
  • Orphaned _path_to_name entries (where the corresponding _actors entry is absent) should be cleaned up.
  • Optionally: LRU or age-based eviction when cache size exceeds a configurable threshold.

Affected Location

  • File: src/cleveragents/actor/loader.py
  • Class: ActorLoader
  • Methods: discover(), __init__()
  • Supporting class: _CacheEntry (unused load_count field)
  • Approximate lines: ~70–90 (init/cache setup), ~150–200 (discover loop)

Backlog note: This issue was discovered during autonomous operation
on milestone v3.5.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Subtasks

  • Audit discover() to identify all cache write paths
  • Implement stale-entry removal: after scanning current YAML files, diff against cached paths and evict missing entries from both _path_to_name and _actors
  • Handle orphaned _path_to_name entries (path present but no corresponding _actors entry)
  • Decide on load_count semantics: implement LRU eviction using it, or remove the field if unused
  • Add configurable max_cache_size with LRU eviction when threshold is exceeded
  • Write BDD Behave scenarios covering: deleted-file eviction, renamed-file eviction, validation-failure cleanup, and LRU eviction
  • Write Robot Framework integration tests for long-running discover() cycles
  • Write ASV benchmark to measure memory growth across N discover() cycles
  • Update docstrings and inline documentation for ActorLoader, discover(), and _CacheEntry
  • Verify all nox stages pass

Definition of Done

  • discover() removes stale cache entries for deleted and renamed YAML files on every call
  • Orphaned _path_to_name entries (no corresponding _actors entry) are cleaned up
  • load_count is either used for eviction logic or removed (no dead code)
  • Cache size is bounded (configurable threshold with LRU eviction)
  • BDD Behave scenarios cover all three leak scenarios and LRU eviction
  • Robot Framework integration tests pass for repeated discover() cycles
  • ASV benchmark shows stable (non-growing) memory across N discover() cycles
  • All nox stages pass (lint, typecheck, security, unit_tests, integration_tests, coverage)
  • Coverage ≥ 97%
  • No # type: ignore suppressions introduced

Automated by CleverAgents Bot
Supervisor: Acting on behalf of: Bug Hunt Cycle 2 Batch 2 Worker 12 | Agent: new-issue-creator

## Metadata - **Branch**: `fix/resource-actor-loader-cache-memory-leak` - **Commit Message**: `fix(resource): remove stale entries from ActorLoader cache to prevent memory leak` - **Milestone**: *(none — backlog per Milestone Scope Guard)* - **Parent Epic**: #7023 ## Background and Context The `ActorLoader` class in `src/cleveragents/actor/loader.py` maintains two internal cache dictionaries (`_actors` and `_path_to_name`) that grow indefinitely over the lifetime of a long-running process. The `discover()` method adds new entries to these caches but never removes stale ones — entries for deleted, renamed, or validation-failed YAML files remain in memory forever. This is a resource management issue that manifests as a slow memory leak in long-running processes that repeatedly call `discover()` (e.g., daemon processes, watch-mode operations, or server deployments). ## Current Behavior The `ActorLoader` caches accumulate entries without bound: ```python class ActorLoader: def __init__(self, ...): self._actors: dict[str, _CacheEntry] = {} # Grows indefinitely self._path_to_name: dict[Path, str] = {} # Grows indefinitely self._lock = threading.RLock() def discover(self) -> list[ActorConfigSchema]: # Updates caches but never removes stale entries for yaml_path in self._find_yaml_files(): # Adds new entries but never cleans up: # - Deleted files (path in cache but file gone) # - Renamed files (old path remains in cache) # - Files that fail validation (in path_to_name but not _actors) ``` Additionally, `_CacheEntry.load_count` is incremented but never used for any eviction or cleanup logic. ## Memory Leak Scenarios 1. **Deleted Files**: File `/actors/old.yaml` is discovered and cached. File is deleted from filesystem. Cache entry remains forever in both `_path_to_name` and `_actors`. 2. **Renamed Files**: File `/actors/v1.yaml` is discovered and cached. File is renamed to `/actors/v2.yaml`. Old entry remains, new entry is added — duplicate memory usage that never resolves. 3. **Validation Failures**: File `/actors/broken.yaml` is added to `_path_to_name`. Validation fails, so it is not added to `_actors`. The path entry remains in cache forever, even if the file is later fixed or deleted. 4. **Unused `load_count`**: `_CacheEntry.load_count` is incremented on each cache hit but is never used for LRU eviction, age-based cleanup, or any other purpose. ## Expected Behavior The `discover()` method should reconcile the cache against the current filesystem state on each call: - Entries for files that no longer exist should be removed from both `_path_to_name` and `_actors`. - Entries for renamed files (old path no longer present) should be evicted. - Orphaned `_path_to_name` entries (where the corresponding `_actors` entry is absent) should be cleaned up. - Optionally: LRU or age-based eviction when cache size exceeds a configurable threshold. ## Affected Location - **File**: `src/cleveragents/actor/loader.py` - **Class**: `ActorLoader` - **Methods**: `discover()`, `__init__()` - **Supporting class**: `_CacheEntry` (unused `load_count` field) - **Approximate lines**: ~70–90 (init/cache setup), ~150–200 (discover loop) > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.5.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Subtasks - [ ] Audit `discover()` to identify all cache write paths - [ ] Implement stale-entry removal: after scanning current YAML files, diff against cached paths and evict missing entries from both `_path_to_name` and `_actors` - [ ] Handle orphaned `_path_to_name` entries (path present but no corresponding `_actors` entry) - [ ] Decide on `load_count` semantics: implement LRU eviction using it, or remove the field if unused - [ ] Add configurable `max_cache_size` with LRU eviction when threshold is exceeded - [ ] Write BDD Behave scenarios covering: deleted-file eviction, renamed-file eviction, validation-failure cleanup, and LRU eviction - [ ] Write Robot Framework integration tests for long-running discover() cycles - [ ] Write ASV benchmark to measure memory growth across N discover() cycles - [ ] Update docstrings and inline documentation for `ActorLoader`, `discover()`, and `_CacheEntry` - [ ] Verify all nox stages pass ## Definition of Done - [ ] `discover()` removes stale cache entries for deleted and renamed YAML files on every call - [ ] Orphaned `_path_to_name` entries (no corresponding `_actors` entry) are cleaned up - [ ] `load_count` is either used for eviction logic or removed (no dead code) - [ ] Cache size is bounded (configurable threshold with LRU eviction) - [ ] BDD Behave scenarios cover all three leak scenarios and LRU eviction - [ ] Robot Framework integration tests pass for repeated discover() cycles - [ ] ASV benchmark shows stable (non-growing) memory across N discover() cycles - [ ] All nox stages pass (`lint`, `typecheck`, `security`, `unit_tests`, `integration_tests`, `coverage`) - [ ] Coverage ≥ 97% - [ ] No `# type: ignore` suppressions introduced --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: Bug Hunt Cycle 2 Batch 2 Worker 12 | Agent: new-issue-creator
Author
Owner

Verified — Resource leak: ActorLoader cache never removes deleted/renamed files. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource leak: ActorLoader cache never removes deleted/renamed files. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7148
No description provided.