BUG-HUNT: [resource] Memory growth in reference parser cache over long sessions #7080

Open
opened 2026-04-10 07:30:57 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: Resource — Memory Growth in Reference Parser Cache Over Long Sessions

Severity Assessment

  • Impact: Memory leakage in long-running TUI sessions, particularly those that frequently change directories
  • Likelihood: Low to Medium in typical usage, High in automated or long-running sessions with frequent directory changes
  • Priority: Low

Location

  • File: src/cleveragents/tui/input/reference_parser.py
  • Function/Class: _catalog() function and global _catalog_cache
  • Lines: 15, 33-46

Description

The reference parser uses a global cache with TTL-based invalidation, but it only replaces cache entries rather than clearing old ones. While the current implementation only stores one cache entry at a time (per directory), the cache dictionary grows over time through repeated key updates without explicit cleanup, and intermediate objects may not be properly garbage collected.

Evidence

_catalog_cache: dict[str, object] = {"cwd": None, "created_at": 0.0, "catalog": None}

def _catalog() -> dict[str, list[str]]:
    # ... validation logic ...
    
    # Creates new catalog data structure every time
    catalog = {
        "resource": sorted(files),
        "project": [value for value in [cwd.name] if value],
        "plan": [],
        "actor": sorted(
            f"local/{file.stem}"
            for file in (cwd / "examples" / "actors").glob("*.y*ml")
        )
        if (cwd / "examples" / "actors").is_dir()
        else [],
        # ... more categories
    }
    
    # Replace cache entries, but intermediate objects may linger
    _catalog_cache["cwd"] = cwd
    _catalog_cache["created_at"] = now
    _catalog_cache["catalog"] = catalog
    return catalog

Expected Behavior

Cache memory usage should remain bounded over long sessions, with proper cleanup of old cache data and intermediate objects.

Actual Behavior

While the cache dictionary itself stays small, each cache refresh creates new data structures (sorted lists, file paths, etc.) that may not be immediately garbage collected. In long-running sessions with frequent directory changes or large file trees, this can lead to:

  • Gradual memory growth as intermediate objects accumulate
  • Higher memory pressure during filesystem scanning
  • Potential performance degradation over time

Suggested Fix

Improve cache management with explicit cleanup:

def _catalog() -> dict[str, list[str]]:
    cwd = Path.cwd()
    now = time.time()
    cached_cwd = _catalog_cache.get("cwd")
    cached_time = _catalog_cache.get("created_at")
    cached_catalog = _catalog_cache.get("catalog")
    
    if (
        isinstance(cached_cwd, Path)
        and cached_cwd == cwd
        and isinstance(cached_time, float)
        and now - cached_time < _CATALOG_CACHE_TTL_SECONDS
        and isinstance(cached_catalog, dict)
    ):
        return cached_catalog

    # Explicit cleanup before creating new cache
    _catalog_cache.clear()
    
    # ... build new catalog ...
    
    # Explicitly delete old references to help GC
    del cached_catalog, cached_cwd, cached_time
    
    # Set new cache
    _catalog_cache.update({
        "cwd": cwd,
        "created_at": now,
        "catalog": catalog
    })
    
    return catalog

Category

resource

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: Resource — Memory Growth in Reference Parser Cache Over Long Sessions ### Severity Assessment - **Impact**: Memory leakage in long-running TUI sessions, particularly those that frequently change directories - **Likelihood**: Low to Medium in typical usage, High in automated or long-running sessions with frequent directory changes - **Priority**: Low ### Location - **File**: `src/cleveragents/tui/input/reference_parser.py` - **Function/Class**: `_catalog()` function and global `_catalog_cache` - **Lines**: 15, 33-46 ### Description The reference parser uses a global cache with TTL-based invalidation, but it only replaces cache entries rather than clearing old ones. While the current implementation only stores one cache entry at a time (per directory), the cache dictionary grows over time through repeated key updates without explicit cleanup, and intermediate objects may not be properly garbage collected. ### Evidence ```python _catalog_cache: dict[str, object] = {"cwd": None, "created_at": 0.0, "catalog": None} def _catalog() -> dict[str, list[str]]: # ... validation logic ... # Creates new catalog data structure every time catalog = { "resource": sorted(files), "project": [value for value in [cwd.name] if value], "plan": [], "actor": sorted( f"local/{file.stem}" for file in (cwd / "examples" / "actors").glob("*.y*ml") ) if (cwd / "examples" / "actors").is_dir() else [], # ... more categories } # Replace cache entries, but intermediate objects may linger _catalog_cache["cwd"] = cwd _catalog_cache["created_at"] = now _catalog_cache["catalog"] = catalog return catalog ``` ### Expected Behavior Cache memory usage should remain bounded over long sessions, with proper cleanup of old cache data and intermediate objects. ### Actual Behavior While the cache dictionary itself stays small, each cache refresh creates new data structures (sorted lists, file paths, etc.) that may not be immediately garbage collected. In long-running sessions with frequent directory changes or large file trees, this can lead to: - Gradual memory growth as intermediate objects accumulate - Higher memory pressure during filesystem scanning - Potential performance degradation over time ### Suggested Fix Improve cache management with explicit cleanup: ```python def _catalog() -> dict[str, list[str]]: cwd = Path.cwd() now = time.time() cached_cwd = _catalog_cache.get("cwd") cached_time = _catalog_cache.get("created_at") cached_catalog = _catalog_cache.get("catalog") if ( isinstance(cached_cwd, Path) and cached_cwd == cwd and isinstance(cached_time, float) and now - cached_time < _CATALOG_CACHE_TTL_SECONDS and isinstance(cached_catalog, dict) ): return cached_catalog # Explicit cleanup before creating new cache _catalog_cache.clear() # ... build new catalog ... # Explicitly delete old references to help GC del cached_catalog, cached_cwd, cached_time # Set new cache _catalog_cache.update({ "cwd": cwd, "created_at": now, "catalog": catalog }) return catalog ``` ### Category resource ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_&lt;this-issue-number&gt;, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
Author
Owner

Verified — Resource bug: memory growth in reference parser cache. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource bug: memory growth in reference parser cache. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7080
No description provided.