BUG-HUNT: [resource] Memory growth in reference parser cache over long sessions #7080

New Issue

2026-04-10T07:30:57Z

HAL9000 commented

2026-04-10 07:30:57 +00:00

Bug Report: Resource — Memory Growth in Reference Parser Cache Over Long Sessions

Severity Assessment

Impact: Memory leakage in long-running TUI sessions, particularly those that frequently change directories
Likelihood: Low to Medium in typical usage, High in automated or long-running sessions with frequent directory changes
Priority: Low

Location

File: src/cleveragents/tui/input/reference_parser.py
Function/Class: _catalog() function and global _catalog_cache
Lines: 15, 33-46

Description

The reference parser uses a global cache with TTL-based invalidation, but it only replaces cache entries rather than clearing old ones. While the current implementation only stores one cache entry at a time (per directory), the cache dictionary grows over time through repeated key updates without explicit cleanup, and intermediate objects may not be properly garbage collected.

Evidence

_catalog_cache: dict[str, object] = {"cwd": None, "created_at": 0.0, "catalog": None}

def _catalog() -> dict[str, list[str]]:
    # ... validation logic ...
    
    # Creates new catalog data structure every time
    catalog = {
        "resource": sorted(files),
        "project": [value for value in [cwd.name] if value],
        "plan": [],
        "actor": sorted(
            f"local/{file.stem}"
            for file in (cwd / "examples" / "actors").glob("*.y*ml")
        )
        if (cwd / "examples" / "actors").is_dir()
        else [],
        # ... more categories
    }
    
    # Replace cache entries, but intermediate objects may linger
    _catalog_cache["cwd"] = cwd
    _catalog_cache["created_at"] = now
    _catalog_cache["catalog"] = catalog
    return catalog

Expected Behavior

Cache memory usage should remain bounded over long sessions, with proper cleanup of old cache data and intermediate objects.

Actual Behavior

While the cache dictionary itself stays small, each cache refresh creates new data structures (sorted lists, file paths, etc.) that may not be immediately garbage collected. In long-running sessions with frequent directory changes or large file trees, this can lead to:

Gradual memory growth as intermediate objects accumulate
Higher memory pressure during filesystem scanning
Potential performance degradation over time

Suggested Fix

Improve cache management with explicit cleanup:

def _catalog() -> dict[str, list[str]]:
    cwd = Path.cwd()
    now = time.time()
    cached_cwd = _catalog_cache.get("cwd")
    cached_time = _catalog_cache.get("created_at")
    cached_catalog = _catalog_cache.get("catalog")
    
    if (
        isinstance(cached_cwd, Path)
        and cached_cwd == cwd
        and isinstance(cached_time, float)
        and now - cached_time < _CATALOG_CACHE_TTL_SECONDS
        and isinstance(cached_catalog, dict)
    ):
        return cached_catalog

    # Explicit cleanup before creating new cache
    _catalog_cache.clear()
    
    # ... build new catalog ...
    
    # Explicitly delete old references to help GC
    del cached_catalog, cached_cwd, cached_time
    
    # Set new cache
    _catalog_cache.update({
        "cwd": cwd,
        "created_at": now,
        "catalog": catalog
    })
    
    return catalog

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: Resource — Memory Growth in Reference Parser Cache Over Long Sessions ### Severity Assessment - **Impact**: Memory leakage in long-running TUI sessions, particularly those that frequently change directories - **Likelihood**: Low to Medium in typical usage, High in automated or long-running sessions with frequent directory changes - **Priority**: Low ### Location - **File**: `src/cleveragents/tui/input/reference_parser.py` - **Function/Class**: `_catalog()` function and global `_catalog_cache` - **Lines**: 15, 33-46 ### Description The reference parser uses a global cache with TTL-based invalidation, but it only replaces cache entries rather than clearing old ones. While the current implementation only stores one cache entry at a time (per directory), the cache dictionary grows over time through repeated key updates without explicit cleanup, and intermediate objects may not be properly garbage collected. ### Evidence ```python _catalog_cache: dict[str, object] = {"cwd": None, "created_at": 0.0, "catalog": None} def _catalog() -> dict[str, list[str]]: # ... validation logic ... # Creates new catalog data structure every time catalog = { "resource": sorted(files), "project": [value for value in [cwd.name] if value], "plan": [], "actor": sorted( f"local/{file.stem}" for file in (cwd / "examples" / "actors").glob("*.y*ml") ) if (cwd / "examples" / "actors").is_dir() else [], # ... more categories } # Replace cache entries, but intermediate objects may linger _catalog_cache["cwd"] = cwd _catalog_cache["created_at"] = now _catalog_cache["catalog"] = catalog return catalog ``` ### Expected Behavior Cache memory usage should remain bounded over long sessions, with proper cleanup of old cache data and intermediate objects. ### Actual Behavior While the cache dictionary itself stays small, each cache refresh creates new data structures (sorted lists, file paths, etc.) that may not be immediately garbage collected. In long-running sessions with frequent directory changes or large file trees, this can lead to: - Gradual memory growth as intermediate objects accumulate - Higher memory pressure during filesystem scanning - Potential performance degradation over time ### Suggested Fix Improve cache management with explicit cleanup: ```python def _catalog() -> dict[str, list[str]]: cwd = Path.cwd() now = time.time() cached_cwd = _catalog_cache.get("cwd") cached_time = _catalog_cache.get("created_at") cached_catalog = _catalog_cache.get("catalog") if ( isinstance(cached_cwd, Path) and cached_cwd == cwd and isinstance(cached_time, float) and now - cached_time < _CATALOG_CACHE_TTL_SECONDS and isinstance(cached_catalog, dict) ): return cached_catalog # Explicit cleanup before creating new cache _catalog_cache.clear() # ... build new catalog ... # Explicitly delete old references to help GC del cached_catalog, cached_cwd, cached_time # Set new cache _catalog_cache.update({ "cwd": cwd, "created_at": now, "catalog": catalog }) return catalog ``` ### Category resource ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter

HAL9000 referenced this issue

2026-04-10 07:41:53 +00:00

BUG-HUNT: [RESOURCE] Memory leak in PlanService memory services cache #7100

HAL9000 added the

labels 2026-04-10 18:58:30 +00:00

freemo added the

Type

Bug

label 2026-04-12 18:37:28 +00:00

HAL9000 added

and removed

labels 2026-04-14 06:49:14 +00:00

HAL9000 commented

2026-04-14 06:49:14 +00:00

✅ Verified — Resource bug: memory growth in reference parser cache. MoSCoW: Should-have. Priority: Medium.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource bug: memory growth in reference parser cache. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor