[Bug Hunt][Cycle 2][Resource] Unbounded Memory Growth in Sandbox BoundaryCache #7109

Open
opened 2026-04-10 07:49:20 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: bugfix/sandbox-boundary-cache-unbounded-memory-growth
  • Commit Message: fix(infrastructure): implement LRU eviction and size limits in SandboxManager BoundaryCache
  • Milestone: backlog
  • Parent Epic: #7023

Bug Report: Resource Management — Unbounded Memory Growth in Sandbox BoundaryCache

Severity Assessment

  • Impact: Memory exhaustion in long-running processes with many resources
  • Likelihood: High in environments with large resource DAGs or frequent resource creation
  • Priority: High

Location

  • File: src/cleveragents/infrastructure/sandbox/manager.py
  • Class: SandboxManager
  • Lines: 40-45 (BoundaryCache initialization), 370-385 (resolve_sandbox_key), 410-415 (clear_boundary_cache)

Description

The SandboxManager uses a BoundaryCache to cache resource boundary lookups, but the cache has no size limits, eviction policy, or automatic cleanup. In long-running processes that handle many resources, this cache grows unbounded, consuming increasing amounts of memory.

Evidence

class SandboxManager:
    def __init__(self, factory: SandboxFactory, cleanup_on_exit: bool = True) -> None:
        # ...
        self._boundary_cache: BoundaryCache = BoundaryCache()  # No size limits

    def resolve_sandbox_key(self, resource: Resource, resource_registry: Mapping[str, Resource]) -> str:
        boundary = self._boundary_cache.get_boundary(resource, resource_registry)  # Caches indefinitely
        # No cache eviction logic

The cache is only cleared manually via clear_boundary_cache(), which is called at plan execution start but not during cleanup or based on size/memory pressure.

Expected Behavior

The boundary cache should implement proper memory management:

  • Maximum cache size limits (LRU eviction)
  • Periodic cleanup of stale entries
  • Memory pressure-based eviction
  • Automatic cleanup when plans complete

Actual Behavior

  • Cache grows without bounds as more resources are processed
  • Memory consumption increases monotonically in long-running processes
  • No protection against memory exhaustion attacks via resource creation
  • Cache entries persist even after resources are deleted

Suggested Fix

  1. Implement LRU cache with configurable size limit
  2. Add periodic cleanup based on entry age
  3. Clear cache entries when associated resources are deleted
  4. Add memory monitoring and pressure-based eviction
  5. Clear cache sections when plans complete

Example implementation:

from functools import lru_cache
from typing import Optional

class BoundaryCache:
    def __init__(self, max_size: int = 1000):
        self._max_size = max_size
        self._cache = {}  # Implement LRU
        
    @lru_cache(maxsize=max_size)
    def get_boundary_cached(self, resource_id: str) -> Optional[Resource]:
        # Cached boundary lookup

Category

resource

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.

Subtasks

  • Reproduce unbounded memory growth in BoundaryCache with a large resource DAG
  • Implement LRU eviction policy with configurable max_size in BoundaryCache
  • Add periodic cleanup of stale entries based on entry age/TTL
  • Clear cache entries when associated resources are deleted from the registry
  • Add memory pressure-based eviction hook
  • Ensure clear_boundary_cache() is called on plan completion (not just plan start)
  • Write BDD scenarios covering cache size limits, eviction, and cleanup behaviour
  • Update integration tests to verify memory does not grow unboundedly under load
  • Update documentation / docstrings for BoundaryCache and SandboxManager

Definition of Done

  • BoundaryCache enforces a configurable maximum size with LRU eviction
  • Cache entries are automatically invalidated when resources are removed
  • clear_boundary_cache() is invoked at both plan start and plan completion
  • No unbounded memory growth observed under sustained resource-creation load
  • BDD scenarios for cache eviction and cleanup pass
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Metadata - **Branch**: `bugfix/sandbox-boundary-cache-unbounded-memory-growth` - **Commit Message**: `fix(infrastructure): implement LRU eviction and size limits in SandboxManager BoundaryCache` - **Milestone**: backlog - **Parent Epic**: #7023 ## Bug Report: Resource Management — Unbounded Memory Growth in Sandbox BoundaryCache ### Severity Assessment - **Impact**: Memory exhaustion in long-running processes with many resources - **Likelihood**: High in environments with large resource DAGs or frequent resource creation - **Priority**: High ### Location - **File**: `src/cleveragents/infrastructure/sandbox/manager.py` - **Class**: `SandboxManager` - **Lines**: 40-45 (BoundaryCache initialization), 370-385 (resolve_sandbox_key), 410-415 (clear_boundary_cache) ### Description The SandboxManager uses a BoundaryCache to cache resource boundary lookups, but the cache has no size limits, eviction policy, or automatic cleanup. In long-running processes that handle many resources, this cache grows unbounded, consuming increasing amounts of memory. ### Evidence ```python class SandboxManager: def __init__(self, factory: SandboxFactory, cleanup_on_exit: bool = True) -> None: # ... self._boundary_cache: BoundaryCache = BoundaryCache() # No size limits def resolve_sandbox_key(self, resource: Resource, resource_registry: Mapping[str, Resource]) -> str: boundary = self._boundary_cache.get_boundary(resource, resource_registry) # Caches indefinitely # No cache eviction logic ``` The cache is only cleared manually via `clear_boundary_cache()`, which is called at plan execution start but not during cleanup or based on size/memory pressure. ### Expected Behavior The boundary cache should implement proper memory management: - Maximum cache size limits (LRU eviction) - Periodic cleanup of stale entries - Memory pressure-based eviction - Automatic cleanup when plans complete ### Actual Behavior - Cache grows without bounds as more resources are processed - Memory consumption increases monotonically in long-running processes - No protection against memory exhaustion attacks via resource creation - Cache entries persist even after resources are deleted ### Suggested Fix 1. Implement LRU cache with configurable size limit 2. Add periodic cleanup based on entry age 3. Clear cache entries when associated resources are deleted 4. Add memory monitoring and pressure-based eviction 5. Clear cache sections when plans complete Example implementation: ```python from functools import lru_cache from typing import Optional class BoundaryCache: def __init__(self, max_size: int = 1000): self._max_size = max_size self._cache = {} # Implement LRU @lru_cache(maxsize=max_size) def get_boundary_cached(self, resource_id: str) -> Optional[Resource]: # Cached boundary lookup ``` ### Category resource ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. ## Subtasks - [ ] Reproduce unbounded memory growth in BoundaryCache with a large resource DAG - [ ] Implement LRU eviction policy with configurable `max_size` in `BoundaryCache` - [ ] Add periodic cleanup of stale entries based on entry age/TTL - [ ] Clear cache entries when associated resources are deleted from the registry - [ ] Add memory pressure-based eviction hook - [ ] Ensure `clear_boundary_cache()` is called on plan completion (not just plan start) - [ ] Write BDD scenarios covering cache size limits, eviction, and cleanup behaviour - [ ] Update integration tests to verify memory does not grow unboundedly under load - [ ] Update documentation / docstrings for `BoundaryCache` and `SandboxManager` ## Definition of Done - [ ] `BoundaryCache` enforces a configurable maximum size with LRU eviction - [ ] Cache entries are automatically invalidated when resources are removed - [ ] `clear_boundary_cache()` is invoked at both plan start and plan completion - [ ] No unbounded memory growth observed under sustained resource-creation load - [ ] BDD scenarios for cache eviction and cleanup pass - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

Verified — Resource bug: unbounded memory growth in Sandbox BoundaryCache. MoSCoW: Should-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource bug: unbounded memory growth in Sandbox BoundaryCache. MoSCoW: Should-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7109
No description provided.