UAT: ContextTierService.store() does not enforce max_decisions_warm or max_decisions_cold — warm and cold tiers grow unbounded #4847

Open
opened 2026-04-08 20:07:47 +00:00 by HAL9000 · 0 comments
Owner

Bug Report

Feature Area: ACMS — Hot/Warm/Cold Tiered Storage, ContextTierService

Severity: Medium — warm and cold tiers grow unbounded; TierBudget.max_decisions_warm and max_decisions_cold are defined but never enforced, causing unbounded memory growth in long-running sessions

What Was Tested

Code-level analysis of src/cleveragents/application/services/context_tiers.py comparing the store() method against the TierBudget model and the spec (ADR-014).

Expected Behavior (from spec)

From docs/adr/ADR-014-context-management-acms.md:

Warm context: Recent decisions and fragments available for retrieval but not loaded by default. Retained for context.tiers.warm.retention-hours (default: 24 hours). Max decisions: context.warm.max-decisions (default: 100).

Cold context: Historical decisions and fragments archived for audit and correction. Retained for context.tiers.cold.retention-days (default: 90 days). Max decisions: context.cold.max-decisions (default: 500).

The TierBudget model defines these limits:

class TierBudget(BaseModel):
    max_tokens_hot: int = Field(default=8000, ...)
    max_decisions_warm: int = Field(default=500, ...)   # capacity limit
    max_decisions_cold: int = Field(default=5000, ...)  # capacity limit

Actual Behavior

ContextTierService.store() only enforces the hot tier budget:

def store(self, fragment: TieredFragment) -> None:
    self._remove_from_all(fragment.fragment_id)

    if fragment.tier == ContextTier.HOT:
        if fragment.token_count > self._budget.max_tokens_hot:
            # redirect to warm
            ...
        else:
            self._hot[fragment.fragment_id] = fragment
            self._enforce_hot_budget()   # ✅ hot tier enforced
    elif fragment.tier == ContextTier.WARM:
        self._warm[fragment.fragment_id] = fragment  # ❌ no limit check
    else:
        self._cold[fragment.fragment_id] = fragment  # ❌ no limit check

There is no enforcement of max_decisions_warm or max_decisions_cold. The warm and cold tiers can grow to an arbitrary size.

Impact

  • In a long-running session, the warm tier will accumulate all fragments ever stored there, consuming unbounded memory
  • The cold tier will similarly grow without bound
  • TierBudget.max_decisions_warm and max_decisions_cold are effectively dead configuration — they are read from settings but never used to limit tier size
  • evict_lru() exists but is never called automatically for warm or cold tiers

Comparison with Hot Tier

The hot tier correctly enforces its budget via _enforce_hot_budget() (implemented in TierRuntimeMixin), which evicts LRU fragments when the token budget is exceeded. The warm and cold tiers have no equivalent enforcement.

Code Locations

  • src/cleveragents/application/services/context_tiers.pystore() method (lines ~100-130)
  • src/cleveragents/domain/models/acms/tiers.pyTierBudget model (defines limits but they are not enforced)
  • src/cleveragents/application/services/tier_runtime.py_enforce_hot_budget() (hot tier enforcement, no equivalent for warm/cold)

Fix Required

Add warm and cold tier enforcement in store():

elif fragment.tier == ContextTier.WARM:
    self._warm[fragment.fragment_id] = fragment
    self._enforce_warm_budget()   # Add: evict LRU when len > max_decisions_warm

else:  # COLD
    self._cold[fragment.fragment_id] = fragment
    self._enforce_cold_budget()   # Add: evict LRU when len > max_decisions_cold

Where _enforce_warm_budget() and _enforce_cold_budget() evict the least-recently-used fragments when the tier exceeds max_decisions_warm / max_decisions_cold respectively.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area:** ACMS — Hot/Warm/Cold Tiered Storage, `ContextTierService` **Severity:** Medium — warm and cold tiers grow unbounded; `TierBudget.max_decisions_warm` and `max_decisions_cold` are defined but never enforced, causing unbounded memory growth in long-running sessions ### What Was Tested Code-level analysis of `src/cleveragents/application/services/context_tiers.py` comparing the `store()` method against the `TierBudget` model and the spec (ADR-014). ### Expected Behavior (from spec) From `docs/adr/ADR-014-context-management-acms.md`: > **Warm context**: Recent decisions and fragments available for retrieval but not loaded by default. Retained for `context.tiers.warm.retention-hours` (default: 24 hours). **Max decisions: `context.warm.max-decisions` (default: 100).** > > **Cold context**: Historical decisions and fragments archived for audit and correction. Retained for `context.tiers.cold.retention-days` (default: 90 days). **Max decisions: `context.cold.max-decisions` (default: 500).** The `TierBudget` model defines these limits: ```python class TierBudget(BaseModel): max_tokens_hot: int = Field(default=8000, ...) max_decisions_warm: int = Field(default=500, ...) # capacity limit max_decisions_cold: int = Field(default=5000, ...) # capacity limit ``` ### Actual Behavior `ContextTierService.store()` only enforces the hot tier budget: ```python def store(self, fragment: TieredFragment) -> None: self._remove_from_all(fragment.fragment_id) if fragment.tier == ContextTier.HOT: if fragment.token_count > self._budget.max_tokens_hot: # redirect to warm ... else: self._hot[fragment.fragment_id] = fragment self._enforce_hot_budget() # ✅ hot tier enforced elif fragment.tier == ContextTier.WARM: self._warm[fragment.fragment_id] = fragment # ❌ no limit check else: self._cold[fragment.fragment_id] = fragment # ❌ no limit check ``` There is no enforcement of `max_decisions_warm` or `max_decisions_cold`. The warm and cold tiers can grow to an arbitrary size. ### Impact - In a long-running session, the warm tier will accumulate all fragments ever stored there, consuming unbounded memory - The cold tier will similarly grow without bound - `TierBudget.max_decisions_warm` and `max_decisions_cold` are effectively dead configuration — they are read from settings but never used to limit tier size - `evict_lru()` exists but is never called automatically for warm or cold tiers ### Comparison with Hot Tier The hot tier correctly enforces its budget via `_enforce_hot_budget()` (implemented in `TierRuntimeMixin`), which evicts LRU fragments when the token budget is exceeded. The warm and cold tiers have no equivalent enforcement. ### Code Locations - `src/cleveragents/application/services/context_tiers.py` — `store()` method (lines ~100-130) - `src/cleveragents/domain/models/acms/tiers.py` — `TierBudget` model (defines limits but they are not enforced) - `src/cleveragents/application/services/tier_runtime.py` — `_enforce_hot_budget()` (hot tier enforcement, no equivalent for warm/cold) ### Fix Required Add warm and cold tier enforcement in `store()`: ```python elif fragment.tier == ContextTier.WARM: self._warm[fragment.fragment_id] = fragment self._enforce_warm_budget() # Add: evict LRU when len > max_decisions_warm else: # COLD self._cold[fragment.fragment_id] = fragment self._enforce_cold_budget() # Add: evict LRU when len > max_decisions_cold ``` Where `_enforce_warm_budget()` and `_enforce_cold_budget()` evict the least-recently-used fragments when the tier exceeds `max_decisions_warm` / `max_decisions_cold` respectively. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-08 20:18:49 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4847
No description provided.