design(acms): budget eviction permanently deletes hot-tier fragments instead of demoting to warm #1152

Open
opened 2026-03-24 22:15:16 +00:00 by CoreRasurae · 2 comments
Member

Metadata

  • Type: Task
  • Priority: Medium
  • MoSCoW: (to be assigned by project owner)
  • Milestone: (to be assigned — likely v3.4.0 or later)

Background and Context

During code review of PR #1150 (issue #821 — context tier runtime promotion/demotion/eviction), finding B-MED-2 identified a design concern in _enforce_hot_budget(): when the hot-tier token budget is exceeded, evicted fragments are permanently deleted via del self._hot[oldest_id] rather than demoted to the warm tier.

The B-CRIT-1 fix in that PR correctly protects the just-promoted fragment (restoring it to warm if budget-evicted). However, other hot-tier residents evicted to make room are permanently lost.

Relevant code

src/cleveragents/application/services/context_tiers.py_enforce_hot_budget():

while total_tokens > budget_tokens and self._hot:
    oldest_id = min(self._hot.keys(), key=lambda fid: self._hot[fid].last_accessed)
    evicted_tokens = self._hot[oldest_id].token_count
    del self._hot[oldest_id]          # Permanent deletion — not demoted to warm
    total_tokens -= evicted_tokens
    self._emit_tier_event(EventType.TIER_EVICTED, oldest_id, ...)

The pre-existing evict_lru() public method also permanently deletes, so the new code is consistent with the established pattern.

Specification references

  • Line 44569 (§Three Storage Tiers with Temporal Alignment): the tier table states hot-tier retention is "Until resource removed".
  • Lines 25704–25713 (§Plan Lifecycle ACMS Actions): "Hot context archived to warm. Warm context ages to cold based on retention policy." This describes a downward lifecycle flow suggesting fragments should move through tiers rather than be destroyed.

Why this was NOT applied in PR #1150

  1. Issue terminology: Issue #821 acceptance criteria explicitly says "Implement budget-based eviction from hot tier". In caching literature, "eviction" means removal, not demotion. Changing the semantics would contradict the issue's intent.

  2. Pre-existing codebase pattern: The pre-existing evict_lru() public method (not introduced by PR #1150) already permanently deletes fragments. Changing _enforce_hot_budget() to demote instead of delete would create an inconsistency with evict_lru(), and changing both would be scope creep beyond #821.

  3. Cascading risk: Demoting to warm on budget overflow could cause unbounded warm-tier growth. The warm tier uses max_decisions_warm (a decision count, not tokens), and there is no corresponding warm-tier budget enforcement. This could shift the problem rather than solve it.

  4. Spec ambiguity: The specification's "Hot context archived to warm" describes plan-completion lifecycle flow, not capacity management. Budget eviction is a separate concern operating at a different abstraction level.

Expected Behavior

An architecture-level decision (potentially via an ADR) should determine whether budget-based eviction should:

  • Option A — Demote: Evicted hot-tier fragments are demoted to warm (preserving the spec's downward lifecycle flow). This requires warm-tier budget enforcement to prevent cascading growth.
  • Option B — Delete (current): Evicted fragments are permanently removed (traditional cache eviction semantics). This should be explicitly documented as a deliberate deviation from the downward lifecycle.
  • Option C — Hybrid: Demote up to warm-tier capacity, delete beyond that. Most complex but preserves data when possible.

Acceptance Criteria

  • An architecture decision is made and documented (ADR or specification update) on whether budget eviction should demote or delete
  • If the decision is to demote: _enforce_hot_budget() and evict_lru() are updated to demote instead of delete, with warm-tier overflow handling
  • If the decision is to keep deleting: a docstring note is added to _enforce_hot_budget() and evict_lru() explicitly documenting that eviction is permanent deletion, not demotion, and why
  • Tests updated to reflect the chosen behavior

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • The chosen approach is documented (ADR or specification update or code docstrings).
  • If code changes are made: a Git commit is created, pushed, submitted as a pull request to master, reviewed, and merged.

Subtasks

  • Discuss and decide on eviction semantics (demote vs delete vs hybrid)
  • Document the decision (ADR or specification clarification)
  • Implement the chosen approach (if code changes are needed)
  • Update tests to cover the chosen behavior
## Metadata - **Type**: Task - **Priority**: Medium - **MoSCoW**: *(to be assigned by project owner)* - **Milestone**: *(to be assigned — likely v3.4.0 or later)* ## Background and Context During code review of PR #1150 (issue #821 — context tier runtime promotion/demotion/eviction), finding **B-MED-2** identified a design concern in `_enforce_hot_budget()`: when the hot-tier token budget is exceeded, evicted fragments are permanently deleted via `del self._hot[oldest_id]` rather than demoted to the warm tier. The B-CRIT-1 fix in that PR correctly protects the *just-promoted* fragment (restoring it to warm if budget-evicted). However, **other** hot-tier residents evicted to make room are permanently lost. ### Relevant code `src/cleveragents/application/services/context_tiers.py` — `_enforce_hot_budget()`: ```python while total_tokens > budget_tokens and self._hot: oldest_id = min(self._hot.keys(), key=lambda fid: self._hot[fid].last_accessed) evicted_tokens = self._hot[oldest_id].token_count del self._hot[oldest_id] # Permanent deletion — not demoted to warm total_tokens -= evicted_tokens self._emit_tier_event(EventType.TIER_EVICTED, oldest_id, ...) ``` The pre-existing `evict_lru()` public method also permanently deletes, so the new code is consistent with the established pattern. ### Specification references - **Line 44569** (§Three Storage Tiers with Temporal Alignment): the tier table states hot-tier retention is *"Until resource removed"*. - **Lines 25704–25713** (§Plan Lifecycle ACMS Actions): *"Hot context archived to warm. Warm context ages to cold based on retention policy."* This describes a downward lifecycle flow suggesting fragments should move through tiers rather than be destroyed. ### Why this was NOT applied in PR #1150 1. **Issue terminology**: Issue #821 acceptance criteria explicitly says *"Implement budget-based eviction from hot tier"*. In caching literature, "eviction" means removal, not demotion. Changing the semantics would contradict the issue's intent. 2. **Pre-existing codebase pattern**: The pre-existing `evict_lru()` public method (not introduced by PR #1150) already permanently deletes fragments. Changing `_enforce_hot_budget()` to demote instead of delete would create an inconsistency with `evict_lru()`, and changing both would be scope creep beyond #821. 3. **Cascading risk**: Demoting to warm on budget overflow could cause unbounded warm-tier growth. The warm tier uses `max_decisions_warm` (a decision count, not tokens), and there is no corresponding warm-tier budget enforcement. This could shift the problem rather than solve it. 4. **Spec ambiguity**: The specification's *"Hot context archived to warm"* describes **plan-completion lifecycle flow**, not capacity management. Budget eviction is a separate concern operating at a different abstraction level. ## Expected Behavior An architecture-level decision (potentially via an ADR) should determine whether budget-based eviction should: - **Option A — Demote**: Evicted hot-tier fragments are demoted to warm (preserving the spec's downward lifecycle flow). This requires warm-tier budget enforcement to prevent cascading growth. - **Option B — Delete (current)**: Evicted fragments are permanently removed (traditional cache eviction semantics). This should be explicitly documented as a deliberate deviation from the downward lifecycle. - **Option C — Hybrid**: Demote up to warm-tier capacity, delete beyond that. Most complex but preserves data when possible. ## Acceptance Criteria - [ ] An architecture decision is made and documented (ADR or specification update) on whether budget eviction should demote or delete - [ ] If the decision is to demote: `_enforce_hot_budget()` and `evict_lru()` are updated to demote instead of delete, with warm-tier overflow handling - [ ] If the decision is to keep deleting: a docstring note is added to `_enforce_hot_budget()` and `evict_lru()` explicitly documenting that eviction is permanent deletion, not demotion, and why - [ ] Tests updated to reflect the chosen behavior ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - The chosen approach is documented (ADR or specification update or code docstrings). - If code changes are made: a Git commit is created, pushed, submitted as a pull request to `master`, reviewed, and merged. ## Subtasks - [ ] Discuss and decide on eviction semantics (demote vs delete vs hybrid) - [ ] Document the decision (ADR or specification clarification) - [ ] Implement the chosen approach (if code changes are needed) - [ ] Update tests to cover the chosen behavior
freemo added this to the v3.4.0 milestone 2026-03-28 23:15:43 +00:00
Owner

Day 48 Planning — TDD Workflow Initiated

TDD workflow initiated for this bug:

  • Created TDD issue #1183 to write a tagged test that captures this bug.
  • TDD issue assigned to @hamza.khyari (ACMS infrastructure expert, different from @aditya who has the bug fix).
  • Once #1183 is merged to master, @aditya should create branch bugfix/m5-budget-eviction-delete from master and implement the fix.
  • The fix must remove the @tdd_expected_fail tag from the test so it runs normally.
  • The CI quality gate will verify the tag was removed before the PR can be merged.

Note: A Forgejo dependency link (#1152 depends on #1183) should be added through the Forgejo UI — the API endpoint is not functioning correctly for dependency creation.

**Day 48 Planning — TDD Workflow Initiated** TDD workflow initiated for this bug: - Created TDD issue #1183 to write a tagged test that captures this bug. - TDD issue assigned to @hamza.khyari (ACMS infrastructure expert, different from @aditya who has the bug fix). - Once #1183 is merged to `master`, @aditya should create branch `bugfix/m5-budget-eviction-delete` from `master` and implement the fix. - The fix must remove the `@tdd_expected_fail` tag from the test so it runs normally. - The CI quality gate will verify the tag was removed before the PR can be merged. **Note**: A Forgejo dependency link (#1152 depends on #1183) should be added through the Forgejo UI — the API endpoint is not functioning correctly for dependency creation.
freemo self-assigned this 2026-04-02 06:13:53 +00:00
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: High (confirmed) — This is an architecture-level design decision about ACMS budget eviction semantics that affects data preservation guarantees.
  • Milestone: v3.4.0 (confirmed) — ACMS v1 is the core deliverable for this milestone. The eviction semantics must be decided and documented.
  • MoSCoW: Must Have (confirmed) — The ACMS tier lifecycle is a core architectural concept. The current behavior (permanent deletion on budget eviction) may conflict with the spec's downward lifecycle flow. An explicit decision and documentation is required.
  • Parent Epic: Needs linking — likely belongs under Epic #935 (ACMS Context Assembly Pipeline) or #396 (ACMS Context Pipeline).

Valid design issue raised during code review. The three options (demote, delete, hybrid) are well-articulated. This requires an architecture decision — likely from @freemo as the CTO. The decision should be documented as an ADR or spec clarification.

Recommendation: Given the spec's language about "hot context archived to warm" and the downward lifecycle flow, Option A (Demote) aligns best with the specification. However, this requires warm-tier budget enforcement to prevent cascading growth, which adds scope. If time is constrained for v3.4.0, Option B (Delete with documentation) is acceptable as a pragmatic choice with a follow-up issue for demotion in a later milestone.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: High (confirmed) — This is an architecture-level design decision about ACMS budget eviction semantics that affects data preservation guarantees. - **Milestone**: v3.4.0 (confirmed) — ACMS v1 is the core deliverable for this milestone. The eviction semantics must be decided and documented. - **MoSCoW**: Must Have (confirmed) — The ACMS tier lifecycle is a core architectural concept. The current behavior (permanent deletion on budget eviction) may conflict with the spec's downward lifecycle flow. An explicit decision and documentation is required. - **Parent Epic**: Needs linking — likely belongs under Epic #935 (ACMS Context Assembly Pipeline) or #396 (ACMS Context Pipeline). Valid design issue raised during code review. The three options (demote, delete, hybrid) are well-articulated. This requires an architecture decision — likely from @freemo as the CTO. The decision should be documented as an ADR or spec clarification. **Recommendation**: Given the spec's language about "hot context archived to warm" and the downward lifecycle flow, **Option A (Demote)** aligns best with the specification. However, this requires warm-tier budget enforcement to prevent cascading growth, which adds scope. If time is constrained for v3.4.0, **Option B (Delete with documentation)** is acceptable as a pragmatic choice with a follow-up issue for demotion in a later milestone. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#1152
No description provided.