bug(acms): context tier service has no runtime promotion/demotion/eviction logic #821

Closed
opened 2026-03-13 20:12:18 +00:00 by freemo · 5 comments
Owner

Metadata

  • Commit Message: fix(acms): implement context tier runtime promotion/demotion/eviction
  • Branch: fix/context-tier-runtime
  • Type: Bug
  • Priority: Medium
  • MoSCoW: Should have
  • Points: 8
  • Milestone: v3.4.0

Background and Context

The specification defines a hot/warm/cold context tier system where fragments are promoted, demoted, and evicted based on access patterns, recency, and budget constraints. PR #530 (feat(context): add hot/warm/cold tiers and actor views) was merged and the work was marked complete.

However, a spec-vs-code audit found that only the data models for context tiers exist (domain/models/acms/tiers.py). There is no runtime logic for:

  • Promotion: Moving a fragment from cold → warm or warm → hot based on access frequency
  • Demotion: Moving a fragment from hot → warm or warm → cold based on staleness/inactivity
  • Eviction: Removing fragments from hot tier when budget is exceeded
  • Tier-aware queries: Retrieving fragments with tier-appropriate latency characteristics

The tier models define the data structures correctly, but the ContextTierService (or equivalent) that would perform tier transitions at runtime does not exist.

Acceptance Criteria

  • Implement ContextTierService with promote(fragment_id, target_tier), demote(fragment_id, target_tier), and evict(fragment_id) methods
  • Implement automatic promotion on access (configurable threshold: acms.tier.promotion_threshold)
  • Implement time-based demotion (configurable TTL per tier: acms.tier.hot_ttl, acms.tier.warm_ttl)
  • Implement budget-based eviction from hot tier when acms.tier.hot_max_tokens is exceeded
  • Wire tier service into ACMS pipeline so fragment retrieval respects tier placement
  • Add tier transition event emission for observability

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Subtasks

  • Implement ContextTierService with promote/demote/evict methods
  • Add automatic promotion on access with configurable threshold
  • Add time-based demotion with configurable TTL per tier
  • Add budget-based eviction from hot tier
  • Wire tier service into ACMS pipeline fragment retrieval
  • Add tier transition event emission
  • Tests (Behave): Add scenarios for tier promotion, demotion, eviction
  • Tests (Robot): Add integration test for tier lifecycle
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors
## Metadata - **Commit Message**: `fix(acms): implement context tier runtime promotion/demotion/eviction` - **Branch**: `fix/context-tier-runtime` - **Type**: Bug - **Priority**: Medium - **MoSCoW**: Should have - **Points**: 8 - **Milestone**: v3.4.0 ## Background and Context The specification defines a hot/warm/cold context tier system where fragments are promoted, demoted, and evicted based on access patterns, recency, and budget constraints. PR #530 (`feat(context): add hot/warm/cold tiers and actor views`) was merged and the work was marked complete. However, a spec-vs-code audit found that only the **data models** for context tiers exist (`domain/models/acms/tiers.py`). There is **no runtime logic** for: - **Promotion**: Moving a fragment from cold → warm or warm → hot based on access frequency - **Demotion**: Moving a fragment from hot → warm or warm → cold based on staleness/inactivity - **Eviction**: Removing fragments from hot tier when budget is exceeded - **Tier-aware queries**: Retrieving fragments with tier-appropriate latency characteristics The tier models define the data structures correctly, but the `ContextTierService` (or equivalent) that would perform tier transitions at runtime does not exist. ## Acceptance Criteria - [x] Implement `ContextTierService` with `promote(fragment_id, target_tier)`, `demote(fragment_id, target_tier)`, and `evict(fragment_id)` methods - [x] Implement automatic promotion on access (configurable threshold: `acms.tier.promotion_threshold`) - [x] Implement time-based demotion (configurable TTL per tier: `acms.tier.hot_ttl`, `acms.tier.warm_ttl`) - [x] Implement budget-based eviction from hot tier when `acms.tier.hot_max_tokens` is exceeded - [x] Wire tier service into ACMS pipeline so fragment retrieval respects tier placement - [x] Add tier transition event emission for observability ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. ## Subtasks - [x] Implement `ContextTierService` with promote/demote/evict methods - [x] Add automatic promotion on access with configurable threshold - [x] Add time-based demotion with configurable TTL per tier - [x] Add budget-based eviction from hot tier - [x] Wire tier service into ACMS pipeline fragment retrieval - [x] Add tier transition event emission - [x] Tests (Behave): Add scenarios for tier promotion, demotion, eviction - [x] Tests (Robot): Add integration test for tier lifecycle - [x] Verify coverage >=97% via `nox -s coverage_report` - [x] Run `nox` (all default sessions), fix any errors
freemo added this to the v3.4.0 milestone 2026-03-13 20:19:56 +00:00
Author
Owner

Dependency (TDD workflow): This bug fix is blocked by TDD issue #840. Per CONTRIBUTING.md Bug Fix Workflow, the TDD test capturing the bug must be written and merged first (@tdd_expected_fail), then the fix PR removes the tag.

Blocked by: #840

**Dependency (TDD workflow):** This bug fix is blocked by TDD issue #840. Per `CONTRIBUTING.md` Bug Fix Workflow, the TDD test capturing the bug must be written and merged first (`@tdd_expected_fail`), then the fix PR removes the tag. **Blocked by:** #840
Author
Owner

PM Status — Day 36

TDD not yet started. TDD counterpart #840 is assigned to @brent.edwards with State/Verified but no implementation or PR yet.

TDD Pipeline Status:

Stage Item Status
1. TDD Issue #840 Open, State/Verified, no PR
2. TDD PR Not started
3. Bug fix PR Cannot start until TDD merges

Action items:

  • @brent.edwards — After completing your current TDD work (#958 rebase, #929 review support), please start on #840. This is the TDD test for the context tier promotion/demotion/eviction logic bug. Branch: tdd/m5-context-tier-runtime.

Priority: Critical. The context tier service is a core ACMS component. However, this is in v3.4.0 and #958/#929 (v3.3.0) take precedence.


PM status comment — Day 36

## PM Status — Day 36 **TDD not yet started.** TDD counterpart #840 is assigned to @brent.edwards with State/Verified but no implementation or PR yet. **TDD Pipeline Status:** | Stage | Item | Status | |-------|------|--------| | 1. TDD Issue | #840 | Open, State/Verified, **no PR** | | 2. TDD PR | — | Not started | | 3. Bug fix PR | — | Cannot start until TDD merges | **Action items:** - @brent.edwards — After completing your current TDD work (#958 rebase, #929 review support), please start on #840. This is the TDD test for the context tier promotion/demotion/eviction logic bug. Branch: `tdd/m5-context-tier-runtime`. **Priority**: Critical. The context tier service is a core ACMS component. However, this is in v3.4.0 and #958/#929 (v3.3.0) take precedence. --- *PM status comment — Day 36*
Author
Owner

Day 42 escalation: The TDD counterpart (#840) was completed and merged to master on 2026-03-19. However, no bugfix/ branch has been created yet. Per the TDD workflow (CONTRIBUTING.md > Bug Fix Workflow), @CoreRasurae should:

  1. Create the bugfix branch from master
  2. Implement the fix
  3. Remove the @tdd_expected_fail tag
  4. Open a PR to master

This is now 2 days overdue. Please start immediately.

**Day 42 escalation**: The TDD counterpart (#840) was completed and merged to master on 2026-03-19. However, no bugfix/ branch has been created yet. Per the TDD workflow (CONTRIBUTING.md > Bug Fix Workflow), @CoreRasurae should: 1. Create the bugfix branch from master 2. Implement the fix 3. Remove the @tdd_expected_fail tag 4. Open a PR to master This is now 2 days overdue. Please start immediately.
Author
Owner

ESCALATION: Stalled TDD Pipeline — Day 43

The TDD counterpart for this bug (#840) was completed and merged to master (commit 202c9bfe on 2026-03-19). The @tdd_expected_fail tagged test for @tdd_bug_821 now exists on master. However, 4 days have passed and no bugfix branch has been created.

@CoreRasurae: Per the TDD workflow, you must:

  1. Create branch bugfix/m5-context-tier-runtime from master
  2. Implement the runtime promotion/demotion/eviction logic for ContextTierService
  3. Remove the @tdd_expected_fail tag from the test (leaving @tdd_bug and @tdd_bug_821)
  4. Open a PR from bugfix/m5-context-tier-runtime to master

The TDD test on master currently passes CI via the expected-failure mechanism. Once you implement the fix and remove the tag, the test will run normally and must pass — confirming the bug is fixed.

This is a Priority/Critical bug and has been stalled for 4 days. Please begin the fix immediately or flag if you have a capacity blocker.

**ESCALATION: Stalled TDD Pipeline — Day 43** The TDD counterpart for this bug (#840) was completed and merged to `master` (commit `202c9bfe` on 2026-03-19). The `@tdd_expected_fail` tagged test for `@tdd_bug_821` now exists on `master`. **However, 4 days have passed and no bugfix branch has been created.** @CoreRasurae: Per the TDD workflow, you must: 1. Create branch `bugfix/m5-context-tier-runtime` from `master` 2. Implement the runtime promotion/demotion/eviction logic for `ContextTierService` 3. Remove the `@tdd_expected_fail` tag from the test (leaving `@tdd_bug` and `@tdd_bug_821`) 4. Open a PR from `bugfix/m5-context-tier-runtime` to `master` The TDD test on `master` currently passes CI via the expected-failure mechanism. Once you implement the fix and remove the tag, the test will run normally and must pass — confirming the bug is fixed. **This is a Priority/Critical bug and has been stalled for 4 days. Please begin the fix immediately or flag if you have a capacity blocker.**
Member

Implementation Notes — Bug Fix for #821

Design Decisions

  1. Auto-promotion on access: Modified ContextTierService.get() to call _maybe_auto_promote() after touching a fragment. When access_count reaches the configurable promotion_threshold (default: 5), the fragment is promoted one tier up via the existing promote() method. This is a minimal change that reuses existing promotion logic.

  2. Staleness enforcement: Added enforce_staleness() method that inspects last_accessed timestamps across all tiers. Hot fragments older than hot_ttl (default: 24h) are demoted to warm; warm fragments older than warm_ttl (default: 24h) are demoted to cold. A snapshot of warm-tier IDs is taken before processing hot demotions to prevent double-demotion in a single pass.

  3. Budget enforcement on store: Modified store() to call _enforce_hot_budget() after inserting a fragment into the hot tier. This evicts the LRU fragment(s) until total_tokens <= max_tokens_hot. Uses a while loop for correctness when multiple fragments need eviction.

  4. Event emission: Added TIER_PROMOTED, TIER_DEMOTED, TIER_EVICTED event types to EventType enum. All tier transitions (promote, demote, evict, staleness enforcement, budget enforcement) emit DomainEvent instances through an optional EventBus. The event bus is injected via the DI container.

  5. Configuration: Added three new settings to Settings:

    • context_tier_promotion_threshold (default: 5)
    • context_tier_hot_ttl_hours (default: 24)
    • context_tier_warm_ttl_hours (default: 24)

Key Code Locations

  • ContextTierServicesrc/cleveragents/application/services/context_tiers.py
  • EventType enum — src/cleveragents/infrastructure/events/types.py
  • Settingssrc/cleveragents/config/settings.py
  • Container wiring — src/cleveragents/application/container.py (context_tier_service now receives event_bus)

TDD Tag Changes

  • Removed @tdd_expected_fail from features/tdd_context_tier_runtime.feature (Behave)
  • Removed tdd_expected_fail from robot/tdd_context_tier_runtime.robot (Robot Framework)
  • All 3 TDD scenarios now pass normally, confirming the bug is fixed.

Test Coverage

  • Behave BDD: 12 new scenarios in features/context_tier_runtime.feature covering auto-promotion (3), staleness enforcement (4), budget enforcement (2), event emission (3)
  • Robot Framework: 4 new integration tests in robot/context_tier_runtime.robot
  • ASV Benchmarks: 4 new benchmark suites in benchmarks/bench_context_tier_runtime.py
  • TDD tests: All 3 TDD scenarios pass (promotion, demotion, eviction)

Quality Gate Results

Session Result
lint PASS
typecheck PASS — 0 errors
unit_tests PASS — 12,307 scenarios, 0 failures
integration_tests PASS — 1,706 tests, 0 failures
coverage_report PASS — 98% (>= 97% threshold)
## Implementation Notes — Bug Fix for #821 ### Design Decisions 1. **Auto-promotion on access**: Modified `ContextTierService.get()` to call `_maybe_auto_promote()` after touching a fragment. When `access_count` reaches the configurable `promotion_threshold` (default: 5), the fragment is promoted one tier up via the existing `promote()` method. This is a minimal change that reuses existing promotion logic. 2. **Staleness enforcement**: Added `enforce_staleness()` method that inspects `last_accessed` timestamps across all tiers. Hot fragments older than `hot_ttl` (default: 24h) are demoted to warm; warm fragments older than `warm_ttl` (default: 24h) are demoted to cold. A snapshot of warm-tier IDs is taken before processing hot demotions to prevent double-demotion in a single pass. 3. **Budget enforcement on store**: Modified `store()` to call `_enforce_hot_budget()` after inserting a fragment into the hot tier. This evicts the LRU fragment(s) until `total_tokens <= max_tokens_hot`. Uses a while loop for correctness when multiple fragments need eviction. 4. **Event emission**: Added `TIER_PROMOTED`, `TIER_DEMOTED`, `TIER_EVICTED` event types to `EventType` enum. All tier transitions (promote, demote, evict, staleness enforcement, budget enforcement) emit `DomainEvent` instances through an optional `EventBus`. The event bus is injected via the DI container. 5. **Configuration**: Added three new settings to `Settings`: - `context_tier_promotion_threshold` (default: 5) - `context_tier_hot_ttl_hours` (default: 24) - `context_tier_warm_ttl_hours` (default: 24) ### Key Code Locations - `ContextTierService` — `src/cleveragents/application/services/context_tiers.py` - `EventType` enum — `src/cleveragents/infrastructure/events/types.py` - `Settings` — `src/cleveragents/config/settings.py` - Container wiring — `src/cleveragents/application/container.py` (context_tier_service now receives event_bus) ### TDD Tag Changes - Removed `@tdd_expected_fail` from `features/tdd_context_tier_runtime.feature` (Behave) - Removed `tdd_expected_fail` from `robot/tdd_context_tier_runtime.robot` (Robot Framework) - All 3 TDD scenarios now pass normally, confirming the bug is fixed. ### Test Coverage - **Behave BDD**: 12 new scenarios in `features/context_tier_runtime.feature` covering auto-promotion (3), staleness enforcement (4), budget enforcement (2), event emission (3) - **Robot Framework**: 4 new integration tests in `robot/context_tier_runtime.robot` - **ASV Benchmarks**: 4 new benchmark suites in `benchmarks/bench_context_tier_runtime.py` - **TDD tests**: All 3 TDD scenarios pass (promotion, demotion, eviction) ### Quality Gate Results | Session | Result | |---------|--------| | lint | PASS | | typecheck | PASS — 0 errors | | unit_tests | PASS — 12,307 scenarios, 0 failures | | integration_tests | PASS — 1,706 tests, 0 failures | | coverage_report | PASS — 98% (>= 97% threshold) |
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#821
No description provided.