[sentinel #11238] fix(acms): harden hot/warm/cold tier service reliability #2

Open
HAL9000 wants to merge 2 commits from tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability into master
Owner

This is an auto-generated sentinel duplicate of upstream PR cleveragents/cleveragents-core#11238 for pipeline testing. It targets the fork's master and is isolated from the canonical pipeline.


Summary

Harden the ContextTierService for production reliability under concurrent plan execution and high-throughput scenario.

Changes

  1. Remove dead conflicting defaults: Deleted _DEFAULT_MAX_TOKENS_HOT, _DEFAULT_MAX_DECISIONS_WARM, _DEFAULT_MAX_DECISIONS_COLD which were never used (all budget flows through budget_from_settings() in context_tier_settings.py) and contradicted the canonical values.

  2. Warm-tier capacity enforcement: Added _enforce_warm_capacity() — count-based LRU eviction of the warm tier when max_decisions_warm is exceeded. Triggered after:

    • Cold → warm promotion (in promote())
    • Hot-budget-fallback restore to warm (when promoted fragment could't fit in hot)
  3. Immutable snapshot returns: get_all_fragments() and get_hot_fragments() now return model_copy() instances instead of live mutable references, preventing state corruption from external mutation while the RLock is held.

  4. Naming consistency: _COLD_SUMMARY_MAX_CHARS renamed to _default_summarisation_max_chars following snake_case convention used throughout the module.

Impact

  • No breaking API changes — return types and method signatures unchanged (copies still implement TieredFragment).
  • Warmer tiers no longer silently exceed max_decisions_warm capacity after promotions.
  • Thread-safety improved for all public reader methods.

Original upstream PR: cleveragents/cleveragents-core#11238

_This is an auto-generated sentinel duplicate of upstream PR `cleveragents/cleveragents-core#11238` for pipeline testing. It targets the fork's `master` and is **isolated** from the canonical pipeline._ --- ## Summary Harden the `ContextTierService` for production reliability under concurrent plan execution and high-throughput scenario. ### Changes 1. **Remove dead conflicting defaults**: Deleted `_DEFAULT_MAX_TOKENS_HOT`, `_DEFAULT_MAX_DECISIONS_WARM`, `_DEFAULT_MAX_DECISIONS_COLD` which were never used (all budget flows through `budget_from_settings()` in `context_tier_settings.py`) and contradicted the canonical values. 2. **Warm-tier capacity enforcement**: Added `_enforce_warm_capacity()` — count-based LRU eviction of the warm tier when `max_decisions_warm` is exceeded. Triggered after: - Cold → warm promotion (in `promote()`) - Hot-budget-fallback restore to warm (when promoted fragment could't fit in hot) 3. **Immutable snapshot returns**: `get_all_fragments()` and `get_hot_fragments()` now return `model_copy()` instances instead of live mutable references, preventing state corruption from external mutation while the RLock is held. 4. **Naming consistency**: `_COLD_SUMMARY_MAX_CHARS` renamed to `_default_summarisation_max_chars` following snake_case convention used throughout the module. ### Impact - No breaking API changes — return types and method signatures unchanged (copies still implement `TieredFragment`). - Warmer tiers no longer silently exceed `max_decisions_warm` capacity after promotions. - Thread-safety improved for all public reader methods. --- _Original upstream PR: cleveragents/cleveragents-core#11238_
HAL9000 self-assigned this 2026-05-21 10:32:48 +00:00
- Remove unused conflicting _DEFAULT_* constants that conflicted with
  canonical defaults in context_tier_settings.py (which serves as the
  sole source of truth for budget/setting defaults).

- Add _enforce_warm_capacity() to enforce max_decisions_warm limit on
  warm tier after cold→warm promotion and hot-budget-fallback restore,
  preventing silent over-capacity data accumulation.

- Return deep copies (model_copy) from get_all_fragments() and
  get_hot_fragments() to prevent callers from mutating internal fragment
  state while the service holds its RLock under concurrent plan execution.

- Rename _COLD_SUMMARY_MAX_CHARS → _default_summarisation_max_chars for
  consistent snake_case naming throughout the module.

Fixes: PR #9663
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability:tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git switch tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch master
git merge --no-ff tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git switch tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git rebase master
git switch master
git merge --ff-only tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git switch tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git rebase master
git switch master
git merge --no-ff tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git switch master
git merge --squash tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git switch master
git merge --ff-only tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git switch master
git merge tests/sentinel-11238-pr-fix-9663-hot-warm-cold-tier-reliability
git push origin master
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
HAL9000/cleveragents-core!2
No description provided.