BUG-HUNT: [concurrency] ContextTierService documented as single-threaded but registered as DI Singleton #7547

Closed
opened 2026-04-10 21:40:24 +00:00 by HAL9000 · 2 comments
Owner

Bug Report: [concurrency] — ContextTierService Single-Threaded but Used as Singleton

Severity Assessment

  • Impact: ContextTierService is documented as single-threaded but registered as providers.Singleton. Concurrent plan execution shares the same instance, causing potential RuntimeError: dictionary changed size during iteration on the hot/warm/cold tier dicts.
  • Likelihood: High — triggered during parallel subplan execution.
  • Priority: High

Location

  • File: src/cleveragents/application/services/context_tiers.py
  • Also: src/cleveragents/application/container.py
  • Lines: context_tiers.py:92, container.py:713-716

Description

The DI container registers ContextTierService as a singleton:

But the class docstring explicitly warns:

The tier stores have no locking:

Expected Behavior

Either add threading.RLock() to all operations, or change providers.Singleton to providers.Factory.

Actual Behavior

Shared, unsynchronized ContextTierService causes potential data corruption under concurrent plan execution.

Suggested Fix

Add self._lock = threading.RLock() and acquire it in store(), get(), promote(), demote(), evict(), and enforce_staleness().

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with @tdd_expected_fail tags.


Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: [concurrency] — ContextTierService Single-Threaded but Used as Singleton ### Severity Assessment - **Impact**: ContextTierService is documented as single-threaded but registered as providers.Singleton. Concurrent plan execution shares the same instance, causing potential RuntimeError: dictionary changed size during iteration on the hot/warm/cold tier dicts. - **Likelihood**: High — triggered during parallel subplan execution. - **Priority**: High ### Location - **File**: src/cleveragents/application/services/context_tiers.py - **Also**: src/cleveragents/application/container.py - **Lines**: context_tiers.py:92, container.py:713-716 ### Description The DI container registers ContextTierService as a singleton: But the class docstring explicitly warns: The tier stores have no locking: ### Expected Behavior Either add threading.RLock() to all operations, or change providers.Singleton to providers.Factory. ### Actual Behavior Shared, unsynchronized ContextTierService causes potential data corruption under concurrent plan execution. ### Suggested Fix Add self._lock = threading.RLock() and acquire it in store(), get(), promote(), demote(), evict(), and enforce_staleness(). ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with @tdd_expected_fail tags. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-10 23:05:49 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: High — Concurrency bug that can cause data corruption or incorrect behavior under concurrent access
  • Milestone: v3.5.0 (M6: Autonomy Hardening) — ContextTierService is part of ACMS context management
  • Story Points: 3 (M) — Thread safety fix with clear scope
  • MoSCoW: Must Have — Thread safety is required for correct concurrent operation

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Issue triaged by project owner: - **State**: Verified - **Priority**: High — Concurrency bug that can cause data corruption or incorrect behavior under concurrent access - **Milestone**: v3.5.0 (M6: Autonomy Hardening) — ContextTierService is part of ACMS context management - **Story Points**: 3 (M) — Thread safety fix with clear scope - **MoSCoW**: Must Have — Thread safety is required for correct concurrent operation --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Implementation Attempt — Tier 1: haiku — Success [AUTO-IMP-ISSUE-7547]

What Was Done

Implemented the thread-safety fix for ContextTierService as described in the issue.

Changes Made

  1. src/cleveragents/application/services/context_tiers.py

    • Added import threading
    • Added self._lock = threading.RLock() in __init__
    • Wrapped all public methods with with self._lock:: store, get, promote, demote, evict_lru, get_metrics, get_all_fragments, get_hot_fragments, get_for_actor, get_scoped_view
    • Updated docstring to document thread-safety guarantee
    • Extracted settings helpers to new module to stay under 500-line limit
  2. src/cleveragents/application/services/tier_runtime.py

    • Added _lock: threading.RLock type stub to TierRuntimeMixin
    • Wrapped enforce_staleness() with with self._lock:
  3. src/cleveragents/application/services/scoped_tiers.py

    • Added _lock: threading.RLock type stub to ScopedTierMixin
    • Wrapped get_scoped_by_resource() and get_scoped_metrics() with with self._lock:
  4. src/cleveragents/application/services/context_tier_settings.py (new)

    • Extracted settings helpers from context_tiers.py to keep it under 500 lines
  5. features/context_tier_thread_safety.feature (new)

    • 10 BDD scenarios covering concurrent store, get, promote, demote, evict_lru, enforce_staleness, get_metrics, and singleton access
  6. features/steps/context_tier_thread_safety_steps.py (new)

    • Step definitions for all thread-safety scenarios
  7. CHANGELOG.md — Added fix entry under [Unreleased] > Fixed

Quality Gates

  • nox -e lint — passes
  • nox -e typecheck — passes (0 errors)
  • nox -e unit_tests — running in CI (MCP timeout prevented local execution)

PR

PR #8279: #8279


Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-pool-supervisor

**Implementation Attempt** — Tier 1: haiku — Success [AUTO-IMP-ISSUE-7547] ## What Was Done Implemented the thread-safety fix for `ContextTierService` as described in the issue. ### Changes Made 1. **`src/cleveragents/application/services/context_tiers.py`** - Added `import threading` - Added `self._lock = threading.RLock()` in `__init__` - Wrapped all public methods with `with self._lock:`: `store`, `get`, `promote`, `demote`, `evict_lru`, `get_metrics`, `get_all_fragments`, `get_hot_fragments`, `get_for_actor`, `get_scoped_view` - Updated docstring to document thread-safety guarantee - Extracted settings helpers to new module to stay under 500-line limit 2. **`src/cleveragents/application/services/tier_runtime.py`** - Added `_lock: threading.RLock` type stub to `TierRuntimeMixin` - Wrapped `enforce_staleness()` with `with self._lock:` 3. **`src/cleveragents/application/services/scoped_tiers.py`** - Added `_lock: threading.RLock` type stub to `ScopedTierMixin` - Wrapped `get_scoped_by_resource()` and `get_scoped_metrics()` with `with self._lock:` 4. **`src/cleveragents/application/services/context_tier_settings.py`** (new) - Extracted settings helpers from `context_tiers.py` to keep it under 500 lines 5. **`features/context_tier_thread_safety.feature`** (new) - 10 BDD scenarios covering concurrent store, get, promote, demote, evict_lru, enforce_staleness, get_metrics, and singleton access 6. **`features/steps/context_tier_thread_safety_steps.py`** (new) - Step definitions for all thread-safety scenarios 7. **`CHANGELOG.md`** — Added fix entry under `[Unreleased] > Fixed` ### Quality Gates - ✅ `nox -e lint` — passes - ✅ `nox -e typecheck` — passes (0 errors) - ⏳ `nox -e unit_tests` — running in CI (MCP timeout prevented local execution) ### PR PR #8279: https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/8279 --- **Automated by CleverAgents Bot** Supervisor: Implementation Pool | Agent: implementation-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7547
No description provided.