reference_parser._catalog_cache global dict is not thread-safe — concurrent Textual workers cause race conditions #8454

Open
opened 2026-04-13 19:15:54 +00:00 by HAL9000 · 1 comment
Owner

Metadata

Commit: Build: Reinforced label enforcement, and ensure implementation workers dont continue work on a mergable PR.
Branch: main

Background and Context

src/cleveragents/tui/input/reference_parser.py uses a module-level mutable dict _catalog_cache as a filesystem scan cache. This dict is read and written without any locking. In a Textual application that uses worker threads (e.g., for A2A streaming or shell execution), concurrent calls to _catalog() from different threads can cause race conditions: one thread may read a partially-written cache entry, or two threads may simultaneously trigger a filesystem walk and overwrite each other's results.

Code evidence:

_catalog_cache: dict[str, object] = {"cwd": None, "created_at": 0.0, "catalog": None}

def _catalog() -> dict[str, list[str]]:
    cwd = Path.cwd()
    now = time.time()
    cached_cwd = _catalog_cache.get("cwd")       # ← read without lock
    cached_time = _catalog_cache.get("created_at")
    cached_catalog = _catalog_cache.get("catalog")
    if (
        isinstance(cached_cwd, Path)
        and cached_cwd == cwd
        and isinstance(cached_time, float)
        and now - cached_time < _CATALOG_CACHE_TTL_SECONDS
        and isinstance(cached_catalog, dict)
    ):
        return cached_catalog

    # ... filesystem walk ...

    _catalog_cache["cwd"] = cwd          # ← write without lock
    _catalog_cache["created_at"] = now   # ← write without lock
    _catalog_cache["catalog"] = catalog  # ← write without lock
    return catalog

The three cache writes are not atomic. A thread reading between the first and third write will see an inconsistent cache state (e.g., new cwd but old catalog).

Current Behavior

Under concurrent access (e.g., two Textual workers both calling suggestions() simultaneously), the cache can be read in a partially-updated state, causing one thread to return stale or mismatched catalog data. In CPython, the GIL provides some protection for individual dict operations, but the multi-step read-check-write sequence is not atomic and can interleave.

Expected Behavior

The _catalog_cache should be protected by a threading.Lock to ensure atomic read-check-write operations. Alternatively, the cache could be replaced with a thread-local or an immutable snapshot pattern.

Acceptance Criteria

  • A threading.Lock (or threading.RLock) guards all reads and writes to _catalog_cache
  • The lock is acquired for the entire read-check-write sequence in _catalog()
  • No deadlock is introduced
  • BDD test covers concurrent calls to _catalog() from multiple threads
  • Existing reference parser tests pass

Subtasks

  • Add _catalog_lock = threading.Lock() at module level in reference_parser.py
  • Wrap the read-check-write sequence in _catalog() with with _catalog_lock:
  • Write a BDD scenario verifying thread-safe cache access
  • Verify existing reference parser BDD scenarios pass

Definition of Done

The issue is closed when _catalog_cache is protected by a threading lock, concurrent access is safe, and all BDD tests pass on main.


Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Metadata **Commit:** `Build: Reinforced label enforcement, and ensure implementation workers dont continue work on a mergable PR.` **Branch:** `main` ## Background and Context `src/cleveragents/tui/input/reference_parser.py` uses a module-level mutable dict `_catalog_cache` as a filesystem scan cache. This dict is read and written without any locking. In a Textual application that uses worker threads (e.g., for A2A streaming or shell execution), concurrent calls to `_catalog()` from different threads can cause race conditions: one thread may read a partially-written cache entry, or two threads may simultaneously trigger a filesystem walk and overwrite each other's results. **Code evidence:** ```python _catalog_cache: dict[str, object] = {"cwd": None, "created_at": 0.0, "catalog": None} def _catalog() -> dict[str, list[str]]: cwd = Path.cwd() now = time.time() cached_cwd = _catalog_cache.get("cwd") # ← read without lock cached_time = _catalog_cache.get("created_at") cached_catalog = _catalog_cache.get("catalog") if ( isinstance(cached_cwd, Path) and cached_cwd == cwd and isinstance(cached_time, float) and now - cached_time < _CATALOG_CACHE_TTL_SECONDS and isinstance(cached_catalog, dict) ): return cached_catalog # ... filesystem walk ... _catalog_cache["cwd"] = cwd # ← write without lock _catalog_cache["created_at"] = now # ← write without lock _catalog_cache["catalog"] = catalog # ← write without lock return catalog ``` The three cache writes are not atomic. A thread reading between the first and third write will see an inconsistent cache state (e.g., new `cwd` but old `catalog`). ## Current Behavior Under concurrent access (e.g., two Textual workers both calling `suggestions()` simultaneously), the cache can be read in a partially-updated state, causing one thread to return stale or mismatched catalog data. In CPython, the GIL provides some protection for individual dict operations, but the multi-step read-check-write sequence is not atomic and can interleave. ## Expected Behavior The `_catalog_cache` should be protected by a `threading.Lock` to ensure atomic read-check-write operations. Alternatively, the cache could be replaced with a thread-local or an immutable snapshot pattern. ## Acceptance Criteria - [ ] A `threading.Lock` (or `threading.RLock`) guards all reads and writes to `_catalog_cache` - [ ] The lock is acquired for the entire read-check-write sequence in `_catalog()` - [ ] No deadlock is introduced - [ ] BDD test covers concurrent calls to `_catalog()` from multiple threads - [ ] Existing reference parser tests pass ## Subtasks - [ ] Add `_catalog_lock = threading.Lock()` at module level in `reference_parser.py` - [ ] Wrap the read-check-write sequence in `_catalog()` with `with _catalog_lock:` - [ ] Write a BDD scenario verifying thread-safe cache access - [ ] Verify existing reference parser BDD scenarios pass ## Definition of Done The issue is closed when `_catalog_cache` is protected by a threading lock, concurrent access is safe, and all BDD tests pass on `main`. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
Author
Owner

[AUTO-OWNR-7] Triage Decision

Status: Verified

MoSCoW: Must Have
Priority: High ⬆️ (upgraded from Medium)

Rationale: The _catalog_cache global dict in reference_parser.py is read and written across multiple steps without any locking. In a Textual application that uses worker threads for A2A streaming and shell execution, concurrent calls to _catalog() can interleave — one thread may observe a partially-written cache state (e.g., updated cwd but stale catalog), leading to incorrect reference suggestions or data corruption. This is a Must Have fix at High priority: race conditions in shared mutable state are notoriously difficult to reproduce and debug, and the multi-threaded nature of the TUI makes this a real production risk, not a theoretical one. Priority upgraded from Medium to High accordingly.

Next Steps: Add _catalog_lock = threading.Lock() at module level in reference_parser.py. Wrap the entire read-check-write sequence in _catalog() with with _catalog_lock:. Add a BDD scenario verifying thread-safe concurrent access. Verify existing reference parser scenarios pass.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## [AUTO-OWNR-7] Triage Decision **Status**: ✅ Verified **MoSCoW**: Must Have **Priority**: High ⬆️ (upgraded from Medium) **Rationale**: The `_catalog_cache` global dict in `reference_parser.py` is read and written across multiple steps without any locking. In a Textual application that uses worker threads for A2A streaming and shell execution, concurrent calls to `_catalog()` can interleave — one thread may observe a partially-written cache state (e.g., updated `cwd` but stale `catalog`), leading to incorrect reference suggestions or data corruption. This is a **Must Have** fix at **High** priority: race conditions in shared mutable state are notoriously difficult to reproduce and debug, and the multi-threaded nature of the TUI makes this a real production risk, not a theoretical one. Priority upgraded from Medium to High accordingly. **Next Steps**: Add `_catalog_lock = threading.Lock()` at module level in `reference_parser.py`. Wrap the entire read-check-write sequence in `_catalog()` with `with _catalog_lock:`. Add a BDD scenario verifying thread-safe concurrent access. Verify existing reference parser scenarios pass. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8454
No description provided.