BUG-HUNT: [concurrency] Potential race condition in ActorRegistry #3213

Open
opened 2026-04-05 07:52:15 +00:00 by freemo · 1 comment
Owner

Background

The ActorRegistry class in src/cleveragents/actor/registry.py is stateful and its methods are not thread-safe. In a multi-threaded environment, concurrent calls to ActorRegistry methods could lead to inconsistent state, data corruption, or unexpected errors. The ensure_built_in_actors method is called by several other methods (add, upsert_actor, get, remove, etc.) and it modifies the state of the _actor_service. If multiple threads call these methods concurrently on the same ActorRegistry instance, it could lead to a race condition where the built-in actors are created multiple times or the default actor is set incorrectly.

Current Behavior

The ActorRegistry class has mutable state (_actor_service, _provider_registry, _settings) and its methods are not synchronized. The ensure_built_in_actors method performs a classic check-then-act race condition: it checks for the existence of actors and then creates them without holding any lock.

class ActorRegistry:
    def __init__(
        self,
        *,
        actor_service: ActorService,
        provider_registry: ProviderRegistry,
        settings: Settings,
    ) -> None:
        self._actor_service = actor_service
        self._provider_registry = provider_registry
        self._settings = settings

    def ensure_built_in_actors(self) -> list[Actor]:
        """Generate built-in actors from configured providers if missing."""

        configured: list[ProviderInfo] = (
            self._provider_registry.get_configured_providers()
        )
        if not configured:
            return []
        # ...
        # This part is not atomic and can cause race conditions
        # ...
        if not self._actor_service.get_default_actor() and actors:
            # ...
            self._actor_service.set_default_actor(preferred.name)

    def add(self, yaml_text: str, *, update: bool = False) -> Actor:
        self.ensure_built_in_actors()
        # ...

    def get(self, name: str) -> Actor:
        self.ensure_built_in_actors()
        # ...

Expected Behavior

The ActorRegistry class should be thread-safe. Concurrent calls to its methods from multiple threads must not result in inconsistent state, duplicate actor creation, or incorrect default actor assignment.

Acceptance Criteria

  • A threading.Lock (or equivalent) is added to ActorRegistry and acquired before accessing or modifying any shared state.
  • The ensure_built_in_actors method is synchronized to prevent the check-then-act race condition.
  • All existing Behave unit tests continue to pass.
  • New Behave unit tests cover concurrent access scenarios.
  • nox -e typecheck passes (no # type: ignore suppressions).
  • Coverage remains ≥ 97%.

Supporting Information

  • File: src/cleveragents/actor/registry.py
  • Class: ActorRegistry
  • Lines: 20–383
  • Impact: Medium likelihood; High impact if triggered in a multi-threaded deployment.
  • Category: concurrency

Metadata

  • Branch: fix/actor-registry-concurrency-race-condition
  • Commit Message: fix(actor): add thread-safety to ActorRegistry to prevent race conditions
  • Milestone: (none — backlog)
  • Parent Epic: #362

Subtasks

  • Add threading.Lock to ActorRegistry.__init__
  • Wrap ensure_built_in_actors body in lock acquisition
  • Audit all other mutating methods (add, upsert_actor, get, remove, etc.) for additional lock requirements
  • Write Behave scenarios covering concurrent access to ActorRegistry
  • Verify nox -e typecheck passes with no suppressions
  • Verify nox -e coverage_report reports ≥ 97%

Definition of Done

  • ActorRegistry acquires a lock before any check-then-act operation on shared state
  • No # type: ignore or type-checking suppressions introduced
  • All Behave unit tests pass (nox -e unit_tests)
  • New concurrent-access scenarios added under features/
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.3.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Background The `ActorRegistry` class in `src/cleveragents/actor/registry.py` is stateful and its methods are not thread-safe. In a multi-threaded environment, concurrent calls to `ActorRegistry` methods could lead to inconsistent state, data corruption, or unexpected errors. The `ensure_built_in_actors` method is called by several other methods (`add`, `upsert_actor`, `get`, `remove`, etc.) and it modifies the state of the `_actor_service`. If multiple threads call these methods concurrently on the same `ActorRegistry` instance, it could lead to a race condition where the built-in actors are created multiple times or the default actor is set incorrectly. ## Current Behavior The `ActorRegistry` class has mutable state (`_actor_service`, `_provider_registry`, `_settings`) and its methods are not synchronized. The `ensure_built_in_actors` method performs a classic check-then-act race condition: it checks for the existence of actors and then creates them without holding any lock. ```python class ActorRegistry: def __init__( self, *, actor_service: ActorService, provider_registry: ProviderRegistry, settings: Settings, ) -> None: self._actor_service = actor_service self._provider_registry = provider_registry self._settings = settings def ensure_built_in_actors(self) -> list[Actor]: """Generate built-in actors from configured providers if missing.""" configured: list[ProviderInfo] = ( self._provider_registry.get_configured_providers() ) if not configured: return [] # ... # This part is not atomic and can cause race conditions # ... if not self._actor_service.get_default_actor() and actors: # ... self._actor_service.set_default_actor(preferred.name) def add(self, yaml_text: str, *, update: bool = False) -> Actor: self.ensure_built_in_actors() # ... def get(self, name: str) -> Actor: self.ensure_built_in_actors() # ... ``` ## Expected Behavior The `ActorRegistry` class should be thread-safe. Concurrent calls to its methods from multiple threads must not result in inconsistent state, duplicate actor creation, or incorrect default actor assignment. ## Acceptance Criteria - A `threading.Lock` (or equivalent) is added to `ActorRegistry` and acquired before accessing or modifying any shared state. - The `ensure_built_in_actors` method is synchronized to prevent the check-then-act race condition. - All existing Behave unit tests continue to pass. - New Behave unit tests cover concurrent access scenarios. - `nox -e typecheck` passes (no `# type: ignore` suppressions). - Coverage remains ≥ 97%. ## Supporting Information - **File**: `src/cleveragents/actor/registry.py` - **Class**: `ActorRegistry` - **Lines**: 20–383 - **Impact**: Medium likelihood; High impact if triggered in a multi-threaded deployment. - **Category**: concurrency ## Metadata - **Branch**: `fix/actor-registry-concurrency-race-condition` - **Commit Message**: `fix(actor): add thread-safety to ActorRegistry to prevent race conditions` - **Milestone**: *(none — backlog)* - **Parent Epic**: #362 ## Subtasks - [ ] Add `threading.Lock` to `ActorRegistry.__init__` - [ ] Wrap `ensure_built_in_actors` body in lock acquisition - [ ] Audit all other mutating methods (`add`, `upsert_actor`, `get`, `remove`, etc.) for additional lock requirements - [ ] Write Behave scenarios covering concurrent access to `ActorRegistry` - [ ] Verify `nox -e typecheck` passes with no suppressions - [ ] Verify `nox -e coverage_report` reports ≥ 97% ## Definition of Done - [ ] `ActorRegistry` acquires a lock before any check-then-act operation on shared state - [ ] No `# type: ignore` or type-checking suppressions introduced - [ ] All Behave unit tests pass (`nox -e unit_tests`) - [ ] New concurrent-access scenarios added under `features/` - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.3.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-05 08:06:21 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Backlog — potential race condition in ActorRegistry; low likelihood in current single-threaded usage
  • Milestone: v3.8.0
  • MoSCoW: Could Have — concurrency hardening is desirable but the current usage pattern doesn't trigger this race condition in practice.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Backlog — potential race condition in ActorRegistry; low likelihood in current single-threaded usage - **Milestone**: v3.8.0 - **MoSCoW**: Could Have — concurrency hardening is desirable but the current usage pattern doesn't trigger this race condition in practice. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3213
No description provided.