UAT: SkillRegistry is not thread-safe — missing RLock on register/unregister/get operations #4811

Open
opened 2026-04-08 19:33:25 +00:00 by HAL9000 · 1 comment
Owner

Bug Report

Feature Area: Skill Registry — thread safety
Severity: Medium — race condition under concurrent skill registration
Found by: UAT tester instance uat-worker-tool-skill-system
Spec reference: docs/specification.md §Skill Registry


What Was Tested

Code-level analysis of src/cleveragents/skills/registry.pySkillRegistry class.

Expected Behavior (from spec)

The Skill Registry should be safe to use from multiple threads, consistent with the Tool Registry which explicitly documents thread safety via threading.RLock.

Actual Behavior

SkillRegistry has no locking mechanism on any of its mutating operations:

# src/cleveragents/skills/registry.py
class SkillRegistry:
    def __init__(self, tool_registry=None):
        self._skills: dict[str, SkillDefinition] = {}
        self._tool_registry = tool_registry
        # ← No threading.RLock() here

    def register(self, skill: SkillDefinition) -> None:
        name = skill.skill.name
        if name in self._skills:   # ← Not protected by lock
            raise SkillExecutionError(...)
        self._skills[name] = skill  # ← Not protected by lock

    def unregister(self, name: str) -> None:
        if name not in self._skills:  # ← Not protected by lock
            raise SkillExecutionError(...)
        del self._skills[name]        # ← Not protected by lock

    def get(self, name: str) -> SkillDefinition:
        if name not in self._skills:  # ← Not protected by lock
            raise SkillExecutionError(...)
        return self._skills[name]     # ← Not protected by lock

Compare with ToolRegistry which explicitly uses threading.RLock():

# src/cleveragents/tool/registry.py
class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, ToolSpec] = {}
        self._lock = threading.RLock()  # ← Has lock

    def register(self, tool_spec: ToolSpec) -> None:
        with self._lock:  # ← Protected
            ...

Impact

In a multi-threaded server environment (e.g., when multiple requests trigger skill registration or when MCP notifications/tools/list_changed events trigger refresh_all()), concurrent access to SkillRegistry can cause:

  • dict corruption (Python dicts are not thread-safe for concurrent writes)
  • Race conditions in register() where two threads both pass the if name in self._skills check
  • Inconsistent state during refresh_all() iteration

Code Location

src/cleveragents/skills/registry.pySkillRegistry class, all mutating methods.

Fix

Add threading.RLock() to SkillRegistry.__init__() and wrap all mutating operations with with self._lock:, mirroring the pattern in ToolRegistry.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area:** Skill Registry — thread safety **Severity:** Medium — race condition under concurrent skill registration **Found by:** UAT tester instance `uat-worker-tool-skill-system` **Spec reference:** `docs/specification.md` §Skill Registry --- ### What Was Tested Code-level analysis of `src/cleveragents/skills/registry.py` — `SkillRegistry` class. ### Expected Behavior (from spec) The Skill Registry should be safe to use from multiple threads, consistent with the Tool Registry which explicitly documents thread safety via `threading.RLock`. ### Actual Behavior `SkillRegistry` has **no locking mechanism** on any of its mutating operations: ```python # src/cleveragents/skills/registry.py class SkillRegistry: def __init__(self, tool_registry=None): self._skills: dict[str, SkillDefinition] = {} self._tool_registry = tool_registry # ← No threading.RLock() here def register(self, skill: SkillDefinition) -> None: name = skill.skill.name if name in self._skills: # ← Not protected by lock raise SkillExecutionError(...) self._skills[name] = skill # ← Not protected by lock def unregister(self, name: str) -> None: if name not in self._skills: # ← Not protected by lock raise SkillExecutionError(...) del self._skills[name] # ← Not protected by lock def get(self, name: str) -> SkillDefinition: if name not in self._skills: # ← Not protected by lock raise SkillExecutionError(...) return self._skills[name] # ← Not protected by lock ``` Compare with `ToolRegistry` which explicitly uses `threading.RLock()`: ```python # src/cleveragents/tool/registry.py class ToolRegistry: def __init__(self): self._tools: dict[str, ToolSpec] = {} self._lock = threading.RLock() # ← Has lock def register(self, tool_spec: ToolSpec) -> None: with self._lock: # ← Protected ... ``` ### Impact In a multi-threaded server environment (e.g., when multiple requests trigger skill registration or when MCP `notifications/tools/list_changed` events trigger `refresh_all()`), concurrent access to `SkillRegistry` can cause: - `dict` corruption (Python dicts are not thread-safe for concurrent writes) - Race conditions in `register()` where two threads both pass the `if name in self._skills` check - Inconsistent state during `refresh_all()` iteration ### Code Location `src/cleveragents/skills/registry.py` — `SkillRegistry` class, all mutating methods. ### Fix Add `threading.RLock()` to `SkillRegistry.__init__()` and wrap all mutating operations with `with self._lock:`, mirroring the pattern in `ToolRegistry`. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Medium — spec compliance bug identified by UAT testing
  • Story Points: 3 (M) — targeted fix to align implementation with spec
  • MoSCoW: Must Have — spec compliance is required for correct system behavior

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Medium — spec compliance bug identified by UAT testing - **Story Points**: 3 (M) — targeted fix to align implementation with spec - **MoSCoW**: Must Have — spec compliance is required for correct system behavior --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:03:36 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4811
No description provided.