fix(actor): move namespace filter inside lock in ActorLoader.list_actors (#8660) #11038
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
overdue
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!11038
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "8660-move-namespace-filter-inside-lock"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fixes a TOCTOU race condition where the namespace filter was applied after releasing threading.RLock, creating a window where concurrent mutations to _actors (via discover() or clear()) could corrupt the iteration state.
Closes #8588
9dc329f2756f59253b11Review Summary
Thank you for addressing this TOCTOU race condition — the core fix in
loader.py(moving the namespace filter insidewith self._lock:) is correct and well-motivated. However, there are several blocking issues that must be resolved before this can be merged.Blocking Issues
1. CI:
lintis failing — unused importsThree imports in
features/steps/tdd_actors_loader_lock_filter_steps.pyare never used:ExitStack(fromcontextlib),yaml, andValidationError. These will fail ruff's F401 check. Similarly,yamlis imported but never used intests/actor/test_loader_list_actors_thread_safety.py. Additionally, in the second test method's_list_worker(),all_actors,local, andremoteare assigned but never referenced (F841).2. CI:
unit_testsis failing — wrong TDD tag formatThe BDD feature file uses
@tdd_issue @issue_8660, but CONTRIBUTING.md requires@tdd_issue @tdd_issue_8588— whereNis the bug issue number being fixed (which is#8588, not#8660). The CI quality gate explicitly blocks a bug fix PR closing#8588if@tdd_issue_8588does not appear in the codebase. The@issue_8660tag is non-standard and will not satisfy this gate.3. Branch naming convention violation
Per CONTRIBUTING.md, bug fix branches must follow the pattern
bugfix/mN-<descriptor>. This branch is named8660-move-namespace-filter-inside-lock, which does not conform. For a v3.2.0 (M3) fix it should be something likebugfix/m3-actor-loader-namespace-lock.4. Missing
Type/label on the PRCONTRIBUTING.md requires every PR to carry exactly one
Type/label. This PR has no labels. AddType/Bug.5. Missing Forgejo dependency link (PR blocks issue)
The PR body includes
Closes #8588, but CONTRIBUTING requires a Forgejo dependency where the PR blocks the issue and the issue depends on the PR. This dependency is not set (checked via API). The PR must be added under the issue's "depends on" list (or equivalently, issue #8588 added under PR's "blocks").6. TDD workflow bypass
CONTRIBUTING.md mandates a two-PR TDD workflow for all bug fixes:
tdd/mN-branch introduces the test with@tdd_expected_failand is merged tomasterfirst.bugfix/mN-branch implements the fix, removes@tdd_expected_fail, and is opened as a separate PR.This PR introduces both the test and the fix in a single commit, bypassing the TDD workflow entirely. The feature file did not exist in
masterbefore this branch, and the required@tdd_issue_8588tag was never present with@tdd_expected_failin a prior merged commit. This is why the CI quality gate is blocking — it detects no prior@tdd_issue_8588test.Non-Blocking Suggestions
_error_collectoris defined instep_when_parallel_listbut never started as a thread or called. Remove it to keep the code clean.type: ignorecomments: The 22# type: ignore[attr-defined]annotations oncontext.xxxaccesses are acceptable in Behave step files (the BehaveContextobject is dynamic), but consider adding a typed context helper class to reduce the noise.What is correct
src/cleveragents/actor/loader.pyis exactly right — moving the filter inside the lock is the correct fix.### Fixedsection.ISSUES CLOSED: #8588) are correct.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +9,4 @@import tempfileimport threadingimport timefrom contextlib import ExitStackBLOCKING — Unused import causes CI
lintfailure (F401)ExitStackis imported fromcontextlibbut never used anywhere in this file. Remove this import.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +13,4 @@from pathlib import Pathfrom typing import Anyimport yamlBLOCKING — Unused import causes CI
lintfailure (F401)yamlis imported but never used in this file (there are noyaml.calls). Remove this import.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +17,4 @@from behave import given, then, when # type: ignore[attr-defined]from cleveragents.actor.loader import ActorLoaderfrom cleveragents.core.exceptions import ValidationErrorBLOCKING — Unused import causes CI
lintfailure (F401)ValidationErroris imported but never used in this file. Remove this import.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +1,36 @@@tdd_issue @issue_8660BLOCKING — Wrong TDD tag format (causes CI
unit_testsfailure)This feature is tagged
@tdd_issue @issue_8660, but CONTRIBUTING.md requires:The
Nin@tdd_issue_Nmust be the bug issue number — which is#8588(the issue this PR closes), not#8660. The CI quality gate looks for@tdd_issue_8588in the codebase and blocks the merge if it is absent.Fix: Replace
@tdd_issue @issue_8660with@tdd_issue @tdd_issue_8588on this line and on all four scenario-level tag lines if they are separate. Also note: the TDD workflow requires this test to have been merged tomasterfirst (with@tdd_expected_fail) in a priortdd/mN-branch before this fix branch is opened.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +13,4 @@import timefrom pathlib import Pathimport yamlBLOCKING — Unused import causes CI
lintfailure (F401)yamlis imported but never used anywhere in this test file (the YAML content is written as raw f-strings via_make_actor_yaml, not through theyamlmodule). Remove this import.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
@ -0,0 +115,4 @@def _list_worker() -> None:try:for _ in range(30):all_actors = loader.list_actors()BLOCKING — Unused variables cause CI
lintfailure (F841)In
test_namespace_filter_thread_safe_under_concurrent_clear's_list_worker(), the variablesall_actors,local, andremoteare assigned but never used or asserted on. Ruff will flag these as F841 (local variable is assigned but never used).Fix: Either remove the assignments (call the methods but discard results) or add assertions. For example:
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
[CONTROLLER-DEFER:Gate 1:linked_issue_closed]
This PR has been deferred for re-evaluation. The controller has stepped back
from processing it. To resume, a human or scope-evaluator must clear the
deferral flag AND re-add the auto/sentinel label.
Decision:
To clear the deferral (SQL):
UPDATE workflows SET deferred_reason=NULL,
deferred_at=NULL,
deferred_target_workflow_id=NULL
WHERE workflow_id = 430;
Audit ID: 154514
Automated by the CleverAgents controller pipeline.
Identity: HAL9000 (pipeline action)
📋 Estimate: tier 1.
7-file change with +439/-6 lines. The core fix is small (moving namespace filter inside threading.RLock in list_actors()), but the bulk of the diff is new concurrency regression tests (BDD + unit). Concurrency test correctness is non-trivial — timing-sensitive assertions, multiple interacting threads — pushing this past tier 0. Multi-file scope and new test logic warrant tier 1. CI failures are all uniform infrastructure failures (Docker daemon misconfiguration on runners), not code issues.
(attempt #4, tier 1)
🔧 Implementer attempt —
blocked.Blockers:
a7fb7d7445but dispatch base was84582cb11e. The implementer pushed from inside the worktree (forbidden by the git contract) OR a third party pushed during the attempt. Re-dispatch will re-prefetch and pick up the new head.(attempt #5, tier 2)
🔧 Implementer attempt —
rebase-failed.Blockers:
a7fb7d7445be0e54cf60(attempt #7, tier 2)
🔧 Implementer attempt —
ci-not-ready.Address CI lint + unit_tests failures on the namespace-lock test suite: - Step file: drop unused imports (yaml, ExitStack, ValidationError) and unused noqa directives flagged by ruff F401/RUF100; rewrite the collected-errors list to use iterable unpacking (RUF005); register the "I create an ActorLoader with initial actors from multiple namespaces" step under both @given and @when so scenario 3 is no longer reported as undefined; remove the dead _error_collector inner function the reviewer flagged. - Step regex fix: the concurrent-modifications step pattern ended with '' (two single quotes) instead of `` (two backticks), so it never matched the feature file's `(triggering ``discover()``)` literal. - Test fixtures: add the required `description` field to _make_actor_yaml in both the BDD step file and tests/actor/test_loader_list_actors_thread_safety.py so ActorConfigSchema validation passes (previously every scenario errored at discover() with "description: Field required"). - Concurrent worker names: collapse three-slash actor names like "conc/ns{i % 2}/concurrent_{i}" to the single-slash form "conc/concurrent_{i}" required by the schema's namespaced-name rule. - Test file: drop unused `yaml` import and three unused F841 assignments in _list_worker; apply ruff format. - Feature file: switch the @issue_8660 TDD tag to @tdd_issue_8588 to match the CONTRIBUTING.md tag convention for the bug issue this PR closes. ISSUES CLOSED: #8588✅ Approved
Reviewed at commit
36d3c35.Confidence: medium.
Claimed by
merge_drive.py(pid 2329255) until2026-06-14T20:56:25.932480+00:00.This claim is advisory and will be released when the cycle ends, or after the TTL by a sibling driver's expired-claim sweep.
36d3c3530a17c6e5f4eaApproved by the controller reviewer stage (workflow 430).