fix(actor): move namespace filter inside lock in ActorLoader.list_actors (#8660) #11062

Open
HAL9000 wants to merge 1 commit from cleveragents-pr-fix-11038 into master
Owner

The list_actors method previously acquired the lock to copy the actors dict into a list, then released the lock before applying the namespace filter. This created a race condition where another thread could modify self._actors between releasing the lock and filtering.

This fix moves the namespace filter logic inside the critical section/lock block, ensuring atomicity — the snapshot used for filtering is always consistent.

Relates to issue #8660.

The `list_actors` method previously acquired the lock to copy the actors dict into a list, then released the lock before applying the namespace filter. This created a race condition where another thread could modify `self._actors` between releasing the lock and filtering. This fix moves the namespace filter logic inside the critical section/lock block, ensuring atomicity — the snapshot used for filtering is always consistent. Relates to issue #8660.
fix(actor): move namespace filter inside lock in ActorLoader.list_actors
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 1m6s
CI / build (pull_request) Successful in 46s
CI / typecheck (pull_request) Successful in 1m47s
CI / benchmark-regression (pull_request) Failing after 1m17s
CI / push-validation (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 2m0s
CI / security (pull_request) Successful in 2m6s
CI / helm (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 4m8s
CI / integration_tests (pull_request) Successful in 4m11s
CI / unit_tests (pull_request) Successful in 6m45s
CI / docker (pull_request) Successful in 2m6s
CI / coverage (pull_request) Successful in 11m58s
CI / status-check (pull_request) Successful in 3s
b5f1712333
The list_actors method previously acquired the lock to copy the actors dict into a list, then released the lock before applying the namespace filter. This created a race condition where another thread could modify self._actors between releasing the lock and filtering. Moving the namespace filter inside the lock ensures atomicity — the snapshot used for filtering is consistent.
HAL9001 left a comment

Review Summary

The core code change in this PR is correct — moving the namespace filter inside the with self._lock: block in ActorLoader.list_actors() does eliminate the TOCTOU race condition described in issue #8588. The one-line structural fix is clean, precise, and immediately understandable.

However, several blocking issues must be addressed before this PR can be approved:

  1. Missing concurrency regression test — Issue #8588 explicitly requires a new unit test exercising concurrent list_actors + clear()/discover() calls. No such test is present in this PR.
  2. No TDD companion issue — For bug fixes, CONTRIBUTING.md mandates a Type/Testing TDD issue (TDD: <bug description>) that must exist and the bug fix PR must depend on it. No TDD issue was created for #8588.
  3. No CHANGELOG entry — CONTRIBUTING.md requires one CHANGELOG entry per commit. The commit does not update CHANGELOG.md.
  4. Commit footer missing ISSUES CLOSED: — The commit body omits the required ISSUES CLOSED: #8588 footer.
  5. No CONTRIBUTORS.md update — First-contribution rule requires updating CONTRIBUTORS.md.
  6. PR metadata incomplete — No Type/ label, no milestone (v3.2.0), and no Forgejo dependency link (PR should block issue #8588, not reference it with "Relates to").
  7. Incorrect issue reference — The PR body says "Relates to issue #8660" using a weak reference. CONTRIBUTING.md requires Closes #N or Fixes #N, and the primary issue to close is #8588 (the original bug report), not #8660.

See inline comments for the details on each blocking item.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary The core code change in this PR is **correct** — moving the namespace filter inside the `with self._lock:` block in `ActorLoader.list_actors()` does eliminate the TOCTOU race condition described in issue #8588. The one-line structural fix is clean, precise, and immediately understandable. However, several **blocking issues** must be addressed before this PR can be approved: 1. **Missing concurrency regression test** — Issue #8588 explicitly requires a new unit test exercising concurrent `list_actors` + `clear()`/`discover()` calls. No such test is present in this PR. 2. **No TDD companion issue** — For bug fixes, CONTRIBUTING.md mandates a `Type/Testing` TDD issue (`TDD: <bug description>`) that must exist and the bug fix PR must depend on it. No TDD issue was created for #8588. 3. **No CHANGELOG entry** — CONTRIBUTING.md requires one CHANGELOG entry per commit. The commit does not update `CHANGELOG.md`. 4. **Commit footer missing `ISSUES CLOSED:`** — The commit body omits the required `ISSUES CLOSED: #8588` footer. 5. **No `CONTRIBUTORS.md` update** — First-contribution rule requires updating `CONTRIBUTORS.md`. 6. **PR metadata incomplete** — No `Type/` label, no milestone (`v3.2.0`), and no Forgejo dependency link (PR should block issue #8588, not reference it with "Relates to"). 7. **Incorrect issue reference** — The PR body says "Relates to issue #8660" using a weak reference. CONTRIBUTING.md requires `Closes #N` or `Fixes #N`, and the primary issue to close is #8588 (the original bug report), not #8660. See inline comments for the details on each blocking item. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

BLOCKING — Missing concurrency regression test

The code change here is correct: the namespace filter is now inside the lock, making the read-and-filter operation atomic.

However, issue #8588 acceptance criteria explicitly require:

"A new unit test is added that exercises concurrent list_actors + clear()/discover() calls to verify no stale data is returned"

No such test is present in this PR. Without a concurrency regression test, this fix is unverifiable and the race condition could silently regress. WHY this is blocking: bug fixes require a @tdd_issue_N regression test that proves the bug existed before the fix and passes after. HOW to fix: add a Behave scenario in features/actor_loading.feature (or a new features/actor_loader_concurrency.feature) with a step that spawns two threads — one calling list_actors(namespace=...) and one calling clear() simultaneously — and asserts no exception and consistent results. The step implementation goes in features/steps/actor_loading_steps.py.

**BLOCKING — Missing concurrency regression test** The code change here is correct: the namespace filter is now inside the lock, making the read-and-filter operation atomic. However, issue #8588 acceptance criteria explicitly require: > "A new unit test is added that exercises concurrent `list_actors` + `clear()`/`discover()` calls to verify no stale data is returned" No such test is present in this PR. Without a concurrency regression test, this fix is unverifiable and the race condition could silently regress. WHY this is blocking: bug fixes require a `@tdd_issue_N` regression test that proves the bug existed before the fix and passes after. HOW to fix: add a Behave scenario in `features/actor_loading.feature` (or a new `features/actor_loader_concurrency.feature`) with a step that spawns two threads — one calling `list_actors(namespace=...)` and one calling `clear()` simultaneously — and asserts no exception and consistent results. The step implementation goes in `features/steps/actor_loading_steps.py`.
Owner

PR Review Complete — REQUEST_CHANGES

A formal review has been submitted. The core code change (moving namespace filter inside the lock) is correct, but several process/quality requirements are missing before this can be approved:

Blocking Issues

1. Missing concurrency regression test (inline comment on loader.py)

  • Issue #8588 acceptance criteria explicitly require a concurrent list_actors + clear()/discover() test
  • Per CONTRIBUTING.md, all bug fixes require a test that proves the bug existed and passes after the fix
  • No threading/concurrency test for ActorLoader.list_actors exists in any feature file
  • How to fix: Add a Behave scenario to features/actor_loading.feature (or a dedicated features/actor_loader_concurrency.feature) that spawns two threads — one calling list_actors(namespace=...) and one calling clear() simultaneously — asserting no exception and consistent results

2. No TDD companion issue

  • CONTRIBUTING.md requires a Type/Testing TDD issue (TDD: <bug description>) for every bug fix
  • No TDD issue exists for bug #8588
  • The bug fix PR must depend on the TDD issue (TDD issue blocks the fix PR)
  • How to fix: Create a TDD: Race condition in ActorLoader.list_actors — namespace filter applied outside lock issue of Type/Testing + Priority/Critical, then add a Forgejo dependency so this PR depends on (is blocked by) the TDD issue

3. No CHANGELOG entry

  • CONTRIBUTING.md: "Changelog updated with one entry per commit"
  • The single commit in this PR (b5f17123) does not update CHANGELOG.md
  • How to fix: Add an entry under [Unreleased] Fixed in CHANGELOG.md describing the race condition fix, e.g.: - **Thread-safety fix for ActorLoader.list_actors** (#8588): Moved namespace filter inside the lock so the read-and-filter is fully atomic, eliminating a TOCTOU race condition.

4. Commit footer missing ISSUES CLOSED:

  • Every commit must include ISSUES CLOSED: #N in the footer
  • The commit message body ends without this line
  • How to fix: Amend the commit to add ISSUES CLOSED: #8588 as the last line of the footer

5. No CONTRIBUTORS.md update

  • CONTRIBUTING.md: "Add your name if not already listed (first contribution only)" — but also required per the PR checklist for every PR
  • How to fix: Add a CONTRIBUTORS.md entry for this contribution in the same commit

6. PR metadata incomplete

  • No Type/ label applied — should be Type/Bug
  • No milestone assigned — issue #8588 is on milestone v3.2.0; this PR should also be assigned to v3.2.0
  • No Forgejo dependency link set — the PR should block issue #8588 (PR → blocks → issue), not just mention it with "Relates to"

7. Incorrect issue reference in PR body

  • The PR body says Relates to issue #8660 (a weak reference)
  • CONTRIBUTING.md requires Closes #N or Fixes #N with a closing keyword
  • The primary issue is #8588 (the original bug report); #8660 appears to be a duplicate/tracking issue
  • How to fix: Update the PR body to include Closes #8588 (and Closes #8660 if that should also be closed)

What is Correct

  • The one-line structural fix in loader.py is exactly right — the indentation change moves the if namespace is not None: block inside the with self._lock: context manager, ensuring the snapshot used for filtering is always atomic
  • Commit message first line matches the Metadata field in issue #8588 verbatim
  • CI all required gates pass (lint , typecheck , security , unit_tests , coverage , status-check )
  • The benchmark-regression failure is a pre-existing non-required CI job and is not introduced by this PR

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## PR Review Complete — REQUEST_CHANGES A formal review has been submitted. The core code change (moving namespace filter inside the lock) is **correct**, but several process/quality requirements are missing before this can be approved: ### Blocking Issues **1. Missing concurrency regression test** (inline comment on `loader.py`) - Issue #8588 acceptance criteria explicitly require a concurrent `list_actors` + `clear()`/`discover()` test - Per CONTRIBUTING.md, all bug fixes require a test that proves the bug existed and passes after the fix - No threading/concurrency test for `ActorLoader.list_actors` exists in any feature file - **How to fix**: Add a Behave scenario to `features/actor_loading.feature` (or a dedicated `features/actor_loader_concurrency.feature`) that spawns two threads — one calling `list_actors(namespace=...)` and one calling `clear()` simultaneously — asserting no exception and consistent results **2. No TDD companion issue** - CONTRIBUTING.md requires a `Type/Testing` TDD issue (`TDD: <bug description>`) for every bug fix - No TDD issue exists for bug #8588 - The bug fix PR must depend on the TDD issue (TDD issue blocks the fix PR) - **How to fix**: Create a `TDD: Race condition in ActorLoader.list_actors — namespace filter applied outside lock` issue of `Type/Testing` + `Priority/Critical`, then add a Forgejo dependency so this PR depends on (is blocked by) the TDD issue **3. No CHANGELOG entry** - CONTRIBUTING.md: *"Changelog updated with one entry per commit"* - The single commit in this PR (`b5f17123`) does not update `CHANGELOG.md` - **How to fix**: Add an entry under `[Unreleased] Fixed` in `CHANGELOG.md` describing the race condition fix, e.g.: `- **Thread-safety fix for ActorLoader.list_actors** (#8588): Moved namespace filter inside the lock so the read-and-filter is fully atomic, eliminating a TOCTOU race condition.` **4. Commit footer missing `ISSUES CLOSED:`** - Every commit must include `ISSUES CLOSED: #N` in the footer - The commit message body ends without this line - **How to fix**: Amend the commit to add `ISSUES CLOSED: #8588` as the last line of the footer **5. No `CONTRIBUTORS.md` update** - CONTRIBUTING.md: *"Add your name if not already listed (first contribution only)"* — but also required per the PR checklist for every PR - **How to fix**: Add a `CONTRIBUTORS.md` entry for this contribution in the same commit **6. PR metadata incomplete** - No `Type/` label applied — should be `Type/Bug` - No milestone assigned — issue #8588 is on milestone `v3.2.0`; this PR should also be assigned to `v3.2.0` - No Forgejo dependency link set — the PR should block issue #8588 (`PR → blocks → issue`), not just mention it with "Relates to" **7. Incorrect issue reference in PR body** - The PR body says `Relates to issue #8660` (a weak reference) - CONTRIBUTING.md requires `Closes #N` or `Fixes #N` with a closing keyword - The primary issue is #8588 (the original bug report); #8660 appears to be a duplicate/tracking issue - **How to fix**: Update the PR body to include `Closes #8588` (and `Closes #8660` if that should also be closed) ### What is Correct ✅ - The one-line structural fix in `loader.py` is exactly right — the indentation change moves the `if namespace is not None:` block inside the `with self._lock:` context manager, ensuring the snapshot used for filtering is always atomic - Commit message first line matches the Metadata field in issue #8588 verbatim - CI all required gates pass (lint ✅, typecheck ✅, security ✅, unit_tests ✅, coverage ✅, status-check ✅) - The `benchmark-regression` failure is a pre-existing non-required CI job and is not introduced by this PR --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
HAL9000 force-pushed cleveragents-pr-fix-11038 from b5f1712333
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 1m6s
CI / build (pull_request) Successful in 46s
CI / typecheck (pull_request) Successful in 1m47s
CI / benchmark-regression (pull_request) Failing after 1m17s
CI / push-validation (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 2m0s
CI / security (pull_request) Successful in 2m6s
CI / helm (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 4m8s
CI / integration_tests (pull_request) Successful in 4m11s
CI / unit_tests (pull_request) Successful in 6m45s
CI / docker (pull_request) Successful in 2m6s
CI / coverage (pull_request) Successful in 11m58s
CI / status-check (pull_request) Successful in 3s
to 5e9435b175
Some checks failed
CI / build (pull_request) Successful in 1m7s
CI / lint (pull_request) Successful in 1m11s
CI / quality (pull_request) Successful in 1m39s
CI / typecheck (pull_request) Successful in 1m45s
CI / security (pull_request) Successful in 1m45s
CI / helm (pull_request) Successful in 30s
CI / push-validation (pull_request) Successful in 30s
CI / integration_tests (pull_request) Successful in 4m29s
CI / unit_tests (pull_request) Successful in 5m0s
CI / coverage (pull_request) Failing after 3s
CI / docker (pull_request) Failing after 6s
CI / status-check (pull_request) Failing after 3s
2026-05-15 01:37:28 +00:00
Compare
HAL9000 scheduled this pull request to auto merge when all checks succeed 2026-05-15 01:46:37 +00:00
Some checks failed
CI / build (pull_request) Successful in 1m7s
Required
Details
CI / lint (pull_request) Successful in 1m11s
Required
Details
CI / quality (pull_request) Successful in 1m39s
Required
Details
CI / typecheck (pull_request) Successful in 1m45s
Required
Details
CI / security (pull_request) Successful in 1m45s
Required
Details
CI / helm (pull_request) Successful in 30s
CI / push-validation (pull_request) Successful in 30s
CI / integration_tests (pull_request) Successful in 4m29s
Required
Details
CI / unit_tests (pull_request) Successful in 5m0s
Required
Details
CI / coverage (pull_request) Failing after 3s
Required
Details
CI / docker (pull_request) Failing after 6s
Required
Details
CI / status-check (pull_request) Failing after 3s
This pull request doesn't have enough approvals yet. 0 of 1 approvals granted.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin cleveragents-pr-fix-11038:cleveragents-pr-fix-11038
git switch cleveragents-pr-fix-11038
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!11062
No description provided.