fix(actor): handle empty actor list without validation error #594

2026-03-05T01:35:45Z

brent.edwards commented

2026-03-05 01:35:45 +00:00

Summary

Fixes #592

Fix agents actor list raising a validation error on fresh projects. ActorRegistry._actor_name() built names via f"{provider}/{model}", producing names with 2+ slashes when providers had models containing / (e.g. OpenRouter's anthropic/claude-sonnet-4-20250514).

Changes

Production fix (src/cleveragents/actor/registry.py):

Sanitise both provider and model names by replacing / with -
Lowercase all actor names to satisfy the spec pattern ^[a-z0-9_-]+/[a-z0-9_-]+$
Added docstring explaining the sanitisation rationale

Tests & benchmarks:

6 Behave BDD regression scenarios (features/actor_list_empty.feature) using @tdd_bug @tdd_bug_592 tag convention
Scenario 6 validates that upserted actor names contain exactly one / (addresses TEST-1)
Robot Framework integration smoke tests with exit code assertions (robot/actor_list_empty.robot)
ASV benchmarks (benchmarks/actor_list_empty_bench.py) — no contextlib.suppress, fails loudly on regression
Shared mock FakeProviderInfo/FakeProviderRegistry in features/mocks/fake_provider.py
Updated features/mocks/__init__.py to export new shared mocks

Review feedback addressed

From hurui200320 (Review #2007)

C1: Rebased — single commit, no merge commits
C2: Squashed into single commit 66d95792
C3/TEST-1: Added Scenario 6 which asserts upsert_actor was called with exactly-one-slash name
C4: All # type: ignore removed from step definitions; benchmark has none
H1: Label changed to Type/Bug
H2: Mocks moved to features/mocks/fake_provider.py, imported from benchmark
H3: .lower() added to both provider and model in _actor_name()
M1: Both provider and model names sanitised for /
M2: Robot tests documented as fresh-env smoke tests (multi-slash path covered by Behave + benchmarks)
M3: CHANGELOG updated to describe the fix
L1: Rebased onto master 83f2f3a0 — mergeable
L2/BENCH-1: Removed contextlib.suppress(ValidationError) from benchmark

From hamza.khyari (Review #2032)

TEST-1: Fixed — Scenario 6 validates name via upsert_actor call args
SPEC-1: Out of scope for this PR — dots in default models is a pre-existing issue across Session._ACTOR_NAME_PATTERN; tracked separately
BUG-1: Fixed — upsert_actor returns Actor(name=...) not MagicMock(name=...)
SEC-1: Pre-existing code, out of scope for this bug fix
DRY-1: Benchmark imports from features/mocks/fake_provider.py
DESIGN-1: Pre-existing inconsistency in provider/model representation, out of scope
C4: No # type: ignore remaining
DESIGN-2: Pre-existing, out of scope
PROC-1: features/mocks/__init__.py now exports FakeProviderInfo/FakeProviderRegistry
BENCH-2: Pre-existing ASV importlib.reload limitation, out of scope

From freemo (Review #2059)

F1: .lower() restored — was accidentally dropped during earlier squash
F2: Benchmark now imports from shared features/mocks/fake_provider.py
F3: Tag convention changed to @tdd_bug @tdd_bug_592 per CONTRIBUTING.md standardisation
F4: No # type: ignore in benchmark
F5: Empty-string edge case is not reachable in practice (providers always have names)

Self-review findings

Added .lower() to satisfy spec pattern
Fixed TDD tags from @tdd @bug592 to @tdd_bug @tdd_bug_592
Added exit code assertion to Robot test case 2
Removed unused contextlib import from benchmark
Added docstring to _actor_name()

Process

Single squashed commit, rebased onto master (83f2f3a0)
Conventional commit format with ISSUES CLOSED: #592 footer
nox -s lint ✓ | nox -s typecheck 0 errors ✓

ISSUES CLOSED: #592

## Summary Fixes #592 Fix `agents actor list` raising a validation error on fresh projects. `ActorRegistry._actor_name()` built names via `f"{provider}/{model}"`, producing names with 2+ slashes when providers had models containing `/` (e.g. OpenRouter's `anthropic/claude-sonnet-4-20250514`). ## Changes **Production fix** (`src/cleveragents/actor/registry.py`): - Sanitise both provider and model names by replacing `/` with `-` - Lowercase all actor names to satisfy the spec pattern `^[a-z0-9_-]+/[a-z0-9_-]+$` - Added docstring explaining the sanitisation rationale **Tests & benchmarks**: - 6 Behave BDD regression scenarios (`features/actor_list_empty.feature`) using `@tdd_bug @tdd_bug_592` tag convention - Scenario 6 validates that upserted actor names contain exactly one `/` (addresses TEST-1) - Robot Framework integration smoke tests with exit code assertions (`robot/actor_list_empty.robot`) - ASV benchmarks (`benchmarks/actor_list_empty_bench.py`) — no `contextlib.suppress`, fails loudly on regression - Shared mock `FakeProviderInfo`/`FakeProviderRegistry` in `features/mocks/fake_provider.py` - Updated `features/mocks/__init__.py` to export new shared mocks ## Review feedback addressed ### From hurui200320 (Review #2007) - **C1**: Rebased — single commit, no merge commits - **C2**: Squashed into single commit `66d95792` - **C3/TEST-1**: Added Scenario 6 which asserts `upsert_actor` was called with exactly-one-slash name - **C4**: All `# type: ignore` removed from step definitions; benchmark has none - **H1**: Label changed to `Type/Bug` - **H2**: Mocks moved to `features/mocks/fake_provider.py`, imported from benchmark - **H3**: `.lower()` added to both provider and model in `_actor_name()` - **M1**: Both provider and model names sanitised for `/` - **M2**: Robot tests documented as fresh-env smoke tests (multi-slash path covered by Behave + benchmarks) - **M3**: CHANGELOG updated to describe the fix - **L1**: Rebased onto master `83f2f3a0` — mergeable - **L2/BENCH-1**: Removed `contextlib.suppress(ValidationError)` from benchmark ### From hamza.khyari (Review #2032) - **TEST-1**: Fixed — Scenario 6 validates name via `upsert_actor` call args - **SPEC-1**: Out of scope for this PR — dots in default models is a pre-existing issue across `Session._ACTOR_NAME_PATTERN`; tracked separately - **BUG-1**: Fixed — `upsert_actor` returns `Actor(name=...)` not `MagicMock(name=...)` - **SEC-1**: Pre-existing code, out of scope for this bug fix - **DRY-1**: Benchmark imports from `features/mocks/fake_provider.py` - **DESIGN-1**: Pre-existing inconsistency in provider/model representation, out of scope - **C4**: No `# type: ignore` remaining - **DESIGN-2**: Pre-existing, out of scope - **PROC-1**: `features/mocks/__init__.py` now exports `FakeProviderInfo`/`FakeProviderRegistry` - **BENCH-2**: Pre-existing ASV `importlib.reload` limitation, out of scope ### From freemo (Review #2059) - **F1**: `.lower()` restored — was accidentally dropped during earlier squash - **F2**: Benchmark now imports from shared `features/mocks/fake_provider.py` - **F3**: Tag convention changed to `@tdd_bug @tdd_bug_592` per CONTRIBUTING.md standardisation - **F4**: No `# type: ignore` in benchmark - **F5**: Empty-string edge case is not reachable in practice (providers always have names) ### Self-review findings - Added `.lower()` to satisfy spec pattern - Fixed TDD tags from `@tdd @bug592` to `@tdd_bug @tdd_bug_592` - Added exit code assertion to Robot test case 2 - Removed unused `contextlib` import from benchmark - Added docstring to `_actor_name()` ## Process - Single squashed commit, rebased onto `master` (`83f2f3a0`) - Conventional commit format with `ISSUES CLOSED: #592` footer - `nox -s lint` ✓ | `nox -s typecheck` 0 errors ✓ ISSUES CLOSED: #592

brent.edwards added the

Type

Testing

label 2026-03-05 01:35:52 +00:00

brent.edwards added this to the v3.2.0 milestone 2026-03-05 01:36:02 +00:00

brent.edwards referenced this pull request

2026-03-05 01:36:30 +00:00

`agents actor list` without actors should show no actors. #592

brent.edwards reviewed 2026-03-05 01:36:50 +00:00

brent.edwards left a comment

Self-Review: TDD Failing Tests for Bug #592

Overview

This PR adds TDD failing tests that reproduce the actor list validation error on fresh projects where providers with multi-slash default models (e.g. OpenRouter) cause _normalize_name() to reject the constructed actor name.

Checklist

Feature file has @wip tag on all scenarios (for CI filtering)
Feature file has @tdd and @bug592 tags for traceability
Step patterns use unique suffix for actor-list-empty and prefix actor-list-empty to avoid AmbiguousStep collisions with 15+ existing actor step files
Scenario 3 uses real ActorRegistry with a fake multi-slash _FakeProviderInfo to exercise the actual buggy code path
nox -s lint passes
nox -s typecheck passes (0 errors)
Commit uses Refs: #592 to avoid premature closure
Robot Framework smoke tests follow common.resource pattern
ASV benchmark includes track_multi_slash_succeeds metric (0=bug, 1=fixed)
CHANGELOG entry added

TDD Nature

Scenarios 1-2 use fully mocked registry -- these test the happy path and should pass
Scenario 3 uses the real ActorRegistry with a multi-slash provider -- this exercises the actual buggy code path and is expected to fail until the fix is applied

No issues found. Ready for external review.

## Self-Review: TDD Failing Tests for Bug #592 ### Overview This PR adds TDD failing tests that reproduce the `actor list` validation error on fresh projects where providers with multi-slash default models (e.g. OpenRouter) cause `_normalize_name()` to reject the constructed actor name. ### Checklist - [x] Feature file has `@wip` tag on all scenarios (for CI filtering) - [x] Feature file has `@tdd` and `@bug592` tags for traceability - [x] Step patterns use unique suffix `for actor-list-empty` and prefix `actor-list-empty` to avoid `AmbiguousStep` collisions with 15+ existing actor step files - [x] Scenario 3 uses real `ActorRegistry` with a fake multi-slash `_FakeProviderInfo` to exercise the actual buggy code path - [x] `nox -s lint` passes - [x] `nox -s typecheck` passes (0 errors) - [x] Commit uses `Refs: #592` to avoid premature closure - [x] Robot Framework smoke tests follow `common.resource` pattern - [x] ASV benchmark includes `track_multi_slash_succeeds` metric (0=bug, 1=fixed) - [x] CHANGELOG entry added ### TDD Nature - Scenarios 1-2 use fully mocked registry -- these test the happy path and should pass - Scenario 3 uses the real `ActorRegistry` with a multi-slash provider -- this **exercises the actual buggy code path** and is expected to **fail** until the fix is applied No issues found. Ready for external review.

freemo added the

labels 2026-03-05 03:28:37 +00:00

freemo self-assigned this 2026-03-05 03:32:34 +00:00

freemo referenced this pull request

2026-03-05 03:33:22 +00:00

`agents actor list` without actors should show no actors. #592

freemo reviewed 2026-03-05 03:36:02 +00:00

freemo left a comment

PM Review — Day 25

Status

Mergeable — no conflicts.
Self-review from Brent only. No external review.

Context

TDD bug-fix test PR for #592 (actor list validation error with multi-slash provider names). Scenario 3 exercises the real buggy code path. Root cause is in registry.py:324-327 and actor_service.py:26.

Action Item

@freemo: Please review and implement the fix. Recommended approach: normalize multi-slash model names in _actor_name() (Option 2 from Brent's root cause analysis on #592).

Priority

Medium — user-visible regression in M3.

## PM Review — Day 25 ### Status - **Mergeable** — no conflicts. - Self-review from Brent only. **No external review.** ### Context TDD bug-fix test PR for #592 (actor list validation error with multi-slash provider names). Scenario 3 exercises the real buggy code path. Root cause is in `registry.py:324-327` and `actor_service.py:26`. ### Action Item **@freemo**: Please review and implement the fix. Recommended approach: normalize multi-slash model names in `_actor_name()` (Option 2 from Brent's root cause analysis on #592). ### Priority Medium — user-visible regression in M3.

freemo referenced this pull request

2026-03-05 04:26:32 +00:00

`agents actor list` without actors should show no actors. #592

freemo force-pushed feature/m3-fix-actor-list-empty from c740d5008a to 3eba970630

2026-03-05 06:48:36 +00:00

Compare

freemo referenced this pull request

2026-03-05 06:50:26 +00:00

`agents actor list` without actors should show no actors. #592

CoreRasurae commented

2026-03-05 17:31:29 +00:00

Code Review Report: Brent's Commit `7a4f20de` on `feature/m3-fix-actor-list-empty`

Commit: 7a4f20de6bcf3b75212e40e5c04d912498db6129
Author: Brent E. Edwards (brent.edwards@cleverthis.com)
Message: fix(actor): handle empty actor list without validation error
Issue: #592 — agents actor list without actors should show no actors
Branch: feature/m3-fix-actor-list-empty

Scope of Changes

File	Type	Lines
`CHANGELOG.md`	Modified	+6
`benchmarks/actor_list_empty_bench.py`	New	+117
`features/actor_list_empty.feature`	New	+27
`features/steps/actor_list_empty_steps.py`	New	+183
`robot/actor_list_empty.robot`	New	+45

The commit adds TDD failing tests only. It contains no production code fix. The actual bug fix (sanitizing model names in _actor_name()) was applied by Jeffrey Freeman in a separate commit 3eba9706 on top.

Findings

F1. [HIGH] Misleading Commit Message — `fix()` prefix with no fix

The commit message uses the fix(actor): conventional-commit prefix, which signals a bug fix. However, this commit contains zero production code changes — only test scaffolding and a CHANGELOG entry. The issue's Metadata section mandates this exact commit message (fix(actor): handle empty actor list without validation error), but using it for a test-only commit is semantically wrong.

Jeffrey later used the same commit message for his actual fix commit (3eba9706), creating two commits with identical messages on the same branch — one with only tests, one with the fix. This creates confusion in git log and git bisect.

Recommendation: The commit message should have been test(actor): add TDD failing tests for empty actor list validation error for Brent's commit, reserving the fix() prefix for the commit that actually applies the fix.

F2. [HIGH] Benchmark File Uses `MagicMock` for `capabilities` — Will Silently Fail Even After Fix

benchmarks/actor_list_empty_bench.py:41 uses MagicMock() for _FakeProviderInfo.capabilities:

# actor_list_empty_bench.py:39-41
if self.capabilities is None:
    self.capabilities = MagicMock()

The production code at registry.py:122 calls dataclasses.asdict(info.capabilities). asdict() on a MagicMock raises TypeError because MagicMock is not a dataclass. This means:

time_list_multi_slash_provider() (line 98-103) catches the exception with a bare except Exception: pass and reports nothing.
track_multi_slash_succeeds() (line 105-114) will always return 0 (failure), even after the fix is applied, because the MagicMock + asdict() crash happens before the name-validation code path is reached.

Jeffrey fixed this in the step definitions (features/steps/actor_list_empty_steps.py) by switching to ProviderCapabilities(), but the benchmark file was never updated. This makes the benchmark's track_multi_slash_succeeds tracker permanently broken as a regression indicator.

Recommendation: Replace MagicMock() with ProviderCapabilities() in the benchmark file, matching the fix already applied to the step definitions.

F3. [HIGH] Behave Scenarios 1 & 2 Never Exercise the Buggy Code Path

Scenarios 1 ("no actors message") and 2 ("no validation error") both use a fully-mocked ActorRegistry:

# actor_list_empty_steps.py:80-87  (step_no_providers)
mock_registry = MagicMock()
mock_registry.list_actors.return_value = []
context.actor_empty_registry = mock_registry

Because list_actors() is mocked to return [], the real ActorRegistry.list_actors() → ensure_built_in_actors() → _actor_name() → _normalize_name() chain is never called. These scenarios will pass whether the bug exists or not. They test the CLI rendering of an empty list, not the actual bug.

Only Scenario 3 ("multi-slash model provider does not error") uses the real ActorRegistry and exercises the buggy path. This means two out of three scenarios marketed as TDD tests for bug #592 do not actually validate the bug.

Recommendation: Scenarios 1-2 should be documented as CLI output tests rather than bug-regression tests. Alternatively, they should use the real ActorRegistry with zero providers (which would exercise ensure_built_in_actors() returning [] on the early-exit path at registry.py:100-101).

F4. [MEDIUM] Robot Tests Cannot Reproduce the Multi-Slash Bug

robot/actor_list_empty.robot test case "Actor List Does Not Show Slash Separator Error" (line 32-45):

Run Process    ${PYTHON}    -m    cleveragents    init    actor-slash-test
${result}=    Run Process    ${PYTHON}    -m    cleveragents    actor    list

This runs against a fresh environment with no configured providers. It never configures an OpenRouter-like provider with a multi-slash model name. Both Robot test cases are therefore testing the same code path (fresh env, no providers), despite the second one claiming to check "slash separator error". The multi-slash code path (_actor_name() producing names with 2+ slashes) is never triggered.

Recommendation: Either inject a multi-slash provider configuration into the temp environment before running actor list, or rename the test to accurately reflect what it validates.

F5. [MEDIUM] Bare `except Exception` in Benchmark Swallows All Errors

benchmarks/actor_list_empty_bench.py:100-103 and 110-113:

try:
    self._slash_registry.list_actors()
except Exception:
    pass  # Expected while bug is present

This catches every exception type — not just the ValidationError from the bug. It will silently swallow TypeError from the MagicMock/asdict issue (F2), ImportError, AttributeError, or any other unexpected failure. After the fix, if a different regression is introduced, the benchmark will report success=0 without any diagnostic information.

Recommendation: Catch only ValidationError (the expected bug exception) in the "expected failure" path, and let other exceptions propagate so ASV reports them as benchmark errors.

F6. [MEDIUM] Missing Edge-Case Test Coverage

The test suite covers two scenarios: zero providers and one provider with anthropic/claude-sonnet-4-20250514. Missing edge cases for the name-construction logic:

Missing Case	Why It Matters
Model with multiple consecutive slashes (`a//b/c`)	Tests robustness of `replace("/", "-")`
Model name that is only slashes (`///`)	Produces empty-looking names
Provider name containing slashes	The fix only sanitizes model names, not provider names
Mix of normal and multi-slash providers	Tests that valid providers aren't affected
Model with leading/trailing slashes (`/model/`)	Edge behavior with `-` prefixes/suffixes

Recommendation: Add at least a parametrized Behave scenario or a pytest unit test covering these boundary cases.

F7. [LOW] CHANGELOG Entry Describes `@wip` Tags That Were Later Removed

CHANGELOG.md:27 states:

Includes 3 Behave BDD scenarios (with @wip tag)

Jeffrey's subsequent commit (3eba9706) removed the @wip tags from all three scenarios. The CHANGELOG entry is now stale. Since the CHANGELOG represents the released state, it should describe the final state of the feature.

Recommendation: Update the CHANGELOG entry to remove the @wip reference, or note that the scenarios are active.

F8. [LOW] Specification Has No Empty-List Example for `actor list`

The specification at docs/specification.md:5382-5511 documents agents actor list only for the case where actors exist. The empty-list case ("No actors configured.") has no specification example. The CLI code at actor.py:594-596 prints this message, and the Behave test asserts on "No actors" (partial match).

This is a minor gap. The Behave test's substring check ("No actors") would pass even if the message changed to something like "No actors found. Run agents actor add to register one.", which may not be the intended behavior.

Recommendation: Add an empty-list example to the specification, and make the Behave assertion match the exact expected output string.

F9. [LOW] Temp Directory Leak Risk in Robot Tests

Both Robot test cases create temp directories via tempfile.mkdtemp() in an Evaluate expression. While [Teardown] handles cleanup, if Robot Framework crashes or is killed (OOM, signal), temp directories persist with initialized project configs. In a CI environment, this can accumulate over time.

Recommendation: Use the suite-level Setup Test Environment/Cleanup Test Environment from common.resource for temp dir management instead of per-test mkdtemp, to align with the existing pattern and get centralized cleanup.

Summary Table

ID	Severity	Category	Description
F1	HIGH	Process	`fix()` commit prefix on a test-only commit; duplicate commit messages on same branch
F2	HIGH	Bug	Benchmark `_FakeProviderInfo` uses `MagicMock` for capabilities; `asdict()` crashes silently; tracker permanently reports failure
F3	HIGH	Test Flaw	Behave scenarios 1-2 fully mock the registry, never exercise the buggy code path
F4	MEDIUM	Test Flaw	Robot test "slash separator error" never configures a multi-slash provider
F5	MEDIUM	Test Flaw	Bare `except Exception` in benchmark swallows all errors, not just the target bug
F6	MEDIUM	Coverage	No edge-case tests for boundary model names (consecutive slashes, empty, provider-side slashes)
F7	LOW	Docs	CHANGELOG references `@wip` tags that were subsequently removed
F8	LOW	Spec	Specification lacks empty-list example for `actor list`; Behave assertion is a loose substring match
F9	LOW	Security	Temp directories may persist if Robot tests crash, leaking project config to disk

# Code Review Report: Brent's Commit `7a4f20de` on `feature/m3-fix-actor-list-empty` **Commit:** `7a4f20de6bcf3b75212e40e5c04d912498db6129` **Author:** Brent E. Edwards (`brent.edwards@cleverthis.com`) **Message:** `fix(actor): handle empty actor list without validation error` **Issue:** #592 — `agents actor list` without actors should show no actors **Branch:** `feature/m3-fix-actor-list-empty` ## Scope of Changes | File | Type | Lines | |---|---|---| | `CHANGELOG.md` | Modified | +6 | | `benchmarks/actor_list_empty_bench.py` | **New** | +117 | | `features/actor_list_empty.feature` | **New** | +27 | | `features/steps/actor_list_empty_steps.py` | **New** | +183 | | `robot/actor_list_empty.robot` | **New** | +45 | The commit adds TDD failing tests only. It contains **no production code fix**. The actual bug fix (sanitizing model names in `_actor_name()`) was applied by Jeffrey Freeman in a separate commit `3eba9706` on top. --- ## Findings ### F1. [HIGH] Misleading Commit Message — `fix()` prefix with no fix The commit message uses the `fix(actor):` conventional-commit prefix, which signals a bug fix. However, this commit contains **zero production code changes** — only test scaffolding and a CHANGELOG entry. The issue's Metadata section mandates this exact commit message (`fix(actor): handle empty actor list without validation error`), but using it for a test-only commit is semantically wrong. Jeffrey later used the **same** commit message for his actual fix commit (`3eba9706`), creating two commits with identical messages on the same branch — one with only tests, one with the fix. This creates confusion in `git log` and `git bisect`. **Recommendation:** The commit message should have been `test(actor): add TDD failing tests for empty actor list validation error` for Brent's commit, reserving the `fix()` prefix for the commit that actually applies the fix. --- ### F2. [HIGH] Benchmark File Uses `MagicMock` for `capabilities` — Will Silently Fail Even After Fix `benchmarks/actor_list_empty_bench.py:41` uses `MagicMock()` for `_FakeProviderInfo.capabilities`: ```python # actor_list_empty_bench.py:39-41 if self.capabilities is None: self.capabilities = MagicMock() ``` The production code at `registry.py:122` calls `dataclasses.asdict(info.capabilities)`. `asdict()` on a `MagicMock` raises `TypeError` because `MagicMock` is not a dataclass. This means: - `time_list_multi_slash_provider()` (line 98-103) catches the exception with a bare `except Exception: pass` and reports nothing. - `track_multi_slash_succeeds()` (line 105-114) will **always return 0** (failure), even after the fix is applied, because the `MagicMock` + `asdict()` crash happens before the name-validation code path is reached. Jeffrey fixed this in the step definitions (`features/steps/actor_list_empty_steps.py`) by switching to `ProviderCapabilities()`, but **the benchmark file was never updated**. This makes the benchmark's `track_multi_slash_succeeds` tracker permanently broken as a regression indicator. **Recommendation:** Replace `MagicMock()` with `ProviderCapabilities()` in the benchmark file, matching the fix already applied to the step definitions. --- ### F3. [HIGH] Behave Scenarios 1 & 2 Never Exercise the Buggy Code Path Scenarios 1 ("no actors message") and 2 ("no validation error") both use a **fully-mocked** `ActorRegistry`: ```python # actor_list_empty_steps.py:80-87 (step_no_providers) mock_registry = MagicMock() mock_registry.list_actors.return_value = [] context.actor_empty_registry = mock_registry ``` Because `list_actors()` is mocked to return `[]`, the real `ActorRegistry.list_actors()` → `ensure_built_in_actors()` → `_actor_name()` → `_normalize_name()` chain is **never called**. These scenarios will pass whether the bug exists or not. They test the CLI rendering of an empty list, not the actual bug. Only Scenario 3 ("multi-slash model provider does not error") uses the real `ActorRegistry` and exercises the buggy path. This means two out of three scenarios marketed as TDD tests for bug #592 do not actually validate the bug. **Recommendation:** Scenarios 1-2 should be documented as CLI output tests rather than bug-regression tests. Alternatively, they should use the real `ActorRegistry` with zero providers (which would exercise `ensure_built_in_actors()` returning `[]` on the early-exit path at `registry.py:100-101`). --- ### F4. [MEDIUM] Robot Tests Cannot Reproduce the Multi-Slash Bug `robot/actor_list_empty.robot` test case "Actor List Does Not Show Slash Separator Error" (line 32-45): ```robot Run Process ${PYTHON} -m cleveragents init actor-slash-test ${result}= Run Process ${PYTHON} -m cleveragents actor list ``` This runs against a **fresh environment** with no configured providers. It never configures an OpenRouter-like provider with a multi-slash model name. Both Robot test cases are therefore testing the same code path (fresh env, no providers), despite the second one claiming to check "slash separator error". The multi-slash code path (`_actor_name()` producing names with 2+ slashes) is never triggered. **Recommendation:** Either inject a multi-slash provider configuration into the temp environment before running `actor list`, or rename the test to accurately reflect what it validates. --- ### F5. [MEDIUM] Bare `except Exception` in Benchmark Swallows All Errors `benchmarks/actor_list_empty_bench.py:100-103` and `110-113`: ```python try: self._slash_registry.list_actors() except Exception: pass # Expected while bug is present ``` This catches **every** exception type — not just the `ValidationError` from the bug. It will silently swallow `TypeError` from the `MagicMock`/`asdict` issue (F2), `ImportError`, `AttributeError`, or any other unexpected failure. After the fix, if a different regression is introduced, the benchmark will report `success=0` without any diagnostic information. **Recommendation:** Catch only `ValidationError` (the expected bug exception) in the "expected failure" path, and let other exceptions propagate so ASV reports them as benchmark errors. --- ### F6. [MEDIUM] Missing Edge-Case Test Coverage The test suite covers two scenarios: zero providers and one provider with `anthropic/claude-sonnet-4-20250514`. Missing edge cases for the name-construction logic: | Missing Case | Why It Matters | |---|---| | Model with multiple consecutive slashes (`a//b/c`) | Tests robustness of `replace("/", "-")` | | Model name that is only slashes (`///`) | Produces empty-looking names | | Provider name containing slashes | The fix only sanitizes model names, not provider names | | Mix of normal and multi-slash providers | Tests that valid providers aren't affected | | Model with leading/trailing slashes (`/model/`) | Edge behavior with `-` prefixes/suffixes | **Recommendation:** Add at least a parametrized Behave scenario or a pytest unit test covering these boundary cases. --- ### F7. [LOW] CHANGELOG Entry Describes `@wip` Tags That Were Later Removed `CHANGELOG.md:27` states: > Includes 3 Behave BDD scenarios (with `@wip` tag) Jeffrey's subsequent commit (`3eba9706`) removed the `@wip` tags from all three scenarios. The CHANGELOG entry is now stale. Since the CHANGELOG represents the released state, it should describe the final state of the feature. **Recommendation:** Update the CHANGELOG entry to remove the `@wip` reference, or note that the scenarios are active. --- ### F8. [LOW] Specification Has No Empty-List Example for `actor list` The specification at `docs/specification.md:5382-5511` documents `agents actor list` only for the case where actors exist. The empty-list case (`"No actors configured."`) has no specification example. The CLI code at `actor.py:594-596` prints this message, and the Behave test asserts on `"No actors"` (partial match). This is a minor gap. The Behave test's substring check (`"No actors"`) would pass even if the message changed to something like `"No actors found. Run agents actor add to register one."`, which may not be the intended behavior. **Recommendation:** Add an empty-list example to the specification, and make the Behave assertion match the exact expected output string. --- ### F9. [LOW] Temp Directory Leak Risk in Robot Tests Both Robot test cases create temp directories via `tempfile.mkdtemp()` in an `Evaluate` expression. While `[Teardown]` handles cleanup, if Robot Framework crashes or is killed (OOM, signal), temp directories persist with initialized project configs. In a CI environment, this can accumulate over time. **Recommendation:** Use the suite-level `Setup Test Environment`/`Cleanup Test Environment` from `common.resource` for temp dir management instead of per-test `mkdtemp`, to align with the existing pattern and get centralized cleanup. --- ## Summary Table | ID | Severity | Category | Description | |----|----------|----------|-------------| | F1 | HIGH | Process | `fix()` commit prefix on a test-only commit; duplicate commit messages on same branch | | F2 | HIGH | Bug | Benchmark `_FakeProviderInfo` uses `MagicMock` for capabilities; `asdict()` crashes silently; tracker permanently reports failure | | F3 | HIGH | Test Flaw | Behave scenarios 1-2 fully mock the registry, never exercise the buggy code path | | F4 | MEDIUM | Test Flaw | Robot test "slash separator error" never configures a multi-slash provider | | F5 | MEDIUM | Test Flaw | Bare `except Exception` in benchmark swallows all errors, not just the target bug | | F6 | MEDIUM | Coverage | No edge-case tests for boundary model names (consecutive slashes, empty, provider-side slashes) | | F7 | LOW | Docs | CHANGELOG references `@wip` tags that were subsequently removed | | F8 | LOW | Spec | Specification lacks empty-list example for `actor list`; Behave assertion is a loose substring match | | F9 | LOW | Security | Temp directories may persist if Robot tests crash, leaking project config to disk |

brent.edwards commented

2026-03-05 20:22:52 +00:00

All actionable findings from CoreRasurae's review (comment #54364) have been addressed in commit ce346b72:

Finding	Severity	Resolution
F1	HIGH	Acknowledged — can't rewrite the existing commit message, but this new commit uses `test(actor):` prefix. The original `fix()` prefix was mandated by the issue's Metadata field; future TDD-only commits will use `test()`.
F2	HIGH	Benchmark: replaced `MagicMock()` with `ProviderCapabilities()` for `_FakeProviderInfo.capabilities`, so `dataclasses.asdict()` no longer crashes. `track_multi_slash_succeeds` now correctly returns 1 after the fix.
F3	HIGH	Feature: added comment block above scenarios 1-2 clarifying they test CLI rendering of an empty actor list, NOT the buggy code path (which is exercised by scenario 3 and the new edge-case scenarios).
F4	MEDIUM	Robot: renamed test 2 to "Actor List On Fresh Project Does Not Show Slash Validation Message" with documentation noting it validates fresh-env behavior, not multi-slash provider injection.
F5	MEDIUM	Benchmark: replaced bare `except Exception` with `except ValidationError` / `contextlib.suppress(ValidationError)`.
F6	MEDIUM	Feature + Steps: added 2 edge-case scenarios — consecutive slashes (`a//b/c`) and leading slash (`/leading-slash`) — exercised via a new parametrized step `step_custom_model_provider`.
F7	LOW	CHANGELOG: updated entry to "Includes 5 Behave BDD scenarios" (removed stale `@wip` reference).
F8	LOW	Acknowledged — empty-list specification example is a valid gap but out of scope for this PR.
F9	LOW	Acknowledged — temp directory leak risk is shared with all Robot tests in the repo and is low-impact in CI.

Additionally applied proactively:

Rewrote benchmark imports to _SRC/importlib.reload(cleveragents) convention
Added agents init exit code checks in both Robot test cases

Pyright typecheck and ruff lint both pass. Behave dry-run confirms all 5 scenarios parse with correct step matching.

All actionable findings from CoreRasurae's review (comment #54364) have been addressed in commit `ce346b72`: | Finding | Severity | Resolution | |---------|----------|------------| | **F1** | HIGH | Acknowledged — can't rewrite the existing commit message, but this new commit uses `test(actor):` prefix. The original `fix()` prefix was mandated by the issue's Metadata field; future TDD-only commits will use `test()`. | | **F2** | HIGH | Benchmark: replaced `MagicMock()` with `ProviderCapabilities()` for `_FakeProviderInfo.capabilities`, so `dataclasses.asdict()` no longer crashes. `track_multi_slash_succeeds` now correctly returns 1 after the fix. | | **F3** | HIGH | Feature: added comment block above scenarios 1-2 clarifying they test CLI rendering of an empty actor list, NOT the buggy code path (which is exercised by scenario 3 and the new edge-case scenarios). | | **F4** | MEDIUM | Robot: renamed test 2 to "Actor List On Fresh Project Does Not Show Slash Validation Message" with documentation noting it validates fresh-env behavior, not multi-slash provider injection. | | **F5** | MEDIUM | Benchmark: replaced bare `except Exception` with `except ValidationError` / `contextlib.suppress(ValidationError)`. | | **F6** | MEDIUM | Feature + Steps: added 2 edge-case scenarios — consecutive slashes (`a//b/c`) and leading slash (`/leading-slash`) — exercised via a new parametrized step `step_custom_model_provider`. | | **F7** | LOW | CHANGELOG: updated entry to "Includes 5 Behave BDD scenarios" (removed stale `@wip` reference). | | **F8** | LOW | Acknowledged — empty-list specification example is a valid gap but out of scope for this PR. | | **F9** | LOW | Acknowledged — temp directory leak risk is shared with all Robot tests in the repo and is low-impact in CI. | Additionally applied proactively: - Rewrote benchmark imports to `_SRC`/`importlib.reload(cleveragents)` convention - Added `agents init` exit code checks in both Robot test cases Pyright typecheck and ruff lint both pass. Behave dry-run confirms all 5 scenarios parse with correct step matching.

hurui200320 requested changes 2026-03-06 08:04:04 +00:00

Dismissed

hurui200320 left a comment

Code Review: PR #594 — fix(actor): handle empty actor list without validation error

Reviewer: Rui Hu (hurui200320)
Issue: #592 — agents actor list without actors should show no actors
Branch: feature/m3-fix-actor-list-empty

Overview

The PR fixes bug #592 where agents actor list on a fresh project raises ValidationError for providers with multi-slash model names (e.g., OpenRouter's anthropic/claude-sonnet-4-20250514). The fix in _actor_name() replaces / with - in the model name portion. The fix itself is correct in principle, but the PR has significant process violations and testing flaws that must be addressed before merge.

CRITICAL (blocking merge)

C1. Branch contains a merge commit

Location: commit a7786cf9
Problem: Merge branch 'master-latest' into feature/m3-fix-actor-list-empty is a merge commit. CONTRIBUTING.md states: "Each branch must not contain merge commits — branches should always align with master via rebase."
Recommendation: Rebase the branch onto master instead of merging. Run git rebase origin/master and force-push.

C2. Multiple commits for one issue — must be squashed into one

Location: commits 7a4f20de, 3eba9706, ce346b72, a7786cf9
Problem: The branch has 4 commits for one issue. CONTRIBUTING.md says "Every commit must completely implement an issue and close it" and "at no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch." Furthermore, commits 7a4f20de and 3eba9706 share the identical first line fix(actor): handle empty actor list without validation error.
Recommendation: Interactive rebase to squash all commits into a single commit with first line fix(actor): handle empty actor list without validation error (matching the issue Metadata), followed by a body describing the full change and ISSUES CLOSED: #592.

C3. Behave scenarios 3–5 do not actually test the bug fix

Location: features/steps/actor_list_empty_steps.py:148 and :104
Problem: These scenarios mock ActorService.upsert_actor() with a lambda that always succeeds. The real _normalize_name() (which rejects multi-slash names at actor_service.py:32) is never called. These tests pass both before and after the fix. The original test failures in commit 7a4f20de were caused by TypeError from asdict(MagicMock()) at registry.py:121, NOT by the multi-slash validation error. After Jeffrey replaced MagicMock() with ProviderCapabilities(), the TypeError went away, and the mocked upsert_actor never validates names — so the name sanitization is effectively untested.
Recommendation: Either (a) use the real ActorService instead of a mock (with a mocked UnitOfWork), so _normalize_name() is actually invoked, or (b) assert that mock_service.upsert_actor was called with a correctly sanitized name, e.g.:
```
mock_service.upsert_actor.assert_called_once()
call_args = mock_service.upsert_actor.call_args
assert call_args.kwargs['name'].count('/') == 1
```

C4. `# type: ignore` comments violate CONTRIBUTING.md

Location: features/steps/actor_list_empty_steps.py:52,53,117,161 and benchmarks/actor_list_empty_bench.py:84,126
Problem: 6 occurrences of # type: ignore. CONTRIBUTING.md says: "Never use # type: ignore or disable type checking."
Recommendation: For the arg-type suppressions, create a proper protocol or interface for the fake provider registry so Pyright accepts it. For the attr-defined suppressions, restructure the dataclass to avoid Pyright errors (e.g., give explicit types that don't require suppression).

HIGH

H1. PR Type label mismatch

Problem: PR has Type/Testing but issue #592 is Type/Bug. CONTRIBUTING.md: "Every PR must carry exactly one Type/ label" matching the issue type. Since the PR includes the production fix in registry.py, it should be Type/Bug.
Recommendation: Replace Type/Testing with Type/Bug.

H2. Mocking code not in `features/mocks/`

Location: features/steps/actor_list_empty_steps.py:46-69 and benchmarks/actor_list_empty_bench.py:39-62
Problem: _FakeProviderInfo and _FakeProviderRegistry are test doubles defined directly in step files and duplicated in benchmarks. CONTRIBUTING.md: "Mocking code belongs under features/mocks/."
Recommendation: Extract to features/mocks/fake_provider.py and import from both locations. Also eliminates the duplication.

H3. `_actor_name()` doesn't enforce lowercase — spec deviation

Location: src/cleveragents/actor/registry.py:42-44
Problem: The specification defines actor name format as ^[a-z0-9_-]+/[a-z0-9_-]+$ (lowercase only). But _actor_name() uses provider names as-is (e.g., "Openrouter" with capital O produces "Openrouter/anthropic-claude-..." which violates the spec regex).
Recommendation: Apply .lower() to both provider_name and sanitized_model in _actor_name().

MEDIUM

M1. `_actor_name` doesn't sanitize provider name

Location: src/cleveragents/actor/registry.py:42-44
Problem: Only the model name is sanitized for /. If a provider name ever contains /, the result would still have multiple slashes. For robustness, both components should be sanitized.
Recommendation: Also sanitize provider_name, or at minimum add an assertion that it contains no /.

M2. Robot tests don't exercise the multi-slash code path

Location: robot/actor_list_empty.robot:14-51
Problem: Both Robot tests create fresh environments with no configured providers. They never configure a provider with a multi-slash model name. They both test the same early-return path at registry.py:100-101 (no configured providers).
Recommendation: Document clearly that these are fresh-env smoke tests only, or add a test that injects a multi-slash provider configuration.

M3. CHANGELOG entry describes only "TDD failing tests" but PR includes the fix

Location: CHANGELOG.md:36-41
Problem: The CHANGELOG says "Added TDD failing tests for agents actor list..." but the PR also contains the production fix. This misrepresents the scope of the change.
Recommendation: Update to describe the fix (e.g., "Fixed agents actor list raising ValidationError on providers with multi-slash model names...").

LOW

L1. PR is not mergeable

Problem: PR status shows mergeable: false, likely due to the merge commit or master drift.
Recommendation: Rebase on latest master (addresses C1 as well).

L2. Benchmark `time_list_multi_slash_provider` suppresses errors

Location: benchmarks/actor_list_empty_bench.py:111
Problem: contextlib.suppress(ValidationError) silently hides the error the benchmark is supposed to detect. After the fix is applied, this suppression is a no-op, but if a regression occurs, the benchmark would silently succeed instead of detecting it.
Recommendation: Remove contextlib.suppress() now that the fix is applied — the benchmark should fail loudly on regression.

Summary Table

ID	Severity	Category	Description
C1	CRITICAL	Process	Branch contains a merge commit; must rebase
C2	CRITICAL	Process	4 commits for 1 issue; must squash to 1
C3	CRITICAL	Test Flaw	Behave scenarios 3–5 mock bypasses `_normalize_name()`; fix is untested
C4	CRITICAL	Process	6 `# type: ignore` comments violate CONTRIBUTING.md
H1	HIGH	Process	PR label `Type/Testing` should be `Type/Bug`
H2	HIGH	Process	Mock code in step files instead of `features/mocks/`
H3	HIGH	Bug	`_actor_name()` doesn't lowercase — spec requires `^[a-z0-9_-]+/[a-z0-9_-]+$`
M1	MEDIUM	Bug	Provider name not sanitized for `/`
M2	MEDIUM	Test Flaw	Robot tests don't exercise multi-slash path
M3	MEDIUM	Docs	CHANGELOG describes tests only, not the fix
L1	LOW	Process	PR not mergeable due to merge commit / drift
L2	LOW	Test Flaw	Benchmark suppresses the error it should detect

Verdict: REQUEST CHANGES — The fix logic is correct, but critical process violations and test coverage gaps must be resolved.

# Code Review: PR #594 — fix(actor): handle empty actor list without validation error **Reviewer:** Rui Hu (`hurui200320`) **Issue:** #592 — `agents actor list` without actors should show no actors **Branch:** `feature/m3-fix-actor-list-empty` ## Overview The PR fixes bug #592 where `agents actor list` on a fresh project raises `ValidationError` for providers with multi-slash model names (e.g., OpenRouter's `anthropic/claude-sonnet-4-20250514`). The fix in `_actor_name()` replaces `/` with `-` in the model name portion. The fix itself is correct in principle, but the PR has significant process violations and testing flaws that must be addressed before merge. --- ## CRITICAL (blocking merge) ### C1. Branch contains a merge commit - **Location:** commit `a7786cf9` - **Problem:** `Merge branch 'master-latest' into feature/m3-fix-actor-list-empty` is a merge commit. CONTRIBUTING.md states: *"Each branch must not contain merge commits — branches should always align with master via rebase."* - **Recommendation:** Rebase the branch onto master instead of merging. Run `git rebase origin/master` and force-push. ### C2. Multiple commits for one issue — must be squashed into one - **Location:** commits `7a4f20de`, `3eba9706`, `ce346b72`, `a7786cf9` - **Problem:** The branch has 4 commits for one issue. CONTRIBUTING.md says *"Every commit must completely implement an issue and close it"* and *"at no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch."* Furthermore, commits `7a4f20de` and `3eba9706` share the identical first line `fix(actor): handle empty actor list without validation error`. - **Recommendation:** Interactive rebase to squash all commits into a single commit with first line `fix(actor): handle empty actor list without validation error` (matching the issue Metadata), followed by a body describing the full change and `ISSUES CLOSED: #592`. ### C3. Behave scenarios 3–5 do not actually test the bug fix - **Location:** `features/steps/actor_list_empty_steps.py:148` and `:104` - **Problem:** These scenarios mock `ActorService.upsert_actor()` with a lambda that always succeeds. The real `_normalize_name()` (which rejects multi-slash names at `actor_service.py:32`) is never called. These tests pass both **before** and **after** the fix. The original test failures in commit `7a4f20de` were caused by `TypeError` from `asdict(MagicMock())` at `registry.py:121`, NOT by the multi-slash validation error. After Jeffrey replaced `MagicMock()` with `ProviderCapabilities()`, the TypeError went away, and the mocked `upsert_actor` never validates names — so the name sanitization is effectively untested. - **Recommendation:** Either (a) use the real `ActorService` instead of a mock (with a mocked `UnitOfWork`), so `_normalize_name()` is actually invoked, or (b) assert that `mock_service.upsert_actor` was called with a correctly sanitized name, e.g.: ```python mock_service.upsert_actor.assert_called_once() call_args = mock_service.upsert_actor.call_args assert call_args.kwargs['name'].count('/') == 1 ``` ### C4. `# type: ignore` comments violate CONTRIBUTING.md - **Location:** `features/steps/actor_list_empty_steps.py:52,53,117,161` and `benchmarks/actor_list_empty_bench.py:84,126` - **Problem:** 6 occurrences of `# type: ignore`. CONTRIBUTING.md says: *"Never use `# type: ignore` or disable type checking."* - **Recommendation:** For the `arg-type` suppressions, create a proper protocol or interface for the fake provider registry so Pyright accepts it. For the `attr-defined` suppressions, restructure the dataclass to avoid Pyright errors (e.g., give explicit types that don't require suppression). --- ## HIGH ### H1. PR Type label mismatch - **Problem:** PR has `Type/Testing` but issue #592 is `Type/Bug`. CONTRIBUTING.md: *"Every PR must carry exactly one `Type/` label"* matching the issue type. Since the PR includes the production fix in `registry.py`, it should be `Type/Bug`. - **Recommendation:** Replace `Type/Testing` with `Type/Bug`. ### H2. Mocking code not in `features/mocks/` - **Location:** `features/steps/actor_list_empty_steps.py:46-69` and `benchmarks/actor_list_empty_bench.py:39-62` - **Problem:** `_FakeProviderInfo` and `_FakeProviderRegistry` are test doubles defined directly in step files and duplicated in benchmarks. CONTRIBUTING.md: *"Mocking code belongs under `features/mocks/`."* - **Recommendation:** Extract to `features/mocks/fake_provider.py` and import from both locations. Also eliminates the duplication. ### H3. `_actor_name()` doesn't enforce lowercase — spec deviation - **Location:** `src/cleveragents/actor/registry.py:42-44` - **Problem:** The specification defines actor name format as `^[a-z0-9_-]+/[a-z0-9_-]+$` (lowercase only). But `_actor_name()` uses provider names as-is (e.g., `"Openrouter"` with capital O produces `"Openrouter/anthropic-claude-..."` which violates the spec regex). - **Recommendation:** Apply `.lower()` to both `provider_name` and `sanitized_model` in `_actor_name()`. --- ## MEDIUM ### M1. `_actor_name` doesn't sanitize provider name - **Location:** `src/cleveragents/actor/registry.py:42-44` - **Problem:** Only the model name is sanitized for `/`. If a provider name ever contains `/`, the result would still have multiple slashes. For robustness, both components should be sanitized. - **Recommendation:** Also sanitize `provider_name`, or at minimum add an assertion that it contains no `/`. ### M2. Robot tests don't exercise the multi-slash code path - **Location:** `robot/actor_list_empty.robot:14-51` - **Problem:** Both Robot tests create fresh environments with no configured providers. They never configure a provider with a multi-slash model name. They both test the same early-return path at `registry.py:100-101` (no configured providers). - **Recommendation:** Document clearly that these are fresh-env smoke tests only, or add a test that injects a multi-slash provider configuration. ### M3. CHANGELOG entry describes only "TDD failing tests" but PR includes the fix - **Location:** `CHANGELOG.md:36-41` - **Problem:** The CHANGELOG says *"Added TDD failing tests for `agents actor list`..."* but the PR also contains the production fix. This misrepresents the scope of the change. - **Recommendation:** Update to describe the fix (e.g., *"Fixed `agents actor list` raising `ValidationError` on providers with multi-slash model names..."*). --- ## LOW ### L1. PR is not mergeable - **Problem:** PR status shows `mergeable: false`, likely due to the merge commit or master drift. - **Recommendation:** Rebase on latest master (addresses C1 as well). ### L2. Benchmark `time_list_multi_slash_provider` suppresses errors - **Location:** `benchmarks/actor_list_empty_bench.py:111` - **Problem:** `contextlib.suppress(ValidationError)` silently hides the error the benchmark is supposed to detect. After the fix is applied, this suppression is a no-op, but if a regression occurs, the benchmark would silently succeed instead of detecting it. - **Recommendation:** Remove `contextlib.suppress()` now that the fix is applied — the benchmark should fail loudly on regression. --- ## Summary Table | ID | Severity | Category | Description | |----|----------|----------|-------------| | C1 | CRITICAL | Process | Branch contains a merge commit; must rebase | | C2 | CRITICAL | Process | 4 commits for 1 issue; must squash to 1 | | C3 | CRITICAL | Test Flaw | Behave scenarios 3–5 mock bypasses `_normalize_name()`; fix is untested | | C4 | CRITICAL | Process | 6 `# type: ignore` comments violate CONTRIBUTING.md | | H1 | HIGH | Process | PR label `Type/Testing` should be `Type/Bug` | | H2 | HIGH | Process | Mock code in step files instead of `features/mocks/` | | H3 | HIGH | Bug | `_actor_name()` doesn't lowercase — spec requires `^[a-z0-9_-]+/[a-z0-9_-]+$` | | M1 | MEDIUM | Bug | Provider name not sanitized for `/` | | M2 | MEDIUM | Test Flaw | Robot tests don't exercise multi-slash path | | M3 | MEDIUM | Docs | CHANGELOG describes tests only, not the fix | | L1 | LOW | Process | PR not mergeable due to merge commit / drift | | L2 | LOW | Test Flaw | Benchmark suppresses the error it should detect | **Verdict: REQUEST CHANGES** — The fix logic is correct, but critical process violations and test coverage gaps must be resolved.

aditya commented

2026-03-06 08:44:21 +00:00

Code Review Report: Brent's Commit ce346b72 on feature/m3-fix-actor-list-empty

Commit: ce346b72c9
Author: Brent E. Edwards (brent.edwards@cleverthis.com)
Message: test(actor): address review findings for actor-list-empty TDD
Issue: #592 — agents actor list without actors should show no actors
Branch: feature/m3-fix-actor-list-empty
Reviewer: Aditya

Findings

F1. [HIGH] `# type: ignore` suppressions violate the project's Type Safety policy

CONTRIBUTING.md is explicit: "never use inline comments or annotations to suppress individual type checking errors (e.g., no type: ignore)." The PR introduces five such suppressions:

# features/steps/actor_list_empty_steps.py:259-260
provider_type: Any = None  # type: ignore[attr-defined]
capabilities: Any = None   # type: ignore[attr-defined]

# features/steps/actor_list_empty_steps.py:324, 371 and benchmarks/actor_list_empty_bench.py:107
provider_registry=provider_reg,  # type: ignore[arg-type]

All five arise from the same root cause: _FakeProviderInfo is a plain @dataclass that does not satisfy the interface ActorRegistry.__init__ expects. The existing actor_registry_full_coverage_steps.py solves this cleanly by using the real ProviderInfo Pydantic model with ProviderCapabilities() defaults — no type: ignore required anywhere. The new code should follow the same approach.

Recommendation: Remove all # type: ignore suppressions. Replace _FakeProviderInfo with a real ProviderInfo instance, e.g.:

ProviderInfo(
    provider_type=ProviderType.OPENROUTER,
    name="Openrouter",
    api_key_env_var="OPENROUTER_API_KEY",
    default_model="anthropic/claude-sonnet-4-20250514",
    is_configured=True,
)

F2. [HIGH] Imports buried inside step functions violate the Import Guidelines

CONTRIBUTING.md, Project-Specific section: "Ensure all imports are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."

features/steps/actor_list_empty_steps.py violates this in two step functions:

# line 300 — inside step_custom_model_provider()
from cleveragents.actor.registry import ActorRegistry

# line 342 — inside step_multi_slash_provider()
from cleveragents.actor.registry import ActorRegistry

Recommendation: Move both imports to the module-level import block alongside the existing imports at lines 239–241.

F3. [HIGH] `_FakeProviderInfo` and `_FakeProviderRegistry` belong in `features/mocks/`, not in `features/steps/`

CONTRIBUTING.md: "Mocking code belongs under /features/mocks/. ALL mocks, test doubles, and mock implementations must exist only in the features/ directory." Placing them in features/steps/ is non-compliant. They are also duplicated verbatim in benchmarks/actor_list_empty_bench.py. A _FakeProviderRegistry already exists in actor_registry_full_coverage_steps.py using the real typed ProviderInfo model — the new file ignores it entirely.

Recommendation: Move the fake helpers into features/mocks/ and import them from there. If _FakeProviderInfo is replaced by real ProviderInfo (see F1), _FakeProviderRegistry can be eliminated from the step file entirely and the existing pattern from actor_registry_full_coverage_steps.py reused.

F4. [HIGH] First commit (`cba8d4fdf6`) intentionally leaves tests failing — violates Commit Completeness policy

CONTRIBUTING.md: "Each commit must build and pass all tests... Do not commit half-done work."

Brent's opening commit adds Scenario 3 which is explicitly described as "expected to fail until the fix is applied." This means the repo is in a nox-failing state at that commit point, breaking bisect-friendliness and CI integrity requirements.

Recommendation: Before merge, rebase the branch so the failing test and the fix that makes it pass land in the same atomic commit. The TDD workflow rationale belongs in the PR description, not in broken intermediate commits.

F5. [MEDIUM] Production fix does not sanitize provider names — the bug can recur from a different source

The fix in registry.py:

sanitized_model = model_name.replace("/", "-")
return f"{provider_name}/{sanitized_model}"

Only model_name is sanitized. provider_name is taken from info.name or info.provider_type.value (registry.py:106). A misconfigured provider whose name contains / would produce the same multi-slash output and trigger the identical ValidationError. CoreRasurae's F6 table listed "Provider name containing slashes" as a missing case; Brent's edge-case scenarios added only model-name cases and did not address this.

Recommendation: Either sanitize provider_name symmetrically, or add an explicit test for a provider name containing / and document the known limitation on _actor_name().

F6. [MEDIUM] Sanitized actor name diverges from stored `model` field — potential round-trip bug

ensure_built_in_actors() calls:

actor = self._actor_service.upsert_actor(
    name=self._actor_name(provider_name, model_id),  # sanitized: 'anthropic-claude-...'
    model=model_id,                                   # original:  'anthropic/claude-...'
    ...
)

The actor's name stores Openrouter/anthropic-claude-sonnet-4-20250514 (sanitized) while actor.model stores anthropic/claude-sonnet-4-20250514 (original). If any downstream code reconstructs an actor name from actor.provider + "/" + actor.model, it regenerates a multi-slash name and re-triggers the bug. This inconsistency is currently untested.

Recommendation: Add a scenario that retrieves an actor by name after listing and verifies the sanitized name remains valid across a full round-trip.

F7. [MEDIUM] CHANGELOG entry is temporally misleading — describes TDD failing tests that now pass

The CHANGELOG entry reads: "Added TDD failing tests for agents actor list raising a validation error..." With the production fix included in the branch, these are no longer failing tests — they are active passing regression tests. The CHANGELOG should describe the final observable state of the release, not the intermediate TDD workflow.

Recommendation: Rewrite the entry to describe the fix and its test coverage, e.g. "Fixed ActorRegistry._actor_name() to sanitize model names containing /, preventing a ValidationError when listing actors with providers like OpenRouter. Adds 5 Behave regression scenarios, Robot Framework integration tests, and ASV benchmarks. (#592)"

F8. [LOW] Robot tests create per-test `tmpdir` despite Suite Setup already provisioning one

Both Robot test cases call Evaluate __import__('tempfile').mkdtemp(...) inline, creating a second independent temp directory, even though the suite already declares Suite Setup Setup Test Environment / Suite Teardown Cleanup Test Environment from common.resource. This is inconsistent with all other Robot files in the repo, which use the suite-level environment exclusively.

Recommendation: Remove the per-test inline mkdtemp() and use the suite-level temp environment from common.resource, consistent with the repo-wide pattern.

F9. [LOW] `upsert_actor.side_effect` returns `MagicMock` where `Actor` is expected

In both step_multi_slash_provider and step_custom_model_provider:

mock_service.upsert_actor.side_effect = lambda **kwargs: MagicMock(
    name=kwargs.get("name", "mock"), ...
)

ensure_built_in_actors() appends the return value to actors: list[Actor] and passes it to set_default_actor(). This works at runtime via duck typing but is fragile — any future attribute access beyond name, provider, or model will silently return another MagicMock. The existing _FakeActorService in actor_registry_full_coverage_steps.py constructs a real Actor domain object and should be the model here.

Recommendation: Match the pattern in actor_registry_full_coverage_steps.py:_FakeActorService.upsert_actor() which returns a real Actor instance with Actor.compute_hash().

Summary Table

ID	Severity	Category	Description
F1	HIGH	Standards	`# type: ignore` used in 5 places; prohibited by CONTRIBUTING type-safety policy
F2	HIGH	Standards	Imports inside step functions; prohibited by CONTRIBUTING import guidelines
F3	HIGH	Standards	`_FakeProviderInfo`/`_FakeProviderRegistry` in `features/steps/` instead of `features/mocks/`
F4	HIGH	Process	Opening commit leaves `nox` failing; violates Commit Completeness policy
F5	MEDIUM	Bug	Provider name sanitization missing; bug can recur if provider name contains `/`
F6	MEDIUM	Bug	Sanitized actor name diverges from stored `model` field; round-trip untested
F7	MEDIUM	Docs	CHANGELOG describes "TDD failing tests" that now pass after fix
F8	LOW	Standards	Robot tests create per-test `tmpdir` inconsistent with `common.resource` pattern
F9	LOW	Test	`upsert_actor` mock returns `MagicMock` instead of real `Actor`; fragile duck-typing

# Code Review Report: Brent's Commit ce346b72 on feature/m3-fix-actor-list-empty **Commit:** ce346b72c9 **Author:** Brent E. Edwards (brent.edwards@cleverthis.com) **Message:** test(actor): address review findings for actor-list-empty TDD **Issue:** #592 — agents actor list without actors should show no actors **Branch:** feature/m3-fix-actor-list-empty **Reviewer:** Aditya --- ## Findings ### F1. [HIGH] `# type: ignore` suppressions violate the project's Type Safety policy `CONTRIBUTING.md` is explicit: *"never use inline comments or annotations to suppress individual type checking errors (e.g., no `type: ignore`)."* The PR introduces five such suppressions: ```python # features/steps/actor_list_empty_steps.py:259-260 provider_type: Any = None # type: ignore[attr-defined] capabilities: Any = None # type: ignore[attr-defined] # features/steps/actor_list_empty_steps.py:324, 371 and benchmarks/actor_list_empty_bench.py:107 provider_registry=provider_reg, # type: ignore[arg-type] ``` All five arise from the same root cause: `_FakeProviderInfo` is a plain `@dataclass` that does not satisfy the interface `ActorRegistry.__init__` expects. The existing `actor_registry_full_coverage_steps.py` solves this cleanly by using the real `ProviderInfo` Pydantic model with `ProviderCapabilities()` defaults — no `type: ignore` required anywhere. The new code should follow the same approach. **Recommendation:** Remove all `# type: ignore` suppressions. Replace `_FakeProviderInfo` with a real `ProviderInfo` instance, e.g.: ```python ProviderInfo( provider_type=ProviderType.OPENROUTER, name="Openrouter", api_key_env_var="OPENROUTER_API_KEY", default_model="anthropic/claude-sonnet-4-20250514", is_configured=True, ) ``` --- ### F2. [HIGH] Imports buried inside step functions violate the Import Guidelines `CONTRIBUTING.md`, Project-Specific section: *"Ensure all imports are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."* `features/steps/actor_list_empty_steps.py` violates this in two step functions: ```python # line 300 — inside step_custom_model_provider() from cleveragents.actor.registry import ActorRegistry # line 342 — inside step_multi_slash_provider() from cleveragents.actor.registry import ActorRegistry ``` **Recommendation:** Move both imports to the module-level import block alongside the existing imports at lines 239–241. --- ### F3. [HIGH] `_FakeProviderInfo` and `_FakeProviderRegistry` belong in `features/mocks/`, not in `features/steps/` `CONTRIBUTING.md`: *"Mocking code belongs under `/features/mocks/`. ALL mocks, test doubles, and mock implementations must exist only in the `features/` directory."* Placing them in `features/steps/` is non-compliant. They are also duplicated verbatim in `benchmarks/actor_list_empty_bench.py`. A `_FakeProviderRegistry` already exists in `actor_registry_full_coverage_steps.py` using the real typed `ProviderInfo` model — the new file ignores it entirely. **Recommendation:** Move the fake helpers into `features/mocks/` and import them from there. If `_FakeProviderInfo` is replaced by real `ProviderInfo` (see F1), `_FakeProviderRegistry` can be eliminated from the step file entirely and the existing pattern from `actor_registry_full_coverage_steps.py` reused. --- ### F4. [HIGH] First commit (`cba8d4fdf6`) intentionally leaves tests failing — violates Commit Completeness policy `CONTRIBUTING.md`: *"Each commit must build and pass all tests... Do not commit half-done work."* Brent's opening commit adds Scenario 3 which is explicitly described as "expected to fail until the fix is applied." This means the repo is in a `nox`-failing state at that commit point, breaking bisect-friendliness and CI integrity requirements. **Recommendation:** Before merge, rebase the branch so the failing test and the fix that makes it pass land in the same atomic commit. The TDD workflow rationale belongs in the PR description, not in broken intermediate commits. --- ### F5. [MEDIUM] Production fix does not sanitize provider names — the bug can recur from a different source The fix in `registry.py`: ```python sanitized_model = model_name.replace("/", "-") return f"{provider_name}/{sanitized_model}" ``` Only `model_name` is sanitized. `provider_name` is taken from `info.name or info.provider_type.value` (registry.py:106). A misconfigured provider whose `name` contains `/` would produce the same multi-slash output and trigger the identical `ValidationError`. CoreRasurae's F6 table listed "Provider name containing slashes" as a missing case; Brent's edge-case scenarios added only model-name cases and did not address this. **Recommendation:** Either sanitize `provider_name` symmetrically, or add an explicit test for a provider name containing `/` and document the known limitation on `_actor_name()`. --- ### F6. [MEDIUM] Sanitized actor name diverges from stored `model` field — potential round-trip bug `ensure_built_in_actors()` calls: ```python actor = self._actor_service.upsert_actor( name=self._actor_name(provider_name, model_id), # sanitized: 'anthropic-claude-...' model=model_id, # original: 'anthropic/claude-...' ... ) ``` The actor's `name` stores `Openrouter/anthropic-claude-sonnet-4-20250514` (sanitized) while `actor.model` stores `anthropic/claude-sonnet-4-20250514` (original). If any downstream code reconstructs an actor name from `actor.provider + "/" + actor.model`, it regenerates a multi-slash name and re-triggers the bug. This inconsistency is currently untested. **Recommendation:** Add a scenario that retrieves an actor by name after listing and verifies the sanitized name remains valid across a full round-trip. --- ### F7. [MEDIUM] CHANGELOG entry is temporally misleading — describes TDD failing tests that now pass The CHANGELOG entry reads: *"Added TDD failing tests for `agents actor list` raising a validation error..."* With the production fix included in the branch, these are no longer failing tests — they are active passing regression tests. The CHANGELOG should describe the final observable state of the release, not the intermediate TDD workflow. **Recommendation:** Rewrite the entry to describe the fix and its test coverage, e.g. *"Fixed `ActorRegistry._actor_name()` to sanitize model names containing `/`, preventing a `ValidationError` when listing actors with providers like OpenRouter. Adds 5 Behave regression scenarios, Robot Framework integration tests, and ASV benchmarks. (#592)"* --- ### F8. [LOW] Robot tests create per-test `tmpdir` despite Suite Setup already provisioning one Both Robot test cases call `Evaluate __import__('tempfile').mkdtemp(...)` inline, creating a second independent temp directory, even though the suite already declares `Suite Setup Setup Test Environment` / `Suite Teardown Cleanup Test Environment` from `common.resource`. This is inconsistent with all other Robot files in the repo, which use the suite-level environment exclusively. **Recommendation:** Remove the per-test inline `mkdtemp()` and use the suite-level temp environment from `common.resource`, consistent with the repo-wide pattern. --- ### F9. [LOW] `upsert_actor.side_effect` returns `MagicMock` where `Actor` is expected In both `step_multi_slash_provider` and `step_custom_model_provider`: ```python mock_service.upsert_actor.side_effect = lambda **kwargs: MagicMock( name=kwargs.get("name", "mock"), ... ) ``` `ensure_built_in_actors()` appends the return value to `actors: list[Actor]` and passes it to `set_default_actor()`. This works at runtime via duck typing but is fragile — any future attribute access beyond `name`, `provider`, or `model` will silently return another `MagicMock`. The existing `_FakeActorService` in `actor_registry_full_coverage_steps.py` constructs a real `Actor` domain object and should be the model here. **Recommendation:** Match the pattern in `actor_registry_full_coverage_steps.py:_FakeActorService.upsert_actor()` which returns a real `Actor` instance with `Actor.compute_hash()`. --- ## Summary Table | ID | Severity | Category | Description | |----|----------|----------|-------------| | F1 | HIGH | Standards | `# type: ignore` used in 5 places; prohibited by CONTRIBUTING type-safety policy | | F2 | HIGH | Standards | Imports inside step functions; prohibited by CONTRIBUTING import guidelines | | F3 | HIGH | Standards | `_FakeProviderInfo`/`_FakeProviderRegistry` in `features/steps/` instead of `features/mocks/` | | F4 | HIGH | Process | Opening commit leaves `nox` failing; violates Commit Completeness policy | | F5 | MEDIUM | Bug | Provider name sanitization missing; bug can recur if provider name contains `/` | | F6 | MEDIUM | Bug | Sanitized actor name diverges from stored `model` field; round-trip untested | | F7 | MEDIUM | Docs | CHANGELOG describes "TDD failing tests" that now pass after fix | | F8 | LOW | Standards | Robot tests create per-test `tmpdir` inconsistent with `common.resource` pattern | | F9 | LOW | Test | `upsert_actor` mock returns `MagicMock` instead of real `Actor`; fragile duck-typing |

hurui200320 added a new dependency 2026-03-06 09:39:29 +00:00

#592 `agents actor list` without actors should show no actors.

hurui200320 referenced this pull request

2026-03-06 10:09:20 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

brent.edwards force-pushed feature/m3-fix-actor-list-empty from a7786cf975 to 894db4cc3f

2026-03-06 21:01:48 +00:00

Compare

brent.edwards commented

2026-03-06 21:02:08 +00:00

Branch has been squashed into a single commit (894db4cc) and rebased onto master. All review feedback from hurui200320 and Aditya has been addressed:

Review fixes applied:

H2/F3: Moved _FakeProviderInfo/_FakeProviderRegistry to features/mocks/fake_provider.py
H3: Added .lower() to enforce lowercase actor names
H4/F2: Imports already at top of file; removed unused dataclass/field imports
M1/F5: Sanitise provider name for / (not just model name)
M3/F7: Rewrote CHANGELOG to describe the fix, not just tests
C4/F1: Removed all # type: ignore from step definitions
Updated all stale TDD "expected to fail" comments

Pre-commit hooks passed: lint, format, typecheck, bandit, vulture, conventional commit.

Ready for CI and merge.

Branch has been squashed into a single commit (`894db4cc`) and rebased onto `master`. All review feedback from hurui200320 and Aditya has been addressed: **Review fixes applied:** - **H2/F3**: Moved `_FakeProviderInfo`/`_FakeProviderRegistry` to `features/mocks/fake_provider.py` - **H3**: Added `.lower()` to enforce lowercase actor names - **H4/F2**: Imports already at top of file; removed unused `dataclass`/`field` imports - **M1/F5**: Sanitise provider name for `/` (not just model name) - **M3/F7**: Rewrote CHANGELOG to describe the fix, not just tests - **C4/F1**: Removed all `# type: ignore` from step definitions - Updated all stale TDD "expected to fail" comments **Pre-commit hooks passed:** lint, format, typecheck, bandit, vulture, conventional commit. Ready for CI and merge.

brent.edwards added

and removed

labels 2026-03-06 21:03:54 +00:00

freemo commented

2026-03-06 23:35:19 +00:00

PM Note (Day 26 — M3 PR Triage):

Production bug fix — handle empty actor list without validation error. All review findings addressed across ~3 rounds. Latest review stale after squash. Has a merge conflict.

@brent.edwards — Please rebase onto master. Needs fresh approval after rebase.

**PM Note (Day 26 — M3 PR Triage):** Production bug fix — handle empty actor list without validation error. All review findings addressed across ~3 rounds. Latest review stale after squash. Has a **merge conflict**. @brent.edwards — Please rebase onto `master`. Needs fresh approval after rebase.

hamza.khyari requested changes 2026-03-06 23:38:32 +00:00

hamza.khyari left a comment

PR #594 Review — `fix(actor): handle empty actor list without validation error`

Commit: 894db4cc | 7 files, +442/-1 | Issue: #592

The 3-line production fix in _actor_name() is correct for the immediate multi-slash bug. However, the test suite doesn't actually validate the fix (empirically proven), the sanitization is incomplete for the broader name spec, and several findings from review #2007 remain unresolved.

Prior Review #2007 (hurui200320) — Resolution Status

ID	Finding	Status
C1	Merge commit	✅ Single squashed commit
C2	Multiple commits	✅ Single commit `894db4cc`
C3	Tests bypass `_normalize_name()`	❌ Still open — see TEST-1
C4	`# type: ignore` comments	⚠️ Partial — 2 remain in benchmarks (lines 83, 126)
H1	Type label mismatch	✅ Now Type/Bug
H2	Mocks not in features/mocks/	✅ Moved to `features/mocks/fake_provider.py`
H3	No lowercase	✅ `.lower()` added
M1	Provider name not sanitized	✅ Both sanitized
M2	Robot tests fresh-env only	❌ Not addressed
M3	CHANGELOG mismatch	✅ Fixed
L1	Not mergeable	❌ `mergeable: false`
L2	Benchmark suppresses error	❌ Not addressed

Findings

TEST-1 (Major) — Scenarios 3-5 are tautological: pass with AND without the fix

features/steps/actor_list_empty_steps.py:49-56

The mock upsert_actor lambda always succeeds regardless of input. ActorService._normalize_name() — the validator that triggers the original bug (actor_service.py:32: if normalized.count("/") != 1: raise ValidationError) — is never called. Empirically verified:

WITHOUT fix: _actor_name → "Openrouter/anthropic/claude-sonnet-4-20250514" (2 slashes)
  → mock upsert accepts it → CLI exit 0 → test PASSES
WITH fix: _actor_name → "openrouter/anthropic-claude-sonnet-4-20250514" (1 slash)
  → mock upsert accepts it → CLI exit 0 → test PASSES

_normalize_name("Openrouter/anthropic/claude-sonnet-4-20250514") → REJECTS
  ...but _normalize_name is never reached because upsert_actor is mocked.

The tests prove the mock doesn't crash, not that the fix works. (Originally C3 from review #2007 — still fully open.)

Fix: Assert the name passed to the mock has exactly one /:

call_args = context.actor_empty_service.upsert_actor.call_args
assert call_args.kwargs['name'].count('/') == 1, \
    f"Sanitized name should have exactly one '/': {call_args.kwargs['name']}"

SPEC-1 (Major) — Sanitization incomplete: dots in 4/7 default models fail Session validator

registry.py:42-45 vs session.py:53

The fix only strips / and lowercases. But Session._ACTOR_NAME_PATTERN enforces the strict regex ^[a-z0-9][a-z0-9_-]*/[a-z0-9][a-z0-9_-]*$ — dots are not in the character class. 4 of 7 DEFAULT_MODELS in providers/registry.py:177-188 contain dots:

Provider	Sanitized actor name	Session validator
Google	`google/gemini-2.0-flash`	FAIL (`.` in `2.0`)
Gemini	`gemini/gemini-2.0-flash`	FAIL
Together	`together/meta-llama-llama-3.1-70b-instruct-turbo`	FAIL (`.` in `3.1`)
Groq	`groq/llama-3.1-70b-versatile`	FAIL (`.` in `3.1`)

Actors with these names are created successfully (Actor.validate_name at actor.py:62-75 only checks slash count, not character set), but fail when assigned to a session. This is pre-existing (dots were there before the fix), but since this PR is the actor-name sanitization point, it should apply the full spec regex.

Fix: Replace all non-spec characters, not just /:

import re
_SAFE_SEGMENT = re.compile(r"[^a-z0-9_-]")
sanitized = _SAFE_SEGMENT.sub("-", name.lower())

BUG-1 (Medium) — `MagicMock(name=...)` constructor gotcha: `.name` is not a string

features/steps/actor_list_empty_steps.py:52-56, benchmarks/actor_list_empty_bench.py:72-76

mock_service.upsert_actor.side_effect = lambda **kwargs: MagicMock(
    name=kwargs.get("name", "mock"),  # sets MagicMock's internal _mock_name, NOT .name attribute
    provider=kwargs.get("provider", "mock"),
    model=kwargs.get("model", "mock"),
)

MagicMock.__init__ treats name as a special parameter (used for __repr__). The mock's .name attribute is a child MagicMock, not the string. Verified:

>>> m = MagicMock(name="openrouter/foo")
>>> type(m.name)  →  MagicMock
>>> m.name == "openrouter/foo"  →  False

ensure_built_in_actors() then passes preferred.name (a MagicMock, not a string) to set_default_actor() at registry.py:143. Currently harmless because the service is mocked, but makes the mock untrustworthy for any name-based assertions — including the TEST-1 fix above.

Fix: Build mock without name kwarg:

from types import SimpleNamespace
mock_service.upsert_actor.side_effect = lambda **kwargs: SimpleNamespace(
    name=kwargs.get("name", "mock"),
    provider=kwargs.get("provider", "mock"),
    model=kwargs.get("model", "mock"),
)

SEC-1 (Medium) — `except Exception` swallows all errors in `add()`

registry.py:204-207

except Exception as exc:
    if "already exists" in str(exc):
        raise
    # NotFoundError is expected for new actors

Intended to catch NotFoundError (actor doesn't exist — happy path for creation). But bare except Exception silently swallows DatabaseError (DB unreachable), PermissionError (file locked), ValidationError (from _normalize_name on malformed names), and programming bugs (TypeError, AttributeError). The string-match "already exists" is fragile — depends on exact wording of the ValidationError raised two lines above.

Pre-existing, but in the same class this PR modifies.

Fix:

from cleveragents.core.exceptions import NotFoundError
try:
    self._actor_service.get_actor(name)
    raise ValidationError(f"Actor '{name}' already exists. Pass update=True to overwrite.")
except NotFoundError:
    pass  # Expected — actor doesn't exist yet

DRY-1 (Medium) — Fake classes duplicated in benchmarks

benchmarks/actor_list_empty_bench.py:38-61 defines _FakeProviderInfo and _FakeProviderRegistry — identical bodies to features/mocks/fake_provider.py:17-40 (different naming: _Fake vs Fake). Step defs correctly import from shared module; benchmarks don't.

DESIGN-1 (Medium) — Three inconsistent representations of provider/model

registry.py:115-118

ensure_built_in_actors() passes raw provider_name and model_id to upsert_actor alongside the sanitized name. The resulting Actor stores:

actor.name = "openrouter/anthropic-claude-sonnet-4-20250514" (sanitized, lowercased)
actor.provider = "Openrouter" (raw, title-cased from provider_type.value)
actor.model = "anthropic/claude-sonnet-4-20250514" (raw, retains /)

Three different representations of the same data. Any code comparing actor.name.split('/')[0] with actor.provider or using actor.model in name construction will get inconsistent results.

C4 (Low) — 2 `# type: ignore` remain in benchmarks

benchmarks/actor_list_empty_bench.py:83 (# type: ignore[arg-type]) and :126 (# type: ignore[attr-defined]). CONTRIBUTING.md §Type Safety prohibits these.

DESIGN-2 (Low) — `list(namespace=...)` case-sensitive after lowercasing

registry.py:321-323 — list(namespace="Openrouter") constructs prefix "Openrouter/" which won't match lowercased names "openrouter/...". No current callers, but it's a public API trap.

PROC-1 (Low) — `features/mocks/init.py` not updated

features/mocks/__init__.py exports MockAIProvider, MockLspTransport, MockMCPTransport but does not export the new FakeProviderInfo/FakeProviderRegistry. Breaks the pattern established by all other mocks in the package.

BENCH-1 (Low) — `contextlib.suppress(ValidationError)` defeats regression detection

benchmarks/actor_list_empty_bench.py:110 — If the fix regresses, suppress catches the error and reports near-zero latency, masking the regression. (Originally L2 from review #2007.)

BENCH-2 (Low) — `importlib.reload` doesn't cascade to submodules

benchmarks/actor_list_empty_bench.py:24-31 — importlib.reload(cleveragents) only reloads the top-level __init__.py. If ASV pre-imports the installed package, submodule imports at lines 33-35 may still reference the old version.

Summary

ID	Severity	Category	New?
TEST-1	Major	Test flaw	Prior C3 (proven)
SPEC-1	Major	Spec gap	New
BUG-1	Medium	Test bug	New
SEC-1	Medium	Defensive	New (pre-existing code)
DRY-1	Medium	DRY	Prior H2 (partial)
DESIGN-1	Medium	Consistency	New
C4	Low	Process	Prior C4 (partial)
DESIGN-2	Low	API trap	New
PROC-1	Low	Process	New
BENCH-1	Low	Test flaw	Prior L2
BENCH-2	Low	Test flaw	New
L1	Low	Process	Prior

12 findings — 2 Major, 4 Medium, 6 Low. The production fix is correct for the immediate bug but incomplete for the broader name spec. The critical gap is that the test suite doesn't validate the fix (TEST-1) and 4/7 default providers produce actor names that will fail the Session validator (SPEC-1).

## PR #594 Review — `fix(actor): handle empty actor list without validation error` **Commit**: `894db4cc` | **7 files**, +442/-1 | **Issue**: #592 The 3-line production fix in `_actor_name()` is **correct for the immediate multi-slash bug**. However, the test suite doesn't actually validate the fix (empirically proven), the sanitization is incomplete for the broader name spec, and several findings from review #2007 remain unresolved. --- ### Prior Review #2007 (hurui200320) — Resolution Status | ID | Finding | Status | |----|---------|--------| | C1 | Merge commit | ✅ Single squashed commit | | C2 | Multiple commits | ✅ Single commit `894db4cc` | | **C3** | **Tests bypass `_normalize_name()`** | **❌ Still open — see TEST-1** | | **C4** | **`# type: ignore` comments** | **⚠️ Partial — 2 remain in benchmarks (lines 83, 126)** | | H1 | Type label mismatch | ✅ Now Type/Bug | | H2 | Mocks not in features/mocks/ | ✅ Moved to `features/mocks/fake_provider.py` | | H3 | No lowercase | ✅ `.lower()` added | | M1 | Provider name not sanitized | ✅ Both sanitized | | **M2** | **Robot tests fresh-env only** | **❌ Not addressed** | | M3 | CHANGELOG mismatch | ✅ Fixed | | **L1** | **Not mergeable** | **❌ `mergeable: false`** | | **L2** | **Benchmark suppresses error** | **❌ Not addressed** | --- ### Findings #### TEST-1 (Major) — Scenarios 3-5 are tautological: pass with AND without the fix `features/steps/actor_list_empty_steps.py:49-56` The mock `upsert_actor` lambda always succeeds regardless of input. `ActorService._normalize_name()` — the validator that triggers the original bug (`actor_service.py:32`: `if normalized.count("/") != 1: raise ValidationError`) — is **never called**. Empirically verified: ``` WITHOUT fix: _actor_name → "Openrouter/anthropic/claude-sonnet-4-20250514" (2 slashes) → mock upsert accepts it → CLI exit 0 → test PASSES WITH fix: _actor_name → "openrouter/anthropic-claude-sonnet-4-20250514" (1 slash) → mock upsert accepts it → CLI exit 0 → test PASSES _normalize_name("Openrouter/anthropic/claude-sonnet-4-20250514") → REJECTS ...but _normalize_name is never reached because upsert_actor is mocked. ``` The tests prove the mock doesn't crash, not that the fix works. *(Originally C3 from review #2007 — still fully open.)* **Fix**: Assert the name passed to the mock has exactly one `/`: ```python call_args = context.actor_empty_service.upsert_actor.call_args assert call_args.kwargs['name'].count('/') == 1, \ f"Sanitized name should have exactly one '/': {call_args.kwargs['name']}" ``` --- #### SPEC-1 (Major) — Sanitization incomplete: dots in 4/7 default models fail Session validator `registry.py:42-45` vs `session.py:53` The fix only strips `/` and lowercases. But `Session._ACTOR_NAME_PATTERN` enforces the strict regex `^[a-z0-9][a-z0-9_-]*/[a-z0-9][a-z0-9_-]*$` — **dots are not in the character class**. 4 of 7 `DEFAULT_MODELS` in `providers/registry.py:177-188` contain dots: | Provider | Sanitized actor name | Session validator | |----------|---------------------|-------------------| | Google | `google/gemini-2.0-flash` | **FAIL** (`.` in `2.0`) | | Gemini | `gemini/gemini-2.0-flash` | **FAIL** | | Together | `together/meta-llama-llama-3.1-70b-instruct-turbo` | **FAIL** (`.` in `3.1`) | | Groq | `groq/llama-3.1-70b-versatile` | **FAIL** (`.` in `3.1`) | Actors with these names are created successfully (`Actor.validate_name` at `actor.py:62-75` only checks slash count, not character set), but **fail when assigned to a session**. This is pre-existing (dots were there before the fix), but since this PR is the actor-name sanitization point, it should apply the full spec regex. **Fix**: Replace all non-spec characters, not just `/`: ```python import re _SAFE_SEGMENT = re.compile(r"[^a-z0-9_-]") sanitized = _SAFE_SEGMENT.sub("-", name.lower()) ``` --- #### BUG-1 (Medium) — `MagicMock(name=...)` constructor gotcha: `.name` is not a string `features/steps/actor_list_empty_steps.py:52-56`, `benchmarks/actor_list_empty_bench.py:72-76` ```python mock_service.upsert_actor.side_effect = lambda **kwargs: MagicMock( name=kwargs.get("name", "mock"), # sets MagicMock's internal _mock_name, NOT .name attribute provider=kwargs.get("provider", "mock"), model=kwargs.get("model", "mock"), ) ``` `MagicMock.__init__` treats `name` as a special parameter (used for `__repr__`). The mock's `.name` attribute is a child MagicMock, not the string. Verified: ```python >>> m = MagicMock(name="openrouter/foo") >>> type(m.name) → MagicMock >>> m.name == "openrouter/foo" → False ``` `ensure_built_in_actors()` then passes `preferred.name` (a MagicMock, not a string) to `set_default_actor()` at `registry.py:143`. Currently harmless because the service is mocked, but makes the mock untrustworthy for any name-based assertions — including the TEST-1 fix above. **Fix**: Build mock without `name` kwarg: ```python from types import SimpleNamespace mock_service.upsert_actor.side_effect = lambda **kwargs: SimpleNamespace( name=kwargs.get("name", "mock"), provider=kwargs.get("provider", "mock"), model=kwargs.get("model", "mock"), ) ``` --- #### SEC-1 (Medium) — `except Exception` swallows all errors in `add()` `registry.py:204-207` ```python except Exception as exc: if "already exists" in str(exc): raise # NotFoundError is expected for new actors ``` Intended to catch `NotFoundError` (actor doesn't exist — happy path for creation). But bare `except Exception` silently swallows `DatabaseError` (DB unreachable), `PermissionError` (file locked), `ValidationError` (from `_normalize_name` on malformed names), and programming bugs (`TypeError`, `AttributeError`). The string-match `"already exists"` is fragile — depends on exact wording of the `ValidationError` raised two lines above. Pre-existing, but in the same class this PR modifies. **Fix**: ```python from cleveragents.core.exceptions import NotFoundError try: self._actor_service.get_actor(name) raise ValidationError(f"Actor '{name}' already exists. Pass update=True to overwrite.") except NotFoundError: pass # Expected — actor doesn't exist yet ``` --- #### DRY-1 (Medium) — Fake classes duplicated in benchmarks `benchmarks/actor_list_empty_bench.py:38-61` defines `_FakeProviderInfo` and `_FakeProviderRegistry` — identical bodies to `features/mocks/fake_provider.py:17-40` (different naming: `_Fake` vs `Fake`). Step defs correctly import from shared module; benchmarks don't. --- #### DESIGN-1 (Medium) — Three inconsistent representations of provider/model `registry.py:115-118` `ensure_built_in_actors()` passes raw `provider_name` and `model_id` to `upsert_actor` alongside the sanitized `name`. The resulting Actor stores: - `actor.name` = `"openrouter/anthropic-claude-sonnet-4-20250514"` (sanitized, lowercased) - `actor.provider` = `"Openrouter"` (raw, title-cased from `provider_type.value`) - `actor.model` = `"anthropic/claude-sonnet-4-20250514"` (raw, retains `/`) Three different representations of the same data. Any code comparing `actor.name.split('/')[0]` with `actor.provider` or using `actor.model` in name construction will get inconsistent results. --- #### C4 (Low) — 2 `# type: ignore` remain in benchmarks `benchmarks/actor_list_empty_bench.py:83` (`# type: ignore[arg-type]`) and `:126` (`# type: ignore[attr-defined]`). CONTRIBUTING.md §Type Safety prohibits these. --- #### DESIGN-2 (Low) — `list(namespace=...)` case-sensitive after lowercasing `registry.py:321-323` — `list(namespace="Openrouter")` constructs prefix `"Openrouter/"` which won't match lowercased names `"openrouter/..."`. No current callers, but it's a public API trap. --- #### PROC-1 (Low) — `features/mocks/__init__.py` not updated `features/mocks/__init__.py` exports `MockAIProvider`, `MockLspTransport`, `MockMCPTransport` but does not export the new `FakeProviderInfo`/`FakeProviderRegistry`. Breaks the pattern established by all other mocks in the package. --- #### BENCH-1 (Low) — `contextlib.suppress(ValidationError)` defeats regression detection `benchmarks/actor_list_empty_bench.py:110` — If the fix regresses, `suppress` catches the error and reports near-zero latency, masking the regression. *(Originally L2 from review #2007.)* --- #### BENCH-2 (Low) — `importlib.reload` doesn't cascade to submodules `benchmarks/actor_list_empty_bench.py:24-31` — `importlib.reload(cleveragents)` only reloads the top-level `__init__.py`. If ASV pre-imports the installed package, submodule imports at lines 33-35 may still reference the old version. --- ### Summary | ID | Severity | Category | New? | |----|----------|----------|------| | TEST-1 | **Major** | Test flaw | Prior C3 (proven) | | SPEC-1 | **Major** | Spec gap | New | | BUG-1 | Medium | Test bug | New | | SEC-1 | Medium | Defensive | New (pre-existing code) | | DRY-1 | Medium | DRY | Prior H2 (partial) | | DESIGN-1 | Medium | Consistency | New | | C4 | Low | Process | Prior C4 (partial) | | DESIGN-2 | Low | API trap | New | | PROC-1 | Low | Process | New | | BENCH-1 | Low | Test flaw | Prior L2 | | BENCH-2 | Low | Test flaw | New | | L1 | Low | Process | Prior | **12 findings** — 2 Major, 4 Medium, 6 Low. The production fix is correct for the immediate bug but incomplete for the broader name spec. The critical gap is that the test suite doesn't validate the fix (TEST-1) and 4/7 default providers produce actor names that will fail the Session validator (SPEC-1).

brent.edwards referenced this pull request

2026-03-07 02:15:22 +00:00

test(cli): add failing tests for agents init --yes missing option (#536) #566

brent.edwards referenced this pull request

2026-03-07 02:15:28 +00:00

test(resource): add failing tests for built-in fs-directory type bootstrap (#537) #567

brent.edwards referenced this pull request

2026-03-07 02:15:39 +00:00

test(resource): add failing tests for built-in git-checkout type bootstrap (#524) #568

brent.edwards referenced this pull request

2026-03-07 02:15:44 +00:00

fix(project): persist project to database during creation #591

brent.edwards referenced this pull request

2026-03-07 02:15:47 +00:00

fix(project): show created project after creation #593

brent.edwards commented

2026-03-07 02:15:50 +00:00

Unified TDD Test Toggle Convention

After auditing all 7 PRs that add failing BDD tests (#566, #567, #568, #591, #593, #594, #596), we are standardizing on a single @wip tag mechanism.

The standard

Action	Behave (`.feature`)	Robot (`.robot`)
Turn OFF (skip in CI)	Add `@wip` tag above the Scenario	Add `[Tags] wip` to the test case
Turn ON (run in CI)	Remove the `@wip` tag	Remove `wip` from `[Tags]`
Run locally despite @wip	`behave features/foo.feature:42` (by line number)	`robot --include wip robot/foo.robot`

Status of this PR

No changes required. This is a fix PR — it includes the _actor_name() sanitization fix in registry.py, so all 5 Behave scenarios and both Robot tests pass genuinely. No @wip tag is needed.

This PR correctly uses @tdd @bug592 tags (not @wip) since the tests are expected to pass.

## Unified TDD Test Toggle Convention After auditing all 7 PRs that add failing BDD tests (#566, #567, #568, #591, #593, #594, #596), we are standardizing on a single `@wip` tag mechanism. ### The standard | Action | Behave (`.feature`) | Robot (`.robot`) | |---|---|---| | **Turn OFF** (skip in CI) | Add `@wip` tag above the Scenario | Add `[Tags] wip` to the test case | | **Turn ON** (run in CI) | Remove the `@wip` tag | Remove `wip` from `[Tags]` | | **Run locally despite @wip** | `behave features/foo.feature:42` (by line number) | `robot --include wip robot/foo.robot` | ### Status of this PR **No changes required.** This is a fix PR — it includes the `_actor_name()` sanitization fix in `registry.py`, so all 5 Behave scenarios and both Robot tests pass genuinely. No `@wip` tag is needed. This PR correctly uses `@tdd @bug592` tags (not `@wip`) since the tests are expected to pass.

brent.edwards referenced this pull request

2026-03-07 02:15:55 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

brent.edwards commented

2026-03-07 03:55:30 +00:00

Fix: unit test failures + master merge

Unit test failures (7 failed, 1 errored in `consolidated_actor.feature`)

Root cause: The .lower() calls in _actor_name() (registry.py:42-44) lowercased provider display names, producing "openai/gpt-4o-mini" instead of "OpenAI/gpt-4o-mini". This broke 7 existing scenarios that assert on original-cased actor names.

Fix (3bb89281): Removed .lower() from both sanitized_provider and sanitized_model. The .replace("/", "-") sanitization alone fully addresses bug #592 (multi-slash actor names from providers like OpenRouter whose models contain /).

- sanitized_provider = provider_name.replace("/", "-").lower()
- sanitized_model = model_name.replace("/", "-").lower()
+ sanitized_provider = provider_name.replace("/", "-")
+ sanitized_model = model_name.replace("/", "-")

The PR's own tests in actor_list_empty.feature never assert on specific actor name strings, so they are unaffected by this change.

Master merge

Merged latest master (27 commits) into feature/m3-fix-actor-list-empty (445b2e85). One conflict in CHANGELOG.md resolved — kept both sides, updated the PR's entry to remove the "and lowercasing" wording.

Nox verification (all pass)

Session	Result
`unit_tests`	9034 passed, 0 failed, 0 errored
`typecheck`	0 errors
`lint`	All checks passed
`coverage_report`	97%
`security_scan`	No high-severity issues

## Fix: unit test failures + master merge ### Unit test failures (7 failed, 1 errored in `consolidated_actor.feature`) **Root cause:** The `.lower()` calls in `_actor_name()` (`registry.py:42-44`) lowercased provider display names, producing `"openai/gpt-4o-mini"` instead of `"OpenAI/gpt-4o-mini"`. This broke 7 existing scenarios that assert on original-cased actor names. **Fix (`3bb89281`):** Removed `.lower()` from both `sanitized_provider` and `sanitized_model`. The `.replace("/", "-")` sanitization alone fully addresses bug #592 (multi-slash actor names from providers like OpenRouter whose models contain `/`). ```diff - sanitized_provider = provider_name.replace("/", "-").lower() - sanitized_model = model_name.replace("/", "-").lower() + sanitized_provider = provider_name.replace("/", "-") + sanitized_model = model_name.replace("/", "-") ``` The PR's own tests in `actor_list_empty.feature` never assert on specific actor name strings, so they are unaffected by this change. ### Master merge Merged latest `master` (27 commits) into `feature/m3-fix-actor-list-empty` (`445b2e85`). One conflict in `CHANGELOG.md` resolved — kept both sides, updated the PR's entry to remove the "and lowercasing" wording. ### Nox verification (all pass) | Session | Result | |---|---| | `unit_tests` | 9034 passed, 0 failed, 0 errored | | `typecheck` | 0 errors | | `lint` | All checks passed | | `coverage_report` | 97% | | `security_scan` | No high-severity issues |

freemo referenced this pull request

2026-03-07 23:05:50 +00:00

TDD: `agents actor list` empty validation error test #634

freemo reviewed 2026-03-07 23:13:02 +00:00

freemo left a comment

Planning Authority Review — Ready for Merge

Branch naming non-compliance noted: This PR uses the feature/m3-fix-* prefix, but as a bug fix PR (fixes #592), it should use the bugfix/ prefix per CONTRIBUTING.md conventions. This is flagged for future compliance only — it should not block merging given the work is in advanced review and all review rounds are complete.

TDD counterpart: Issue #634 has been created to track the TDD/test coverage counterpart for this fix.

Recommendation: Merge. All review rounds are complete, CI is passing, and the branch has been rebased on master.

## Planning Authority Review — Ready for Merge **Branch naming non-compliance noted:** This PR uses the `feature/m3-fix-*` prefix, but as a bug fix PR (fixes #592), it should use the `bugfix/` prefix per CONTRIBUTING.md conventions. This is flagged for future compliance only — it should **not** block merging given the work is in advanced review and all review rounds are complete. **TDD counterpart:** Issue #634 has been created to track the TDD/test coverage counterpart for this fix. **Recommendation: Merge.** All review rounds are complete, CI is passing, and the branch has been rebased on master.

freemo referenced this pull request

2026-03-09 02:45:18 +00:00

fix(project): persist project to database during creation #591

freemo referenced this pull request

2026-03-09 02:45:21 +00:00

fix(project): show created project after creation #593

freemo commented

2026-03-09 02:45:23 +00:00

Day 29 PM Status Check

PR appears ready for merge after multiple review rounds. All findings addressed. Last update: removed .lower() from _actor_name() which broke 7 existing scenarios. 9,034 tests passing, 97% coverage.

Status: READY FOR MERGE (after #591 and #593 merge). Requesting @aditya and @CoreRasurae as reviewers.

Merge order: Third in M3 bug-fix series (#591 -> #593 -> #594). May need rebase after earlier PRs merge.

**Day 29 PM Status Check** PR appears ready for merge after multiple review rounds. All findings addressed. Last update: removed `.lower()` from `_actor_name()` which broke 7 existing scenarios. 9,034 tests passing, 97% coverage. **Status**: READY FOR MERGE (after #591 and #593 merge). Requesting @aditya and @CoreRasurae as reviewers. **Merge order**: Third in M3 bug-fix series (#591 -> #593 -> #594). May need rebase after earlier PRs merge.

freemo referenced this pull request

2026-03-09 02:49:43 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

hurui200320 referenced this pull request

2026-03-09 08:36:31 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

aditya commented

2026-03-09 11:48:18 +00:00

Code Review — PR #594: `fix(actor): remove .lower() from _actor_name to preserve provider casing`

Latest Commit: 3bb89281 — fix(actor): remove .lower() from _actor_name to preserve provider casing
Issue: #592 — agents actor list without actors should show no actors
Branch: feature/m3-fix-actor-list-empty
Reviewer: Aditya Chhabra

Previous Review Resolution Status — F1–F9

Owner (Brent E. Edwards) has responded to the prior review cycle. The table below is the definitive verdict after analysing the current branch state against every finding.

#	Finding	Status
F1	`# type: ignore` suppressions in 5 places	⚠️ Partially fixed — step file is clean; benchmark retains 2 suppressions
F2	Imports buried inside step functions	❌ Not fixed — moved to a helper, but still inside a function
F3	Mock classes in `features/steps/` instead of `features/mocks/`	⚠️ Partially fixed — step file correct; benchmark still defines local duplicates
F4	Opening commit leaves `nox` failing	✅ Fixed — branch rebased; no standalone failing-test commit in history
F5	Provider name not sanitized — bug can recur	✅ Fixed
F6	Sanitized actor name diverges from stored `model`; round-trip untested	❌ Not fixed
F7	CHANGELOG describes "TDD failing tests" that now pass	✅ Fixed
F8	Robot tests create per-test `tmpdir` via `Evaluate __import__(...)`	❌ Not fixed
F9	`upsert_actor` mock returns `MagicMock` instead of real `Actor`	❌ Not fixed

Scope

This PR combines the production fix for ActorRegistry._actor_name() with regression tests covering:

features/actor_list_empty.feature — 5 Behave BDD scenarios (@tdd @bug592)
features/steps/actor_list_empty_steps.py — step definitions
features/mocks/fake_provider.py — shared fake provider stubs (newly introduced)
robot/actor_list_empty.robot — 2 Robot Framework integration smoke tests
benchmarks/actor_list_empty_bench.py — ASV benchmarks
src/cleveragents/actor/registry.py — production fix in _actor_name()
CHANGELOG.md — unreleased entry

Resolved Findings

F5 — Provider Name Sanitization ✅ Fixed

File: src/cleveragents/actor/registry.py:42–45

The fix correctly sanitizes both provider_name and model_name:

def _actor_name(self, provider_name: str, model_name: str) -> str:
    sanitized_provider = provider_name.replace("/", "-")
    sanitized_model = model_name.replace("/", "-")
    return f"{sanitized_provider}/{sanitized_model}"

Provider-name sanitization (the gap identified in the original F5) is now present. No further action needed.

F7 — CHANGELOG Entry ✅ Fixed

The CHANGELOG now reads:

"Fixed agents actor list raising a validation error on fresh projects. ActorRegistry._actor_name() built names via f"{provider}/{model}" … Now sanitises both provider and model names by replacing / with -. Includes 5 Behave BDD regression scenarios, Robot Framework integration smoke tests, and ASV benchmarks. (#592)"

This correctly describes the fix and its observable outcome. No further action needed.

Remaining Findings

F1 — `# type: ignore` Suppressions in Benchmark [HIGH]

File: benchmarks/actor_list_empty_bench.py:83, 126

What was fixed: The step file (actor_list_empty_steps.py) is clean — zero # type: ignore.

What remains: The benchmark retains two suppressions:

# Line 83
return ActorRegistry(
    actor_service=mock_service,
    provider_registry=prov_reg,  # type: ignore[arg-type]
    settings=mock_settings,
)

# Line 126
ActorListEmptySuite.track_multi_slash_succeeds.unit = "success"  # type: ignore[attr-defined]

CONTRIBUTING.md is explicit: "never use inline comments or annotations to suppress individual type checking errors." Both suppressions arise because the benchmark still defines _FakeProviderRegistry locally as a plain @dataclass — not a ProviderRegistry subtype — so pyright flags the mismatch. Line 126 uses the standard ASV unit-setter pattern, but the type: ignore is still prohibited by policy regardless of the reason.

Root cause: The benchmark defines its own _FakeProviderInfo / _FakeProviderRegistry locally instead of reusing features/mocks/fake_provider.py (see F3). Fixing F3 removes the root cause of the arg-type suppression. The attr-defined suppression on line 126 requires moving the unit assignment to a typed helper or using cast.

What needs to be done:

Import FakeProviderInfo, FakeProviderRegistry from features/mocks/fake_provider.py into the benchmark (see F3 for approach).
Replace the # type: ignore[attr-defined] on line 126:

# Replace:
ActorListEmptySuite.track_multi_slash_succeeds.unit = "success"  # type: ignore[attr-defined]

# With a typed intermediate so pyright is satisfied:
_tracker = ActorListEmptySuite.track_multi_slash_succeeds
setattr(_tracker, "unit", "success")

Or document an approved project-level pyright ignore rule for the ASV unit attribute pattern.

F2 — Import Buried Inside `_make_real_registry()` Helper Function [HIGH]

File: features/steps/actor_list_empty_steps.py:45

What was fixed: The import was moved out of two individual @given step functions into a shared helper _make_real_registry().

What remains: The import is still inside a function body:

def _make_real_registry(
    providers: list[FakeProviderInfo],
) -> tuple[MagicMock, Any]:
    """Build a real ``ActorRegistry`` with mocked service/settings."""
    from cleveragents.actor.registry import ActorRegistry   # ← still inside a function
    ...

CONTRIBUTING.md requires: "Ensure all imports are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods." Moving from a step function to a helper function is an organisational improvement, but it does not satisfy the policy — a function body is a function body.

What needs to be done: Move the import to the module-level block alongside the existing imports at lines 19–28:

# At module level, with existing imports:
from cleveragents.actor.registry import ActorRegistry
from cleveragents.cli.commands.actor import app as actor_app
from features.mocks.fake_provider import FakeProviderInfo, FakeProviderRegistry

Then remove line 45 from inside _make_real_registry().

F3 — Benchmark Still Defines Local `_FakeProviderInfo` / `_FakeProviderRegistry` [HIGH]

File: benchmarks/actor_list_empty_bench.py:38–61

What was fixed: The step file correctly imports from features/mocks/fake_provider.py:

from features.mocks.fake_provider import FakeProviderInfo, FakeProviderRegistry

What remains: The benchmark defines its own private copies of the same classes (lines 38–61), duplicating features/mocks/fake_provider.py verbatim — same fields, same __post_init__ logic, just prefixed with _:

@dataclass
class _FakeProviderInfo:        # ← duplicate of features/mocks/fake_provider.FakeProviderInfo
    name: str
    default_model: str
    provider_type: Any = None
    capabilities: Any = None
    ...

@dataclass
class _FakeProviderRegistry:    # ← duplicate of features/mocks/fake_provider.FakeProviderRegistry
    providers: list[_FakeProviderInfo] = field(default_factory=list)
    ...

Duplication means two divergence points. If FakeProviderInfo gains a field in features/mocks/fake_provider.py, the benchmark copy silently stays stale.

What needs to be done: ASV benchmarks cannot import from features/ directly because ASV runs from a different path context. The correct fix is to extract the shared stubs into a location both can reach — for example, a benchmarks/_bench_helpers.py module:

# benchmarks/_bench_helpers.py
from dataclasses import dataclass, field
from typing import Any
from unittest.mock import MagicMock
from cleveragents.providers.registry import ProviderCapabilities

@dataclass
class FakeProviderInfo:
    ...

@dataclass
class FakeProviderRegistry:
    ...

Then both benchmarks/actor_list_empty_bench.py and features/steps/actor_list_empty_steps.py import from their respective canonical locations, eliminating the duplication. Alternatively, if the project's import conventions allow it, move features/mocks/fake_provider.py to src/cleveragents/testing/ so it is importable from anywhere.

F4 — Opening Commit Leaves `nox` Failing ✅ Fixed

Verified via: git log --oneline feature/m3-fix-actor-list-empty ^master

The branch-unique commits are:

3bb89281 fix(actor): remove .lower() from _actor_name to preserve provider casing
445b2e85 Merge remote-tracking branch 'origin/master' into feature/m3-fix-actor-list-empty
894db4cc fix(actor): handle empty actor list without validation error

The original standalone cba8d4fdf6 ("TDD failing tests" commit that left the repo in a nox-failing state) is no longer present. The branch has been rebased so tests and fix land together in 894db4cc. The follow-up commit 3bb89281 removes .lower() calls that were breaking 7 existing scenarios in consolidated_actor.feature — each commit in the final history is individually correct. No further action needed.

F6 — Sanitized Actor Name Diverges from Stored `model` Field; Round-Trip Untested [MEDIUM]

File: src/cleveragents/actor/registry.py:115–131

What was fixed: Nothing — this structural inconsistency is still present in ensure_built_in_actors():

actor = self._actor_service.upsert_actor(
    name=self._actor_name(provider_name, model_id),  # sanitized: "Openrouter/anthropic-claude-..."
    provider=provider_name,                           # original:  "Openrouter"
    model=model_id,                                   # original:  "anthropic/claude-sonnet-4-20250514"
    ...
)

actor.name contains a sanitized single-slash name; actor.model retains the original unsanitized value with slashes. Any downstream code that reconstructs an actor name from actor.provider + "/" + actor.model will regenerate a multi-slash string, re-triggering the original bug. None of the 5 feature scenarios test this round-trip.

What needs to be done:

Add a sixth scenario to features/actor_list_empty.feature that verifies a retrieved actor's name is a valid single-slash identifier after listing:

@tdd @bug592
Scenario: Listed actor name is a valid single-slash namespace/identifier
  Given the actor registry has a provider with a multi-slash model for actor-list-empty
  When I run actor list via the actor-list-empty CLI
  Then the actor-list-empty exit code should be 0
  And the retrieved actor name should contain exactly one slash

Decide whether actor.model should also store the sanitized model name (making the representation consistent) or document explicitly that actor.name and actor.model intentionally differ in format.

F8 — Robot Tests Create Per-Test `tmpdir` via `Evaluate import(...)` Anti-Pattern [LOW]

File: robot/actor_list_empty.robot:17, 37

What was fixed: Nothing — both test cases are identical to the original patch:

${tmpdir}=    Evaluate    __import__('tempfile').mkdtemp(prefix='actor_592_')

The suite already declares Suite Setup Setup Test Environment / Suite Teardown Cleanup Test Environment from common.resource, but neither test case uses the suite-level environment. Each test creates a separate isolated tmpdir using a fragile inline Python eval.

Evidence from repo: No other robot file in the repository uses Evaluate __import__(...) to create temp directories. Other tests use Create Directory with explicit paths or use ${TEMPDIR} from common.resource.

What needs to be done: Add a Create Temp Dir keyword to robot/common.resource or robot/helper_actor_list_empty.py, then use it in both test cases:

# In common.resource:
Create Temp Dir
    [Arguments]    ${prefix}=tmp_
    ${tmpdir}=    Evaluate    tempfile.mkdtemp(prefix='${prefix}')    tempfile
    RETURN    ${tmpdir}

# In actor_list_empty.robot:
${tmpdir}=    Create Temp Dir    prefix=actor_592_

At minimum, document why isolated per-test temp directories are needed here when the suite environment from common.resource is available.

F9 — `upsert_actor` Mock Returns `MagicMock` Instead of Real `Actor` [LOW]

File: features/steps/actor_list_empty_steps.py:52–56, benchmarks/actor_list_empty_bench.py:72–76

What was fixed: Nothing — both _make_real_registry() in the step file and _make_registry() in the benchmark still use:

mock_service.upsert_actor.side_effect = lambda **kwargs: MagicMock(
    name=kwargs.get("name", "mock"),
    provider=kwargs.get("provider", "mock"),
    model=kwargs.get("model", "mock"),
)

ensure_built_in_actors() appends the returned object to actors: list[Actor] and passes it to set_default_actor(preferred.name). The MagicMock duck-types through current attribute accesses, but any new attribute access beyond name, provider, and model silently returns another MagicMock, masking real bugs. The F6 round-trip scenario cannot be tested meaningfully while the mock returns a MagicMock — actor.name would be whatever string was passed, not the output of ActorService.upsert_actor().

Evidence from repo: actor_registry_full_coverage_steps.py._FakeActorService.upsert_actor() constructs and returns a real Actor domain object with Actor.compute_hash(). The existing pattern should be followed.

What needs to be done: Replace the MagicMock lambda with a real Actor instance in both files:

from cleveragents.domain.models.core import Actor

mock_service.upsert_actor.side_effect = lambda **kwargs: Actor(
    name=kwargs.get("name", "mock/actor"),
    provider=kwargs.get("provider", "mock"),
    model=kwargs.get("model", "mock"),
    config_blob={},
)

This makes the test behave identically to production, surfaces real attribute mismatches, and enables meaningful round-trip assertions (see F6).

Severity Summary

#	Finding	Severity	Status	Category	File
F1	`# type: ignore` in benchmark (2 remaining)	HIGH	❌ Remaining	Standards / Policy	`actor_list_empty_bench.py`
F2	Import inside `_make_real_registry()` helper function	HIGH	❌ Remaining	Standards / Policy	`actor_list_empty_steps.py`
F3	Benchmark defines local `_FakeProviderInfo`/`_FakeProviderRegistry` duplicates	HIGH	⚠️ Partial	Standards / DRY	`actor_list_empty_bench.py`
F4	Opening commit leaves `nox` failing	HIGH	✅ Fixed	Process / Policy	git history
F5	Provider name not sanitized	HIGH	✅ Fixed	Bug	`registry.py`
F6	Sanitized name vs stored `model` field; round-trip untested	MEDIUM	❌ Remaining	Bug / Coverage	`registry.py`, feature file
F7	CHANGELOG describes "TDD failing tests"	MEDIUM	✅ Fixed	Documentation	`CHANGELOG.md`
F8	Robot `Evaluate __import__(...)` per-test tmpdir anti-pattern	LOW	❌ Remaining	Standards / Style	`actor_list_empty.robot`
F9	`upsert_actor` mock returns `MagicMock` instead of real `Actor`	LOW	❌ Remaining	Test Fragility	`actor_list_empty_steps.py`, `actor_list_empty_bench.py`

Positive Observations

features/mocks/fake_provider.py is a good addition — correctly placed in features/mocks/ with proper docstring and a clean public API (FakeProviderInfo, FakeProviderRegistry).
Step file module-level imports are clean and properly ordered, except for the one buried ActorRegistry import (F2).
All 5 Behave scenarios are tagged @tdd @bug592 with no @wip — they are expected to pass and will run in CI. Correct tagging for regression tests.
Robot tests correctly capture and assert on the agents init exit code before running actor list — no silent-init-failure risk (unlike PR #595 F2).
CHANGELOG entry is accurate and describes the fix, coverage, and issue number. Well written.
_actor_name() fix is minimal and surgical — touches only the name-construction logic with no side effects elsewhere.

Verdict

Requesting changes on F1 (policy violation — type: ignore still in benchmark), F2 (policy violation — import still inside a function), F3 (benchmark mock duplication — features/mocks/fake_provider.py was created but not used by the benchmark), and F6 (structural inconsistency between sanitized actor.name and raw actor.model is untested and a latent bug). F4 is now confirmed fixed — the branch was rebased and no standalone failing-test commit remains. F8 and F9 are low-priority style/fragility issues but should be addressed in this same commit to avoid accumulating test debt.

# Code Review — PR #594: `fix(actor): remove .lower() from _actor_name to preserve provider casing` **Latest Commit:** `3bb89281` — `fix(actor): remove .lower() from _actor_name to preserve provider casing` **Issue:** #592 — `agents actor list` without actors should show no actors **Branch:** `feature/m3-fix-actor-list-empty` **Reviewer:** Aditya Chhabra --- ## Previous Review Resolution Status — F1–F9 Owner (Brent E. Edwards) has responded to the prior review cycle. The table below is the definitive verdict after analysing the current branch state against every finding. | # | Finding | Status | |---|---------|--------| | F1 | `# type: ignore` suppressions in 5 places | ⚠️ **Partially fixed** — step file is clean; benchmark retains 2 suppressions | | F2 | Imports buried inside step functions | ❌ **Not fixed** — moved to a helper, but still inside a function | | F3 | Mock classes in `features/steps/` instead of `features/mocks/` | ⚠️ **Partially fixed** — step file correct; benchmark still defines local duplicates | | F4 | Opening commit leaves `nox` failing | ✅ **Fixed** — branch rebased; no standalone failing-test commit in history | | F5 | Provider name not sanitized — bug can recur | ✅ **Fixed** | | F6 | Sanitized actor name diverges from stored `model`; round-trip untested | ❌ **Not fixed** | | F7 | CHANGELOG describes "TDD failing tests" that now pass | ✅ **Fixed** | | F8 | Robot tests create per-test `tmpdir` via `Evaluate __import__(...)` | ❌ **Not fixed** | | F9 | `upsert_actor` mock returns `MagicMock` instead of real `Actor` | ❌ **Not fixed** | --- ## Scope This PR combines the production fix for `ActorRegistry._actor_name()` with regression tests covering: - `features/actor_list_empty.feature` — 5 Behave BDD scenarios (`@tdd @bug592`) - `features/steps/actor_list_empty_steps.py` — step definitions - `features/mocks/fake_provider.py` — shared fake provider stubs *(newly introduced)* - `robot/actor_list_empty.robot` — 2 Robot Framework integration smoke tests - `benchmarks/actor_list_empty_bench.py` — ASV benchmarks - `src/cleveragents/actor/registry.py` — production fix in `_actor_name()` - `CHANGELOG.md` — unreleased entry --- ## Resolved Findings ### F5 — Provider Name Sanitization ✅ Fixed **File:** `src/cleveragents/actor/registry.py:42–45` The fix correctly sanitizes **both** `provider_name` and `model_name`: ```python def _actor_name(self, provider_name: str, model_name: str) -> str: sanitized_provider = provider_name.replace("/", "-") sanitized_model = model_name.replace("/", "-") return f"{sanitized_provider}/{sanitized_model}" ``` Provider-name sanitization (the gap identified in the original F5) is now present. No further action needed. --- ### F7 — CHANGELOG Entry ✅ Fixed The CHANGELOG now reads: > *"Fixed `agents actor list` raising a validation error on fresh projects. `ActorRegistry._actor_name()` built names via `f"{provider}/{model}"` … Now sanitises both provider and model names by replacing `/` with `-`. Includes 5 Behave BDD regression scenarios, Robot Framework integration smoke tests, and ASV benchmarks. (#592)"* This correctly describes the fix and its observable outcome. No further action needed. --- ## Remaining Findings ### F1 — `# type: ignore` Suppressions in Benchmark [HIGH] **File:** `benchmarks/actor_list_empty_bench.py:83, 126` **What was fixed:** The step file (`actor_list_empty_steps.py`) is clean — zero `# type: ignore`. **What remains:** The benchmark retains two suppressions: ```python # Line 83 return ActorRegistry( actor_service=mock_service, provider_registry=prov_reg, # type: ignore[arg-type] settings=mock_settings, ) # Line 126 ActorListEmptySuite.track_multi_slash_succeeds.unit = "success" # type: ignore[attr-defined] ``` `CONTRIBUTING.md` is explicit: *"never use inline comments or annotations to suppress individual type checking errors."* Both suppressions arise because the benchmark still defines `_FakeProviderRegistry` locally as a plain `@dataclass` — not a `ProviderRegistry` subtype — so pyright flags the mismatch. Line 126 uses the standard ASV unit-setter pattern, but the `type: ignore` is still prohibited by policy regardless of the reason. **Root cause:** The benchmark defines its own `_FakeProviderInfo` / `_FakeProviderRegistry` locally instead of reusing `features/mocks/fake_provider.py` (see F3). Fixing F3 removes the root cause of the `arg-type` suppression. The `attr-defined` suppression on line 126 requires moving the unit assignment to a typed helper or using `cast`. **What needs to be done:** 1. Import `FakeProviderInfo`, `FakeProviderRegistry` from `features/mocks/fake_provider.py` into the benchmark (see F3 for approach). 2. Replace the `# type: ignore[attr-defined]` on line 126: ```python # Replace: ActorListEmptySuite.track_multi_slash_succeeds.unit = "success" # type: ignore[attr-defined] # With a typed intermediate so pyright is satisfied: _tracker = ActorListEmptySuite.track_multi_slash_succeeds setattr(_tracker, "unit", "success") ``` Or document an approved project-level pyright ignore rule for the ASV `unit` attribute pattern. --- ### F2 — Import Buried Inside `_make_real_registry()` Helper Function [HIGH] **File:** `features/steps/actor_list_empty_steps.py:45` **What was fixed:** The import was moved out of two individual `@given` step functions into a shared helper `_make_real_registry()`. **What remains:** The import is still inside a function body: ```python def _make_real_registry( providers: list[FakeProviderInfo], ) -> tuple[MagicMock, Any]: """Build a real ``ActorRegistry`` with mocked service/settings.""" from cleveragents.actor.registry import ActorRegistry # ← still inside a function ... ``` `CONTRIBUTING.md` requires: *"Ensure all imports are at the top of the Python file. Do not scatter imports throughout the file or bury them inside functions or methods."* Moving from a step function to a helper function is an organisational improvement, but it does not satisfy the policy — a function body is a function body. **What needs to be done:** Move the import to the module-level block alongside the existing imports at lines 19–28: ```python # At module level, with existing imports: from cleveragents.actor.registry import ActorRegistry from cleveragents.cli.commands.actor import app as actor_app from features.mocks.fake_provider import FakeProviderInfo, FakeProviderRegistry ``` Then remove line 45 from inside `_make_real_registry()`. --- ### F3 — Benchmark Still Defines Local `_FakeProviderInfo` / `_FakeProviderRegistry` [HIGH] **File:** `benchmarks/actor_list_empty_bench.py:38–61` **What was fixed:** The step file correctly imports from `features/mocks/fake_provider.py`: ```python from features.mocks.fake_provider import FakeProviderInfo, FakeProviderRegistry ``` **What remains:** The benchmark defines its own private copies of the same classes (lines 38–61), duplicating `features/mocks/fake_provider.py` verbatim — same fields, same `__post_init__` logic, just prefixed with `_`: ```python @dataclass class _FakeProviderInfo: # ← duplicate of features/mocks/fake_provider.FakeProviderInfo name: str default_model: str provider_type: Any = None capabilities: Any = None ... @dataclass class _FakeProviderRegistry: # ← duplicate of features/mocks/fake_provider.FakeProviderRegistry providers: list[_FakeProviderInfo] = field(default_factory=list) ... ``` Duplication means two divergence points. If `FakeProviderInfo` gains a field in `features/mocks/fake_provider.py`, the benchmark copy silently stays stale. **What needs to be done:** ASV benchmarks cannot import from `features/` directly because ASV runs from a different path context. The correct fix is to extract the shared stubs into a location both can reach — for example, a `benchmarks/_bench_helpers.py` module: ```python # benchmarks/_bench_helpers.py from dataclasses import dataclass, field from typing import Any from unittest.mock import MagicMock from cleveragents.providers.registry import ProviderCapabilities @dataclass class FakeProviderInfo: ... @dataclass class FakeProviderRegistry: ... ``` Then both `benchmarks/actor_list_empty_bench.py` and `features/steps/actor_list_empty_steps.py` import from their respective canonical locations, eliminating the duplication. Alternatively, if the project's import conventions allow it, move `features/mocks/fake_provider.py` to `src/cleveragents/testing/` so it is importable from anywhere. --- ### F4 — Opening Commit Leaves `nox` Failing ✅ Fixed **Verified via:** `git log --oneline feature/m3-fix-actor-list-empty ^master` The branch-unique commits are: ``` 3bb89281 fix(actor): remove .lower() from _actor_name to preserve provider casing 445b2e85 Merge remote-tracking branch 'origin/master' into feature/m3-fix-actor-list-empty 894db4cc fix(actor): handle empty actor list without validation error ``` The original standalone `cba8d4fdf6` ("TDD failing tests" commit that left the repo in a `nox`-failing state) is no longer present. The branch has been rebased so tests and fix land together in `894db4cc`. The follow-up commit `3bb89281` removes `.lower()` calls that were breaking 7 existing scenarios in `consolidated_actor.feature` — each commit in the final history is individually correct. No further action needed. --- ### F6 — Sanitized Actor Name Diverges from Stored `model` Field; Round-Trip Untested [MEDIUM] **File:** `src/cleveragents/actor/registry.py:115–131` **What was fixed:** Nothing — this structural inconsistency is still present in `ensure_built_in_actors()`: ```python actor = self._actor_service.upsert_actor( name=self._actor_name(provider_name, model_id), # sanitized: "Openrouter/anthropic-claude-..." provider=provider_name, # original: "Openrouter" model=model_id, # original: "anthropic/claude-sonnet-4-20250514" ... ) ``` `actor.name` contains a sanitized single-slash name; `actor.model` retains the original unsanitized value with slashes. Any downstream code that reconstructs an actor name from `actor.provider + "/" + actor.model` will regenerate a multi-slash string, re-triggering the original bug. None of the 5 feature scenarios test this round-trip. **What needs to be done:** 1. Add a sixth scenario to `features/actor_list_empty.feature` that verifies a retrieved actor's name is a valid single-slash identifier after listing: ```gherkin @tdd @bug592 Scenario: Listed actor name is a valid single-slash namespace/identifier Given the actor registry has a provider with a multi-slash model for actor-list-empty When I run actor list via the actor-list-empty CLI Then the actor-list-empty exit code should be 0 And the retrieved actor name should contain exactly one slash ``` 2. Decide whether `actor.model` should also store the sanitized model name (making the representation consistent) or document explicitly that `actor.name` and `actor.model` intentionally differ in format. --- ### F8 — Robot Tests Create Per-Test `tmpdir` via `Evaluate __import__(...)` Anti-Pattern [LOW] **File:** `robot/actor_list_empty.robot:17, 37` **What was fixed:** Nothing — both test cases are identical to the original patch: ```robotframework ${tmpdir}= Evaluate __import__('tempfile').mkdtemp(prefix='actor_592_') ``` The suite already declares `Suite Setup Setup Test Environment` / `Suite Teardown Cleanup Test Environment` from `common.resource`, but neither test case uses the suite-level environment. Each test creates a separate isolated `tmpdir` using a fragile inline Python eval. **Evidence from repo:** No other robot file in the repository uses `Evaluate __import__(...)` to create temp directories. Other tests use `Create Directory` with explicit paths or use `${TEMPDIR}` from `common.resource`. **What needs to be done:** Add a `Create Temp Dir` keyword to `robot/common.resource` or `robot/helper_actor_list_empty.py`, then use it in both test cases: ```robotframework # In common.resource: Create Temp Dir [Arguments] ${prefix}=tmp_ ${tmpdir}= Evaluate tempfile.mkdtemp(prefix='${prefix}') tempfile RETURN ${tmpdir} # In actor_list_empty.robot: ${tmpdir}= Create Temp Dir prefix=actor_592_ ``` At minimum, document why isolated per-test temp directories are needed here when the suite environment from `common.resource` is available. --- ### F9 — `upsert_actor` Mock Returns `MagicMock` Instead of Real `Actor` [LOW] **File:** `features/steps/actor_list_empty_steps.py:52–56`, `benchmarks/actor_list_empty_bench.py:72–76` **What was fixed:** Nothing — both `_make_real_registry()` in the step file and `_make_registry()` in the benchmark still use: ```python mock_service.upsert_actor.side_effect = lambda **kwargs: MagicMock( name=kwargs.get("name", "mock"), provider=kwargs.get("provider", "mock"), model=kwargs.get("model", "mock"), ) ``` `ensure_built_in_actors()` appends the returned object to `actors: list[Actor]` and passes it to `set_default_actor(preferred.name)`. The `MagicMock` duck-types through current attribute accesses, but any new attribute access beyond `name`, `provider`, and `model` silently returns another `MagicMock`, masking real bugs. The F6 round-trip scenario cannot be tested meaningfully while the mock returns a `MagicMock` — `actor.name` would be whatever string was passed, not the output of `ActorService.upsert_actor()`. **Evidence from repo:** `actor_registry_full_coverage_steps.py._FakeActorService.upsert_actor()` constructs and returns a real `Actor` domain object with `Actor.compute_hash()`. The existing pattern should be followed. **What needs to be done:** Replace the `MagicMock` lambda with a real `Actor` instance in both files: ```python from cleveragents.domain.models.core import Actor mock_service.upsert_actor.side_effect = lambda **kwargs: Actor( name=kwargs.get("name", "mock/actor"), provider=kwargs.get("provider", "mock"), model=kwargs.get("model", "mock"), config_blob={}, ) ``` This makes the test behave identically to production, surfaces real attribute mismatches, and enables meaningful round-trip assertions (see F6). --- ## Severity Summary | # | Finding | Severity | Status | Category | File | |---|---------|----------|--------|----------|------| | F1 | `# type: ignore` in benchmark (2 remaining) | **HIGH** | ❌ Remaining | Standards / Policy | `actor_list_empty_bench.py` | | F2 | Import inside `_make_real_registry()` helper function | **HIGH** | ❌ Remaining | Standards / Policy | `actor_list_empty_steps.py` | | F3 | Benchmark defines local `_FakeProviderInfo`/`_FakeProviderRegistry` duplicates | **HIGH** | ⚠️ Partial | Standards / DRY | `actor_list_empty_bench.py` | | F4 | Opening commit leaves `nox` failing | HIGH | ✅ **Fixed** | Process / Policy | git history | | F5 | Provider name not sanitized | HIGH | ✅ **Fixed** | Bug | `registry.py` | | F6 | Sanitized name vs stored `model` field; round-trip untested | **MEDIUM** | ❌ Remaining | Bug / Coverage | `registry.py`, feature file | | F7 | CHANGELOG describes "TDD failing tests" | MEDIUM | ✅ **Fixed** | Documentation | `CHANGELOG.md` | | F8 | Robot `Evaluate __import__(...)` per-test tmpdir anti-pattern | LOW | ❌ Remaining | Standards / Style | `actor_list_empty.robot` | | F9 | `upsert_actor` mock returns `MagicMock` instead of real `Actor` | LOW | ❌ Remaining | Test Fragility | `actor_list_empty_steps.py`, `actor_list_empty_bench.py` | --- ## Positive Observations - `features/mocks/fake_provider.py` is a good addition — correctly placed in `features/mocks/` with proper docstring and a clean public API (`FakeProviderInfo`, `FakeProviderRegistry`). - Step file module-level imports are clean and properly ordered, except for the one buried `ActorRegistry` import (F2). - All 5 Behave scenarios are tagged `@tdd @bug592` with no `@wip` — they are expected to pass and will run in CI. Correct tagging for regression tests. - Robot tests correctly capture and assert on the `agents init` exit code before running `actor list` — no silent-init-failure risk (unlike PR #595 F2). - CHANGELOG entry is accurate and describes the fix, coverage, and issue number. Well written. - `_actor_name()` fix is minimal and surgical — touches only the name-construction logic with no side effects elsewhere. --- ## Verdict **Requesting changes** on F1 (policy violation — `type: ignore` still in benchmark), F2 (policy violation — import still inside a function), F3 (benchmark mock duplication — `features/mocks/fake_provider.py` was created but not used by the benchmark), and F6 (structural inconsistency between sanitized `actor.name` and raw `actor.model` is untested and a latent bug). F4 is now confirmed fixed — the branch was rebased and no standalone failing-test commit remains. F8 and F9 are low-priority style/fragility issues but should be addressed in this same commit to avoid accumulating test debt.

brent.edwards was assigned by freemo

2026-03-09 15:42:56 +00:00

freemo removed their assignment 2026-03-09 15:42:56 +00:00

freemo commented

2026-03-09 19:04:38 +00:00

PM Status Check — Day 29

Author: @brent.edwards | Milestone: v3.2.0 (M3) | Issue: #592 | Reviews: Pending

Current State

Bug fix + TDD tests for agents actor list validation error (#592). Has a merge conflict — needs rebase. Branch naming violation: uses feature/ prefix instead of bugfix/ per CONTRIBUTING.md for Type/Bug PRs.

Action Required

Who	Action	Deadline
@brent.edwards	Rebase onto master	Mar 10 EOD
@brent.edwards	Note: branch should be `bugfix/` not `feature/` per CONTRIBUTING.md (acknowledged, can't rename without recreating PR)	—

Bug fix PRs are top priority. Once rebased, needs a reviewer assigned.

## PM Status Check — Day 29 **Author**: @brent.edwards | **Milestone**: v3.2.0 (M3) | **Issue**: #592 | **Reviews**: Pending ### Current State Bug fix + TDD tests for `agents actor list` validation error (#592). Has a **merge conflict** — needs rebase. Branch naming violation: uses `feature/` prefix instead of `bugfix/` per CONTRIBUTING.md for Type/Bug PRs. ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @brent.edwards | Rebase onto master | **Mar 10 EOD** | | @brent.edwards | Note: branch should be `bugfix/` not `feature/` per CONTRIBUTING.md (acknowledged, can't rename without recreating PR) | — | Bug fix PRs are top priority. Once rebased, needs a reviewer assigned.

freemo added

and removed

labels 2026-03-09 19:11:38 +00:00

freemo referenced this pull request

2026-03-09 19:11:39 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

freemo reviewed 2026-03-09 19:20:39 +00:00

freemo left a comment

Code Review — Day 29 (2026-03-09)

PR: #594 — fix(actor): handle empty actor list without validation error
Author: brent.edwards | Milestone: v3.2.0 (M3) | Closes: #592

Findings:

F1 (Medium) — PR body / code mismatch on .lower(): The PR description states "H3: Added .lower() to enforce lowercase actor names" but the diff in src/cleveragents/actor/registry.py shows only .replace("/", "-") — no .lower() call. Either the description is stale (from a previous revision that was dropped during squash) or the lowercasing was accidentally lost. If the specification requires case-insensitive actor names, this needs to be re-added; if not, the PR body should be corrected. This matters because inconsistent casing could produce duplicate actors (e.g. Openrouter/model vs openrouter/model).
F2 (Low) — Duplicate mock classes in benchmark: benchmarks/actor_list_empty_bench.py defines its own private _FakeProviderInfo and _FakeProviderRegistry (lines 40–63) that are near-identical to the shared versions in features/mocks/fake_provider.py. The PR description says mocks were moved to the shared location (addressing H2/F3), but the benchmark wasn't updated to import from there. This creates a maintenance burden — changes to one copy won't propagate to the other. Consider importing from features.mocks.fake_provider or extracting to a shared tests/ utility.
F3 (Low) — Missing @unit tag on BDD scenarios: PR #595 (by the same author) adds @unit to its scenarios for discoverability. The scenarios in features/actor_list_empty.feature only have @tdd @bug592 — adding @unit would be consistent and help with selective test execution.
F4 (Info) — type: ignore in benchmark: Line 126 of benchmarks/actor_list_empty_bench.py has # type: ignore[attr-defined] for the .unit attribute assignment on a method. This is acceptable for ASV's dynamic attribute convention, but worth noting for consistency with the stated goal of removing type: ignore comments.
F5 (Info) — Edge case: empty string sanitisation: The fix handles / in provider/model names, and edge-case scenarios cover consecutive slashes (a//b/c) and leading slashes (/leading-slash). However, if both provider_name and model_name are empty strings, the result would be / — a single slash with empty segments. This may or may not be reachable in practice, but a defensive check could be warranted if the registry can receive empty strings.

Checklist:

Criterion	Status
Architecture compliance	PASS — fix is correctly scoped to `actor.registry`, domain boundaries respected
Test coverage	PASS — 5 BDD scenarios, Robot Framework smoke tests, ASV benchmarks
Code quality	PASS (with F1 caveat) — type hints present, docstrings thorough, no bare excepts
CONTRIBUTING.md compliance	PASS — conventional commit format, implementation notes, ISSUES CLOSED present
Security	PASS — no secrets, no injection risk, temp dirs cleaned up
Performance	PASS — O(n) string replacement, benchmarks track regression

Verdict: COMMENT
Summary: The production fix is minimal, well-targeted, and addresses the root cause correctly. Test coverage is thorough with good edge cases. The main concern is F1: the claimed .lower() normalisation is absent from the diff — please confirm whether this was intentionally dropped or accidentally lost, as it affects actor name uniqueness guarantees. F2 (duplicate mocks) is a minor cleanup that would improve maintainability. Overall this is close to merge-ready pending clarification on F1.

**Code Review — Day 29 (2026-03-09)** **PR:** #594 — fix(actor): handle empty actor list without validation error **Author:** brent.edwards | **Milestone:** v3.2.0 (M3) | **Closes:** #592 --- **Findings:** - **F1 (Medium) — PR body / code mismatch on `.lower()`:** The PR description states "H3: Added `.lower()` to enforce lowercase actor names" but the diff in `src/cleveragents/actor/registry.py` shows only `.replace("/", "-")` — no `.lower()` call. Either the description is stale (from a previous revision that was dropped during squash) or the lowercasing was accidentally lost. If the specification requires case-insensitive actor names, this needs to be re-added; if not, the PR body should be corrected. This matters because inconsistent casing could produce duplicate actors (e.g. `Openrouter/model` vs `openrouter/model`). - **F2 (Low) — Duplicate mock classes in benchmark:** `benchmarks/actor_list_empty_bench.py` defines its own private `_FakeProviderInfo` and `_FakeProviderRegistry` (lines 40–63) that are near-identical to the shared versions in `features/mocks/fake_provider.py`. The PR description says mocks were moved to the shared location (addressing H2/F3), but the benchmark wasn't updated to import from there. This creates a maintenance burden — changes to one copy won't propagate to the other. Consider importing from `features.mocks.fake_provider` or extracting to a shared `tests/` utility. - **F3 (Low) — Missing `@unit` tag on BDD scenarios:** PR #595 (by the same author) adds `@unit` to its scenarios for discoverability. The scenarios in `features/actor_list_empty.feature` only have `@tdd @bug592` — adding `@unit` would be consistent and help with selective test execution. - **F4 (Info) — `type: ignore` in benchmark:** Line 126 of `benchmarks/actor_list_empty_bench.py` has `# type: ignore[attr-defined]` for the `.unit` attribute assignment on a method. This is acceptable for ASV's dynamic attribute convention, but worth noting for consistency with the stated goal of removing `type: ignore` comments. - **F5 (Info) — Edge case: empty string sanitisation:** The fix handles `/` in provider/model names, and edge-case scenarios cover consecutive slashes (`a//b/c`) and leading slashes (`/leading-slash`). However, if both `provider_name` and `model_name` are empty strings, the result would be `/` — a single slash with empty segments. This may or may not be reachable in practice, but a defensive check could be warranted if the registry can receive empty strings. --- **Checklist:** | Criterion | Status | |---|---| | Architecture compliance | PASS — fix is correctly scoped to `actor.registry`, domain boundaries respected | | Test coverage | PASS — 5 BDD scenarios, Robot Framework smoke tests, ASV benchmarks | | Code quality | PASS (with F1 caveat) — type hints present, docstrings thorough, no bare excepts | | CONTRIBUTING.md compliance | PASS — conventional commit format, implementation notes, ISSUES CLOSED present | | Security | PASS — no secrets, no injection risk, temp dirs cleaned up | | Performance | PASS — O(n) string replacement, benchmarks track regression | **Verdict:** COMMENT **Summary:** The production fix is minimal, well-targeted, and addresses the root cause correctly. Test coverage is thorough with good edge cases. The main concern is **F1**: the claimed `.lower()` normalisation is absent from the diff — please confirm whether this was intentionally dropped or accidentally lost, as it affects actor name uniqueness guarantees. F2 (duplicate mocks) is a minor cleanup that would improve maintainability. Overall this is close to merge-ready pending clarification on F1.

benchmarks/actor_list_empty_bench.py Outdated

						
				@@ -0,0 +37,4 @@

				@dataclass

				class _FakeProviderInfo:

				    """Minimal provider info stand-in."""

freemo commented

2026-03-09 19:20:39 +00:00

F2 (Low): These _FakeProviderInfo / _FakeProviderRegistry classes duplicate the shared versions in features/mocks/fake_provider.py. Consider importing from there instead to avoid divergence. The ASV sys.path manipulation above should make the import reachable.

**F2 (Low):** These `_FakeProviderInfo` / `_FakeProviderRegistry` classes duplicate the shared versions in `features/mocks/fake_provider.py`. Consider importing from there instead to avoid divergence. The ASV `sys.path` manipulation above should make the import reachable.

src/cleveragents/actor/registry.py Outdated

						
				@@ -41,3 +41,3 @@

				    def _actor_name(self, provider_name: str, model_name: str) -> str:

				        return f"{provider_name}/{model_name}"

				        sanitized_provider = provider_name.replace("/", "-")

freemo commented

2026-03-09 19:20:39 +00:00

F1 (Medium): PR description states .lower() was added to enforce lowercase actor names, but this code only has .replace("/", "-"). If case-insensitive actor names are required by the specification, add .lower() to both sanitized_provider and sanitized_model. If not needed, update the PR body to remove the stale claim.

**F1 (Medium):** PR description states `.lower()` was added to enforce lowercase actor names, but this code only has `.replace("/", "-")`. If case-insensitive actor names are required by the specification, add `.lower()` to both `sanitized_provider` and `sanitized_model`. If not needed, update the PR body to remove the stale claim.

brent.edwards force-pushed feature/m3-fix-actor-list-empty from 3bb8928170 to 07376d608d

2026-03-09 19:34:13 +00:00

Compare

brent.edwards commented

2026-03-09 19:35:55 +00:00

Disposition of review findings — comment #56638

All findings addressed in 07376d60, rebased as a single commit on master (a808c395).

#	Severity	Summary	Disposition
F1	HIGH	`# type: ignore` in benchmark (2 instances)	Fixed. Removed `# type: ignore[arg-type]` on `provider_registry=` (now uses `FakeProviderRegistry` from shared stubs). Removed `# type: ignore[attr-defined]` on `.unit =` assignment — matches existing pattern in `cli_init_yes_bench.py` without suppression.
F2	HIGH	Import buried inside `_make_real_registry()`	Fixed. Moved `ActorRegistry` import to module-level in step file.
F3	HIGH	Benchmark defines duplicate fake provider classes	Fixed. Benchmark now imports `FakeProviderInfo` and `FakeProviderRegistry` from `features/mocks/fake_provider.py` via `sys.path` insertion. Local `_FakeProviderInfo` / `_FakeProviderRegistry` removed.
F4	HIGH	Opening commit leaves nox failing	Already fixed in prior revision.
F5	HIGH	Provider name not sanitised	Already fixed in prior revision (`_actor_name` replaces `/` with `-`).
F6	MEDIUM	Sanitised name vs stored model divergence; round-trip untested	Fixed. Added 6th scenario: "Listed actor name is a valid single-slash namespace/identifier". New `@then` step inspects `upsert_actor.call_args_list` and asserts every name has exactly one `/`.
F7	MEDIUM	CHANGELOG describes TDD failing tests	Already fixed in prior revision.
F8	LOW	Robot `Evaluate __import__` anti-pattern	Acknowledged. The `Evaluate` keyword is the standard Robot Framework idiom for importing Python modules in-process. Refactoring to a Python helper would add complexity without measurable benefit.
F9	LOW	`upsert_actor` mock returns `MagicMock` instead of real `Actor`	Fixed in both step file and benchmark. `upsert_actor.side_effect` now returns a real `Actor(name=..., provider=..., model=..., config_blob={}, config_hash=...)`.

Quality gates

Gate	Result
`nox -s lint`	All checks passed
`nox -s typecheck`	0 errors, 0 warnings
`nox -s unit_tests`	9,035 scenarios passed, 0 failed, 6 skipped (`@wip`)

Single commit 07376d60 rebased on master a808c395. Ready for re-review.

## Disposition of review findings — comment #56638 All findings addressed in `07376d60`, rebased as a single commit on master (`a808c395`). | # | Severity | Summary | Disposition | |---|----------|---------|-------------| | F1 | HIGH | `# type: ignore` in benchmark (2 instances) | **Fixed.** Removed `# type: ignore[arg-type]` on `provider_registry=` (now uses `FakeProviderRegistry` from shared stubs). Removed `# type: ignore[attr-defined]` on `.unit =` assignment — matches existing pattern in `cli_init_yes_bench.py` without suppression. | | F2 | HIGH | Import buried inside `_make_real_registry()` | **Fixed.** Moved `ActorRegistry` import to module-level in step file. | | F3 | HIGH | Benchmark defines duplicate fake provider classes | **Fixed.** Benchmark now imports `FakeProviderInfo` and `FakeProviderRegistry` from `features/mocks/fake_provider.py` via `sys.path` insertion. Local `_FakeProviderInfo` / `_FakeProviderRegistry` removed. | | F4 | HIGH | Opening commit leaves nox failing | **Already fixed** in prior revision. | | F5 | HIGH | Provider name not sanitised | **Already fixed** in prior revision (`_actor_name` replaces `/` with `-`). | | F6 | MEDIUM | Sanitised name vs stored model divergence; round-trip untested | **Fixed.** Added 6th scenario: *"Listed actor name is a valid single-slash namespace/identifier"*. New `@then` step inspects `upsert_actor.call_args_list` and asserts every name has exactly one `/`. | | F7 | MEDIUM | CHANGELOG describes TDD failing tests | **Already fixed** in prior revision. | | F8 | LOW | Robot `Evaluate __import__` anti-pattern | **Acknowledged.** The `Evaluate` keyword is the standard Robot Framework idiom for importing Python modules in-process. Refactoring to a Python helper would add complexity without measurable benefit. | | F9 | LOW | `upsert_actor` mock returns `MagicMock` instead of real `Actor` | **Fixed** in both step file and benchmark. `upsert_actor.side_effect` now returns a real `Actor(name=..., provider=..., model=..., config_blob={}, config_hash=...)`. | ### Quality gates | Gate | Result | |------|--------| | `nox -s lint` | All checks passed | | `nox -s typecheck` | 0 errors, 0 warnings | | `nox -s unit_tests` | 9,035 scenarios passed, 0 failed, 6 skipped (`@wip`) | Single commit `07376d60` rebased on master `a808c395`. Ready for re-review.

brent.edwards commented

2026-03-09 20:04:55 +00:00

Code Review — PR #594 `feature/m3-fix-actor-list-empty`

Reviewed commit: 07376d60 on base a808c395
Review type: Test review + Architecture review
Verdict: REQUEST CHANGES — 3× P1, 4× P2, 3× P3

P1 — Must fix before merge

F1 · P1:must-fix — TDD bug tags use wrong convention
features/actor_list_empty.feature

Tags @tdd @bug592 must be @tdd_bug @tdd_bug_592 per CONTRIBUTING.md §TDD Bug Test Tags (lines 1176-1221). The @wip tag should be @tdd_expected_fail if the test is expected to fail, or omitted if the fix is included and tests pass.

F2 · P1:must-fix — Commit footer uses Refs: instead of ISSUES CLOSED:

This PR contains the actual bug fix (production code change in registry.py), not just a TDD test-only PR. Per CONTRIBUTING.md, bug fix PRs that close an issue must use ISSUES CLOSED: #592 in the commit footer, not Refs: #592.

F3 · P1:must-fix — Commit subject is factually incorrect

Subject says "remove .lower() from _actor_name" but no .lower() call ever existed in the codebase. The actual fix adds .replace("/", "-") to _actor_name(). The commit message must accurately describe the change.

P2 — Should fix (follow-up within 3 days)

F4 · P2:should-fix — _actor_name() lacks docstring
src/cleveragents/actor/registry.py

Private helper _actor_name() is non-trivial (name transformation logic). A one-line docstring explaining the slash-to-dash replacement and why it exists would aid future maintainers.

F5 · P2:should-fix — Benchmark silently suppresses errors
benchmarks/actor_list_empty_bench.py — time_list_multi_slash_provider

The benchmark catches exceptions in the multi-slash provider scenario but doesn't document why suppression is expected. Add a comment explaining the expected error or use pytest.raises-style assertion.

F6 · P2:should-fix — Robot test case 2 missing exit code assertion
robot/actor_list_empty.robot

Test case "List Actors After Adding Actor With Slash In Provider" checks stdout content but doesn't assert result.rc == 0. If the CLI crashes but happens to print the expected string to stderr, this test would still pass.

F7 · P2:should-fix — Novel _FEATURES sys.path pattern in benchmark not documented
benchmarks/actor_list_empty_bench.py

The _FEATURES = Path(__file__).resolve().parents[1] / "features" / sys.path.insert(0, str(_FEATURES)) pattern is used to import from features.mocks. If this is a new pattern for benchmarks, add a brief comment explaining why it's needed and consider documenting it in the benchmark README or CONTRIBUTING.md.

P3 — Nits (author discretion)

F8 · P3:nit — CHANGELOG vs commit message inconsistency

CHANGELOG entry says "Normalize actor names to replace / with -" while the commit subject says "remove .lower()". These should tell a consistent story (see also F3).

F9 · P3:nit — Missing trailing-slash edge case test

No BDD scenario tests actor names with trailing slashes (e.g., provider/model/). Edge case may not matter in practice but would round out coverage.

F10 · P3:nit — Any type annotations in fake_provider.py
features/mocks/fake_provider.py

Some parameters typed as Any where more concrete types (e.g., dict[str, object]) are available. Minor type safety improvement.

Summary

The production fix itself is sound — replacing / with - in _actor_name() correctly addresses issue #592. The test coverage is thorough with 6 BDD scenarios, Robot smoke tests, and ASV benchmarks. However, the three P1 findings (wrong TDD tags, wrong commit footer keyword, factually incorrect commit subject) must be addressed before merge.

## Code Review — PR #594 `feature/m3-fix-actor-list-empty` **Reviewed commit:** `07376d60` on base `a808c395` **Review type:** Test review + Architecture review **Verdict:** REQUEST CHANGES — 3× P1, 4× P2, 3× P3 --- ### P1 — Must fix before merge **F1 · `P1:must-fix` — TDD bug tags use wrong convention** `features/actor_list_empty.feature` Tags `@tdd @bug592` must be `@tdd_bug @tdd_bug_592` per CONTRIBUTING.md §TDD Bug Test Tags (lines 1176-1221). The `@wip` tag should be `@tdd_expected_fail` if the test is expected to fail, or omitted if the fix is included and tests pass. **F2 · `P1:must-fix` — Commit footer uses `Refs:` instead of `ISSUES CLOSED:`** This PR contains the actual bug fix (production code change in `registry.py`), not just a TDD test-only PR. Per CONTRIBUTING.md, bug fix PRs that close an issue must use `ISSUES CLOSED: #592` in the commit footer, not `Refs: #592`. **F3 · `P1:must-fix` — Commit subject is factually incorrect** Subject says "remove .lower() from _actor_name" but no `.lower()` call ever existed in the codebase. The actual fix adds `.replace("/", "-")` to `_actor_name()`. The commit message must accurately describe the change. --- ### P2 — Should fix (follow-up within 3 days) **F4 · `P2:should-fix` — `_actor_name()` lacks docstring** `src/cleveragents/actor/registry.py` Private helper `_actor_name()` is non-trivial (name transformation logic). A one-line docstring explaining the slash-to-dash replacement and why it exists would aid future maintainers. **F5 · `P2:should-fix` — Benchmark silently suppresses errors** `benchmarks/actor_list_empty_bench.py` — `time_list_multi_slash_provider` The benchmark catches exceptions in the multi-slash provider scenario but doesn't document why suppression is expected. Add a comment explaining the expected error or use `pytest.raises`-style assertion. **F6 · `P2:should-fix` — Robot test case 2 missing exit code assertion** `robot/actor_list_empty.robot` Test case "List Actors After Adding Actor With Slash In Provider" checks stdout content but doesn't assert `result.rc == 0`. If the CLI crashes but happens to print the expected string to stderr, this test would still pass. **F7 · `P2:should-fix` — Novel `_FEATURES` sys.path pattern in benchmark not documented** `benchmarks/actor_list_empty_bench.py` The `_FEATURES = Path(__file__).resolve().parents[1] / "features"` / `sys.path.insert(0, str(_FEATURES))` pattern is used to import from `features.mocks`. If this is a new pattern for benchmarks, add a brief comment explaining why it's needed and consider documenting it in the benchmark README or CONTRIBUTING.md. --- ### P3 — Nits (author discretion) **F8 · `P3:nit` — CHANGELOG vs commit message inconsistency** CHANGELOG entry says "Normalize actor names to replace `/` with `-`" while the commit subject says "remove .lower()". These should tell a consistent story (see also F3). **F9 · `P3:nit` — Missing trailing-slash edge case test** No BDD scenario tests actor names with trailing slashes (e.g., `provider/model/`). Edge case may not matter in practice but would round out coverage. **F10 · `P3:nit` — `Any` type annotations in `fake_provider.py`** `features/mocks/fake_provider.py` Some parameters typed as `Any` where more concrete types (e.g., `dict[str, object]`) are available. Minor type safety improvement. --- ### Summary The production fix itself is sound — replacing `/` with `-` in `_actor_name()` correctly addresses issue #592. The test coverage is thorough with 6 BDD scenarios, Robot smoke tests, and ASV benchmarks. However, the three P1 findings (wrong TDD tags, wrong commit footer keyword, factually incorrect commit subject) must be addressed before merge.

brent.edwards referenced this pull request

2026-03-09 20:05:32 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

freemo referenced this pull request

2026-03-09 20:17:15 +00:00

feat(cli): implement full output rendering framework per specification #550

freemo referenced this pull request

2026-03-09 20:18:17 +00:00

TDD: `agents actor list` empty validation error test #634

freemo commented

2026-03-09 20:28:53 +00:00

PM Compliance Audit — CONTRIBUTING.md Checklist

#	Requirement	Status
1	Detailed PR description	PASS
2	Issue reference with closing keyword	PASS — `Closes #592`
3	Dependency link (PR blocks issue)	PASS — PR blocks #592
4	CHANGELOG.md updated	PASS
5	Milestone	PASS — v3.2.0
6	Type label	PASS — Type/Bug
7	Assignee	PASS — @brent.edwards
8	Mergeable	PASS
9	Reviews	STALE — REQUEST_CHANGES from @hurui200320 and @hamza.khyari predate latest squash/rebase. Need re-review.

Merge blockers: Stale REQUEST_CHANGES reviews need re-review or withdrawal. Need 2 fresh approvals from non-author reviewers.

## PM Compliance Audit — CONTRIBUTING.md Checklist | # | Requirement | Status | |---|------------|--------| | 1 | Detailed PR description | PASS | | 2 | Issue reference with closing keyword | PASS — `Closes #592` | | 3 | Dependency link (PR blocks issue) | PASS — PR blocks #592 | | 4 | CHANGELOG.md updated | PASS | | 5 | Milestone | PASS — v3.2.0 | | 6 | Type label | PASS — Type/Bug | | 7 | Assignee | PASS — @brent.edwards | | 8 | Mergeable | PASS | | 9 | Reviews | **STALE** — REQUEST_CHANGES from @hurui200320 and @hamza.khyari predate latest squash/rebase. Need re-review. | **Merge blockers**: Stale REQUEST_CHANGES reviews need re-review or withdrawal. Need 2 fresh approvals from non-author reviewers.

brent.edwards force-pushed feature/m3-fix-actor-list-empty from 07376d608d to 27b3c65231

2026-03-09 22:38:32 +00:00

Compare

brent.edwards force-pushed feature/m3-fix-actor-list-empty from 27b3c65231 to 66d957925a

2026-03-09 22:41:00 +00:00

Compare

brent.edwards commented

2026-03-09 22:41:56 +00:00

Disposition of all review findings — commit `66d95792`

Rebased as a single commit on master (83f2f3a0). All quality gates pass: nox -s lint ✓, nox -s typecheck 0 errors ✓.

hurui200320 — Review #2007

ID	Severity	Finding	Disposition
C1	CRITICAL	Merge commit	✅ Fixed — single squashed commit, rebased
C2	CRITICAL	Multiple commits	✅ Fixed — single commit `66d95792`
C3	CRITICAL	Tests bypass `_normalize_name()`	✅ Fixed — Scenario 6 asserts `upsert_actor` called with exactly-one-slash name
C4	CRITICAL	`# type: ignore` comments	✅ Fixed — zero remaining in steps or benchmark
H1	HIGH	Type label mismatch	✅ Fixed — now `Type/Bug`
H2	HIGH	Mocks not in `features/mocks/`	✅ Fixed — `features/mocks/fake_provider.py`
H3	HIGH	No `.lower()` in `_actor_name()`	✅ Fixed — `.lower()` applied to both components
M1	MEDIUM	Provider name not sanitised	✅ Fixed — both provider and model sanitised
M2	MEDIUM	Robot tests fresh-env only	✅ Acknowledged — documented in test docstrings; multi-slash path exercised by Behave Scenario 6 + benchmarks
M3	MEDIUM	CHANGELOG describes tests only	✅ Fixed — now describes the production fix
L1	LOW	Not mergeable	✅ Fixed — rebased onto master, mergeable
L2	LOW	Benchmark suppresses error	✅ Fixed — `contextlib.suppress` removed

hamza.khyari — Review #2032

ID	Severity	Finding	Disposition
TEST-1	Major	Scenarios 3–5 tautological	✅ Fixed — Scenario 6 validates name sanitisation via call args
SPEC-1	Major	Dots in 4/7 default models fail Session validator	⏭️ Out of scope — pre-existing across `Session._ACTOR_NAME_PATTERN`, not introduced by this PR
BUG-1	Medium	`MagicMock(name=...)` gotcha	✅ Already fixed — uses `Actor(name=...)` not `MagicMock`
SEC-1	Medium	`except Exception` swallows errors	⏭️ Out of scope — pre-existing code not modified by this PR
DRY-1	Medium	Fake classes duplicated in benchmark	✅ Fixed — benchmark imports from `features/mocks/fake_provider.py`
DESIGN-1	Medium	Three inconsistent representations	⏭️ Out of scope — pre-existing design issue
C4	Low	2 `# type: ignore` in benchmark	✅ Fixed — zero remaining
DESIGN-2	Low	`list(namespace=...)` case-sensitive	⏭️ Out of scope — pre-existing
PROC-1	Low	`__init__.py` not updated	✅ Fixed — exports `FakeProviderInfo`/`FakeProviderRegistry`
BENCH-1	Low	`contextlib.suppress` defeats regression	✅ Fixed — removed
BENCH-2	Low	`importlib.reload` doesn't cascade	⏭️ Out of scope — pre-existing ASV limitation

freemo — Review #2059

ID	Severity	Finding	Disposition
F1	Medium	`.lower()` missing from diff	✅ Fixed — `.lower()` restored to both components
F2	Low	Duplicate mocks in benchmark	✅ Fixed — imports from shared module
F3	Low	Missing `@unit` tag	✅ Addressed — changed to `@tdd_bug @tdd_bug_592` per CONTRIBUTING.md tag convention
F4	Info	`# type: ignore` in benchmark	✅ Fixed — zero remaining
F5	Info	Empty string edge case	⏭️ Not reachable — providers always have names from `ProviderType` enum

Self-review findings (comment #56911)

#	Finding	Disposition
P1-1	`.lower()` missing	✅ Fixed
P1-2	TDD tags use old convention	✅ Fixed — `@tdd_bug @tdd_bug_592`
P1-3	Robot test case 2 missing exit code assertion	✅ Fixed
P2-1	`contextlib.suppress` in benchmark	✅ Fixed — removed
P2-2	No docstring on `_actor_name()`	✅ Fixed
P2-3	CHANGELOG doesn't mention `.lower()`	✅ Fixed
P3-1	`__init__.py` not exporting new mocks	✅ Fixed

All actionable findings resolved. Items marked ⏭️ are pre-existing issues not introduced by this PR.

## Disposition of all review findings — commit `66d95792` Rebased as a single commit on master (`83f2f3a0`). All quality gates pass: `nox -s lint` ✓, `nox -s typecheck` 0 errors ✓. ### hurui200320 — Review #2007 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | C1 | CRITICAL | Merge commit | ✅ Fixed — single squashed commit, rebased | | C2 | CRITICAL | Multiple commits | ✅ Fixed — single commit `66d95792` | | C3 | CRITICAL | Tests bypass `_normalize_name()` | ✅ Fixed — Scenario 6 asserts `upsert_actor` called with exactly-one-slash name | | C4 | CRITICAL | `# type: ignore` comments | ✅ Fixed — zero remaining in steps or benchmark | | H1 | HIGH | Type label mismatch | ✅ Fixed — now `Type/Bug` | | H2 | HIGH | Mocks not in `features/mocks/` | ✅ Fixed — `features/mocks/fake_provider.py` | | H3 | HIGH | No `.lower()` in `_actor_name()` | ✅ Fixed — `.lower()` applied to both components | | M1 | MEDIUM | Provider name not sanitised | ✅ Fixed — both provider and model sanitised | | M2 | MEDIUM | Robot tests fresh-env only | ✅ Acknowledged — documented in test docstrings; multi-slash path exercised by Behave Scenario 6 + benchmarks | | M3 | MEDIUM | CHANGELOG describes tests only | ✅ Fixed — now describes the production fix | | L1 | LOW | Not mergeable | ✅ Fixed — rebased onto master, mergeable | | L2 | LOW | Benchmark suppresses error | ✅ Fixed — `contextlib.suppress` removed | ### hamza.khyari — Review #2032 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | TEST-1 | Major | Scenarios 3–5 tautological | ✅ Fixed — Scenario 6 validates name sanitisation via call args | | SPEC-1 | Major | Dots in 4/7 default models fail Session validator | ⏭️ Out of scope — pre-existing across `Session._ACTOR_NAME_PATTERN`, not introduced by this PR | | BUG-1 | Medium | `MagicMock(name=...)` gotcha | ✅ Already fixed — uses `Actor(name=...)` not `MagicMock` | | SEC-1 | Medium | `except Exception` swallows errors | ⏭️ Out of scope — pre-existing code not modified by this PR | | DRY-1 | Medium | Fake classes duplicated in benchmark | ✅ Fixed — benchmark imports from `features/mocks/fake_provider.py` | | DESIGN-1 | Medium | Three inconsistent representations | ⏭️ Out of scope — pre-existing design issue | | C4 | Low | 2 `# type: ignore` in benchmark | ✅ Fixed — zero remaining | | DESIGN-2 | Low | `list(namespace=...)` case-sensitive | ⏭️ Out of scope — pre-existing | | PROC-1 | Low | `__init__.py` not updated | ✅ Fixed — exports `FakeProviderInfo`/`FakeProviderRegistry` | | BENCH-1 | Low | `contextlib.suppress` defeats regression | ✅ Fixed — removed | | BENCH-2 | Low | `importlib.reload` doesn't cascade | ⏭️ Out of scope — pre-existing ASV limitation | ### freemo — Review #2059 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | F1 | Medium | `.lower()` missing from diff | ✅ Fixed — `.lower()` restored to both components | | F2 | Low | Duplicate mocks in benchmark | ✅ Fixed — imports from shared module | | F3 | Low | Missing `@unit` tag | ✅ Addressed — changed to `@tdd_bug @tdd_bug_592` per CONTRIBUTING.md tag convention | | F4 | Info | `# type: ignore` in benchmark | ✅ Fixed — zero remaining | | F5 | Info | Empty string edge case | ⏭️ Not reachable — providers always have names from `ProviderType` enum | ### Self-review findings (comment #56911) | # | Finding | Disposition | |---|---------|-------------| | P1-1 | `.lower()` missing | ✅ Fixed | | P1-2 | TDD tags use old convention | ✅ Fixed — `@tdd_bug @tdd_bug_592` | | P1-3 | Robot test case 2 missing exit code assertion | ✅ Fixed | | P2-1 | `contextlib.suppress` in benchmark | ✅ Fixed — removed | | P2-2 | No docstring on `_actor_name()` | ✅ Fixed | | P2-3 | CHANGELOG doesn't mention `.lower()` | ✅ Fixed | | P3-1 | `__init__.py` not exporting new mocks | ✅ Fixed | **All actionable findings resolved. Items marked ⏭️ are pre-existing issues not introduced by this PR.**

brent.edwards reviewed 2026-03-09 23:22:07 +00:00

brent.edwards left a comment

Code Review — PR #594 `fix(actor): handle empty actor list without validation error`

Reviewed commit: 66d95792 | Base: master (83f2f3a0)
Scope: Production domain code (registry.py), Behave/Robot tests, ASV benchmarks
Checklists applied: Architecture review, Test review, Docs review

Summary

The PR fixes bug #592 by sanitizing actor names in ActorRegistry._actor_name() — replacing / with - and lowercasing — so that providers whose default model contains slashes (e.g., OpenRouter's anthropic/claude-sonnet-4-20250514) no longer produce multi-slash names that fail ActorService._normalize_name() validation. The production change is minimal and well-targeted (3 lines in registry.py). Test coverage is extensive: 6 Behave scenarios, 2 Robot tests, and ASV benchmarks. The FakeProviderInfo/FakeProviderRegistry mocks are cleanly implemented and properly exported.

Findings

ID	Severity	Category	File:Line	Description	Recommended Fix
F1	P2:should-fix	Domain safety	`registry.py:54`	`.lower()` may cause name drift for existing actors. `upsert_actor` does case-sensitive `get_by_name()`. If any provider was previously registered with mixed-case (e.g., `_actor_name("OpenAI", "gpt-4")` → `"OpenAI/gpt-4"`), the new lowercased name won't match and a duplicate is created. Practical risk is low — provider names in `ProviderInfo.name` and `provider_type.value` are conventionally lowercase (`"openai"`, `"anthropic"`, `"openrouter"`), and model IDs are also lowercase. But the risk is nonzero for edge cases.	At minimum, document this as a known behavior change in the CHANGELOG entry: "Provider/model names are now lowercased; existing mixed-case built-in actors will be superseded by lowercased versions." Consider adding a one-time cleanup in `ensure_built_in_actors()` that removes stale mixed-case entries.
F2	P2:should-fix	Domain consistency	`registry.py:114-126`	`provider` and `model` fields passed to `upsert_actor` are not sanitized — only `name` is. The `Actor` object ends up with `name="openrouter/anthropic-claude-..."` but `provider="openrouter"`, `model="anthropic/claude-sonnet-4-20250514"` (with original slashes). Any code reconstructing a name from `actor.provider + "/" + actor.model` would produce a different string than `actor.name`.	Add a comment documenting this intentional asymmetry: the `name` field is the canonical identifier; `provider` and `model` store raw provider metadata.
F3	P2:should-fix	Docstring accuracy	`registry.py:47-49`	Docstring claims result "always matches `^[a-z0-9_-]+/[a-z0-9_-]+$`" but the function only replaces `/` and lowercases. Names with spaces, dots, or `@` would not match the claimed pattern (e.g., `_actor_name("my provider", "gpt.4")` → `"my provider/gpt.4"`).	Soften the docstring to "handles the most common violation (embedded slashes from providers like OpenRouter)" or additionally strip disallowed characters with `re.sub`.
F4	P2:should-fix	Test fidelity	`steps/actor_list_empty_steps.py:81-88`	Scenarios 1-2 (`step_no_providers`) use `MagicMock()` for both service and registry, so `list_actors()` returns the canned `[]` without exercising any production code. These scenarios test the CLI rendering of an empty list (which is valid), but they don't exercise the actual bug path through `ensure_built_in_actors()` → `_actor_name()`. Scenarios 3-6 DO exercise real code.	Change `step_no_providers` to use `_make_real_registry([])` (zero providers) so the real `ensure_built_in_actors` is called and returns `[]` naturally. This strengthens the regression value while keeping the same assertion.
F5	P2:should-fix	Test coverage	`actor_list_empty.feature:30-36`	No scenario asserts on the exact sanitized name string produced by `_actor_name()`. A subtle regression in the sanitization logic (e.g., replacing `-` instead of `/`) could slip through since tests only check exit code and absence of `VALIDATION_FAILED`.	Add a Then step verifying the actual actor name, e.g., `And the upserted actor-list-empty actor name should be "openrouter/anthropic-claude-sonnet-4-20250514"`.
F6	P3:nit	Benchmark	`benchmarks/actor_list_empty_bench.py:22-30`	`sys.path` manipulation and `importlib.reload(cleveragents)` is the standard ASV pattern in this project, but the `features` directory added to `sys.path` makes `mocks` a top-level importable name (namespace collision risk).	Low risk in practice since ASV runs in isolation. No action needed.
F7	P3:nit	CHANGELOG	`CHANGELOG.md:8`	Hardcodes "6 Behave BDD regression scenarios" — scenario count may drift if the feature file is updated.	Consider "Includes Behave BDD regression scenarios, Robot Framework tests, and ASV benchmarks."

Security Check

No credentials, secrets, or API keys.
FakeProviderInfo/FakeProviderRegistry are test-only mocks properly isolated in features/mocks/.
No unsafe patterns.

Verdict

Severity	Count
P0	0
P1	0
P2	5
P3	2

APPROVE with comments. No P0/P1 findings. The production fix is correct and well-targeted. The five P2 items (name drift documentation, provider/model asymmetry comment, docstring accuracy, mock fidelity, name assertion) should be addressed in a follow-up within 3 days. F4 and F5 are the most impactful — strengthening the mock to use real code and adding a name assertion would significantly improve regression coverage.

## Code Review — PR #594 `fix(actor): handle empty actor list without validation error` **Reviewed commit:** `66d95792` | **Base:** `master` (`83f2f3a0`) **Scope:** Production domain code (`registry.py`), Behave/Robot tests, ASV benchmarks **Checklists applied:** Architecture review, Test review, Docs review --- ### Summary The PR fixes bug #592 by sanitizing actor names in `ActorRegistry._actor_name()` — replacing `/` with `-` and lowercasing — so that providers whose default model contains slashes (e.g., OpenRouter's `anthropic/claude-sonnet-4-20250514`) no longer produce multi-slash names that fail `ActorService._normalize_name()` validation. The production change is minimal and well-targeted (3 lines in `registry.py`). Test coverage is extensive: 6 Behave scenarios, 2 Robot tests, and ASV benchmarks. The `FakeProviderInfo`/`FakeProviderRegistry` mocks are cleanly implemented and properly exported. --- ### Findings | ID | Severity | Category | File:Line | Description | Recommended Fix | |----|----------|----------|-----------|-------------|-----------------| | F1 | **P2:should-fix** | Domain safety | `registry.py:54` | **`.lower()` may cause name drift for existing actors.** `upsert_actor` does case-sensitive `get_by_name()`. If any provider was previously registered with mixed-case (e.g., `_actor_name("OpenAI", "gpt-4")` → `"OpenAI/gpt-4"`), the new lowercased name won't match and a duplicate is created. **Practical risk is low** — provider names in `ProviderInfo.name` and `provider_type.value` are conventionally lowercase (`"openai"`, `"anthropic"`, `"openrouter"`), and model IDs are also lowercase. But the risk is nonzero for edge cases. | At minimum, document this as a known behavior change in the CHANGELOG entry: "Provider/model names are now lowercased; existing mixed-case built-in actors will be superseded by lowercased versions." Consider adding a one-time cleanup in `ensure_built_in_actors()` that removes stale mixed-case entries. | | F2 | **P2:should-fix** | Domain consistency | `registry.py:114-126` | **`provider` and `model` fields passed to `upsert_actor` are not sanitized** — only `name` is. The `Actor` object ends up with `name="openrouter/anthropic-claude-..."` but `provider="openrouter"`, `model="anthropic/claude-sonnet-4-20250514"` (with original slashes). Any code reconstructing a name from `actor.provider + "/" + actor.model` would produce a different string than `actor.name`. | Add a comment documenting this intentional asymmetry: the `name` field is the canonical identifier; `provider` and `model` store raw provider metadata. | | F3 | **P2:should-fix** | Docstring accuracy | `registry.py:47-49` | Docstring claims result "always matches `^[a-z0-9_-]+/[a-z0-9_-]+$`" but the function only replaces `/` and lowercases. Names with spaces, dots, or `@` would not match the claimed pattern (e.g., `_actor_name("my provider", "gpt.4")` → `"my provider/gpt.4"`). | Soften the docstring to "handles the most common violation (embedded slashes from providers like OpenRouter)" or additionally strip disallowed characters with `re.sub`. | | F4 | **P2:should-fix** | Test fidelity | `steps/actor_list_empty_steps.py:81-88` | Scenarios 1-2 (`step_no_providers`) use `MagicMock()` for both service and registry, so `list_actors()` returns the canned `[]` without exercising any production code. These scenarios test the CLI rendering of an empty list (which is valid), but they don't exercise the actual bug path through `ensure_built_in_actors()` → `_actor_name()`. Scenarios 3-6 DO exercise real code. | Change `step_no_providers` to use `_make_real_registry([])` (zero providers) so the real `ensure_built_in_actors` is called and returns `[]` naturally. This strengthens the regression value while keeping the same assertion. | | F5 | **P2:should-fix** | Test coverage | `actor_list_empty.feature:30-36` | No scenario asserts on the **exact sanitized name string** produced by `_actor_name()`. A subtle regression in the sanitization logic (e.g., replacing `-` instead of `/`) could slip through since tests only check exit code and absence of `VALIDATION_FAILED`. | Add a Then step verifying the actual actor name, e.g., `And the upserted actor-list-empty actor name should be "openrouter/anthropic-claude-sonnet-4-20250514"`. | | F6 | **P3:nit** | Benchmark | `benchmarks/actor_list_empty_bench.py:22-30` | `sys.path` manipulation and `importlib.reload(cleveragents)` is the standard ASV pattern in this project, but the `features` directory added to `sys.path` makes `mocks` a top-level importable name (namespace collision risk). | Low risk in practice since ASV runs in isolation. No action needed. | | F7 | **P3:nit** | CHANGELOG | `CHANGELOG.md:8` | Hardcodes "6 Behave BDD regression scenarios" — scenario count may drift if the feature file is updated. | Consider "Includes Behave BDD regression scenarios, Robot Framework tests, and ASV benchmarks." | ### Security Check - No credentials, secrets, or API keys. - `FakeProviderInfo`/`FakeProviderRegistry` are test-only mocks properly isolated in `features/mocks/`. - No unsafe patterns. ### Verdict | Severity | Count | |----------|-------| | P0 | 0 | | P1 | 0 | | P2 | 5 | | P3 | 2 | **APPROVE with comments.** No P0/P1 findings. The production fix is correct and well-targeted. The five P2 items (name drift documentation, provider/model asymmetry comment, docstring accuracy, mock fidelity, name assertion) should be addressed in a follow-up within 3 days. F4 and F5 are the most impactful — strengthening the mock to use real code and adding a name assertion would significantly improve regression coverage.

brent.edwards force-pushed feature/m3-fix-actor-list-empty from 66d957925a to 62468d970c

2026-03-10 01:22:35 +00:00

Compare

brent.edwards commented

2026-03-10 01:23:01 +00:00

Disposition — Self-Review #2071 Findings

Addressed all P2/P3 findings from self-review in commit 62468d97. Rebased onto latest master (fcdf80f3).

F1 (P2) — `.lower()` may cause name drift

Fixed. Added a .. note:: block in the _actor_name() docstring documenting the lowercasing behavior change. CHANGELOG entry updated: "provider/model names are now lowercased; existing mixed-case built-in actors will be superseded by lowercased versions on the next ensure_built_in_actors() call."

F2 (P2) — `provider`/`model` fields not sanitized

Fixed. Added a comment block above the upsert_actor() call in ensure_built_in_actors() documenting the intentional asymmetry: name is the canonical sanitized identifier; provider and model store raw provider metadata.

F3 (P2) — Docstring overclaims regex conformance

Fixed. Softened the _actor_name() docstring to say it "handles the most common violations (embedded slashes)" and explicitly notes it does not strip every character disallowed by the spec pattern (spaces, dots, @), which in practice never appear in configured provider/model identifiers.

F4 (P2) — Scenarios 1-2 use MagicMock bypassing real code

Fixed. Changed step_no_providers to use _make_real_registry([]) so the production code path through ensure_built_in_actors() is exercised and returns [] naturally, instead of a fully-mocked registry that bypasses it.

F5 (P2) — No scenario asserts exact sanitized name

Fixed. Added a new Then step the upserted actor-list-empty actor name should be "{name}" and added it to Scenario 6: And the upserted actor-list-empty actor name should be "openrouter/anthropic-claude-sonnet-4-20250514". This catches subtle regressions in the sanitization logic.

F6 (P3) — `sys.path` namespace collision risk

Acknowledged. No action — low risk since ASV runs in isolation.

F7 (P3) — CHANGELOG hardcodes scenario count

Fixed. Changed to "Includes Behave BDD regression scenarios, Robot Framework integration smoke tests, and ASV benchmarks."

Quality gates

nox -s typecheck — 0 errors ✓
nox -s lint — 0 errors ✓
Single squashed commit ✓
Rebased on latest master (fcdf80f3) ✓

## Disposition — Self-Review #2071 Findings Addressed all P2/P3 findings from self-review in commit `62468d97`. Rebased onto latest `master` (`fcdf80f3`). ### F1 (P2) — `.lower()` may cause name drift **Fixed.** Added a `.. note::` block in the `_actor_name()` docstring documenting the lowercasing behavior change. CHANGELOG entry updated: "provider/model names are now lowercased; existing mixed-case built-in actors will be superseded by lowercased versions on the next `ensure_built_in_actors()` call." ### F2 (P2) — `provider`/`model` fields not sanitized **Fixed.** Added a comment block above the `upsert_actor()` call in `ensure_built_in_actors()` documenting the intentional asymmetry: `name` is the canonical sanitized identifier; `provider` and `model` store raw provider metadata. ### F3 (P2) — Docstring overclaims regex conformance **Fixed.** Softened the `_actor_name()` docstring to say it "handles the most common violations (embedded slashes)" and explicitly notes it does **not** strip every character disallowed by the spec pattern (spaces, dots, `@`), which in practice never appear in configured provider/model identifiers. ### F4 (P2) — Scenarios 1-2 use MagicMock bypassing real code **Fixed.** Changed `step_no_providers` to use `_make_real_registry([])` so the production code path through `ensure_built_in_actors()` is exercised and returns `[]` naturally, instead of a fully-mocked registry that bypasses it. ### F5 (P2) — No scenario asserts exact sanitized name **Fixed.** Added a new Then step `the upserted actor-list-empty actor name should be "{name}"` and added it to Scenario 6: `And the upserted actor-list-empty actor name should be "openrouter/anthropic-claude-sonnet-4-20250514"`. This catches subtle regressions in the sanitization logic. ### F6 (P3) — `sys.path` namespace collision risk **Acknowledged.** No action — low risk since ASV runs in isolation. ### F7 (P3) — CHANGELOG hardcodes scenario count **Fixed.** Changed to "Includes Behave BDD regression scenarios, Robot Framework integration smoke tests, and ASV benchmarks." ### Quality gates - `nox -s typecheck` — 0 errors ✓ - `nox -s lint` — 0 errors ✓ - Single squashed commit ✓ - Rebased on latest master (`fcdf80f3`) ✓

hurui200320 approved these changes 2026-03-10 08:41:45 +00:00

Dismissed

hurui200320 left a comment

Code Review: PR #594 — fix(actor): handle empty actor list without validation error

Reviewer: Rui Hu (hurui200320)
Issue: #592 — agents actor list without actors should show no actors
Branch: feature/m3-fix-actor-list-empty
Reviewed commit: 62468d97

Overview

The production fix in _actor_name() (registry.py:66-68) is correct and well-scoped: it replaces / with - and lowercases both provider and model components, ensuring the output always contains exactly one /. The docstring (lines 43–65) honestly documents limitations and the lowercasing behavior change. Test coverage is adequate through mock argument inspection (Scenario 6) and implicit Actor constructor validation in the mock side_effect.

Prior Review Resolution

All critical, high, and medium items from my previous review (#2007), hamza.khyari's review (#2032), and freemo's review (#2059) have been addressed:

Prior ID	Description	Status
C1	Merge commit	Fixed — single squashed commit `62468d97`
C2	Multiple commits	Fixed — single commit
C3/TEST-1	Tests bypass `_normalize_name()`	Addressed — Scenario 6 asserts name format; `Actor()` constructor validates in mock side_effect
C4	`# type: ignore`	Fixed — verified zero occurrences across all PR files
H1	Label mismatch	Fixed — now `Type/Bug`
H2/DRY-1	Mock placement / duplication	Fixed — shared `features/mocks/fake_provider.py`, benchmark imports from it
H3/F1	Missing `.lower()`	Fixed — present at lines 66–67
M1	Provider name not sanitized	Fixed — both components sanitized
M3	CHANGELOG mismatch	Fixed — accurately describes the fix and behavior change
L2/BENCH-1	`contextlib.suppress` in benchmark	Fixed — removed
BUG-1	`MagicMock(name=...)` gotcha	Fixed — uses `Actor(name=...)`
PROC-1	`__init__.py` not updated	Fixed — exports `FakeProviderInfo` / `FakeProviderRegistry`
F3	Tag convention	Fixed — `@tdd_bug @tdd_bug_592`

Remaining Findings

R1 (LOW / Process) — CHANGELOG merge conflict; needs rebase

File: CHANGELOG.md
Problem: Master has advanced past the merge base (fcdf80f3). Merging produces a conflict in CHANGELOG.md. The PR shows mergeable: false.
Recommendation: Rebase onto latest master and resolve the conflict.

R2 (LOW / Docs) — Feature file comment inaccuracy

File: features/actor_list_empty.feature:7-10
Problem: Comment says "Scenarios 1-2 test CLI rendering of an empty actor list using a fully-mocked registry", but step_no_providers (actor_list_empty_steps.py:86) actually uses _make_real_registry([]) — a real ActorRegistry with zero providers.
Recommendation: Change to "…using a real registry with zero configured providers".

Out-of-Scope Items (confirmed deferred)

I agree these are pre-existing issues and should not block this PR:

SPEC-1 — dots in 4/7 default model names vs. Session._ACTOR_NAME_PATTERN
SEC-1 — except Exception in add()
DESIGN-1 / DESIGN-2 — three name representations, namespace case sensitivity
BENCH-2 — importlib.reload cascade limitation

Checklist

Criterion	Status
Production fix correctness	PASS — sanitizes `/` and lowercases both components
Commit format	PASS — matches ticket metadata, `ISSUES CLOSED: #592`, single squashed commit
Test coverage (Behave)	PASS — 6 scenarios including name-format assertion in Scenario 6
Test coverage (Robot)	PASS — 2 fresh-env smoke tests
Benchmarks (ASV)	PASS — timing + regression tracker, no error suppression
`# type: ignore`	PASS — zero occurrences
Mock placement	PASS — shared in `features/mocks/fake_provider.py`
CHANGELOG	PASS — accurate description
Security	PASS — no credentials, no injection risk
Performance	PASS — O(n) string ops, negligible

Verdict: APPROVED

The production fix is correct and minimal. All prior critical/high/medium review items are resolved. The two remaining items (R1: rebase conflict, R2: comment wording) are low-severity and non-blocking.

# Code Review: PR #594 — fix(actor): handle empty actor list without validation error **Reviewer:** Rui Hu (`hurui200320`) **Issue:** #592 — `agents actor list` without actors should show no actors **Branch:** `feature/m3-fix-actor-list-empty` **Reviewed commit:** `62468d97` --- ## Overview The production fix in `_actor_name()` (`registry.py:66-68`) is **correct and well-scoped**: it replaces `/` with `-` and lowercases both provider and model components, ensuring the output always contains exactly one `/`. The docstring (lines 43–65) honestly documents limitations and the lowercasing behavior change. Test coverage is adequate through mock argument inspection (Scenario 6) and implicit `Actor` constructor validation in the mock side_effect. --- ## Prior Review Resolution All **critical**, **high**, and **medium** items from my previous review (#2007), hamza.khyari's review (#2032), and freemo's review (#2059) have been addressed: | Prior ID | Description | Status | |----------|-------------|--------| | C1 | Merge commit | **Fixed** — single squashed commit `62468d97` | | C2 | Multiple commits | **Fixed** — single commit | | C3/TEST-1 | Tests bypass `_normalize_name()` | **Addressed** — Scenario 6 asserts name format; `Actor()` constructor validates in mock side_effect | | C4 | `# type: ignore` | **Fixed** — verified zero occurrences across all PR files | | H1 | Label mismatch | **Fixed** — now `Type/Bug` | | H2/DRY-1 | Mock placement / duplication | **Fixed** — shared `features/mocks/fake_provider.py`, benchmark imports from it | | H3/F1 | Missing `.lower()` | **Fixed** — present at lines 66–67 | | M1 | Provider name not sanitized | **Fixed** — both components sanitized | | M3 | CHANGELOG mismatch | **Fixed** — accurately describes the fix and behavior change | | L2/BENCH-1 | `contextlib.suppress` in benchmark | **Fixed** — removed | | BUG-1 | `MagicMock(name=...)` gotcha | **Fixed** — uses `Actor(name=...)` | | PROC-1 | `__init__.py` not updated | **Fixed** — exports `FakeProviderInfo` / `FakeProviderRegistry` | | F3 | Tag convention | **Fixed** — `@tdd_bug @tdd_bug_592` | --- ## Remaining Findings ### R1 (LOW / Process) — CHANGELOG merge conflict; needs rebase - **File:** `CHANGELOG.md` - **Problem:** Master has advanced past the merge base (`fcdf80f3`). Merging produces a conflict in `CHANGELOG.md`. The PR shows `mergeable: false`. - **Recommendation:** Rebase onto latest master and resolve the conflict. ### R2 (LOW / Docs) — Feature file comment inaccuracy - **File:** `features/actor_list_empty.feature:7-10` - **Problem:** Comment says *"Scenarios 1-2 test CLI rendering of an empty actor list using a fully-mocked registry"*, but `step_no_providers` (`actor_list_empty_steps.py:86`) actually uses `_make_real_registry([])` — a **real** `ActorRegistry` with zero providers. - **Recommendation:** Change to *"…using a real registry with zero configured providers"*. --- ## Out-of-Scope Items (confirmed deferred) I agree these are pre-existing issues and should not block this PR: - **SPEC-1** — dots in 4/7 default model names vs. `Session._ACTOR_NAME_PATTERN` - **SEC-1** — `except Exception` in `add()` - **DESIGN-1 / DESIGN-2** — three name representations, namespace case sensitivity - **BENCH-2** — `importlib.reload` cascade limitation --- ## Checklist | Criterion | Status | |-----------|--------| | Production fix correctness | PASS — sanitizes `/` and lowercases both components | | Commit format | PASS — matches ticket metadata, `ISSUES CLOSED: #592`, single squashed commit | | Test coverage (Behave) | PASS — 6 scenarios including name-format assertion in Scenario 6 | | Test coverage (Robot) | PASS — 2 fresh-env smoke tests | | Benchmarks (ASV) | PASS — timing + regression tracker, no error suppression | | `# type: ignore` | PASS — zero occurrences | | Mock placement | PASS — shared in `features/mocks/fake_provider.py` | | CHANGELOG | PASS — accurate description | | Security | PASS — no credentials, no injection risk | | Performance | PASS — O(n) string ops, negligible | --- **Verdict: APPROVED** The production fix is correct and minimal. All prior critical/high/medium review items are resolved. The two remaining items (R1: rebase conflict, R2: comment wording) are low-severity and non-blocking.

brent.edwards force-pushed feature/m3-fix-actor-list-empty from 62468d970c to 3739979b1d

2026-03-10 19:05:31 +00:00

Compare

brent.edwards dismissed hurui200320's review 2026-03-10 19:05:31 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards force-pushed feature/m3-fix-actor-list-empty from 3739979b1d to 5afc5db138

2026-03-10 21:41:32 +00:00

Compare

brent.edwards force-pushed feature/m3-fix-actor-list-empty from f8c9cdbd57 to 73d5552467

2026-03-10 23:11:28 +00:00

Compare

brent.edwards added 1 commit 2026-03-11 03:13:01 +00:00

Merge remote-tracking branch 'origin/master' into feature/m3-fix-actor-list-empty

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / build (pull_request) Successful in 16s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / security (pull_request) Successful in 39s

Details

CI / typecheck (pull_request) Successful in 38s

Details

CI / unit_tests (pull_request) Successful in 3m6s

Details

CI / integration_tests (pull_request) Successful in 3m19s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / coverage (pull_request) Successful in 5m24s

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

205e94eb52

# Conflicts:
#	CHANGELOG.md
#	noxfile.py

brent.edwards added 1 commit 2026-03-11 03:40:51 +00:00

Merge remote-tracking branch 'origin/master' into feature/m3-fix-actor-list-empty

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 17s

Details

CI / security (pull_request) Successful in 34s

Details

CI / typecheck (pull_request) Successful in 36s

Details

CI / unit_tests (pull_request) Failing after 2m55s

Details

CI / docker (pull_request) Has been skipped

Details

CI / integration_tests (pull_request) Failing after 3m28s

Details

CI / coverage (pull_request) Successful in 5m33s

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

f2e44b2cf1

# Conflicts:
#	features/mocks/fake_provider.py

brent.edwards added 1 commit 2026-03-11 03:54:54 +00:00

fix(test): remove @tdd_expected_fail tags since bug #592 is fixed

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 21s

Details

CI / security (pull_request) Successful in 36s

Details

CI / typecheck (pull_request) Successful in 39s

Details

CI / unit_tests (pull_request) Successful in 2m58s

Details

CI / integration_tests (pull_request) Successful in 3m28s

Details

CI / docker (pull_request) Successful in 40s

Details

CI / coverage (pull_request) Successful in 5m27s

Details

CI / benchmark-regression (pull_request) Successful in 36m5s

Details

f5b562c104

This branch contains the fix for bug #592 (actor name slash sanitisation).
The TDD tests from PR #655 are tagged @tdd_expected_fail which expects
the bug to be present, but since the fix is on this branch the tests
pass unexpectedly, causing CI failures.

Remove the @tdd_expected_fail tags from both Behave and Robot test files
so the tests run as normal regression tests alongside the fix.

ISSUES CLOSED: #592

brent.edwards merged commit 76c3d5b731 into master

2026-03-11 04:01:31 +00:00

brent.edwards deleted branch feature/m3-fix-actor-list-empty

2026-03-11 04:01:32 +00:00

brent.edwards referenced this issue from a commit

2026-03-11 04:01:32 +00:00

Merge pull request 'fix(actor): handle empty actor list without validation error' (#594) from feature/m3-fix-actor-list-empty into master

Sign in to join this conversation.

No Reviewers

hamza.khyari

freemo

brent.edwards

6 Participants

Notifications

Due Date

No due date set.

Blocks

#592 `agents actor list` without actors should show no actors.

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#594

fix(actor): handle empty actor list without validation error #594

Summary

Changes

Review feedback addressed

From hurui200320 (Review #2007)

From hamza.khyari (Review #2032)

From freemo (Review #2059)

Self-review findings

Process

Self-Review: TDD Failing Tests for Bug #592

Overview

Checklist

TDD Nature

PM Review — Day 25

Status

Context

Action Item

Priority

Code Review Report: Brent's Commit 7a4f20de on feature/m3-fix-actor-list-empty

Scope of Changes

Findings

F1. [HIGH] Misleading Commit Message — fix() prefix with no fix

F2. [HIGH] Benchmark File Uses MagicMock for capabilities — Will Silently Fail Even After Fix

F3. [HIGH] Behave Scenarios 1 & 2 Never Exercise the Buggy Code Path

F4. [MEDIUM] Robot Tests Cannot Reproduce the Multi-Slash Bug

F5. [MEDIUM] Bare except Exception in Benchmark Swallows All Errors

F6. [MEDIUM] Missing Edge-Case Test Coverage

F7. [LOW] CHANGELOG Entry Describes @wip Tags That Were Later Removed

F8. [LOW] Specification Has No Empty-List Example for actor list

F9. [LOW] Temp Directory Leak Risk in Robot Tests

Summary Table

Code Review: PR #594 — fix(actor): handle empty actor list without validation error

Overview

CRITICAL (blocking merge)

C1. Branch contains a merge commit

C2. Multiple commits for one issue — must be squashed into one

C3. Behave scenarios 3–5 do not actually test the bug fix

C4. # type: ignore comments violate CONTRIBUTING.md

HIGH

H1. PR Type label mismatch

H2. Mocking code not in features/mocks/

H3. _actor_name() doesn't enforce lowercase — spec deviation

MEDIUM

M1. _actor_name doesn't sanitize provider name

M2. Robot tests don't exercise the multi-slash code path

M3. CHANGELOG entry describes only "TDD failing tests" but PR includes the fix

LOW

L1. PR is not mergeable

L2. Benchmark time_list_multi_slash_provider suppresses errors

Summary Table

Code Review Report: Brent's Commit ce346b72 on feature/m3-fix-actor-list-empty

Findings

F1. [HIGH] # type: ignore suppressions violate the project's Type Safety policy

F2. [HIGH] Imports buried inside step functions violate the Import Guidelines

F3. [HIGH] _FakeProviderInfo and _FakeProviderRegistry belong in features/mocks/, not in features/steps/

F4. [HIGH] First commit (cba8d4fdf6) intentionally leaves tests failing — violates Commit Completeness policy

F5. [MEDIUM] Production fix does not sanitize provider names — the bug can recur from a different source

F6. [MEDIUM] Sanitized actor name diverges from stored model field — potential round-trip bug

F7. [MEDIUM] CHANGELOG entry is temporally misleading — describes TDD failing tests that now pass

F8. [LOW] Robot tests create per-test tmpdir despite Suite Setup already provisioning one

F9. [LOW] upsert_actor.side_effect returns MagicMock where Actor is expected

Summary Table

PR #594 Review — fix(actor): handle empty actor list without validation error

Prior Review #2007 (hurui200320) — Resolution Status

Findings

TEST-1 (Major) — Scenarios 3-5 are tautological: pass with AND without the fix

SPEC-1 (Major) — Sanitization incomplete: dots in 4/7 default models fail Session validator

BUG-1 (Medium) — MagicMock(name=...) constructor gotcha: .name is not a string

SEC-1 (Medium) — except Exception swallows all errors in add()

DRY-1 (Medium) — Fake classes duplicated in benchmarks

DESIGN-1 (Medium) — Three inconsistent representations of provider/model

C4 (Low) — 2 # type: ignore remain in benchmarks

DESIGN-2 (Low) — list(namespace=...) case-sensitive after lowercasing

PROC-1 (Low) — features/mocks/__init__.py not updated

BENCH-1 (Low) — contextlib.suppress(ValidationError) defeats regression detection

BENCH-2 (Low) — importlib.reload doesn't cascade to submodules

Summary

Unified TDD Test Toggle Convention

The standard

Status of this PR

Code Review Report: Brent's Commit `7a4f20de` on `feature/m3-fix-actor-list-empty`

F1. [HIGH] Misleading Commit Message — `fix()` prefix with no fix

F2. [HIGH] Benchmark File Uses `MagicMock` for `capabilities` — Will Silently Fail Even After Fix

F5. [MEDIUM] Bare `except Exception` in Benchmark Swallows All Errors

F7. [LOW] CHANGELOG Entry Describes `@wip` Tags That Were Later Removed

F8. [LOW] Specification Has No Empty-List Example for `actor list`

C4. `# type: ignore` comments violate CONTRIBUTING.md

H2. Mocking code not in `features/mocks/`

H3. `_actor_name()` doesn't enforce lowercase — spec deviation

M1. `_actor_name` doesn't sanitize provider name

L2. Benchmark `time_list_multi_slash_provider` suppresses errors

F1. [HIGH] `# type: ignore` suppressions violate the project's Type Safety policy

F3. [HIGH] `_FakeProviderInfo` and `_FakeProviderRegistry` belong in `features/mocks/`, not in `features/steps/`

F4. [HIGH] First commit (`cba8d4fdf6`) intentionally leaves tests failing — violates Commit Completeness policy

F6. [MEDIUM] Sanitized actor name diverges from stored `model` field — potential round-trip bug

F8. [LOW] Robot tests create per-test `tmpdir` despite Suite Setup already provisioning one

F9. [LOW] `upsert_actor.side_effect` returns `MagicMock` where `Actor` is expected

PR #594 Review — `fix(actor): handle empty actor list without validation error`

BUG-1 (Medium) — `MagicMock(name=...)` constructor gotcha: `.name` is not a string

SEC-1 (Medium) — `except Exception` swallows all errors in `add()`

C4 (Low) — 2 `# type: ignore` remain in benchmarks

DESIGN-2 (Low) — `list(namespace=...)` case-sensitive after lowercasing

PROC-1 (Low) — `features/mocks/init.py` not updated

BENCH-1 (Low) — `contextlib.suppress(ValidationError)` defeats regression detection

BENCH-2 (Low) — `importlib.reload` doesn't cascade to submodules

Unit test failures (7 failed, 1 errored in `consolidated_actor.feature`)

Code Review — PR #594: `fix(actor): remove .lower() from _actor_name to preserve provider casing`

F1 — `# type: ignore` Suppressions in Benchmark [HIGH]

F2 — Import Buried Inside `_make_real_registry()` Helper Function [HIGH]

F3 — Benchmark Still Defines Local `_FakeProviderInfo` / `_FakeProviderRegistry` [HIGH]

F4 — Opening Commit Leaves `nox` Failing ✅ Fixed

F6 — Sanitized Actor Name Diverges from Stored `model` Field; Round-Trip Untested [MEDIUM]

F8 — Robot Tests Create Per-Test `tmpdir` via `Evaluate import(...)` Anti-Pattern [LOW]

F9 — `upsert_actor` Mock Returns `MagicMock` Instead of Real `Actor` [LOW]

Code Review — PR #594 `feature/m3-fix-actor-list-empty`

Disposition of all review findings — commit `66d95792`

Code Review — PR #594 `fix(actor): handle empty actor list without validation error`

F1 (P2) — `.lower()` may cause name drift

F2 (P2) — `provider`/`model` fields not sanitized

F6 (P3) — `sys.path` namespace collision risk