test(actor): add TDD failing tests for actor list empty validation error #655

2026-03-09T21:50:12Z

brent.edwards commented

2026-03-09 21:50:12 +00:00

Summary

Closes #634

TDD bug-capture tests for bug #592: agents actor list raises a validation error when a provider has a default model containing / separators (e.g. OpenRouter's anthropic/claude-sonnet-4-20250514).

The root cause is ActorRegistry._actor_name() building names via f"{provider}/{model}", producing names with 2+ slashes that ActorService._normalize_name() rejects.

Changes

Behave BDD (features/tdd_actor_list_validation.feature, 3 scenarios):

Tagged @tdd_bug @tdd_bug_592 @tdd_expected_fail
Exercises real ActorRegistry._actor_name() with a FakeProviderInfo having a multi-slash model
Asserts correct behaviour (exit 0, single-slash names, valid JSON) — fails while bug present, inverted to pass by @tdd_expected_fail

Robot Framework (robot/tdd_actor_list_validation.robot, 3 tests):

Self-inverting helper (robot/helper_tdd_actor_list_validation.py) detects the bug and prints sentinel on expected failure

ASV benchmarks (benchmarks/tdd_actor_list_validation_bench.py):

time_list_empty, time_list_multi_slash_provider, track_multi_slash_succeeds

Shared mock (features/mocks/fake_provider.py):

FakeProviderInfo and FakeProviderRegistry stubs for ActorRegistry duck-typed provider contract

Verification

Check	Result
`nox -e typecheck`	0 errors
`nox -e lint`	All passed
`nox -e unit_tests`	9708 scenarios, 0 failures
`nox -e integration_tests -- --include tdd_bug_592`	3 tests, 3 passed
`nox -e coverage_report`	99% >= 97%

Dependencies

Depends on PR #654 (chain: #653 → #654 → this PR)
Blocks #592 (bug fix)

ISSUES CLOSED: #634

## Summary Closes #634 TDD bug-capture tests for bug #592: `agents actor list` raises a validation error when a provider has a default model containing `/` separators (e.g. OpenRouter's `anthropic/claude-sonnet-4-20250514`). The root cause is `ActorRegistry._actor_name()` building names via `f"{provider}/{model}"`, producing names with 2+ slashes that `ActorService._normalize_name()` rejects. ## Changes **Behave BDD** (`features/tdd_actor_list_validation.feature`, 3 scenarios): - Tagged `@tdd_bug @tdd_bug_592 @tdd_expected_fail` - Exercises real `ActorRegistry._actor_name()` with a `FakeProviderInfo` having a multi-slash model - Asserts correct behaviour (exit 0, single-slash names, valid JSON) — fails while bug present, inverted to pass by `@tdd_expected_fail` **Robot Framework** (`robot/tdd_actor_list_validation.robot`, 3 tests): - Self-inverting helper (`robot/helper_tdd_actor_list_validation.py`) detects the bug and prints sentinel on expected failure **ASV benchmarks** (`benchmarks/tdd_actor_list_validation_bench.py`): - `time_list_empty`, `time_list_multi_slash_provider`, `track_multi_slash_succeeds` **Shared mock** (`features/mocks/fake_provider.py`): - `FakeProviderInfo` and `FakeProviderRegistry` stubs for `ActorRegistry` duck-typed provider contract ## Verification | Check | Result | |-------|--------| | `nox -e typecheck` | 0 errors | | `nox -e lint` | All passed | | `nox -e unit_tests` | 9708 scenarios, 0 failures | | `nox -e integration_tests -- --include tdd_bug_592` | 3 tests, 3 passed | | `nox -e coverage_report` | 99% >= 97% | ## Dependencies - Depends on PR #654 (chain: #653 → #654 → this PR) - Blocks #592 (bug fix) ISSUES CLOSED: #634

brent.edwards force-pushed tdd/actor-list-validation from 21d90503e9 to e7acd5a665

2026-03-09 22:19:10 +00:00

Compare

hamza.khyari requested changes 2026-03-10 00:15:55 +00:00

Dismissed

hamza.khyari left a comment

Review: TDD Bug #592 — Actor List Validation Error — Commit `e7acd5a6`

Reviewer: Hamza Khyari

7 findings total: 4 P2 should-fix, 3 P3 nice-to-have. No P1 blockers.

Note: The BDD tests correctly exercise the bug path. The mock's upsert_actor.side_effect constructs Actor(name=multi_slash_name) which triggers the real validate_name field_validator at actor.py:62-75 — this IS the actual bug. The tests are well-designed.

F1 · `P2:should-fix` · Benchmark catches wrong exception type

File: benchmarks/tdd_actor_list_validation_bench.py:98,110

from cleveragents.core.exceptions import ValidationError
...
with contextlib.suppress(ValidationError):   # line 98
    self._slash_registry.list_actors()
...
except ValidationError:                       # line 110
    return 0

The actual exception raised by Actor.validate_name() is pydantic.ValidationError (the field_validator at actor.py:62 raises ValueError which Pydantic wraps). cleveragents.core.exceptions.ValidationError is a different class. The suppress() and except will NOT catch the real exception — the benchmark will crash with an unhandled pydantic.ValidationError.

Fix: Import and catch pydantic.ValidationError instead.

F2 · `P2:should-fix` · Line exceeds 88-char ruff limit

File: benchmarks/tdd_actor_list_validation_bench.py:114

TDDActorListValidationSuite.track_multi_slash_succeeds.unit = "success"  # type: ignore[attr-defined]

This line is 101 characters (including the # type: ignore), exceeding the project's 88-char ruff line-length limit. Will fail nox -s lint.

Fix: Break the line or assign via a local variable.

F3 · `P2:should-fix` · Same `importlib.reload` issue

File: benchmarks/tdd_actor_list_validation_bench.py:42

Same as 653-F4: importlib.reload(cleveragents) only reloads __init__.py, not the submodules actually imported on lines 44-47.

Fix: Remove the reload.

F4 · `P2:should-fix` · Runtime dependency on PR #653's `@tdd_expected_fail` infrastructure

File: features/tdd_actor_list_validation.feature:13,19,25

All three scenarios use @tdd_expected_fail which requires the hook added to environment.py in PR #653. If #655 merges before #653, or #653 is reverted, the tag has no effect — scenarios that fail will fail CI normally (no inversion).

This is a process dependency, not a code bug. But it should be documented in the PR description and enforced by merge ordering.

Fix: Add to PR description: "Depends on PR #653 (provides @tdd_expected_fail infrastructure in environment.py)." Ensure #653 merges first.

F5 · `P3:nice-to-have` · `FakeProviderInfo` missing `api_key_env_var` field

File: features/mocks/fake_provider.py:138

The real ProviderInfo has a required api_key_env_var: str field. FakeProviderInfo omits it. This works today because ActorRegistry.ensure_built_in_actors() only accesses name, provider_type, default_model, and capabilities. But if ensure_built_in_actors or any downstream code ever accesses api_key_env_var, these tests will fail with AttributeError rather than a clear type error.

Fix: Add api_key_env_var: str = "" to FakeProviderInfo for structural completeness.

F6 · `P3:nice-to-have` · Robot helper uses `dict[str, object]` typing

File: robot/helper_tdd_actor_list_validation.py:157

Same as 653-F6: _COMMANDS: dict[str, object] forces # type: ignore[operator] on the dispatch call.

Fix: Use dict[str, Callable[[], None]].

F7 · `P3:nice-to-have` · `_make_registry()` duplicated across BDD steps, benchmark, and Robot helper

Files: features/steps/tdd_actor_list_validation_steps.py:72, benchmarks/tdd_actor_list_validation_bench.py:50, robot/helper_tdd_actor_list_validation.py:19

Three near-identical copies of the _make_registry() / _make_real_registry() function exist. If the ActorRegistry constructor changes, all three must be updated in lockstep.

Fix: Move to features/mocks/fake_provider.py (which already exists as a shared mock module) and import from all three locations.

## Review: TDD Bug #592 — Actor List Validation Error — Commit `e7acd5a6` Reviewer: **Hamza Khyari** 7 findings total: 4 P2 should-fix, 3 P3 nice-to-have. **No P1 blockers.** Note: The BDD tests correctly exercise the bug path. The mock's `upsert_actor.side_effect` constructs `Actor(name=multi_slash_name)` which triggers the real `validate_name` field_validator at `actor.py:62-75` — this IS the actual bug. The tests are well-designed. --- ### F1 · `P2:should-fix` · Benchmark catches wrong exception type **File:** `benchmarks/tdd_actor_list_validation_bench.py:98,110` ```python from cleveragents.core.exceptions import ValidationError ... with contextlib.suppress(ValidationError): # line 98 self._slash_registry.list_actors() ... except ValidationError: # line 110 return 0 ``` The actual exception raised by `Actor.validate_name()` is `pydantic.ValidationError` (the field_validator at `actor.py:62` raises `ValueError` which Pydantic wraps). `cleveragents.core.exceptions.ValidationError` is a different class. The `suppress()` and `except` will NOT catch the real exception — the benchmark will crash with an unhandled `pydantic.ValidationError`. **Fix:** Import and catch `pydantic.ValidationError` instead. --- ### F2 · `P2:should-fix` · Line exceeds 88-char ruff limit **File:** `benchmarks/tdd_actor_list_validation_bench.py:114` ```python TDDActorListValidationSuite.track_multi_slash_succeeds.unit = "success" # type: ignore[attr-defined] ``` This line is 101 characters (including the `# type: ignore`), exceeding the project's 88-char ruff line-length limit. Will fail `nox -s lint`. **Fix:** Break the line or assign via a local variable. --- ### F3 · `P2:should-fix` · Same `importlib.reload` issue **File:** `benchmarks/tdd_actor_list_validation_bench.py:42` Same as 653-F4: `importlib.reload(cleveragents)` only reloads `__init__.py`, not the submodules actually imported on lines 44-47. **Fix:** Remove the reload. --- ### F4 · `P2:should-fix` · Runtime dependency on PR #653's `@tdd_expected_fail` infrastructure **File:** `features/tdd_actor_list_validation.feature:13,19,25` All three scenarios use `@tdd_expected_fail` which requires the hook added to `environment.py` in PR #653. If #655 merges before #653, or #653 is reverted, the tag has no effect — scenarios that fail will fail CI normally (no inversion). This is a process dependency, not a code bug. But it should be documented in the PR description and enforced by merge ordering. **Fix:** Add to PR description: "Depends on PR #653 (provides `@tdd_expected_fail` infrastructure in `environment.py`)." Ensure #653 merges first. --- ### F5 · `P3:nice-to-have` · `FakeProviderInfo` missing `api_key_env_var` field **File:** `features/mocks/fake_provider.py:138` The real `ProviderInfo` has a required `api_key_env_var: str` field. `FakeProviderInfo` omits it. This works today because `ActorRegistry.ensure_built_in_actors()` only accesses `name`, `provider_type`, `default_model`, and `capabilities`. But if `ensure_built_in_actors` or any downstream code ever accesses `api_key_env_var`, these tests will fail with `AttributeError` rather than a clear type error. **Fix:** Add `api_key_env_var: str = ""` to `FakeProviderInfo` for structural completeness. --- ### F6 · `P3:nice-to-have` · Robot helper uses `dict[str, object]` typing **File:** `robot/helper_tdd_actor_list_validation.py:157` Same as 653-F6: `_COMMANDS: dict[str, object]` forces `# type: ignore[operator]` on the dispatch call. **Fix:** Use `dict[str, Callable[[], None]]`. --- ### F7 · `P3:nice-to-have` · `_make_registry()` duplicated across BDD steps, benchmark, and Robot helper **Files:** `features/steps/tdd_actor_list_validation_steps.py:72`, `benchmarks/tdd_actor_list_validation_bench.py:50`, `robot/helper_tdd_actor_list_validation.py:19` Three near-identical copies of the `_make_registry()` / `_make_real_registry()` function exist. If the `ActorRegistry` constructor changes, all three must be updated in lockstep. **Fix:** Move to `features/mocks/fake_provider.py` (which already exists as a shared mock module) and import from all three locations.

brent.edwards force-pushed tdd/actor-list-validation from e7acd5a665 to b49e8e7dc9

2026-03-10 00:47:11 +00:00

Compare

brent.edwards commented

2026-03-10 01:02:10 +00:00

All 7 review findings addressed. Branch rebased onto updated PR #654. Force-pushed.

Changes made:

Finding	Severity	Fix
655-F1	P2:should-fix	Real bug fixed: Changed `from cleveragents.core.exceptions import ValidationError` to `from pydantic import ValidationError` — `Actor.validate_name()` raises `pydantic.ValidationError` (via `@field_validator`), not the project's custom `ValidationError`
655-F2	P2:should-fix	Shortened the `# type: ignore[attr-defined]` line using a local variable alias (note: E501 is not in the ruff select list so it was not a lint violation, but fixed for readability)
655-F3	P2:should-fix	Removed `import importlib` and `importlib.reload(cleveragents)` from benchmark
655-F4	P2:should-fix	Runtime dependency on PR #653 infrastructure already documented in PR description
655-F5	P3:nit	Added `api_key_env_var: str = ""` to `FakeProviderInfo` for completeness
655-F6	P3:nit	Changed `dict[str, object]` to `dict[str, Callable[[], None]]` in robot helper, removed `# type: ignore[operator]`
655-F7	P3:nit	`_make_registry()` duplication kept for test/benchmark isolation (acceptable trade-off)

Merge-gate results: All checks pass (lint, typecheck, unit tests 9/9 TDD scenarios, robot 9/9, security scan).

All 7 review findings addressed. Branch rebased onto updated PR #654. Force-pushed. **Changes made:** | Finding | Severity | Fix | |---------|----------|-----| | 655-F1 | P2:should-fix | **Real bug fixed**: Changed `from cleveragents.core.exceptions import ValidationError` to `from pydantic import ValidationError` — `Actor.validate_name()` raises `pydantic.ValidationError` (via `@field_validator`), not the project's custom `ValidationError` | | 655-F2 | P2:should-fix | Shortened the `# type: ignore[attr-defined]` line using a local variable alias (note: E501 is not in the ruff select list so it was not a lint violation, but fixed for readability) | | 655-F3 | P2:should-fix | Removed `import importlib` and `importlib.reload(cleveragents)` from benchmark | | 655-F4 | P2:should-fix | Runtime dependency on PR #653 infrastructure already documented in PR description | | 655-F5 | P3:nit | Added `api_key_env_var: str = ""` to `FakeProviderInfo` for completeness | | 655-F6 | P3:nit | Changed `dict[str, object]` to `dict[str, Callable[[], None]]` in robot helper, removed `# type: ignore[operator]` | | 655-F7 | P3:nit | `_make_registry()` duplication kept for test/benchmark isolation (acceptable trade-off) | **Merge-gate results:** All checks pass (lint, typecheck, unit tests 9/9 TDD scenarios, robot 9/9, security scan).

brent.edwards reviewed 2026-03-10 01:27:47 +00:00

brent.edwards left a comment

Self-Review — PR #655 (Issue #634)

Review scope: Full review per docs/development/review_playbook.md — lint, typecheck, security scan, manual logic review, cross-PR consistency, exception-type verification.

Merge-gate checks

Check	Result
`ruff check` (lint)	0 violations
`ruff format --check`	All formatted
`pyright` (typecheck)	0 errors, 0 warnings
`nox -s security_scan`	Passed
`nox -s unit_tests` (TDD features)	3/3 scenarios passed, 0 step errors
`nox -s integration_tests --include tdd_bug`	3/3 robot tests passed
Commit message matches issue metadata	Yes (`test(actor): add TDD failing tests for actor list empty validation error`)
`ISSUES CLOSED: #634` present	Yes
No merge commits	Confirmed

Findings

No P0, P1, or P2 findings.

Verified the pydantic.ValidationError fix (655-F1) is correct: Actor.validate_name() is a @field_validator that raises ValueError, which Pydantic wraps as pydantic.ValidationError. The benchmark now catches the correct exception type. The FakeProviderInfo dataclass correctly satisfies ActorRegistry's duck-typed contract. The api_key_env_var field default aligns with the real ProviderInfo type. Import ordering (ruff I001) is clean after auto-fix.

P3:nit — informational: The _make_registry() helper constructs Actor(name=kw.get("name", ...)) in the upsert_actor.side_effect lambda. The Pydantic field validator rejects multi-slash names at Actor construction time, which faithfully reproduces the bug. This is correct behavior — noting for reviewer context.

## Self-Review — PR #655 (Issue #634) **Review scope**: Full review per `docs/development/review_playbook.md` — lint, typecheck, security scan, manual logic review, cross-PR consistency, exception-type verification. ### Merge-gate checks | Check | Result | |-------|--------| | `ruff check` (lint) | 0 violations | | `ruff format --check` | All formatted | | `pyright` (typecheck) | 0 errors, 0 warnings | | `nox -s security_scan` | Passed | | `nox -s unit_tests` (TDD features) | 3/3 scenarios passed, 0 step errors | | `nox -s integration_tests --include tdd_bug` | 3/3 robot tests passed | | Commit message matches issue metadata | Yes (`test(actor): add TDD failing tests for actor list empty validation error`) | | `ISSUES CLOSED: #634` present | Yes | | No merge commits | Confirmed | ### Findings **No P0, P1, or P2 findings.** Verified the `pydantic.ValidationError` fix (655-F1) is correct: `Actor.validate_name()` is a `@field_validator` that raises `ValueError`, which Pydantic wraps as `pydantic.ValidationError`. The benchmark now catches the correct exception type. The `FakeProviderInfo` dataclass correctly satisfies `ActorRegistry`'s duck-typed contract. The `api_key_env_var` field default aligns with the real `ProviderInfo` type. Import ordering (ruff I001) is clean after auto-fix. **P3:nit — informational**: The `_make_registry()` helper constructs `Actor(name=kw.get("name", ...))` in the `upsert_actor.side_effect` lambda. The Pydantic field validator rejects multi-slash names at Actor construction time, which faithfully reproduces the bug. This is correct behavior — noting for reviewer context.

brent.edwards force-pushed tdd/actor-list-validation from b49e8e7dc9 to 9defc50abe

2026-03-10 01:39:10 +00:00

Compare

brent.edwards referenced this pull request

2026-03-10 01:46:07 +00:00

test(session): add TDD failing tests for session list DI error #653

brent.edwards force-pushed tdd/actor-list-validation from 9defc50abe to 67b8900035

2026-03-10 19:05:34 +00:00

Compare

hamza.khyari requested changes 2026-03-10 22:01:29 +00:00

Dismissed

hamza.khyari left a comment

PR #655 — test(actor): add TDD failing tests for actor list empty validation error (Issue #634) — Round 2

Reviewed: 67b89000 (tdd/actor-list-validation, 1 commit), 6 files, +593 lines. Mergeable: true. Base: tdd/session-create-di-error.

R1 Disposition

R1 ID	Verdict
F1 (P2) Benchmark catches wrong exception	Fixed — `pydantic.ValidationError` imported
F2 (P2) Line exceeds 88-char limit	Fixed (Python files)
F3 (P2) `importlib.reload` issue	Fixed — removed
F4 (P2) Runtime dependency on PR #653	Fixed — `@tdd_expected_fail` is on the base branch
F5 (P3) `FakeProviderInfo` missing `api_key_env_var`	Fixed — added
F6 (P3) Robot `dict[str, object]` typing	Fixed — `Callable[[], None]`
F7 (P3) `_make_registry()` duplicated across 3 files	Outstanding — 3 near-identical copies remain

6 of 7 fixed.

F8 (Major) — Scenario 3 permanently passes under `@tdd_expected_fail` — dead as regression detector

File: features/tdd_actor_list_validation.feature:27-32, features/steps/tdd_actor_list_validation_steps.py:53

mock_service.list_actors.return_value = [] is hardcoded at line 53 and never updated when upsert_actor creates actors. This makes Scenario 3 permanently pass regardless of bug status:

Bug present: list_actors() raises pydantic.ValidationError → exit code 1 → exit-code step fails → @tdd_expected_fail inverts → passes
Bug fixed: list_actors() succeeds but returns [] (the mock). CLI outputs "No actors configured.\n" (not JSON). The "valid JSON" step fails → @tdd_expected_fail inverts → passes

The scenario can never signal that the bug is fixed. Its entire purpose as a @tdd_expected_fail test is defeated.

Fix: Update mock_service.list_actors to return the actors created by upsert_actor.side_effect, e.g., capture actors in a list and set list_actors.side_effect = lambda: captured_actors.

F9 (Medium) — Robot helper lacks `upsert_actor.call_count` guard — silent false positive on setup failure

File: robot/helper_tdd_actor_list_validation.py:97,126,160

All three Robot subcommands iterate mock_service.upsert_actor.call_args_list without first checking that upsert_actor was actually called. If the CLI invocation fails before reaching ensure_built_in_actors (import error, config issue, patch not taking effect), the loop executes zero iterations and the function reports the bug as "fixed" — a misleading false positive.

The BDD step at tdd_actor_list_validation_steps.py:165 has the correct guard: assert service.upsert_actor.call_count > 0, "upsert_actor was never called". The Robot helper should have the same.

Fix: Add assert mock_service.upsert_actor.call_count > 0 before each call_args_list loop.

F7 (Medium) — `_make_registry()` still duplicated across 3 files — OUTSTANDING

Files: features/steps/tdd_actor_list_validation_steps.py:42, benchmarks/tdd_actor_list_validation_bench.py:40, robot/helper_tdd_actor_list_validation.py:42

Three near-identical copies of the registry construction logic. features/mocks/fake_provider.py already exists as a shared mock module — _make_registry() belongs there.

3 findings: 1 Major (dead scenario), 2 Medium (missing guard, duplication). 6 of 7 R1 findings fixed. All 4 ACs met.

**PR #655 — test(actor): add TDD failing tests for actor list empty validation error (Issue #634) — Round 2** Reviewed: `67b89000` (tdd/actor-list-validation, 1 commit), 6 files, +593 lines. Mergeable: **true**. Base: `tdd/session-create-di-error`. ### R1 Disposition | R1 ID | Verdict | |-------|---------| | F1 (P2) Benchmark catches wrong exception | **Fixed** — `pydantic.ValidationError` imported | | F2 (P2) Line exceeds 88-char limit | **Fixed** (Python files) | | F3 (P2) `importlib.reload` issue | **Fixed** — removed | | F4 (P2) Runtime dependency on PR #653 | **Fixed** — `@tdd_expected_fail` is on the base branch | | F5 (P3) `FakeProviderInfo` missing `api_key_env_var` | **Fixed** — added | | F6 (P3) Robot `dict[str, object]` typing | **Fixed** — `Callable[[], None]` | | F7 (P3) `_make_registry()` duplicated across 3 files | **Outstanding** — 3 near-identical copies remain | 6 of 7 fixed. --- ### F8 (Major) — Scenario 3 permanently passes under `@tdd_expected_fail` — dead as regression detector **File:** `features/tdd_actor_list_validation.feature:27-32`, `features/steps/tdd_actor_list_validation_steps.py:53` `mock_service.list_actors.return_value = []` is hardcoded at line 53 and never updated when `upsert_actor` creates actors. This makes Scenario 3 permanently pass regardless of bug status: - **Bug present:** `list_actors()` raises `pydantic.ValidationError` → exit code 1 → exit-code step fails → `@tdd_expected_fail` inverts → **passes** - **Bug fixed:** `list_actors()` succeeds but returns `[]` (the mock). CLI outputs `"No actors configured.\n"` (not JSON). The "valid JSON" step fails → `@tdd_expected_fail` inverts → **passes** The scenario can never signal that the bug is fixed. Its entire purpose as a `@tdd_expected_fail` test is defeated. **Fix:** Update `mock_service.list_actors` to return the actors created by `upsert_actor.side_effect`, e.g., capture actors in a list and set `list_actors.side_effect = lambda: captured_actors`. --- ### F9 (Medium) — Robot helper lacks `upsert_actor.call_count` guard — silent false positive on setup failure **File:** `robot/helper_tdd_actor_list_validation.py:97,126,160` All three Robot subcommands iterate `mock_service.upsert_actor.call_args_list` without first checking that `upsert_actor` was actually called. If the CLI invocation fails before reaching `ensure_built_in_actors` (import error, config issue, patch not taking effect), the loop executes zero iterations and the function reports the bug as "fixed" — a misleading false positive. The BDD step at `tdd_actor_list_validation_steps.py:165` has the correct guard: `assert service.upsert_actor.call_count > 0, "upsert_actor was never called"`. The Robot helper should have the same. **Fix:** Add `assert mock_service.upsert_actor.call_count > 0` before each `call_args_list` loop. --- ### F7 (Medium) — `_make_registry()` still duplicated across 3 files — OUTSTANDING **Files:** `features/steps/tdd_actor_list_validation_steps.py:42`, `benchmarks/tdd_actor_list_validation_bench.py:40`, `robot/helper_tdd_actor_list_validation.py:42` Three near-identical copies of the registry construction logic. `features/mocks/fake_provider.py` already exists as a shared mock module — `_make_registry()` belongs there. --- *3 findings: 1 Major (dead scenario), 2 Medium (missing guard, duplication). 6 of 7 R1 findings fixed. All 4 ACs met.*

hamza.khyari approved these changes 2026-03-10 22:03:43 +00:00

Dismissed

brent.edwards force-pushed tdd/actor-list-validation from 148e036abe to 2365f9a355

2026-03-10 22:40:30 +00:00

Compare

brent.edwards commented

2026-03-10 22:40:57 +00:00

Round 2 Review — All 3 Findings Addressed

Pushed 2365f9a3. Here's what changed:

F8 (Major) — Scenario 3 permanently passes under `@tdd_expected_fail`

Root cause: mock_service.list_actors.return_value = [] was hardcoded, so even after a bug fix list_actors() would return [] → CLI prints "No actors configured" (not JSON) → Scenario 3 step "valid JSON" fails → @tdd_expected_fail inverts to pass → scenario can never signal the bug is fixed.

Fix: Introduced a captured-actors pattern in make_registry() (features/mocks/fake_provider.py). The upsert_actor.side_effect now appends successfully created Actor instances to a captured_actors list, and list_actors.side_effect returns that list. This means:

Bug present: upsert_actor raises pydantic.ValidationError → no actors captured → list_actors() returns [] → CLI fails → @tdd_expected_fail inverts to pass
Bug fixed: upsert_actor succeeds → actors captured → list_actors() returns them → CLI outputs valid JSON → Scenario 3 passes → @tdd_expected_fail inverts to FAIL → signals the tag should be removed

F7 (Medium) — `_make_registry()` duplicated across 3 files

Moved the unified make_registry() function to features/mocks/fake_provider.py (already the shared mock module). All 3 consumers now import from there:

features/steps/tdd_actor_list_validation_steps.py — from features.mocks.fake_provider import FakeProviderInfo, make_registry
benchmarks/tdd_actor_list_validation_bench.py — from mocks.fake_provider import FakeProviderInfo, make_registry
robot/helper_tdd_actor_list_validation.py — from mocks.fake_provider import FakeProviderInfo, make_registry

Signature: make_registry(providers: list[FakeProviderInfo] | None = None) -> tuple[MagicMock, ActorRegistry]. Callers that only need the registry destructure with _, registry = make_registry(...).

F9 (Medium) — Robot helper lacks `upsert_actor.call_count` guard

Added assert mock_service.upsert_actor.call_count > 0 after each runner.invoke() call in all 3 Robot helper subcommands (list_validation_error, name_slash_check, list_json_validation). This ensures the patch took effect and prevents false positives from zero-iteration call_args_list loops.

Verification

Lint: nox -e lint — all checks passed
Typecheck: nox -e typecheck — 0 errors
BDD tests: All 9 @tdd_expected_fail scenarios pass (3 features × 3 scenarios)
Robot helpers: All 3 subcommands exit 0 with expected sentinels
Coverage: 98% (≥97% threshold)

## Round 2 Review — All 3 Findings Addressed Pushed `2365f9a3`. Here's what changed: ### F8 (Major) — Scenario 3 permanently passes under `@tdd_expected_fail` **Root cause**: `mock_service.list_actors.return_value = []` was hardcoded, so even after a bug fix `list_actors()` would return `[]` → CLI prints "No actors configured" (not JSON) → Scenario 3 step "valid JSON" fails → `@tdd_expected_fail` inverts to pass → scenario can never signal the bug is fixed. **Fix**: Introduced a captured-actors pattern in `make_registry()` (`features/mocks/fake_provider.py`). The `upsert_actor.side_effect` now appends successfully created `Actor` instances to a `captured_actors` list, and `list_actors.side_effect` returns that list. This means: - **Bug present**: `upsert_actor` raises `pydantic.ValidationError` → no actors captured → `list_actors()` returns `[]` → CLI fails → `@tdd_expected_fail` inverts to pass - **Bug fixed**: `upsert_actor` succeeds → actors captured → `list_actors()` returns them → CLI outputs valid JSON → Scenario 3 passes → `@tdd_expected_fail` inverts to **FAIL** → signals the tag should be removed ### F7 (Medium) — `_make_registry()` duplicated across 3 files Moved the unified `make_registry()` function to `features/mocks/fake_provider.py` (already the shared mock module). All 3 consumers now import from there: - `features/steps/tdd_actor_list_validation_steps.py` — `from features.mocks.fake_provider import FakeProviderInfo, make_registry` - `benchmarks/tdd_actor_list_validation_bench.py` — `from mocks.fake_provider import FakeProviderInfo, make_registry` - `robot/helper_tdd_actor_list_validation.py` — `from mocks.fake_provider import FakeProviderInfo, make_registry` Signature: `make_registry(providers: list[FakeProviderInfo] | None = None) -> tuple[MagicMock, ActorRegistry]`. Callers that only need the registry destructure with `_, registry = make_registry(...)`. ### F9 (Medium) — Robot helper lacks `upsert_actor.call_count` guard Added `assert mock_service.upsert_actor.call_count > 0` after each `runner.invoke()` call in all 3 Robot helper subcommands (`list_validation_error`, `name_slash_check`, `list_json_validation`). This ensures the patch took effect and prevents false positives from zero-iteration `call_args_list` loops. ### Verification - **Lint**: `nox -e lint` — all checks passed - **Typecheck**: `nox -e typecheck` — 0 errors - **BDD tests**: All 9 `@tdd_expected_fail` scenarios pass (3 features × 3 scenarios) - **Robot helpers**: All 3 subcommands exit 0 with expected sentinels - **Coverage**: 98% (≥97% threshold)

brent.edwards added 1 commit 2026-03-10 22:57:26 +00:00

Merge branch 'tdd/session-create-di-error' into tdd/actor-list-validation 8d235c97fc

brent.edwards changed target branch from tdd/session-create-di-error to master

2026-03-11 02:50:23 +00:00

brent.edwards added 1 commit 2026-03-11 03:08:06 +00:00

Merge remote-tracking branch 'origin/master' into tdd/actor-list-validation

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 16s

Details

CI / quality (pull_request) Successful in 17s

Details

CI / typecheck (pull_request) Successful in 38s

Details

CI / security (pull_request) Successful in 51s

Details

CI / unit_tests (pull_request) Successful in 2m43s

Details

CI / docker (pull_request) Successful in 9s

Details

CI / integration_tests (pull_request) Failing after 3m38s

Details

CI / coverage (pull_request) Successful in 5m29s

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

5c5de082f2

# Conflicts:
#	benchmarks/tdd_session_create_di_bench.py
#	features/steps/tdd_session_create_di_steps.py
#	robot/helper_tdd_session_create_di.py
#	robot/tdd_session_create_di.robot

brent.edwards dismissed hamza.khyari's review 2026-03-11 03:08:06 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards added 1 commit 2026-03-11 03:21:59 +00:00

fix(test): remove self-inversion from actor list Robot helper

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 16s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / security (pull_request) Successful in 36s

Details

CI / typecheck (pull_request) Successful in 57s

Details

CI / unit_tests (pull_request) Successful in 2m50s

Details

CI / docker (pull_request) Successful in 9s

Details

CI / integration_tests (pull_request) Successful in 3m23s

Details

CI / coverage (pull_request) Successful in 5m26s

Details

CI / benchmark-regression (pull_request) Successful in 35m30s

Details

1a1d0a7fbb

The helper was written before the tdd_expected_fail_listener existed and
inverted pass/fail internally.  After merging master the listener is now
active, causing a double inversion that made all three tdd_bug_592 Robot
tests fail with 'bug appears to be fixed — remove the tag'.

Switch to the real-outcome convention: exit 0 + sentinel when the bug is
fixed, exit 1 when the bug is still present, and let the listener handle
inversion.

ISSUES CLOSED: #634

brent.edwards merged commit 5b9f0554c0 into master

2026-03-11 03:37:23 +00:00

brent.edwards deleted branch tdd/actor-list-validation

2026-03-11 03:37:24 +00:00

brent.edwards referenced this issue from a commit

2026-03-11 03:37:24 +00:00

Merge pull request 'test(actor): add TDD failing tests for actor list empty validation error' (#655) from tdd/actor-list-validation into master

brent.edwards referenced this issue from a commit

2026-03-11 03:54:54 +00:00

fix(test): remove @tdd_expected_fail tags since bug #592 is fixed

brent.edwards referenced this pull request

2026-03-12 20:09:11 +00:00

feat(observability): wire AuditService.record() into domain services via EventBus auto-dispatch #659

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#655

test(actor): add TDD failing tests for actor list empty validation error #655

Summary

Changes

Verification

Dependencies

Review: TDD Bug #592 — Actor List Validation Error — Commit e7acd5a6

F1 · P2:should-fix · Benchmark catches wrong exception type

F2 · P2:should-fix · Line exceeds 88-char ruff limit

F3 · P2:should-fix · Same importlib.reload issue

F4 · P2:should-fix · Runtime dependency on PR #653's @tdd_expected_fail infrastructure

F5 · P3:nice-to-have · FakeProviderInfo missing api_key_env_var field

F6 · P3:nice-to-have · Robot helper uses dict[str, object] typing

F7 · P3:nice-to-have · _make_registry() duplicated across BDD steps, benchmark, and Robot helper

Self-Review — PR #655 (Issue #634)

Merge-gate checks

Findings

R1 Disposition

F8 (Major) — Scenario 3 permanently passes under @tdd_expected_fail — dead as regression detector

F9 (Medium) — Robot helper lacks upsert_actor.call_count guard — silent false positive on setup failure

F7 (Medium) — _make_registry() still duplicated across 3 files — OUTSTANDING

Round 2 Review — All 3 Findings Addressed

F8 (Major) — Scenario 3 permanently passes under @tdd_expected_fail

F7 (Medium) — _make_registry() duplicated across 3 files

F9 (Medium) — Robot helper lacks upsert_actor.call_count guard

Verification

Review: TDD Bug #592 — Actor List Validation Error — Commit `e7acd5a6`

F1 · `P2:should-fix` · Benchmark catches wrong exception type

F2 · `P2:should-fix` · Line exceeds 88-char ruff limit

F3 · `P2:should-fix` · Same `importlib.reload` issue

F4 · `P2:should-fix` · Runtime dependency on PR #653's `@tdd_expected_fail` infrastructure

F5 · `P3:nice-to-have` · `FakeProviderInfo` missing `api_key_env_var` field

F6 · `P3:nice-to-have` · Robot helper uses `dict[str, object]` typing

F7 · `P3:nice-to-have` · `_make_registry()` duplicated across BDD steps, benchmark, and Robot helper

F8 (Major) — Scenario 3 permanently passes under `@tdd_expected_fail` — dead as regression detector

F9 (Medium) — Robot helper lacks `upsert_actor.call_count` guard — silent false positive on setup failure

F7 (Medium) — `_make_registry()` still duplicated across 3 files — OUTSTANDING

F8 (Major) — Scenario 3 permanently passes under `@tdd_expected_fail`

F7 (Medium) — `_make_registry()` duplicated across 3 files

F9 (Medium) — Robot helper lacks `upsert_actor.call_count` guard