test(actor): add TDD failing tests for actor list empty validation error #655
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!655
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "tdd/actor-list-validation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Closes #634
TDD bug-capture tests for bug #592:
agents actor listraises a validation error when a provider has a default model containing/separators (e.g. OpenRouter'santhropic/claude-sonnet-4-20250514).The root cause is
ActorRegistry._actor_name()building names viaf"{provider}/{model}", producing names with 2+ slashes thatActorService._normalize_name()rejects.Changes
Behave BDD (
features/tdd_actor_list_validation.feature, 3 scenarios):@tdd_bug @tdd_bug_592 @tdd_expected_failActorRegistry._actor_name()with aFakeProviderInfohaving a multi-slash model@tdd_expected_failRobot Framework (
robot/tdd_actor_list_validation.robot, 3 tests):robot/helper_tdd_actor_list_validation.py) detects the bug and prints sentinel on expected failureASV benchmarks (
benchmarks/tdd_actor_list_validation_bench.py):time_list_empty,time_list_multi_slash_provider,track_multi_slash_succeedsShared mock (
features/mocks/fake_provider.py):FakeProviderInfoandFakeProviderRegistrystubs forActorRegistryduck-typed provider contractVerification
nox -e typechecknox -e lintnox -e unit_testsnox -e integration_tests -- --include tdd_bug_592nox -e coverage_reportDependencies
ISSUES CLOSED: #634
21d90503e9toe7acd5a665Review: TDD Bug #592 — Actor List Validation Error — Commit
e7acd5a6Reviewer: Hamza Khyari
7 findings total: 4 P2 should-fix, 3 P3 nice-to-have. No P1 blockers.
Note: The BDD tests correctly exercise the bug path. The mock's
upsert_actor.side_effectconstructsActor(name=multi_slash_name)which triggers the realvalidate_namefield_validator atactor.py:62-75— this IS the actual bug. The tests are well-designed.F1 ·
P2:should-fix· Benchmark catches wrong exception typeFile:
benchmarks/tdd_actor_list_validation_bench.py:98,110The actual exception raised by
Actor.validate_name()ispydantic.ValidationError(the field_validator atactor.py:62raisesValueErrorwhich Pydantic wraps).cleveragents.core.exceptions.ValidationErroris a different class. Thesuppress()andexceptwill NOT catch the real exception — the benchmark will crash with an unhandledpydantic.ValidationError.Fix: Import and catch
pydantic.ValidationErrorinstead.F2 ·
P2:should-fix· Line exceeds 88-char ruff limitFile:
benchmarks/tdd_actor_list_validation_bench.py:114This line is 101 characters (including the
# type: ignore), exceeding the project's 88-char ruff line-length limit. Will failnox -s lint.Fix: Break the line or assign via a local variable.
F3 ·
P2:should-fix· Sameimportlib.reloadissueFile:
benchmarks/tdd_actor_list_validation_bench.py:42Same as 653-F4:
importlib.reload(cleveragents)only reloads__init__.py, not the submodules actually imported on lines 44-47.Fix: Remove the reload.
F4 ·
P2:should-fix· Runtime dependency on PR #653's@tdd_expected_failinfrastructureFile:
features/tdd_actor_list_validation.feature:13,19,25All three scenarios use
@tdd_expected_failwhich requires the hook added toenvironment.pyin PR #653. If #655 merges before #653, or #653 is reverted, the tag has no effect — scenarios that fail will fail CI normally (no inversion).This is a process dependency, not a code bug. But it should be documented in the PR description and enforced by merge ordering.
Fix: Add to PR description: "Depends on PR #653 (provides
@tdd_expected_failinfrastructure inenvironment.py)." Ensure #653 merges first.F5 ·
P3:nice-to-have·FakeProviderInfomissingapi_key_env_varfieldFile:
features/mocks/fake_provider.py:138The real
ProviderInfohas a requiredapi_key_env_var: strfield.FakeProviderInfoomits it. This works today becauseActorRegistry.ensure_built_in_actors()only accessesname,provider_type,default_model, andcapabilities. But ifensure_built_in_actorsor any downstream code ever accessesapi_key_env_var, these tests will fail withAttributeErrorrather than a clear type error.Fix: Add
api_key_env_var: str = ""toFakeProviderInfofor structural completeness.F6 ·
P3:nice-to-have· Robot helper usesdict[str, object]typingFile:
robot/helper_tdd_actor_list_validation.py:157Same as 653-F6:
_COMMANDS: dict[str, object]forces# type: ignore[operator]on the dispatch call.Fix: Use
dict[str, Callable[[], None]].F7 ·
P3:nice-to-have·_make_registry()duplicated across BDD steps, benchmark, and Robot helperFiles:
features/steps/tdd_actor_list_validation_steps.py:72,benchmarks/tdd_actor_list_validation_bench.py:50,robot/helper_tdd_actor_list_validation.py:19Three near-identical copies of the
_make_registry()/_make_real_registry()function exist. If theActorRegistryconstructor changes, all three must be updated in lockstep.Fix: Move to
features/mocks/fake_provider.py(which already exists as a shared mock module) and import from all three locations.e7acd5a665tob49e8e7dc9All 7 review findings addressed. Branch rebased onto updated PR #654. Force-pushed.
Changes made:
from cleveragents.core.exceptions import ValidationErrortofrom pydantic import ValidationError—Actor.validate_name()raisespydantic.ValidationError(via@field_validator), not the project's customValidationError# type: ignore[attr-defined]line using a local variable alias (note: E501 is not in the ruff select list so it was not a lint violation, but fixed for readability)import importlibandimportlib.reload(cleveragents)from benchmarkapi_key_env_var: str = ""toFakeProviderInfofor completenessdict[str, object]todict[str, Callable[[], None]]in robot helper, removed# type: ignore[operator]_make_registry()duplication kept for test/benchmark isolation (acceptable trade-off)Merge-gate results: All checks pass (lint, typecheck, unit tests 9/9 TDD scenarios, robot 9/9, security scan).
Self-Review — PR #655 (Issue #634)
Review scope: Full review per
docs/development/review_playbook.md— lint, typecheck, security scan, manual logic review, cross-PR consistency, exception-type verification.Merge-gate checks
ruff check(lint)ruff format --checkpyright(typecheck)nox -s security_scannox -s unit_tests(TDD features)nox -s integration_tests --include tdd_bugtest(actor): add TDD failing tests for actor list empty validation error)ISSUES CLOSED: #634presentFindings
No P0, P1, or P2 findings.
Verified the
pydantic.ValidationErrorfix (655-F1) is correct:Actor.validate_name()is a@field_validatorthat raisesValueError, which Pydantic wraps aspydantic.ValidationError. The benchmark now catches the correct exception type. TheFakeProviderInfodataclass correctly satisfiesActorRegistry's duck-typed contract. Theapi_key_env_varfield default aligns with the realProviderInfotype. Import ordering (ruff I001) is clean after auto-fix.P3:nit — informational: The
_make_registry()helper constructsActor(name=kw.get("name", ...))in theupsert_actor.side_effectlambda. The Pydantic field validator rejects multi-slash names at Actor construction time, which faithfully reproduces the bug. This is correct behavior — noting for reviewer context.b49e8e7dc9to9defc50abe9defc50abeto67b8900035PR #655 — test(actor): add TDD failing tests for actor list empty validation error (Issue #634) — Round 2
Reviewed:
67b89000(tdd/actor-list-validation, 1 commit), 6 files, +593 lines. Mergeable: true. Base:tdd/session-create-di-error.R1 Disposition
pydantic.ValidationErrorimportedimportlib.reloadissue@tdd_expected_failis on the base branchFakeProviderInfomissingapi_key_env_vardict[str, object]typingCallable[[], None]_make_registry()duplicated across 3 files6 of 7 fixed.
F8 (Major) — Scenario 3 permanently passes under
@tdd_expected_fail— dead as regression detectorFile:
features/tdd_actor_list_validation.feature:27-32,features/steps/tdd_actor_list_validation_steps.py:53mock_service.list_actors.return_value = []is hardcoded at line 53 and never updated whenupsert_actorcreates actors. This makes Scenario 3 permanently pass regardless of bug status:list_actors()raisespydantic.ValidationError→ exit code 1 → exit-code step fails →@tdd_expected_failinverts → passeslist_actors()succeeds but returns[](the mock). CLI outputs"No actors configured.\n"(not JSON). The "valid JSON" step fails →@tdd_expected_failinverts → passesThe scenario can never signal that the bug is fixed. Its entire purpose as a
@tdd_expected_failtest is defeated.Fix: Update
mock_service.list_actorsto return the actors created byupsert_actor.side_effect, e.g., capture actors in a list and setlist_actors.side_effect = lambda: captured_actors.F9 (Medium) — Robot helper lacks
upsert_actor.call_countguard — silent false positive on setup failureFile:
robot/helper_tdd_actor_list_validation.py:97,126,160All three Robot subcommands iterate
mock_service.upsert_actor.call_args_listwithout first checking thatupsert_actorwas actually called. If the CLI invocation fails before reachingensure_built_in_actors(import error, config issue, patch not taking effect), the loop executes zero iterations and the function reports the bug as "fixed" — a misleading false positive.The BDD step at
tdd_actor_list_validation_steps.py:165has the correct guard:assert service.upsert_actor.call_count > 0, "upsert_actor was never called". The Robot helper should have the same.Fix: Add
assert mock_service.upsert_actor.call_count > 0before eachcall_args_listloop.F7 (Medium) —
_make_registry()still duplicated across 3 files — OUTSTANDINGFiles:
features/steps/tdd_actor_list_validation_steps.py:42,benchmarks/tdd_actor_list_validation_bench.py:40,robot/helper_tdd_actor_list_validation.py:42Three near-identical copies of the registry construction logic.
features/mocks/fake_provider.pyalready exists as a shared mock module —_make_registry()belongs there.3 findings: 1 Major (dead scenario), 2 Medium (missing guard, duplication). 6 of 7 R1 findings fixed. All 4 ACs met.
148e036abeto2365f9a355Round 2 Review — All 3 Findings Addressed
Pushed
2365f9a3. Here's what changed:F8 (Major) — Scenario 3 permanently passes under
@tdd_expected_failRoot cause:
mock_service.list_actors.return_value = []was hardcoded, so even after a bug fixlist_actors()would return[]→ CLI prints "No actors configured" (not JSON) → Scenario 3 step "valid JSON" fails →@tdd_expected_failinverts to pass → scenario can never signal the bug is fixed.Fix: Introduced a captured-actors pattern in
make_registry()(features/mocks/fake_provider.py). Theupsert_actor.side_effectnow appends successfully createdActorinstances to acaptured_actorslist, andlist_actors.side_effectreturns that list. This means:upsert_actorraisespydantic.ValidationError→ no actors captured →list_actors()returns[]→ CLI fails →@tdd_expected_failinverts to passupsert_actorsucceeds → actors captured →list_actors()returns them → CLI outputs valid JSON → Scenario 3 passes →@tdd_expected_failinverts to FAIL → signals the tag should be removedF7 (Medium) —
_make_registry()duplicated across 3 filesMoved the unified
make_registry()function tofeatures/mocks/fake_provider.py(already the shared mock module). All 3 consumers now import from there:features/steps/tdd_actor_list_validation_steps.py—from features.mocks.fake_provider import FakeProviderInfo, make_registrybenchmarks/tdd_actor_list_validation_bench.py—from mocks.fake_provider import FakeProviderInfo, make_registryrobot/helper_tdd_actor_list_validation.py—from mocks.fake_provider import FakeProviderInfo, make_registrySignature:
make_registry(providers: list[FakeProviderInfo] | None = None) -> tuple[MagicMock, ActorRegistry]. Callers that only need the registry destructure with_, registry = make_registry(...).F9 (Medium) — Robot helper lacks
upsert_actor.call_countguardAdded
assert mock_service.upsert_actor.call_count > 0after eachrunner.invoke()call in all 3 Robot helper subcommands (list_validation_error,name_slash_check,list_json_validation). This ensures the patch took effect and prevents false positives from zero-iterationcall_args_listloops.Verification
nox -e lint— all checks passednox -e typecheck— 0 errors@tdd_expected_failscenarios pass (3 features × 3 scenarios)New commits pushed, approval review dismissed automatically according to repository settings