test(plan): TDD failing tests for subplan spawn orchestration (bug #823) #930
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Blocks
#838 TDD: subplan spawn creates metadata but does not orchestrate real child plan execution (bug #823)
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core!930
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "tdd/m6-subplan-spawn-orchestration"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
TDD expected-fail tests proving bug #823 exists:
SubplanService.spawn()only creates metadata (SubplanStatusrecords andSpawnMetadata) without creating real childPlandomain objects, triggering lifecycle progression, or attaching status to the parent plan.Tests Added
Behave scenarios (
features/tdd_subplan_spawn_orchestration.feature):Tags:
@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_onlyRobot Framework tests (
robot/tdd_subplan_spawn_orchestration.robot):spawn-child-plans— verifies SpawnResult contains real Plan objectsspawn-lifecycle— verifies child plan lifecycle progressionspawn-parent-tracking— verifies parent tracks child statusQuality Gates
nox -s lintnox -s typechecknox -s unit_testsnox -s coverage_reportCloses #838
fdcb5ff51c851f1bd18cPM Status Update — Day 34
This is the TDD counterpart PR for Critical bug #823 (subplan spawn no orchestration). Bug #823 is Priority/Critical and blocks M6.
The PR looks well-scoped: 3 Behave scenarios + 3 Robot tests, all tagged
@tdd_expected_fail @tdd_bug_823. CI reports passing per Brent's PR description.Action items:
@tdd_bug,@tdd_bug_823,@tdd_expected_fail), whether the tests actually capture the bug behavior described in #823, and whether the tests will pass normally once the bug is fixed.Priority: Critical — this TDD PR unblocks a Critical-priority bug fix. Fast-track review requested.
Self-Review — PR #930
Reviewer: @brent.edwards (author self-review)
Review method: 4 parallel threads + 2 fresh-eyes passes
Findings
No issues found. All 4 files are clean:
@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only),@mock_onlyappropriate (all in-memory operations with MagicMock)# type: ignore, no bareexcept, no leaked business logic_fail() -> NoReturncorrect,_COMMANDSdict correct, all typedCorrectness Verification
Tests correctly prove bug #823:
SubplanService.spawn()createsSubplanStatusmetadata but never creates childPlanobjects, never setschild_plansonSpawnResult(field doesn't exist), and never attaches statuses to parent plan. Tests usegetattr(result, "child_plans", None)to safely test for the missing attribute. Tests would correctly fail if someone added achild_plansfield and actual Plan creation.Verdict
Clean — no findings. Ready for external review.
PM Review — Day 34: TDD PR for Bug #823 (Subplan Spawn Orchestration)
@brent.edwards — Good work on this TDD PR. Quick assessment:
Checklist
@tdd_expected_fail @tdd_bug @tdd_bug_823MoSCoW/Must have)Points/2or appropriate estimate)Blocking
This TDD PR is on the critical path — once merged, bug #823 fix work can begin (assigned to @freemo).
Recommendation
Needs peer review from a second developer. Requesting @CoreRasurae or @hurui200320.
PM review — Day 34
Code Review Report — PR #930: TDD failing tests for subplan spawn orchestration (bug #823)
Reviewer: Automated code review (test coverage, test flaws, performance, bug detection, security)
Scope: All code changes in branch
tdd/m6-subplan-spawn-orchestrationplus close connections to surrounding code (SubplanService,Plan,Decision,SubplanConfig,SpawnResult, Robotcommon.resource, TDD expected-fail infrastructure)Methodology: 3 full review cycles across all categories (bugs, test flaws, performance, security, code quality) until convergence
Summary
The PR adds 4 new files (574 lines) implementing TDD expected-fail tests for bug #823 across both Behave (feature + steps) and Robot Framework (robot + helper). The tests correctly prove that
SubplanService.spawn()only creates metadata (SubplanStatus+SpawnMetadata) without creating real childPlandomain objects. The TDD tagging convention (@tdd_expected_fail+@tdd_bug+@tdd_bug_823) is correctly applied and follows the project's established pattern.8 findings identified across 3 review cycles (0 critical, 2 high, 3 medium, 3 low). No performance or security issues.
HIGH Severity
H1 — Parent plan created in STRATEGIZE phase; specification requires EXECUTE for spawn
Files:
features/steps/tdd_subplan_spawn_orchestration_steps.py:60,robot/helper_tdd_subplan_spawn_orchestration.py:84Both
_make_parent_plan()functions create the parent plan withphase=PlanPhase.STRATEGIZE. Per the specification (§Plan Hierarchy and Parallelism):The current
SubplanService.spawn()does not validate the parent plan's phase, so the test executes without error today. However, when the bug fix is implemented, it is reasonable (and spec-aligned) to add phase validation tospawn(). If that happens, these TDD tests would fail for the wrong reason (phase mismatch instead of missingchild_plans), producing a false negative that obscures the original bug signal.Recommendation: Change both
_make_parent_plan()functions to usephase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSINGto accurately represent the lifecycle stage at which spawning occurs.H2 — PR body documents incorrect helper subcommand names
Location: PR #930 body ("Changes" section)
The PR body states the Robot helper has subcommands
spawn-returns-plans,child-enters-strategize, andparent-tracks-status. The actual code uses different names:spawn-returns-plansspawn-child-planschild-enters-strategizespawn-lifecycleparent-tracks-statusspawn-parent-trackingReferences:
helper_tdd_subplan_spawn_orchestration.py:233-236,tdd_subplan_spawn_orchestration.robot:20,29,38Recommendation: Update the PR body to reflect the actual subcommand names.
MEDIUM Severity
M1 — Only 3 of 8 acceptance criteria from #823 have corresponding TDD tests
Location: All 4 files (overall scope)
Issue #823 defines 8 acceptance criteria. The TDD tests cover 3:
plan subplan listshows real status progressionWhile TDD tests focus on proving the bug exists, the uncovered criteria (particularly parallel execution and failure handling) represent significant specification requirements that would benefit from TDD coverage to guide the fix implementation.
Recommendation: Consider adding at least one scenario for parallel execution (
ExecutionMode.PARALLEL) and one for the failure-handling contract, even if minimal.M2 — DRY violation: identical setup code duplicated between Behave steps and Robot helper
Files:
tdd_subplan_spawn_orchestration_steps.py:39-98vshelper_tdd_subplan_spawn_orchestration.py:52-118Approximately 54 lines of code are duplicated verbatim across both files:
_PARENT_PLAN_ID,_ROOT_PLAN_ID,_DECISION_ID_1/_DEC_ID1,_DECISION_ID_2/_DEC_ID2)_mock_decision_service()function_make_parent_plan()function_make_spawn_entries()function (with identicalDecisionandSpawnEntryconstruction)Any change to the test fixtures (e.g., adding fields when the bug is fixed) must be synchronized in both files. The ULID constant names even differ slightly (
_DECISION_ID_1vs_DEC_ID1), increasing the maintenance risk.Recommendation: Extract the shared fixtures into a common module (e.g.,
robot/fixtures_tdd_subplan_spawn.pyor a shared test utilities file) and import from both the Behave steps and the Robot helper.M3 — Missing assertion on
child.identity.root_plan_idFiles:
tdd_subplan_spawn_orchestration_steps.py:143-157,helper_tdd_subplan_spawn_orchestration.py:153-170The tests verify
child.identity.parent_plan_id == _PARENT_PLAN_IDbut do not verifychild.identity.root_plan_id. PerPlanIdentityinplan.py:274and the specification, every child plan must haveroot_plan_idset to the top-most plan in the hierarchy. This is a required field for hierarchy traversal and is used bySubplanService.spawn()when buildingSpawnMetadata(line 224:root_id = parent_plan.identity.root_plan_id or parent_id).Recommendation: Add an assertion verifying
child.identity.root_plan_id == _ROOT_PLAN_IDalongside the existingparent_plan_idcheck.LOW Severity
L1 — Inconsistent type annotation for
child_plansvariable across Behave stepsFile:
tdd_subplan_spawn_orchestration_steps.pystep_result_has_child_planschild_plans: objectstep_child_plans_have_parent_idchild_plans: list[Plan] | Nonestep_child_plans_strategizechild_plans: list[Plan] | Nonestep_child_plans_queuedchild_plans: list[Plan] | NoneThe first step uses
objectwhile the remaining three uselist[Plan] | None. Sincegetattr()returnsAny, both are technically valid at runtime, but consistency improves readability.Recommendation: Use
list[Plan] | Nonein all four steps for consistency.L2 — Redundant SubplanConfig instances on Plan and Context
File:
tdd_subplan_spawn_orchestration_steps.py:60-68, 94-96The parent plan is constructed with
subplan_config=SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL)(line 66), and a separate identicalSubplanConfigis stored oncontext.subplan_config(line 94). Both carry the same values.SubplanService.spawn()takesconfigas a separate parameter, so the plan'ssubplan_configis not used by the test. Having two identical configs could confuse future maintainers about which one is authoritative.Recommendation: Either remove
subplan_configfrom thePlanconstructor (if the test doesn't need it on the plan) or derivecontext.subplan_configfromcontext.parent_plan.subplan_configto avoid divergence.L3 — Robot tests log stderr but don't assert it is empty
File:
tdd_subplan_spawn_orchestration.robot:22-23, 31-32, 40-41All three test cases log
${result.stderr}for debugging but only assert onrc == 0andstdoutcontent. If the helper script emits unexpected warnings to stderr while still exiting successfully, they would be silently ignored.Recommendation: Add
Should Be Empty ${result.stderr}after the rc assertion (or at minimumLog ${result.stderr} WARNat warn level to surface unexpected output).No Issues Found In
@tdd_expected_fail @tdd_bug @tdd_bug_823follows the three-tag convention validated byenvironment.py'svalidate_tdd_tags().ULID_PATTERN.Plan,Decision,SubplanConfig, andContextSnapshotinstantiations provide required fields and match current model schemas.@ -0,0 +57,4 @@root_plan_id=_ROOT_PLAN_ID,),namespaced_name=NamespacedName(namespace="local", name="parent-plan"),description="Parent plan for subplan spawn orchestration test",H1: Parent plan phase should be
PlanPhase.EXECUTE(notSTRATEGIZE). Per spec, spawn decisions are recorded during Strategize, but actual spawning occurs during Execute. If the bug fix adds phase validation tospawn(), this test would fail for the wrong reason.@ -0,0 +133,4 @@Bug #823: SpawnResult only contains SubplanStatus metadata, notreal Plan domain objects. This assertion will fail until the bugis fixed.L1: Type annotation inconsistency — this step uses
objectwhilestep_child_plans_have_parent_id,step_child_plans_strategize, andstep_child_plans_queueduselist[Plan] | None. Consider usinglist[Plan] | Nonehere too for consistency.@ -0,0 +152,4 @@f"Expected {len(context.spawn_entries)} child plans, got {len(child_plans)}")M3: Missing assertion on
child.identity.root_plan_id == _ROOT_PLAN_ID. PerPlanIdentityand the spec, every child plan must haveroot_plan_idset to the top-most plan in the hierarchy.@ -0,0 +81,4 @@def _make_spawn_entries() -> list[SpawnEntry]:"""Create spawn entries for testing."""H1: Same phase issue as the Behave side — should be
phase=PlanPhase.EXECUTEto match the spec-defined lifecycle stage where spawning occurs.@ -0,0 +165,4 @@svc: SubplanService = SubplanService(decision_service=_mock_decision_service(),)parent: Plan = _make_parent_plan()M3: Same as Behave side — add a
root_plan_idcheck:@ -0,0 +20,4 @@${result}= Run Process ${PYTHON} ${HELPER} spawn-child-plans cwd=${WORKSPACE} timeout=30sLog ${result.stdout}Log ${result.stderr}Should Be Equal As Integers ${result.rc} 0L3: Consider adding
Should Be Empty ${result.stderr}to catch unexpected warnings that might indicate issues even when rc==0.de8edfd7ddb258410439Review Fixes Applied — Addressing @CoreRasurae REQUEST_CHANGES and @freemo feedback
Commit:
b2584104→ force-pushedFixes applied:
_make_parent_plan()functions to usephase=PlanPhase.EXECUTE(spawn occurs during Execute per spec)spawn-child-plans,spawn-lifecycle,spawn-parent-trackingroot_plan_idassertionroot_plan_id == _ROOT_PLAN_IDassertion in both Behave steps and Robot helperchild_plans: object→ `child_plans: list[Plan]MoSCoW/Must have,Points/2, assigned tobrent.edwardsNot addressed (tracked as follow-up):
subplan_configis a domain requirement; test config is thespawn()parameter. Both are needed.All nox gates pass (lint, typecheck).
Code Review Report — PR #930 (
tdd/m6-subplan-spawn-orchestration)Reviewer: Automated deep review (test coverage, test flaws, performance, bug detection, security)
Scope: 4 new files (574 lines) on branch
tdd/m6-subplan-spawn-orchestrationMethod: 3 iterative global review cycles covering all categories, cross-referenced against
SubplanService.spawn()implementation (subplan_service.py:176-268),SpawnResultdataclass (subplan_service.py:100-118),Plandomain model (plan.py:518-1004), specification document (docs/specification.md), and established TDD test patterns across the codebase.Overall Assessment
This is a solid TDD expected-fail PR that correctly identifies and proves the existence of bug #823. The code follows established patterns, is well-documented, and meets all acceptance criteria from issue #838. No blocking issues were found. The findings below are opportunities for improvement, organized by severity and category.
Findings Summary
MEDIUM Severity
M1. Robot TDD Listener Blind Inversion Risk
Category: Test Design / Test Infrastructure
Files:
robot/tdd_subplan_spawn_orchestration.robot(all 3 test cases)Relates to:
robot/tdd_expected_fail_listener.py(pre-existing)The Robot
tdd_expected_fail_listener.pyinverts any test failure to PASS without checking the failure type. Unlike the Behave implementation (infeatures/environment.py) which guards against non-AssertionErrorexceptions, infrastructure errors, and dry-run mode, the Robot listener performs a simpleFAIL→PASS/PASS→FAILflip.This means if the Robot helper script crashes due to an unrelated problem (e.g., import error from a refactor,
ValueErrorfromspawn()validation changing, or a domain model constructor change), the test would exit withrc=1, Robot would mark it FAIL, and the listener would silently invert it to PASS. The tests would appear green while actually broken.This is a pre-existing architectural weakness affecting all 4 existing TDD Robot files — not a regression introduced by this PR. However, it directly impacts these tests' reliability.
Suggestion: Consider adding a guard to the Robot helper dispatch — e.g., wrapping each subcommand call in a
try/exceptthat catches unexpected exceptions and exits with a distinct code (e.g.,rc=2) that the listener does NOT invert. Alternatively, enhance the listener to only invert when stderr contains a known sentinel pattern (e.g.,"(bug #823)").M2. No Parallel Spawn Mode Test Coverage
Category: Coverage Gap
Files:
features/tdd_subplan_spawn_orchestration.feature,robot/helper_tdd_subplan_spawn_orchestration.pyAll 3 Behave scenarios and all 3 Robot tests exercise only
DecisionType.SUBPLAN_SPAWNwithExecutionMode.SEQUENTIAL. The specification (docs/specification.mdlines 18294-18296, 18403-18407) definesSUBPLAN_PARALLEL_SPAWNas a first-class spawn mode with distinct semantics (concurrent execution, independent failure handling).While the core bug (no child
Planobjects are created) is mode-independent, adding a single scenario withExecutionMode.PARALLELandDecisionType.SUBPLAN_PARALLEL_SPAWNwould:Suggestion: Add a fourth Behave scenario (e.g., "Parallel spawn result contains child Plan domain objects") and corresponding Robot test case using
PARALLELmode.LOW Severity
L1. Redundant
SubplanConfigConstructionCategory: Test Design / DRY
File:
features/steps/tdd_subplan_spawn_orchestration_steps.py:103The
step_parent_planstep createscontext.subplan_config = SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL)as a separate object, but_make_parent_plan()already sets an identicalsubplan_configon the parent plan (line 64-66). Thespawn()API takesconfigas a separate parameter, so both exist for a reason — but they're always identical in these tests.Suggestion: Use
context.subplan_config = context.parent_plan.subplan_configto make the relationship explicit and avoid divergence.L2. Redundant Mock Configuration
Category: Test Design
Files:
features/steps/tdd_subplan_spawn_orchestration_steps.py:48,robot/helper_tdd_subplan_spawn_orchestration.py:61_mock_decision_service()explicitly setssvc.list_by_type = MagicMock(return_value=[]), but:MagicMock()auto-stubs all attribute access, so this explicit setup is unnecessary.spawn()does not calllist_by_type— that method is only used inget_spawn_decisions()(line 359), a separate code path.The explicit stub creates a misleading impression that
spawn()depends onlist_by_type.Suggestion: Simplify to
return MagicMock()with a comment noting thatspawn()only needs a non-Nonedecision_servicefor constructor validation.L3. Inconsistent ULID Constant Naming
Category: Style / Maintainability
Files:
features/steps/tdd_subplan_spawn_orchestration_steps.py:41-42vsrobot/helper_tdd_subplan_spawn_orchestration.py:48-49_DECISION_ID_1_DEC_ID101HGZ6FE0AQDYTR4BXVQZ6DA00_DECISION_ID_2_DEC_ID201HGZ6FE0AQDYTR4BXVQZ6DB00Same values, different naming conventions. Reduces cross-referencing readability when comparing Behave and Robot test logic.
Suggestion: Align on one convention.
_DECISION_ID_1/_DECISION_ID_2is more descriptive.L4. Type Annotation Mismatch on
getattr()ReturnCategory: Correctness / Type Safety
File:
features/steps/tdd_subplan_spawn_orchestration_steps.py:142getattr()returnsobject. Thelist[Plan] | Noneannotation is technically incorrect and would cause a type checker warning. The Robot helper correctly useschild_plans: object = getattr(...)(line 136).This inconsistency appears in 5 step functions (lines 142, 163, 186, 204, 243).
Suggestion: Use
child_plans: object = getattr(...)consistently in both files, or cast explicitly after theisinstancecheck.L5. Feature File Could Use
BackgroundSectionCategory: Style / DRY
File:
features/tdd_subplan_spawn_orchestration.featureAll 3 scenarios repeat identical
Given/Andsteps:A
Backgroundsection would reduce this duplication:Each scenario would then start directly with
When I spawn subplans via SubplanService.L6.
@mock_onlyTag Not in Acceptance CriteriaCategory: Spec Compliance
File:
features/tdd_subplan_spawn_orchestration.feature:1The feature-level tags include
@mock_onlywhich is not listed in issue #838's acceptance criteria (@tdd_expected_fail @tdd_bug @tdd_bug_823). The Robot tests do not have an equivalent tag. While@mock_onlyis a reasonable addition (these tests use mocks, not real infrastructure), the discrepancy should be acknowledged.Verification Notes
The following aspects were verified and found to be correct:
PlanPhase.EXECUTEis correct (spec: spawning happens during Execute phase). Child plans expected atPlanPhase.STRATEGIZEandProcessingState.QUEUEDis correct (spec: plans enter Strategize after creation).ULID_PATTERNregex (^[0-9A-HJKMNP-TV-Z]{26}$).@tdd_expected_fail @tdd_bug @tdd_bug_823present on all scenarios/test cases. Tags reference the underlying bug (#823), not the TDD issue (#838), which is correct per convention.tdd_*.robotfiles.spawn()call path: Fixtures pass all validation inspawn()andvalidate_spawn()— the test correctly reaches the bug assertion point without infrastructure errors.f16c5e0567b67dc63edaBranch cleaned —
b67dc63eRebased on latest master, removing 3 merge commits. No code changes needed — all review findings were already addressed or deferred as follow-up.
Branch: 1 commit, 0 merge commits. Lint + typecheck pass.