test(plan): TDD failing tests for subplan spawn orchestration (bug #823) #930

2026-03-14T00:40:18Z

brent.edwards commented

2026-03-14 00:40:18 +00:00

Summary

TDD expected-fail tests proving bug #823 exists: SubplanService.spawn() only creates metadata (SubplanStatus records and SpawnMetadata) without creating real child Plan domain objects, triggering lifecycle progression, or attaching status to the parent plan.

Tests Added

Behave scenarios (features/tdd_subplan_spawn_orchestration.feature):

Spawn result should contain child Plan domain objects (not just metadata)
Child plans should enter strategize phase after spawn
Parent plan should track child plan lifecycle status

Tags: @tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only

Robot Framework tests (robot/tdd_subplan_spawn_orchestration.robot):

spawn-child-plans — verifies SpawnResult contains real Plan objects
spawn-lifecycle — verifies child plan lifecycle progression
spawn-parent-tracking — verifies parent tracks child status

Quality Gates

Session	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)
`nox -s unit_tests`	PASS
`nox -s coverage_report`	98% (>= 97%)

Closes #838

## Summary TDD expected-fail tests proving bug #823 exists: `SubplanService.spawn()` only creates metadata (`SubplanStatus` records and `SpawnMetadata`) without creating real child `Plan` domain objects, triggering lifecycle progression, or attaching status to the parent plan. ### Tests Added **Behave scenarios** (`features/tdd_subplan_spawn_orchestration.feature`): - Spawn result should contain child Plan domain objects (not just metadata) - Child plans should enter strategize phase after spawn - Parent plan should track child plan lifecycle status Tags: `@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only` **Robot Framework tests** (`robot/tdd_subplan_spawn_orchestration.robot`): - `spawn-child-plans` — verifies SpawnResult contains real Plan objects - `spawn-lifecycle` — verifies child plan lifecycle progression - `spawn-parent-tracking` — verifies parent tracks child status ### Quality Gates | Session | Result | |---|---| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS | | `nox -s coverage_report` | 98% (>= 97%) | Closes #838

brent.edwards added the

Type

Testing

label 2026-03-14 00:40:26 +00:00

brent.edwards added this to the v3.5.0 milestone 2026-03-14 00:40:29 +00:00

brent.edwards force-pushed tdd/m6-subplan-spawn-orchestration from fdcb5ff51c to 851f1bd18c

2026-03-14 03:46:39 +00:00

Compare

brent.edwards added a new dependency 2026-03-14 03:50:05 +00:00

#838 TDD: subplan spawn creates metadata but does not orchestrate real child plan execution (bug #823)

brent.edwards referenced this pull request

2026-03-14 03:50:54 +00:00

TDD: subplan spawn creates metadata but does not orchestrate real child plan execution (bug #823) #838

freemo commented

2026-03-14 04:18:31 +00:00

PM Status Update — Day 34

This is the TDD counterpart PR for Critical bug #823 (subplan spawn no orchestration). Bug #823 is Priority/Critical and blocks M6.

The PR looks well-scoped: 3 Behave scenarios + 3 Robot tests, all tagged @tdd_expected_fail @tdd_bug_823. CI reports passing per Brent's PR description.

Action items:

@CoreRasurae — Please perform the first peer review of this TDD PR. Focus on: correct tag usage (@tdd_bug, @tdd_bug_823, @tdd_expected_fail), whether the tests actually capture the bug behavior described in #823, and whether the tests will pass normally once the bug is fixed.
After approval → merge, which unblocks the bug fix PR for #823.

Priority: Critical — this TDD PR unblocks a Critical-priority bug fix. Fast-track review requested.

**PM Status Update — Day 34** This is the TDD counterpart PR for Critical bug #823 (subplan spawn no orchestration). Bug #823 is Priority/Critical and blocks M6. The PR looks well-scoped: 3 Behave scenarios + 3 Robot tests, all tagged `@tdd_expected_fail @tdd_bug_823`. CI reports passing per Brent's PR description. **Action items:** 1. **@CoreRasurae** — Please perform the first peer review of this TDD PR. Focus on: correct tag usage (`@tdd_bug`, `@tdd_bug_823`, `@tdd_expected_fail`), whether the tests actually capture the bug behavior described in #823, and whether the tests will pass normally once the bug is fixed. 2. After approval → merge, which unblocks the bug fix PR for #823. **Priority:** Critical — this TDD PR unblocks a Critical-priority bug fix. Fast-track review requested.

freemo added the

labels 2026-03-14 04:18:53 +00:00

freemo referenced this pull request

2026-03-14 04:20:30 +00:00

TDD: subplan spawn creates metadata but does not orchestrate real child plan execution (bug #823) #838

brent.edwards reviewed 2026-03-14 19:50:14 +00:00

brent.edwards left a comment

Self-Review — PR #930

Reviewer: @brent.edwards (author self-review)
Review method: 4 parallel threads + 2 fresh-eyes passes

Findings

No issues found. All 4 files are clean:

Feature file: Tags correct (@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only), @mock_only appropriate (all in-memory operations with MagicMock)
Step definitions: All imports valid, all types annotated, no # type: ignore, no bare except, no leaked business logic
Robot file: Standard pattern, tags and timeouts correct
Helper: _fail() -> NoReturn correct, _COMMANDS dict correct, all typed

Correctness Verification

Tests correctly prove bug #823: SubplanService.spawn() creates SubplanStatus metadata but never creates child Plan objects, never sets child_plans on SpawnResult (field doesn't exist), and never attaches statuses to parent plan. Tests use getattr(result, "child_plans", None) to safely test for the missing attribute. Tests would correctly fail if someone added a child_plans field and actual Plan creation.

Verdict

Clean — no findings. Ready for external review.

# Self-Review — PR #930 **Reviewer:** @brent.edwards (author self-review) **Review method:** 4 parallel threads + 2 fresh-eyes passes ## Findings **No issues found.** All 4 files are clean: - Feature file: Tags correct (`@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only`), `@mock_only` appropriate (all in-memory operations with MagicMock) - Step definitions: All imports valid, all types annotated, no `# type: ignore`, no bare `except`, no leaked business logic - Robot file: Standard pattern, tags and timeouts correct - Helper: `_fail() -> NoReturn` correct, `_COMMANDS` dict correct, all typed ## Correctness Verification Tests **correctly prove bug #823**: `SubplanService.spawn()` creates `SubplanStatus` metadata but never creates child `Plan` objects, never sets `child_plans` on `SpawnResult` (field doesn't exist), and never attaches statuses to parent plan. Tests use `getattr(result, "child_plans", None)` to safely test for the missing attribute. Tests **would correctly fail** if someone added a `child_plans` field and actual Plan creation. ## Verdict **Clean — no findings.** Ready for external review.

freemo reviewed 2026-03-14 21:51:26 +00:00

freemo left a comment

PM Review — Day 34: TDD PR for Bug #823 (Subplan Spawn Orchestration)

@brent.edwards — Good work on this TDD PR. Quick assessment:

Checklist

Follows CONTRIBUTING.md TDD Bug Fix Workflow
Tests tagged @tdd_expected_fail @tdd_bug @tdd_bug_823
Behave BDD scenarios present (3 scenarios)
Robot Framework integration tests present (3 tests)
Quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage 98%)
Closes TDD issue TDD: subplan spawn creates metadata but does not orchestrate real child plan execution (bug #823) (#838)
Mergeable (no conflicts)
Missing: MoSCoW label (needs MoSCoW/Must have)
Missing: Points label (needs Points/2 or appropriate estimate)
Missing: Assignee (should be @brent.edwards)
Note: Milestone is v3.5.0 (M6) but bug #823 is in M6. Milestone alignment OK.

Blocking

This TDD PR is on the critical path — once merged, bug #823 fix work can begin (assigned to @freemo).

Recommendation

Needs peer review from a second developer. Requesting @CoreRasurae or @hurui200320.

PM review — Day 34

## PM Review — Day 34: TDD PR for Bug #823 (Subplan Spawn Orchestration) @brent.edwards — Good work on this TDD PR. Quick assessment: ### Checklist - [x] Follows CONTRIBUTING.md TDD Bug Fix Workflow - [x] Tests tagged `@tdd_expected_fail @tdd_bug @tdd_bug_823` - [x] Behave BDD scenarios present (3 scenarios) - [x] Robot Framework integration tests present (3 tests) - [x] Quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage 98%) - [x] Closes TDD issue #838 - [x] Mergeable (no conflicts) - [ ] **Missing**: MoSCoW label (needs `MoSCoW/Must have`) - [ ] **Missing**: Points label (needs `Points/2` or appropriate estimate) - [ ] **Missing**: Assignee (should be @brent.edwards) - [ ] **Note**: Milestone is v3.5.0 (M6) but bug #823 is in M6. Milestone alignment OK. ### Blocking This TDD PR is on the critical path — once merged, bug #823 fix work can begin (assigned to @freemo). ### Recommendation Needs peer review from a second developer. Requesting @CoreRasurae or @hurui200320. --- *PM review — Day 34*

freemo added the

labels 2026-03-14 21:51:48 +00:00

CoreRasurae requested changes 2026-03-15 14:13:51 +00:00

CoreRasurae left a comment

Code Review Report — PR #930: TDD failing tests for subplan spawn orchestration (bug #823)

Reviewer: Automated code review (test coverage, test flaws, performance, bug detection, security)
Scope: All code changes in branch tdd/m6-subplan-spawn-orchestration plus close connections to surrounding code (SubplanService, Plan, Decision, SubplanConfig, SpawnResult, Robot common.resource, TDD expected-fail infrastructure)
Methodology: 3 full review cycles across all categories (bugs, test flaws, performance, security, code quality) until convergence

Summary

The PR adds 4 new files (574 lines) implementing TDD expected-fail tests for bug #823 across both Behave (feature + steps) and Robot Framework (robot + helper). The tests correctly prove that SubplanService.spawn() only creates metadata (SubplanStatus + SpawnMetadata) without creating real child Plan domain objects. The TDD tagging convention (@tdd_expected_fail + @tdd_bug + @tdd_bug_823) is correctly applied and follows the project's established pattern.

8 findings identified across 3 review cycles (0 critical, 2 high, 3 medium, 3 low). No performance or security issues.

HIGH Severity

H1 — Parent plan created in STRATEGIZE phase; specification requires EXECUTE for spawn

Files: features/steps/tdd_subplan_spawn_orchestration_steps.py:60, robot/helper_tdd_subplan_spawn_orchestration.py:84

Both _make_parent_plan() functions create the parent plan with phase=PlanPhase.STRATEGIZE. Per the specification (§Plan Hierarchy and Parallelism):

"Decisions about child plans are made during Strategize (as subplan_spawn decision types). Child plans are actually spawned during Execute (based on those decisions)."

The current SubplanService.spawn() does not validate the parent plan's phase, so the test executes without error today. However, when the bug fix is implemented, it is reasonable (and spec-aligned) to add phase validation to spawn(). If that happens, these TDD tests would fail for the wrong reason (phase mismatch instead of missing child_plans), producing a false negative that obscures the original bug signal.

Recommendation: Change both _make_parent_plan() functions to use phase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSING to accurately represent the lifecycle stage at which spawning occurs.

H2 — PR body documents incorrect helper subcommand names

Location: PR #930 body ("Changes" section)

The PR body states the Robot helper has subcommands spawn-returns-plans, child-enters-strategize, and parent-tracks-status. The actual code uses different names:

PR Body (incorrect)	Actual Code
`spawn-returns-plans`	`spawn-child-plans`
`child-enters-strategize`	`spawn-lifecycle`
`parent-tracks-status`	`spawn-parent-tracking`

References: helper_tdd_subplan_spawn_orchestration.py:233-236, tdd_subplan_spawn_orchestration.robot:20,29,38

Recommendation: Update the PR body to reflect the actual subcommand names.

MEDIUM Severity

M1 — Only 3 of 8 acceptance criteria from #823 have corresponding TDD tests

Location: All 4 files (overall scope)

Issue #823 defines 8 acceptance criteria. The TDD tests cover 3:

AC	Covered	Test
spawn() creates real child Plan	Yes	Scenario 1
Child plan enters strategize phase	Yes	Scenario 2
Child plan full lifecycle (strategize -> execute -> validate -> complete)	No	—
Parent tracks child plan status	Yes	Scenario 3
Child results merge into parent decision tree	No	—
Child failure triggers configurable behavior	No	—
Parallel subplan execution works	No	—
`plan subplan list` shows real status progression	No	—

While TDD tests focus on proving the bug exists, the uncovered criteria (particularly parallel execution and failure handling) represent significant specification requirements that would benefit from TDD coverage to guide the fix implementation.

Recommendation: Consider adding at least one scenario for parallel execution (ExecutionMode.PARALLEL) and one for the failure-handling contract, even if minimal.

M2 — DRY violation: identical setup code duplicated between Behave steps and Robot helper

Files: tdd_subplan_spawn_orchestration_steps.py:39-98 vs helper_tdd_subplan_spawn_orchestration.py:52-118

Approximately 54 lines of code are duplicated verbatim across both files:

ULID constants (_PARENT_PLAN_ID, _ROOT_PLAN_ID, _DECISION_ID_1/_DEC_ID1, _DECISION_ID_2/_DEC_ID2)
_mock_decision_service() function
_make_parent_plan() function
_make_spawn_entries() function (with identical Decision and SpawnEntry construction)

Any change to the test fixtures (e.g., adding fields when the bug is fixed) must be synchronized in both files. The ULID constant names even differ slightly (_DECISION_ID_1 vs _DEC_ID1), increasing the maintenance risk.

Recommendation: Extract the shared fixtures into a common module (e.g., robot/fixtures_tdd_subplan_spawn.py or a shared test utilities file) and import from both the Behave steps and the Robot helper.

M3 — Missing assertion on `child.identity.root_plan_id`

Files: tdd_subplan_spawn_orchestration_steps.py:143-157, helper_tdd_subplan_spawn_orchestration.py:153-170

The tests verify child.identity.parent_plan_id == _PARENT_PLAN_ID but do not verify child.identity.root_plan_id. Per PlanIdentity in plan.py:274 and the specification, every child plan must have root_plan_id set to the top-most plan in the hierarchy. This is a required field for hierarchy traversal and is used by SubplanService.spawn() when building SpawnMetadata (line 224: root_id = parent_plan.identity.root_plan_id or parent_id).

Recommendation: Add an assertion verifying child.identity.root_plan_id == _ROOT_PLAN_ID alongside the existing parent_plan_id check.

LOW Severity

L1 — Inconsistent type annotation for `child_plans` variable across Behave steps

File: tdd_subplan_spawn_orchestration_steps.py

Step function	Line	Type annotation
`step_result_has_child_plans`	136	`child_plans: object`
`step_child_plans_have_parent_id`	144	`child_plans: list[Plan] \| None`
`step_child_plans_strategize`	162	`child_plans: list[Plan] \| None`
`step_child_plans_queued`	178	`child_plans: list[Plan] \| None`

The first step uses object while the remaining three use list[Plan] | None. Since getattr() returns Any, both are technically valid at runtime, but consistency improves readability.

Recommendation: Use list[Plan] | None in all four steps for consistency.

L2 — Redundant SubplanConfig instances on Plan and Context

File: tdd_subplan_spawn_orchestration_steps.py:60-68, 94-96

The parent plan is constructed with subplan_config=SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL) (line 66), and a separate identical SubplanConfig is stored on context.subplan_config (line 94). Both carry the same values. SubplanService.spawn() takes config as a separate parameter, so the plan's subplan_config is not used by the test. Having two identical configs could confuse future maintainers about which one is authoritative.

Recommendation: Either remove subplan_config from the Plan constructor (if the test doesn't need it on the plan) or derive context.subplan_config from context.parent_plan.subplan_config to avoid divergence.

L3 — Robot tests log stderr but don't assert it is empty

File: tdd_subplan_spawn_orchestration.robot:22-23, 31-32, 40-41

All three test cases log ${result.stderr} for debugging but only assert on rc == 0 and stdout content. If the helper script emits unexpected warnings to stderr while still exiting successfully, they would be silently ignored.

Recommendation: Add Should Be Empty ${result.stderr} after the rc assertion (or at minimum Log ${result.stderr} WARN at warn level to surface unexpected output).

No Issues Found In

Performance: All tests are in-memory with mock services; 30s timeouts are appropriate.
Security: No file system writes, no network access, no user input handling in test code.
Tag compliance: @tdd_expected_fail @tdd_bug @tdd_bug_823 follows the three-tag convention validated by environment.py's validate_tdd_tags().
Robot-Behave consistency: Sentinel strings, assertions, and test logic are aligned between the two frameworks.
ULID validity: All 4 hardcoded ULIDs are valid 26-character Crockford Base32 strings matching ULID_PATTERN.
Pydantic model construction: All Plan, Decision, SubplanConfig, and ContextSnapshot instantiations provide required fields and match current model schemas.

## Code Review Report — PR #930: TDD failing tests for subplan spawn orchestration (bug #823) **Reviewer**: Automated code review (test coverage, test flaws, performance, bug detection, security) **Scope**: All code changes in branch `tdd/m6-subplan-spawn-orchestration` plus close connections to surrounding code (`SubplanService`, `Plan`, `Decision`, `SubplanConfig`, `SpawnResult`, Robot `common.resource`, TDD expected-fail infrastructure) **Methodology**: 3 full review cycles across all categories (bugs, test flaws, performance, security, code quality) until convergence --- ### Summary The PR adds 4 new files (574 lines) implementing TDD expected-fail tests for bug #823 across both Behave (feature + steps) and Robot Framework (robot + helper). The tests correctly prove that `SubplanService.spawn()` only creates metadata (`SubplanStatus` + `SpawnMetadata`) without creating real child `Plan` domain objects. The TDD tagging convention (`@tdd_expected_fail` + `@tdd_bug` + `@tdd_bug_823`) is correctly applied and follows the project's established pattern. **8 findings** identified across 3 review cycles (0 critical, 2 high, 3 medium, 3 low). No performance or security issues. --- ### HIGH Severity #### H1 — Parent plan created in STRATEGIZE phase; specification requires EXECUTE for spawn **Files**: `features/steps/tdd_subplan_spawn_orchestration_steps.py:60`, `robot/helper_tdd_subplan_spawn_orchestration.py:84` Both `_make_parent_plan()` functions create the parent plan with `phase=PlanPhase.STRATEGIZE`. Per the specification (§Plan Hierarchy and Parallelism): > *"Decisions about child plans are made during **Strategize** (as `subplan_spawn` decision types). Child plans are actually spawned during **Execute** (based on those decisions)."* The current `SubplanService.spawn()` does not validate the parent plan's phase, so the test executes without error today. However, when the bug fix is implemented, it is reasonable (and spec-aligned) to add phase validation to `spawn()`. If that happens, these TDD tests would fail for the **wrong reason** (phase mismatch instead of missing `child_plans`), producing a false negative that obscures the original bug signal. **Recommendation**: Change both `_make_parent_plan()` functions to use `phase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSING` to accurately represent the lifecycle stage at which spawning occurs. --- #### H2 — PR body documents incorrect helper subcommand names **Location**: PR #930 body ("Changes" section) The PR body states the Robot helper has subcommands `spawn-returns-plans`, `child-enters-strategize`, and `parent-tracks-status`. The actual code uses different names: | PR Body (incorrect) | Actual Code | |---|---| | `spawn-returns-plans` | `spawn-child-plans` | | `child-enters-strategize` | `spawn-lifecycle` | | `parent-tracks-status` | `spawn-parent-tracking` | References: `helper_tdd_subplan_spawn_orchestration.py:233-236`, `tdd_subplan_spawn_orchestration.robot:20,29,38` **Recommendation**: Update the PR body to reflect the actual subcommand names. --- ### MEDIUM Severity #### M1 — Only 3 of 8 acceptance criteria from #823 have corresponding TDD tests **Location**: All 4 files (overall scope) Issue #823 defines 8 acceptance criteria. The TDD tests cover 3: | AC | Covered | Test | |---|---|---| | spawn() creates real child Plan | Yes | Scenario 1 | | Child plan enters strategize phase | Yes | Scenario 2 | | Child plan full lifecycle (strategize -> execute -> validate -> complete) | **No** | — | | Parent tracks child plan status | Yes | Scenario 3 | | Child results merge into parent decision tree | **No** | — | | Child failure triggers configurable behavior | **No** | — | | Parallel subplan execution works | **No** | — | | `plan subplan list` shows real status progression | **No** | — | While TDD tests focus on *proving the bug exists*, the uncovered criteria (particularly parallel execution and failure handling) represent significant specification requirements that would benefit from TDD coverage to guide the fix implementation. **Recommendation**: Consider adding at least one scenario for parallel execution (`ExecutionMode.PARALLEL`) and one for the failure-handling contract, even if minimal. --- #### M2 — DRY violation: identical setup code duplicated between Behave steps and Robot helper **Files**: `tdd_subplan_spawn_orchestration_steps.py:39-98` vs `helper_tdd_subplan_spawn_orchestration.py:52-118` Approximately 54 lines of code are duplicated verbatim across both files: - ULID constants (`_PARENT_PLAN_ID`, `_ROOT_PLAN_ID`, `_DECISION_ID_1`/`_DEC_ID1`, `_DECISION_ID_2`/`_DEC_ID2`) - `_mock_decision_service()` function - `_make_parent_plan()` function - `_make_spawn_entries()` function (with identical `Decision` and `SpawnEntry` construction) Any change to the test fixtures (e.g., adding fields when the bug is fixed) must be synchronized in both files. The ULID constant names even differ slightly (`_DECISION_ID_1` vs `_DEC_ID1`), increasing the maintenance risk. **Recommendation**: Extract the shared fixtures into a common module (e.g., `robot/fixtures_tdd_subplan_spawn.py` or a shared test utilities file) and import from both the Behave steps and the Robot helper. --- #### M3 — Missing assertion on `child.identity.root_plan_id` **Files**: `tdd_subplan_spawn_orchestration_steps.py:143-157`, `helper_tdd_subplan_spawn_orchestration.py:153-170` The tests verify `child.identity.parent_plan_id == _PARENT_PLAN_ID` but do not verify `child.identity.root_plan_id`. Per `PlanIdentity` in `plan.py:274` and the specification, every child plan must have `root_plan_id` set to the top-most plan in the hierarchy. This is a required field for hierarchy traversal and is used by `SubplanService.spawn()` when building `SpawnMetadata` (line 224: `root_id = parent_plan.identity.root_plan_id or parent_id`). **Recommendation**: Add an assertion verifying `child.identity.root_plan_id == _ROOT_PLAN_ID` alongside the existing `parent_plan_id` check. --- ### LOW Severity #### L1 — Inconsistent type annotation for `child_plans` variable across Behave steps **File**: `tdd_subplan_spawn_orchestration_steps.py` | Step function | Line | Type annotation | |---|---|---| | `step_result_has_child_plans` | 136 | `child_plans: object` | | `step_child_plans_have_parent_id` | 144 | `child_plans: list[Plan] \| None` | | `step_child_plans_strategize` | 162 | `child_plans: list[Plan] \| None` | | `step_child_plans_queued` | 178 | `child_plans: list[Plan] \| None` | The first step uses `object` while the remaining three use `list[Plan] | None`. Since `getattr()` returns `Any`, both are technically valid at runtime, but consistency improves readability. **Recommendation**: Use `list[Plan] | None` in all four steps for consistency. --- #### L2 — Redundant SubplanConfig instances on Plan and Context **File**: `tdd_subplan_spawn_orchestration_steps.py:60-68, 94-96` The parent plan is constructed with `subplan_config=SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL)` (line 66), and a separate identical `SubplanConfig` is stored on `context.subplan_config` (line 94). Both carry the same values. `SubplanService.spawn()` takes `config` as a separate parameter, so the plan's `subplan_config` is not used by the test. Having two identical configs could confuse future maintainers about which one is authoritative. **Recommendation**: Either remove `subplan_config` from the `Plan` constructor (if the test doesn't need it on the plan) or derive `context.subplan_config` from `context.parent_plan.subplan_config` to avoid divergence. --- #### L3 — Robot tests log stderr but don't assert it is empty **File**: `tdd_subplan_spawn_orchestration.robot:22-23, 31-32, 40-41` All three test cases log `${result.stderr}` for debugging but only assert on `rc == 0` and `stdout` content. If the helper script emits unexpected warnings to stderr while still exiting successfully, they would be silently ignored. **Recommendation**: Add `Should Be Empty ${result.stderr}` after the rc assertion (or at minimum `Log ${result.stderr} WARN` at warn level to surface unexpected output). --- ### No Issues Found In - **Performance**: All tests are in-memory with mock services; 30s timeouts are appropriate. - **Security**: No file system writes, no network access, no user input handling in test code. - **Tag compliance**: `@tdd_expected_fail @tdd_bug @tdd_bug_823` follows the three-tag convention validated by `environment.py`'s `validate_tdd_tags()`. - **Robot-Behave consistency**: Sentinel strings, assertions, and test logic are aligned between the two frameworks. - **ULID validity**: All 4 hardcoded ULIDs are valid 26-character Crockford Base32 strings matching `ULID_PATTERN`. - **Pydantic model construction**: All `Plan`, `Decision`, `SubplanConfig`, and `ContextSnapshot` instantiations provide required fields and match current model schemas.

features/steps/tdd_subplan_spawn_orchestration_steps.py Outdated

						
				@@ -0,0 +57,4 @@

				            root_plan_id=_ROOT_PLAN_ID,

				        ),

				        namespaced_name=NamespacedName(namespace="local", name="parent-plan"),

				        description="Parent plan for subplan spawn orchestration test",

CoreRasurae commented

2026-03-15 14:13:51 +00:00

H1: Parent plan phase should be PlanPhase.EXECUTE (not STRATEGIZE). Per spec, spawn decisions are recorded during Strategize, but actual spawning occurs during Execute. If the bug fix adds phase validation to spawn(), this test would fail for the wrong reason.

phase=PlanPhase.EXECUTE,
processing_state=ProcessingState.PROCESSING,

**H1**: Parent plan phase should be `PlanPhase.EXECUTE` (not `STRATEGIZE`). Per spec, spawn decisions are *recorded* during Strategize, but actual spawning occurs during Execute. If the bug fix adds phase validation to `spawn()`, this test would fail for the wrong reason. ```python phase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSING, ```

features/steps/tdd_subplan_spawn_orchestration_steps.py Outdated

						
				@@ -0,0 +133,4 @@

				    Bug #823: SpawnResult only contains SubplanStatus metadata, not

				    real Plan domain objects.  This assertion will fail until the bug

				    is fixed.

CoreRasurae commented

2026-03-15 14:13:51 +00:00

L1: Type annotation inconsistency — this step uses object while step_child_plans_have_parent_id, step_child_plans_strategize, and step_child_plans_queued use list[Plan] | None. Consider using list[Plan] | None here too for consistency.

**L1**: Type annotation inconsistency — this step uses `object` while `step_child_plans_have_parent_id`, `step_child_plans_strategize`, and `step_child_plans_queued` use `list[Plan] | None`. Consider using `list[Plan] | None` here too for consistency.

features/steps/tdd_subplan_spawn_orchestration_steps.py Outdated

						
				@@ -0,0 +152,4 @@

				        f"Expected {len(context.spawn_entries)} child plans, got {len(child_plans)}"

				    )

CoreRasurae commented

2026-03-15 14:13:51 +00:00

M3: Missing assertion on child.identity.root_plan_id == _ROOT_PLAN_ID. Per PlanIdentity and the spec, every child plan must have root_plan_id set to the top-most plan in the hierarchy.

**M3**: Missing assertion on `child.identity.root_plan_id == _ROOT_PLAN_ID`. Per `PlanIdentity` and the spec, every child plan must have `root_plan_id` set to the top-most plan in the hierarchy.

robot/helper_tdd_subplan_spawn_orchestration.py Outdated

						
				@@ -0,0 +81,4 @@

				def _make_spawn_entries() -> list[SpawnEntry]:

				    """Create spawn entries for testing."""

CoreRasurae commented

2026-03-15 14:13:51 +00:00

H1: Same phase issue as the Behave side — should be phase=PlanPhase.EXECUTE to match the spec-defined lifecycle stage where spawning occurs.

**H1**: Same phase issue as the Behave side — should be `phase=PlanPhase.EXECUTE` to match the spec-defined lifecycle stage where spawning occurs.

robot/helper_tdd_subplan_spawn_orchestration.py Outdated

						
				@@ -0,0 +165,4 @@

				    svc: SubplanService = SubplanService(

				        decision_service=_mock_decision_service(),

				    )

				    parent: Plan = _make_parent_plan()

CoreRasurae commented

2026-03-15 14:13:51 +00:00

M3: Same as Behave side — add a root_plan_id check:

if child.identity.root_plan_id != _ROOT_PLAN_ID:
    _fail(f"Child plan root_plan_id={child.identity.root_plan_id!r}, expected {_ROOT_PLAN_ID!r}")

**M3**: Same as Behave side — add a `root_plan_id` check: ```python if child.identity.root_plan_id != _ROOT_PLAN_ID: _fail(f"Child plan root_plan_id={child.identity.root_plan_id!r}, expected {_ROOT_PLAN_ID!r}") ```

robot/tdd_subplan_spawn_orchestration.robot Outdated

						
				@@ -0,0 +20,4 @@

				    ${result}=    Run Process    ${PYTHON}    ${HELPER}    spawn-child-plans    cwd=${WORKSPACE}    timeout=30s

				    Log    ${result.stdout}

				    Log    ${result.stderr}

				    Should Be Equal As Integers    ${result.rc}    0

CoreRasurae commented

2026-03-15 14:13:51 +00:00

L3: Consider adding Should Be Empty ${result.stderr} to catch unexpected warnings that might indicate issues even when rc==0.

**L3**: Consider adding `Should Be Empty ${result.stderr}` to catch unexpected warnings that might indicate issues even when rc==0.

brent.edwards force-pushed tdd/m6-subplan-spawn-orchestration from de8edfd7dd to b258410439

2026-03-15 19:17:01 +00:00

Compare

brent.edwards self-assigned this 2026-03-15 19:22:53 +00:00

brent.edwards commented

2026-03-15 19:23:26 +00:00

Review Fixes Applied — Addressing @CoreRasurae REQUEST_CHANGES and @freemo feedback

Commit: b2584104 → force-pushed

Fixes applied:

Finding	Fix
H1 (HIGH) — Parent plan in STRATEGIZE phase	Changed both `_make_parent_plan()` functions to use `phase=PlanPhase.EXECUTE` (spawn occurs during Execute per spec)
H2 (HIGH) — PR body incorrect subcommand names	Updated PR description with correct names: `spawn-child-plans`, `spawn-lifecycle`, `spawn-parent-tracking`
M3 (MEDIUM) — Missing `root_plan_id` assertion	Added `root_plan_id == _ROOT_PLAN_ID` assertion in both Behave steps and Robot helper
L1 (LOW) — Inconsistent type annotation	Changed `child_plans: object` → `child_plans: list[Plan]
@freemo — Missing labels/assignee	Added `MoSCoW/Must have`, `Points/2`, assigned to `brent.edwards`

Not addressed (tracked as follow-up):

Finding	Rationale
M1 — Only 3 of 8 ACs covered	TDD tests focus on proving the core bug. Parallel execution and failure handling are separate specification behaviors best tested when the fix is implemented.
M2 — DRY violation	Consistent with existing pattern (each file defines its own helpers). Will consider shared fixture module as more TDD subplan tests are added.
L2 — Redundant SubplanConfig	Plan's `subplan_config` is a domain requirement; test config is the `spawn()` parameter. Both are needed.
L3 — Robot tests don't assert stderr empty	Stderr may contain debug logs from services; asserting empty would be brittle.

All nox gates pass (lint, typecheck).

## Review Fixes Applied — Addressing @CoreRasurae REQUEST_CHANGES and @freemo feedback **Commit:** `b2584104` → force-pushed ### Fixes applied: | Finding | Fix | |---|---| | **H1** (HIGH) — Parent plan in STRATEGIZE phase | Changed both `_make_parent_plan()` functions to use `phase=PlanPhase.EXECUTE` (spawn occurs during Execute per spec) | | **H2** (HIGH) — PR body incorrect subcommand names | Updated PR description with correct names: `spawn-child-plans`, `spawn-lifecycle`, `spawn-parent-tracking` | | **M3** (MEDIUM) — Missing `root_plan_id` assertion | Added `root_plan_id == _ROOT_PLAN_ID` assertion in both Behave steps and Robot helper | | **L1** (LOW) — Inconsistent type annotation | Changed `child_plans: object` → `child_plans: list[Plan] | None` for consistency | | @freemo — Missing labels/assignee | Added `MoSCoW/Must have`, `Points/2`, assigned to `brent.edwards` | ### Not addressed (tracked as follow-up): | Finding | Rationale | |---|---| | **M1** — Only 3 of 8 ACs covered | TDD tests focus on proving the core bug. Parallel execution and failure handling are separate specification behaviors best tested when the fix is implemented. | | **M2** — DRY violation | Consistent with existing pattern (each file defines its own helpers). Will consider shared fixture module as more TDD subplan tests are added. | | **L2** — Redundant SubplanConfig | Plan's `subplan_config` is a domain requirement; test config is the `spawn()` parameter. Both are needed. | | **L3** — Robot tests don't assert stderr empty | Stderr may contain debug logs from services; asserting empty would be brittle. | All nox gates pass (lint, typecheck).

CoreRasurae reviewed 2026-03-15 22:16:32 +00:00

CoreRasurae left a comment

Code Review Report — PR #930 (`tdd/m6-subplan-spawn-orchestration`)

Reviewer: Automated deep review (test coverage, test flaws, performance, bug detection, security)
Scope: 4 new files (574 lines) on branch tdd/m6-subplan-spawn-orchestration
Method: 3 iterative global review cycles covering all categories, cross-referenced against SubplanService.spawn() implementation (subplan_service.py:176-268), SpawnResult dataclass (subplan_service.py:100-118), Plan domain model (plan.py:518-1004), specification document (docs/specification.md), and established TDD test patterns across the codebase.

Overall Assessment

This is a solid TDD expected-fail PR that correctly identifies and proves the existence of bug #823. The code follows established patterns, is well-documented, and meets all acceptance criteria from issue #838. No blocking issues were found. The findings below are opportunities for improvement, organized by severity and category.

Findings Summary

Severity	Count	Categories
Critical	0	—
Medium	2	Test Design, Coverage
Low	6	Test Design, Correctness, Style, Spec Compliance

MEDIUM Severity

M1. Robot TDD Listener Blind Inversion Risk

Category: Test Design / Test Infrastructure
Files: robot/tdd_subplan_spawn_orchestration.robot (all 3 test cases)
Relates to: robot/tdd_expected_fail_listener.py (pre-existing)

The Robot tdd_expected_fail_listener.py inverts any test failure to PASS without checking the failure type. Unlike the Behave implementation (in features/environment.py) which guards against non-AssertionError exceptions, infrastructure errors, and dry-run mode, the Robot listener performs a simple FAIL→PASS / PASS→FAIL flip.

This means if the Robot helper script crashes due to an unrelated problem (e.g., import error from a refactor, ValueError from spawn() validation changing, or a domain model constructor change), the test would exit with rc=1, Robot would mark it FAIL, and the listener would silently invert it to PASS. The tests would appear green while actually broken.

This is a pre-existing architectural weakness affecting all 4 existing TDD Robot files — not a regression introduced by this PR. However, it directly impacts these tests' reliability.

Suggestion: Consider adding a guard to the Robot helper dispatch — e.g., wrapping each subcommand call in a try/except that catches unexpected exceptions and exits with a distinct code (e.g., rc=2) that the listener does NOT invert. Alternatively, enhance the listener to only invert when stderr contains a known sentinel pattern (e.g., "(bug #823)").

M2. No Parallel Spawn Mode Test Coverage

Category: Coverage Gap
Files: features/tdd_subplan_spawn_orchestration.feature, robot/helper_tdd_subplan_spawn_orchestration.py

All 3 Behave scenarios and all 3 Robot tests exercise only DecisionType.SUBPLAN_SPAWN with ExecutionMode.SEQUENTIAL. The specification (docs/specification.md lines 18294-18296, 18403-18407) defines SUBPLAN_PARALLEL_SPAWN as a first-class spawn mode with distinct semantics (concurrent execution, independent failure handling).

While the core bug (no child Plan objects are created) is mode-independent, adding a single scenario with ExecutionMode.PARALLEL and DecisionType.SUBPLAN_PARALLEL_SPAWN would:

Verify the bug exists for both modes
Guard against a future fix that only addresses sequential spawning
Better align with the spec's equal treatment of both modes

Suggestion: Add a fourth Behave scenario (e.g., "Parallel spawn result contains child Plan domain objects") and corresponding Robot test case using PARALLEL mode.

LOW Severity

L1. Redundant `SubplanConfig` Construction

Category: Test Design / DRY
File: features/steps/tdd_subplan_spawn_orchestration_steps.py:103

The step_parent_plan step creates context.subplan_config = SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL) as a separate object, but _make_parent_plan() already sets an identical subplan_config on the parent plan (line 64-66). The spawn() API takes config as a separate parameter, so both exist for a reason — but they're always identical in these tests.

Suggestion: Use context.subplan_config = context.parent_plan.subplan_config to make the relationship explicit and avoid divergence.

L2. Redundant Mock Configuration

Category: Test Design
Files: features/steps/tdd_subplan_spawn_orchestration_steps.py:48, robot/helper_tdd_subplan_spawn_orchestration.py:61

_mock_decision_service() explicitly sets svc.list_by_type = MagicMock(return_value=[]), but:

MagicMock() auto-stubs all attribute access, so this explicit setup is unnecessary.
spawn() does not call list_by_type — that method is only used in get_spawn_decisions() (line 359), a separate code path.

The explicit stub creates a misleading impression that spawn() depends on list_by_type.

Suggestion: Simplify to return MagicMock() with a comment noting that spawn() only needs a non-None decision_service for constructor validation.

L3. Inconsistent ULID Constant Naming

Category: Style / Maintainability
Files: features/steps/tdd_subplan_spawn_orchestration_steps.py:41-42 vs robot/helper_tdd_subplan_spawn_orchestration.py:48-49

Behave	Robot	Value
`_DECISION_ID_1`	`_DEC_ID1`	`01HGZ6FE0AQDYTR4BXVQZ6DA00`
`_DECISION_ID_2`	`_DEC_ID2`	`01HGZ6FE0AQDYTR4BXVQZ6DB00`

Same values, different naming conventions. Reduces cross-referencing readability when comparing Behave and Robot test logic.

Suggestion: Align on one convention. _DECISION_ID_1 / _DECISION_ID_2 is more descriptive.

L4. Type Annotation Mismatch on `getattr()` Return

Category: Correctness / Type Safety
File: features/steps/tdd_subplan_spawn_orchestration_steps.py:142

child_plans: list[Plan] | None = getattr(result, "child_plans", None)

getattr() returns object. The list[Plan] | None annotation is technically incorrect and would cause a type checker warning. The Robot helper correctly uses child_plans: object = getattr(...) (line 136).

This inconsistency appears in 5 step functions (lines 142, 163, 186, 204, 243).

Suggestion: Use child_plans: object = getattr(...) consistently in both files, or cast explicitly after the isinstance check.

L5. Feature File Could Use `Background` Section

Category: Style / DRY
File: features/tdd_subplan_spawn_orchestration.feature

All 3 scenarios repeat identical Given/And steps:

Given a parent plan configured for subplan spawning
And valid spawn entries for the parent plan

A Background section would reduce this duplication:

Background:
  Given a parent plan configured for subplan spawning
  And valid spawn entries for the parent plan

Each scenario would then start directly with When I spawn subplans via SubplanService.

L6. `@mock_only` Tag Not in Acceptance Criteria

Category: Spec Compliance
File: features/tdd_subplan_spawn_orchestration.feature:1

The feature-level tags include @mock_only which is not listed in issue #838's acceptance criteria (@tdd_expected_fail @tdd_bug @tdd_bug_823). The Robot tests do not have an equivalent tag. While @mock_only is a reasonable addition (these tests use mocks, not real infrastructure), the discrepancy should be acknowledged.

Verification Notes

The following aspects were verified and found to be correct:

Spec alignment: Parent plan in PlanPhase.EXECUTE is correct (spec: spawning happens during Execute phase). Child plans expected at PlanPhase.STRATEGIZE and ProcessingState.QUEUED is correct (spec: plans enter Strategize after creation).
ULID validity: All 4 ULID constants pass the ULID_PATTERN regex (^[0-9A-HJKMNP-TV-Z]{26}$).
Acceptance criteria: All 6 checkboxes from issue #838 are satisfied.
Tag structure: @tdd_expected_fail @tdd_bug @tdd_bug_823 present on all scenarios/test cases. Tags reference the underlying bug (#823), not the TDD issue (#838), which is correct per convention.
Test pattern compliance: Robot test structure matches the canonical TDD template used across all 7 existing tdd_*.robot files.
spawn() call path: Fixtures pass all validation in spawn() and validate_spawn() — the test correctly reaches the bug assertion point without infrastructure errors.
Security: No concerns (test-only code, no user input handling, no new dependencies).
Performance: No concerns (test-only code, no hot paths).

## Code Review Report — PR #930 (`tdd/m6-subplan-spawn-orchestration`) **Reviewer:** Automated deep review (test coverage, test flaws, performance, bug detection, security) **Scope:** 4 new files (574 lines) on branch `tdd/m6-subplan-spawn-orchestration` **Method:** 3 iterative global review cycles covering all categories, cross-referenced against `SubplanService.spawn()` implementation (`subplan_service.py:176-268`), `SpawnResult` dataclass (`subplan_service.py:100-118`), `Plan` domain model (`plan.py:518-1004`), specification document (`docs/specification.md`), and established TDD test patterns across the codebase. --- ### Overall Assessment This is a **solid TDD expected-fail PR** that correctly identifies and proves the existence of bug #823. The code follows established patterns, is well-documented, and meets all acceptance criteria from issue #838. No blocking issues were found. The findings below are opportunities for improvement, organized by severity and category. --- ### Findings Summary | Severity | Count | Categories | |----------|-------|------------| | Critical | 0 | — | | Medium | 2 | Test Design, Coverage | | Low | 6 | Test Design, Correctness, Style, Spec Compliance | --- ### MEDIUM Severity #### M1. Robot TDD Listener Blind Inversion Risk **Category:** Test Design / Test Infrastructure **Files:** `robot/tdd_subplan_spawn_orchestration.robot` (all 3 test cases) **Relates to:** `robot/tdd_expected_fail_listener.py` (pre-existing) The Robot `tdd_expected_fail_listener.py` inverts **any** test failure to PASS without checking the failure type. Unlike the Behave implementation (in `features/environment.py`) which guards against non-`AssertionError` exceptions, infrastructure errors, and dry-run mode, the Robot listener performs a simple `FAIL→PASS` / `PASS→FAIL` flip. This means if the Robot helper script crashes due to an unrelated problem (e.g., import error from a refactor, `ValueError` from `spawn()` validation changing, or a domain model constructor change), the test would exit with `rc=1`, Robot would mark it FAIL, and the listener would silently invert it to PASS. **The tests would appear green while actually broken.** This is a **pre-existing architectural weakness** affecting all 4 existing TDD Robot files — not a regression introduced by this PR. However, it directly impacts these tests' reliability. **Suggestion:** Consider adding a guard to the Robot helper dispatch — e.g., wrapping each subcommand call in a `try/except` that catches unexpected exceptions and exits with a distinct code (e.g., `rc=2`) that the listener does NOT invert. Alternatively, enhance the listener to only invert when stderr contains a known sentinel pattern (e.g., `"(bug #823)"`). --- #### M2. No Parallel Spawn Mode Test Coverage **Category:** Coverage Gap **Files:** `features/tdd_subplan_spawn_orchestration.feature`, `robot/helper_tdd_subplan_spawn_orchestration.py` All 3 Behave scenarios and all 3 Robot tests exercise only `DecisionType.SUBPLAN_SPAWN` with `ExecutionMode.SEQUENTIAL`. The specification (`docs/specification.md` lines 18294-18296, 18403-18407) defines `SUBPLAN_PARALLEL_SPAWN` as a first-class spawn mode with distinct semantics (concurrent execution, independent failure handling). While the core bug (no child `Plan` objects are created) is mode-independent, adding a single scenario with `ExecutionMode.PARALLEL` and `DecisionType.SUBPLAN_PARALLEL_SPAWN` would: - Verify the bug exists for both modes - Guard against a future fix that only addresses sequential spawning - Better align with the spec's equal treatment of both modes **Suggestion:** Add a fourth Behave scenario (e.g., "Parallel spawn result contains child Plan domain objects") and corresponding Robot test case using `PARALLEL` mode. --- ### LOW Severity #### L1. Redundant `SubplanConfig` Construction **Category:** Test Design / DRY **File:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:103` The `step_parent_plan` step creates `context.subplan_config = SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL)` as a separate object, but `_make_parent_plan()` already sets an identical `subplan_config` on the parent plan (line 64-66). The `spawn()` API takes `config` as a separate parameter, so both exist for a reason — but they're always identical in these tests. **Suggestion:** Use `context.subplan_config = context.parent_plan.subplan_config` to make the relationship explicit and avoid divergence. --- #### L2. Redundant Mock Configuration **Category:** Test Design **Files:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:48`, `robot/helper_tdd_subplan_spawn_orchestration.py:61` `_mock_decision_service()` explicitly sets `svc.list_by_type = MagicMock(return_value=[])`, but: 1. `MagicMock()` auto-stubs all attribute access, so this explicit setup is unnecessary. 2. `spawn()` does not call `list_by_type` — that method is only used in `get_spawn_decisions()` (line 359), a separate code path. The explicit stub creates a misleading impression that `spawn()` depends on `list_by_type`. **Suggestion:** Simplify to `return MagicMock()` with a comment noting that `spawn()` only needs a non-None `decision_service` for constructor validation. --- #### L3. Inconsistent ULID Constant Naming **Category:** Style / Maintainability **Files:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:41-42` vs `robot/helper_tdd_subplan_spawn_orchestration.py:48-49` | Behave | Robot | Value | |--------|-------|-------| | `_DECISION_ID_1` | `_DEC_ID1` | `01HGZ6FE0AQDYTR4BXVQZ6DA00` | | `_DECISION_ID_2` | `_DEC_ID2` | `01HGZ6FE0AQDYTR4BXVQZ6DB00` | Same values, different naming conventions. Reduces cross-referencing readability when comparing Behave and Robot test logic. **Suggestion:** Align on one convention. `_DECISION_ID_1` / `_DECISION_ID_2` is more descriptive. --- #### L4. Type Annotation Mismatch on `getattr()` Return **Category:** Correctness / Type Safety **File:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:142` ```python child_plans: list[Plan] | None = getattr(result, "child_plans", None) ``` `getattr()` returns `object`. The `list[Plan] | None` annotation is technically incorrect and would cause a type checker warning. The Robot helper correctly uses `child_plans: object = getattr(...)` (line 136). This inconsistency appears in 5 step functions (lines 142, 163, 186, 204, 243). **Suggestion:** Use `child_plans: object = getattr(...)` consistently in both files, or cast explicitly after the `isinstance` check. --- #### L5. Feature File Could Use `Background` Section **Category:** Style / DRY **File:** `features/tdd_subplan_spawn_orchestration.feature` All 3 scenarios repeat identical `Given`/`And` steps: ```gherkin Given a parent plan configured for subplan spawning And valid spawn entries for the parent plan ``` A `Background` section would reduce this duplication: ```gherkin Background: Given a parent plan configured for subplan spawning And valid spawn entries for the parent plan ``` Each scenario would then start directly with `When I spawn subplans via SubplanService`. --- #### L6. `@mock_only` Tag Not in Acceptance Criteria **Category:** Spec Compliance **File:** `features/tdd_subplan_spawn_orchestration.feature:1` The feature-level tags include `@mock_only` which is not listed in issue #838's acceptance criteria (`@tdd_expected_fail @tdd_bug @tdd_bug_823`). The Robot tests do not have an equivalent tag. While `@mock_only` is a reasonable addition (these tests use mocks, not real infrastructure), the discrepancy should be acknowledged. --- ### Verification Notes The following aspects were verified and found to be **correct**: - **Spec alignment:** Parent plan in `PlanPhase.EXECUTE` is correct (spec: spawning happens during Execute phase). Child plans expected at `PlanPhase.STRATEGIZE` and `ProcessingState.QUEUED` is correct (spec: plans enter Strategize after creation). - **ULID validity:** All 4 ULID constants pass the `ULID_PATTERN` regex (`^[0-9A-HJKMNP-TV-Z]{26}$`). - **Acceptance criteria:** All 6 checkboxes from issue #838 are satisfied. - **Tag structure:** `@tdd_expected_fail @tdd_bug @tdd_bug_823` present on all scenarios/test cases. Tags reference the underlying bug (#823), not the TDD issue (#838), which is correct per convention. - **Test pattern compliance:** Robot test structure matches the canonical TDD template used across all 7 existing `tdd_*.robot` files. - **`spawn()` call path:** Fixtures pass all validation in `spawn()` and `validate_spawn()` — the test correctly reaches the bug assertion point without infrastructure errors. - **Security:** No concerns (test-only code, no user input handling, no new dependencies). - **Performance:** No concerns (test-only code, no hot paths).

brent.edwards force-pushed tdd/m6-subplan-spawn-orchestration from f16c5e0567 to b67dc63eda

2026-03-16 01:12:57 +00:00

Compare

brent.edwards commented

2026-03-16 01:13:57 +00:00

Branch cleaned — `b67dc63e`

Rebased on latest master, removing 3 merge commits. No code changes needed — all review findings were already addressed or deferred as follow-up.

Branch: 1 commit, 0 merge commits. Lint + typecheck pass.

## Branch cleaned — `b67dc63e` Rebased on latest master, removing 3 merge commits. No code changes needed — all review findings were already addressed or deferred as follow-up. Branch: 1 commit, 0 merge commits. Lint + typecheck pass.

brent.edwards merged commit 2d4b12df6a into master

2026-03-16 01:34:55 +00:00

brent.edwards deleted branch tdd/m6-subplan-spawn-orchestration

2026-03-16 01:34:55 +00:00

brent.edwards referenced this issue from a commit

2026-03-16 01:34:56 +00:00

Merge pull request 'test(plan): TDD failing tests for subplan spawn orchestration (bug #823)' (#930) from tdd/m6-subplan-spawn-orchestration into master

freemo referenced this pull request

2026-03-17 18:16:43 +00:00

bug(plan): subplan spawn creates metadata but does not orchestrate real child plan execution #823

Sign in to join this conversation.

No Reviewers

brent.edwards

freemo

CoreRasurae

3 Participants

Notifications

Due Date

No due date set.

Blocks

#838 TDD: subplan spawn creates metadata but does not orchestrate real child plan execution (bug #823)

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#930

test(plan): TDD failing tests for subplan spawn orchestration (bug #823) #930

Summary

Tests Added

Quality Gates

Self-Review — PR #930

Findings

Correctness Verification

Verdict

PM Review — Day 34: TDD PR for Bug #823 (Subplan Spawn Orchestration)

Checklist

Blocking

Recommendation

Code Review Report — PR #930: TDD failing tests for subplan spawn orchestration (bug #823)

Summary

HIGH Severity

H1 — Parent plan created in STRATEGIZE phase; specification requires EXECUTE for spawn

H2 — PR body documents incorrect helper subcommand names

MEDIUM Severity

M1 — Only 3 of 8 acceptance criteria from #823 have corresponding TDD tests

M2 — DRY violation: identical setup code duplicated between Behave steps and Robot helper

M3 — Missing assertion on child.identity.root_plan_id

LOW Severity

L1 — Inconsistent type annotation for child_plans variable across Behave steps

L2 — Redundant SubplanConfig instances on Plan and Context

L3 — Robot tests log stderr but don't assert it is empty

No Issues Found In

Review Fixes Applied — Addressing @CoreRasurae REQUEST_CHANGES and @freemo feedback

Fixes applied:

Not addressed (tracked as follow-up):

Code Review Report — PR #930 (tdd/m6-subplan-spawn-orchestration)

Overall Assessment

Findings Summary

MEDIUM Severity

M1. Robot TDD Listener Blind Inversion Risk

M2. No Parallel Spawn Mode Test Coverage

LOW Severity

L1. Redundant SubplanConfig Construction

L2. Redundant Mock Configuration

L3. Inconsistent ULID Constant Naming

L4. Type Annotation Mismatch on getattr() Return

L5. Feature File Could Use Background Section

L6. @mock_only Tag Not in Acceptance Criteria

Verification Notes

Branch cleaned — b67dc63e

M3 — Missing assertion on `child.identity.root_plan_id`

L1 — Inconsistent type annotation for `child_plans` variable across Behave steps

Code Review Report — PR #930 (`tdd/m6-subplan-spawn-orchestration`)

L1. Redundant `SubplanConfig` Construction

L4. Type Annotation Mismatch on `getattr()` Return

L5. Feature File Could Use `Background` Section

L6. `@mock_only` Tag Not in Acceptance Criteria

Branch cleaned — `b67dc63e`