test(plan): TDD failing tests for subplan spawn orchestration (bug #823) #930

Merged
brent.edwards merged 1 commit from tdd/m6-subplan-spawn-orchestration into master 2026-03-16 01:34:55 +00:00
Member

Summary

TDD expected-fail tests proving bug #823 exists: SubplanService.spawn() only creates metadata (SubplanStatus records and SpawnMetadata) without creating real child Plan domain objects, triggering lifecycle progression, or attaching status to the parent plan.

Tests Added

Behave scenarios (features/tdd_subplan_spawn_orchestration.feature):

  • Spawn result should contain child Plan domain objects (not just metadata)
  • Child plans should enter strategize phase after spawn
  • Parent plan should track child plan lifecycle status

Tags: @tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only

Robot Framework tests (robot/tdd_subplan_spawn_orchestration.robot):

  • spawn-child-plans — verifies SpawnResult contains real Plan objects
  • spawn-lifecycle — verifies child plan lifecycle progression
  • spawn-parent-tracking — verifies parent tracks child status

Quality Gates

Session Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS
nox -s coverage_report 98% (>= 97%)

Closes #838

## Summary TDD expected-fail tests proving bug #823 exists: `SubplanService.spawn()` only creates metadata (`SubplanStatus` records and `SpawnMetadata`) without creating real child `Plan` domain objects, triggering lifecycle progression, or attaching status to the parent plan. ### Tests Added **Behave scenarios** (`features/tdd_subplan_spawn_orchestration.feature`): - Spawn result should contain child Plan domain objects (not just metadata) - Child plans should enter strategize phase after spawn - Parent plan should track child plan lifecycle status Tags: `@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only` **Robot Framework tests** (`robot/tdd_subplan_spawn_orchestration.robot`): - `spawn-child-plans` — verifies SpawnResult contains real Plan objects - `spawn-lifecycle` — verifies child plan lifecycle progression - `spawn-parent-tracking` — verifies parent tracks child status ### Quality Gates | Session | Result | |---|---| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS | | `nox -s coverage_report` | 98% (>= 97%) | Closes #838
brent.edwards added this to the v3.5.0 milestone 2026-03-14 00:40:29 +00:00
brent.edwards force-pushed tdd/m6-subplan-spawn-orchestration from fdcb5ff51c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Successful in 24s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 2m4s
CI / docker (pull_request) Successful in 50s
CI / integration_tests (pull_request) Successful in 3m28s
CI / coverage (pull_request) Successful in 5m0s
CI / benchmark-regression (pull_request) Successful in 35m27s
to 851f1bd18c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 23s
CI / e2e_tests (pull_request) Successful in 23s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 2m16s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 2m37s
CI / coverage (pull_request) Successful in 6m39s
CI / benchmark-regression (pull_request) Successful in 34m31s
2026-03-14 03:46:39 +00:00
Compare
Owner

PM Status Update — Day 34

This is the TDD counterpart PR for Critical bug #823 (subplan spawn no orchestration). Bug #823 is Priority/Critical and blocks M6.

The PR looks well-scoped: 3 Behave scenarios + 3 Robot tests, all tagged @tdd_expected_fail @tdd_bug_823. CI reports passing per Brent's PR description.

Action items:

  1. @CoreRasurae — Please perform the first peer review of this TDD PR. Focus on: correct tag usage (@tdd_bug, @tdd_bug_823, @tdd_expected_fail), whether the tests actually capture the bug behavior described in #823, and whether the tests will pass normally once the bug is fixed.
  2. After approval → merge, which unblocks the bug fix PR for #823.

Priority: Critical — this TDD PR unblocks a Critical-priority bug fix. Fast-track review requested.

**PM Status Update — Day 34** This is the TDD counterpart PR for Critical bug #823 (subplan spawn no orchestration). Bug #823 is Priority/Critical and blocks M6. The PR looks well-scoped: 3 Behave scenarios + 3 Robot tests, all tagged `@tdd_expected_fail @tdd_bug_823`. CI reports passing per Brent's PR description. **Action items:** 1. **@CoreRasurae** — Please perform the first peer review of this TDD PR. Focus on: correct tag usage (`@tdd_bug`, `@tdd_bug_823`, `@tdd_expected_fail`), whether the tests actually capture the bug behavior described in #823, and whether the tests will pass normally once the bug is fixed. 2. After approval → merge, which unblocks the bug fix PR for #823. **Priority:** Critical — this TDD PR unblocks a Critical-priority bug fix. Fast-track review requested.
brent.edwards left a comment

Self-Review — PR #930

Reviewer: @brent.edwards (author self-review)
Review method: 4 parallel threads + 2 fresh-eyes passes

Findings

No issues found. All 4 files are clean:

  • Feature file: Tags correct (@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only), @mock_only appropriate (all in-memory operations with MagicMock)
  • Step definitions: All imports valid, all types annotated, no # type: ignore, no bare except, no leaked business logic
  • Robot file: Standard pattern, tags and timeouts correct
  • Helper: _fail() -> NoReturn correct, _COMMANDS dict correct, all typed

Correctness Verification

Tests correctly prove bug #823: SubplanService.spawn() creates SubplanStatus metadata but never creates child Plan objects, never sets child_plans on SpawnResult (field doesn't exist), and never attaches statuses to parent plan. Tests use getattr(result, "child_plans", None) to safely test for the missing attribute. Tests would correctly fail if someone added a child_plans field and actual Plan creation.

Verdict

Clean — no findings. Ready for external review.

# Self-Review — PR #930 **Reviewer:** @brent.edwards (author self-review) **Review method:** 4 parallel threads + 2 fresh-eyes passes ## Findings **No issues found.** All 4 files are clean: - Feature file: Tags correct (`@tdd_expected_fail @tdd_bug @tdd_bug_823 @mock_only`), `@mock_only` appropriate (all in-memory operations with MagicMock) - Step definitions: All imports valid, all types annotated, no `# type: ignore`, no bare `except`, no leaked business logic - Robot file: Standard pattern, tags and timeouts correct - Helper: `_fail() -> NoReturn` correct, `_COMMANDS` dict correct, all typed ## Correctness Verification Tests **correctly prove bug #823**: `SubplanService.spawn()` creates `SubplanStatus` metadata but never creates child `Plan` objects, never sets `child_plans` on `SpawnResult` (field doesn't exist), and never attaches statuses to parent plan. Tests use `getattr(result, "child_plans", None)` to safely test for the missing attribute. Tests **would correctly fail** if someone added a `child_plans` field and actual Plan creation. ## Verdict **Clean — no findings.** Ready for external review.
freemo left a comment

PM Review — Day 34: TDD PR for Bug #823 (Subplan Spawn Orchestration)

@brent.edwards — Good work on this TDD PR. Quick assessment:

Checklist

  • Follows CONTRIBUTING.md TDD Bug Fix Workflow
  • Tests tagged @tdd_expected_fail @tdd_bug @tdd_bug_823
  • Behave BDD scenarios present (3 scenarios)
  • Robot Framework integration tests present (3 tests)
  • Quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage 98%)
  • Closes TDD issue #838
  • Mergeable (no conflicts)
  • Missing: MoSCoW label (needs MoSCoW/Must have)
  • Missing: Points label (needs Points/2 or appropriate estimate)
  • Missing: Assignee (should be @brent.edwards)
  • Note: Milestone is v3.5.0 (M6) but bug #823 is in M6. Milestone alignment OK.

Blocking

This TDD PR is on the critical path — once merged, bug #823 fix work can begin (assigned to @freemo).

Recommendation

Needs peer review from a second developer. Requesting @CoreRasurae or @hurui200320.


PM review — Day 34

## PM Review — Day 34: TDD PR for Bug #823 (Subplan Spawn Orchestration) @brent.edwards — Good work on this TDD PR. Quick assessment: ### Checklist - [x] Follows CONTRIBUTING.md TDD Bug Fix Workflow - [x] Tests tagged `@tdd_expected_fail @tdd_bug @tdd_bug_823` - [x] Behave BDD scenarios present (3 scenarios) - [x] Robot Framework integration tests present (3 tests) - [x] Quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage 98%) - [x] Closes TDD issue #838 - [x] Mergeable (no conflicts) - [ ] **Missing**: MoSCoW label (needs `MoSCoW/Must have`) - [ ] **Missing**: Points label (needs `Points/2` or appropriate estimate) - [ ] **Missing**: Assignee (should be @brent.edwards) - [ ] **Note**: Milestone is v3.5.0 (M6) but bug #823 is in M6. Milestone alignment OK. ### Blocking This TDD PR is on the critical path — once merged, bug #823 fix work can begin (assigned to @freemo). ### Recommendation Needs peer review from a second developer. Requesting @CoreRasurae or @hurui200320. --- *PM review — Day 34*
CoreRasurae left a comment

Code Review Report — PR #930: TDD failing tests for subplan spawn orchestration (bug #823)

Reviewer: Automated code review (test coverage, test flaws, performance, bug detection, security)
Scope: All code changes in branch tdd/m6-subplan-spawn-orchestration plus close connections to surrounding code (SubplanService, Plan, Decision, SubplanConfig, SpawnResult, Robot common.resource, TDD expected-fail infrastructure)
Methodology: 3 full review cycles across all categories (bugs, test flaws, performance, security, code quality) until convergence


Summary

The PR adds 4 new files (574 lines) implementing TDD expected-fail tests for bug #823 across both Behave (feature + steps) and Robot Framework (robot + helper). The tests correctly prove that SubplanService.spawn() only creates metadata (SubplanStatus + SpawnMetadata) without creating real child Plan domain objects. The TDD tagging convention (@tdd_expected_fail + @tdd_bug + @tdd_bug_823) is correctly applied and follows the project's established pattern.

8 findings identified across 3 review cycles (0 critical, 2 high, 3 medium, 3 low). No performance or security issues.


HIGH Severity

H1 — Parent plan created in STRATEGIZE phase; specification requires EXECUTE for spawn

Files: features/steps/tdd_subplan_spawn_orchestration_steps.py:60, robot/helper_tdd_subplan_spawn_orchestration.py:84

Both _make_parent_plan() functions create the parent plan with phase=PlanPhase.STRATEGIZE. Per the specification (§Plan Hierarchy and Parallelism):

"Decisions about child plans are made during Strategize (as subplan_spawn decision types). Child plans are actually spawned during Execute (based on those decisions)."

The current SubplanService.spawn() does not validate the parent plan's phase, so the test executes without error today. However, when the bug fix is implemented, it is reasonable (and spec-aligned) to add phase validation to spawn(). If that happens, these TDD tests would fail for the wrong reason (phase mismatch instead of missing child_plans), producing a false negative that obscures the original bug signal.

Recommendation: Change both _make_parent_plan() functions to use phase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSING to accurately represent the lifecycle stage at which spawning occurs.


H2 — PR body documents incorrect helper subcommand names

Location: PR #930 body ("Changes" section)

The PR body states the Robot helper has subcommands spawn-returns-plans, child-enters-strategize, and parent-tracks-status. The actual code uses different names:

PR Body (incorrect) Actual Code
spawn-returns-plans spawn-child-plans
child-enters-strategize spawn-lifecycle
parent-tracks-status spawn-parent-tracking

References: helper_tdd_subplan_spawn_orchestration.py:233-236, tdd_subplan_spawn_orchestration.robot:20,29,38

Recommendation: Update the PR body to reflect the actual subcommand names.


MEDIUM Severity

M1 — Only 3 of 8 acceptance criteria from #823 have corresponding TDD tests

Location: All 4 files (overall scope)

Issue #823 defines 8 acceptance criteria. The TDD tests cover 3:

AC Covered Test
spawn() creates real child Plan Yes Scenario 1
Child plan enters strategize phase Yes Scenario 2
Child plan full lifecycle (strategize -> execute -> validate -> complete) No
Parent tracks child plan status Yes Scenario 3
Child results merge into parent decision tree No
Child failure triggers configurable behavior No
Parallel subplan execution works No
plan subplan list shows real status progression No

While TDD tests focus on proving the bug exists, the uncovered criteria (particularly parallel execution and failure handling) represent significant specification requirements that would benefit from TDD coverage to guide the fix implementation.

Recommendation: Consider adding at least one scenario for parallel execution (ExecutionMode.PARALLEL) and one for the failure-handling contract, even if minimal.


M2 — DRY violation: identical setup code duplicated between Behave steps and Robot helper

Files: tdd_subplan_spawn_orchestration_steps.py:39-98 vs helper_tdd_subplan_spawn_orchestration.py:52-118

Approximately 54 lines of code are duplicated verbatim across both files:

  • ULID constants (_PARENT_PLAN_ID, _ROOT_PLAN_ID, _DECISION_ID_1/_DEC_ID1, _DECISION_ID_2/_DEC_ID2)
  • _mock_decision_service() function
  • _make_parent_plan() function
  • _make_spawn_entries() function (with identical Decision and SpawnEntry construction)

Any change to the test fixtures (e.g., adding fields when the bug is fixed) must be synchronized in both files. The ULID constant names even differ slightly (_DECISION_ID_1 vs _DEC_ID1), increasing the maintenance risk.

Recommendation: Extract the shared fixtures into a common module (e.g., robot/fixtures_tdd_subplan_spawn.py or a shared test utilities file) and import from both the Behave steps and the Robot helper.


M3 — Missing assertion on child.identity.root_plan_id

Files: tdd_subplan_spawn_orchestration_steps.py:143-157, helper_tdd_subplan_spawn_orchestration.py:153-170

The tests verify child.identity.parent_plan_id == _PARENT_PLAN_ID but do not verify child.identity.root_plan_id. Per PlanIdentity in plan.py:274 and the specification, every child plan must have root_plan_id set to the top-most plan in the hierarchy. This is a required field for hierarchy traversal and is used by SubplanService.spawn() when building SpawnMetadata (line 224: root_id = parent_plan.identity.root_plan_id or parent_id).

Recommendation: Add an assertion verifying child.identity.root_plan_id == _ROOT_PLAN_ID alongside the existing parent_plan_id check.


LOW Severity

L1 — Inconsistent type annotation for child_plans variable across Behave steps

File: tdd_subplan_spawn_orchestration_steps.py

Step function Line Type annotation
step_result_has_child_plans 136 child_plans: object
step_child_plans_have_parent_id 144 child_plans: list[Plan] | None
step_child_plans_strategize 162 child_plans: list[Plan] | None
step_child_plans_queued 178 child_plans: list[Plan] | None

The first step uses object while the remaining three use list[Plan] | None. Since getattr() returns Any, both are technically valid at runtime, but consistency improves readability.

Recommendation: Use list[Plan] | None in all four steps for consistency.


L2 — Redundant SubplanConfig instances on Plan and Context

File: tdd_subplan_spawn_orchestration_steps.py:60-68, 94-96

The parent plan is constructed with subplan_config=SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL) (line 66), and a separate identical SubplanConfig is stored on context.subplan_config (line 94). Both carry the same values. SubplanService.spawn() takes config as a separate parameter, so the plan's subplan_config is not used by the test. Having two identical configs could confuse future maintainers about which one is authoritative.

Recommendation: Either remove subplan_config from the Plan constructor (if the test doesn't need it on the plan) or derive context.subplan_config from context.parent_plan.subplan_config to avoid divergence.


L3 — Robot tests log stderr but don't assert it is empty

File: tdd_subplan_spawn_orchestration.robot:22-23, 31-32, 40-41

All three test cases log ${result.stderr} for debugging but only assert on rc == 0 and stdout content. If the helper script emits unexpected warnings to stderr while still exiting successfully, they would be silently ignored.

Recommendation: Add Should Be Empty ${result.stderr} after the rc assertion (or at minimum Log ${result.stderr} WARN at warn level to surface unexpected output).


No Issues Found In

  • Performance: All tests are in-memory with mock services; 30s timeouts are appropriate.
  • Security: No file system writes, no network access, no user input handling in test code.
  • Tag compliance: @tdd_expected_fail @tdd_bug @tdd_bug_823 follows the three-tag convention validated by environment.py's validate_tdd_tags().
  • Robot-Behave consistency: Sentinel strings, assertions, and test logic are aligned between the two frameworks.
  • ULID validity: All 4 hardcoded ULIDs are valid 26-character Crockford Base32 strings matching ULID_PATTERN.
  • Pydantic model construction: All Plan, Decision, SubplanConfig, and ContextSnapshot instantiations provide required fields and match current model schemas.
## Code Review Report — PR #930: TDD failing tests for subplan spawn orchestration (bug #823) **Reviewer**: Automated code review (test coverage, test flaws, performance, bug detection, security) **Scope**: All code changes in branch `tdd/m6-subplan-spawn-orchestration` plus close connections to surrounding code (`SubplanService`, `Plan`, `Decision`, `SubplanConfig`, `SpawnResult`, Robot `common.resource`, TDD expected-fail infrastructure) **Methodology**: 3 full review cycles across all categories (bugs, test flaws, performance, security, code quality) until convergence --- ### Summary The PR adds 4 new files (574 lines) implementing TDD expected-fail tests for bug #823 across both Behave (feature + steps) and Robot Framework (robot + helper). The tests correctly prove that `SubplanService.spawn()` only creates metadata (`SubplanStatus` + `SpawnMetadata`) without creating real child `Plan` domain objects. The TDD tagging convention (`@tdd_expected_fail` + `@tdd_bug` + `@tdd_bug_823`) is correctly applied and follows the project's established pattern. **8 findings** identified across 3 review cycles (0 critical, 2 high, 3 medium, 3 low). No performance or security issues. --- ### HIGH Severity #### H1 — Parent plan created in STRATEGIZE phase; specification requires EXECUTE for spawn **Files**: `features/steps/tdd_subplan_spawn_orchestration_steps.py:60`, `robot/helper_tdd_subplan_spawn_orchestration.py:84` Both `_make_parent_plan()` functions create the parent plan with `phase=PlanPhase.STRATEGIZE`. Per the specification (§Plan Hierarchy and Parallelism): > *"Decisions about child plans are made during **Strategize** (as `subplan_spawn` decision types). Child plans are actually spawned during **Execute** (based on those decisions)."* The current `SubplanService.spawn()` does not validate the parent plan's phase, so the test executes without error today. However, when the bug fix is implemented, it is reasonable (and spec-aligned) to add phase validation to `spawn()`. If that happens, these TDD tests would fail for the **wrong reason** (phase mismatch instead of missing `child_plans`), producing a false negative that obscures the original bug signal. **Recommendation**: Change both `_make_parent_plan()` functions to use `phase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSING` to accurately represent the lifecycle stage at which spawning occurs. --- #### H2 — PR body documents incorrect helper subcommand names **Location**: PR #930 body ("Changes" section) The PR body states the Robot helper has subcommands `spawn-returns-plans`, `child-enters-strategize`, and `parent-tracks-status`. The actual code uses different names: | PR Body (incorrect) | Actual Code | |---|---| | `spawn-returns-plans` | `spawn-child-plans` | | `child-enters-strategize` | `spawn-lifecycle` | | `parent-tracks-status` | `spawn-parent-tracking` | References: `helper_tdd_subplan_spawn_orchestration.py:233-236`, `tdd_subplan_spawn_orchestration.robot:20,29,38` **Recommendation**: Update the PR body to reflect the actual subcommand names. --- ### MEDIUM Severity #### M1 — Only 3 of 8 acceptance criteria from #823 have corresponding TDD tests **Location**: All 4 files (overall scope) Issue #823 defines 8 acceptance criteria. The TDD tests cover 3: | AC | Covered | Test | |---|---|---| | spawn() creates real child Plan | Yes | Scenario 1 | | Child plan enters strategize phase | Yes | Scenario 2 | | Child plan full lifecycle (strategize -> execute -> validate -> complete) | **No** | — | | Parent tracks child plan status | Yes | Scenario 3 | | Child results merge into parent decision tree | **No** | — | | Child failure triggers configurable behavior | **No** | — | | Parallel subplan execution works | **No** | — | | `plan subplan list` shows real status progression | **No** | — | While TDD tests focus on *proving the bug exists*, the uncovered criteria (particularly parallel execution and failure handling) represent significant specification requirements that would benefit from TDD coverage to guide the fix implementation. **Recommendation**: Consider adding at least one scenario for parallel execution (`ExecutionMode.PARALLEL`) and one for the failure-handling contract, even if minimal. --- #### M2 — DRY violation: identical setup code duplicated between Behave steps and Robot helper **Files**: `tdd_subplan_spawn_orchestration_steps.py:39-98` vs `helper_tdd_subplan_spawn_orchestration.py:52-118` Approximately 54 lines of code are duplicated verbatim across both files: - ULID constants (`_PARENT_PLAN_ID`, `_ROOT_PLAN_ID`, `_DECISION_ID_1`/`_DEC_ID1`, `_DECISION_ID_2`/`_DEC_ID2`) - `_mock_decision_service()` function - `_make_parent_plan()` function - `_make_spawn_entries()` function (with identical `Decision` and `SpawnEntry` construction) Any change to the test fixtures (e.g., adding fields when the bug is fixed) must be synchronized in both files. The ULID constant names even differ slightly (`_DECISION_ID_1` vs `_DEC_ID1`), increasing the maintenance risk. **Recommendation**: Extract the shared fixtures into a common module (e.g., `robot/fixtures_tdd_subplan_spawn.py` or a shared test utilities file) and import from both the Behave steps and the Robot helper. --- #### M3 — Missing assertion on `child.identity.root_plan_id` **Files**: `tdd_subplan_spawn_orchestration_steps.py:143-157`, `helper_tdd_subplan_spawn_orchestration.py:153-170` The tests verify `child.identity.parent_plan_id == _PARENT_PLAN_ID` but do not verify `child.identity.root_plan_id`. Per `PlanIdentity` in `plan.py:274` and the specification, every child plan must have `root_plan_id` set to the top-most plan in the hierarchy. This is a required field for hierarchy traversal and is used by `SubplanService.spawn()` when building `SpawnMetadata` (line 224: `root_id = parent_plan.identity.root_plan_id or parent_id`). **Recommendation**: Add an assertion verifying `child.identity.root_plan_id == _ROOT_PLAN_ID` alongside the existing `parent_plan_id` check. --- ### LOW Severity #### L1 — Inconsistent type annotation for `child_plans` variable across Behave steps **File**: `tdd_subplan_spawn_orchestration_steps.py` | Step function | Line | Type annotation | |---|---|---| | `step_result_has_child_plans` | 136 | `child_plans: object` | | `step_child_plans_have_parent_id` | 144 | `child_plans: list[Plan] \| None` | | `step_child_plans_strategize` | 162 | `child_plans: list[Plan] \| None` | | `step_child_plans_queued` | 178 | `child_plans: list[Plan] \| None` | The first step uses `object` while the remaining three use `list[Plan] | None`. Since `getattr()` returns `Any`, both are technically valid at runtime, but consistency improves readability. **Recommendation**: Use `list[Plan] | None` in all four steps for consistency. --- #### L2 — Redundant SubplanConfig instances on Plan and Context **File**: `tdd_subplan_spawn_orchestration_steps.py:60-68, 94-96` The parent plan is constructed with `subplan_config=SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL)` (line 66), and a separate identical `SubplanConfig` is stored on `context.subplan_config` (line 94). Both carry the same values. `SubplanService.spawn()` takes `config` as a separate parameter, so the plan's `subplan_config` is not used by the test. Having two identical configs could confuse future maintainers about which one is authoritative. **Recommendation**: Either remove `subplan_config` from the `Plan` constructor (if the test doesn't need it on the plan) or derive `context.subplan_config` from `context.parent_plan.subplan_config` to avoid divergence. --- #### L3 — Robot tests log stderr but don't assert it is empty **File**: `tdd_subplan_spawn_orchestration.robot:22-23, 31-32, 40-41` All three test cases log `${result.stderr}` for debugging but only assert on `rc == 0` and `stdout` content. If the helper script emits unexpected warnings to stderr while still exiting successfully, they would be silently ignored. **Recommendation**: Add `Should Be Empty ${result.stderr}` after the rc assertion (or at minimum `Log ${result.stderr} WARN` at warn level to surface unexpected output). --- ### No Issues Found In - **Performance**: All tests are in-memory with mock services; 30s timeouts are appropriate. - **Security**: No file system writes, no network access, no user input handling in test code. - **Tag compliance**: `@tdd_expected_fail @tdd_bug @tdd_bug_823` follows the three-tag convention validated by `environment.py`'s `validate_tdd_tags()`. - **Robot-Behave consistency**: Sentinel strings, assertions, and test logic are aligned between the two frameworks. - **ULID validity**: All 4 hardcoded ULIDs are valid 26-character Crockford Base32 strings matching `ULID_PATTERN`. - **Pydantic model construction**: All `Plan`, `Decision`, `SubplanConfig`, and `ContextSnapshot` instantiations provide required fields and match current model schemas.
@ -0,0 +57,4 @@
root_plan_id=_ROOT_PLAN_ID,
),
namespaced_name=NamespacedName(namespace="local", name="parent-plan"),
description="Parent plan for subplan spawn orchestration test",
Member

H1: Parent plan phase should be PlanPhase.EXECUTE (not STRATEGIZE). Per spec, spawn decisions are recorded during Strategize, but actual spawning occurs during Execute. If the bug fix adds phase validation to spawn(), this test would fail for the wrong reason.

phase=PlanPhase.EXECUTE,
processing_state=ProcessingState.PROCESSING,
**H1**: Parent plan phase should be `PlanPhase.EXECUTE` (not `STRATEGIZE`). Per spec, spawn decisions are *recorded* during Strategize, but actual spawning occurs during Execute. If the bug fix adds phase validation to `spawn()`, this test would fail for the wrong reason. ```python phase=PlanPhase.EXECUTE, processing_state=ProcessingState.PROCESSING, ```
@ -0,0 +133,4 @@
Bug #823: SpawnResult only contains SubplanStatus metadata, not
real Plan domain objects. This assertion will fail until the bug
is fixed.
Member

L1: Type annotation inconsistency — this step uses object while step_child_plans_have_parent_id, step_child_plans_strategize, and step_child_plans_queued use list[Plan] | None. Consider using list[Plan] | None here too for consistency.

**L1**: Type annotation inconsistency — this step uses `object` while `step_child_plans_have_parent_id`, `step_child_plans_strategize`, and `step_child_plans_queued` use `list[Plan] | None`. Consider using `list[Plan] | None` here too for consistency.
@ -0,0 +152,4 @@
f"Expected {len(context.spawn_entries)} child plans, got {len(child_plans)}"
)
Member

M3: Missing assertion on child.identity.root_plan_id == _ROOT_PLAN_ID. Per PlanIdentity and the spec, every child plan must have root_plan_id set to the top-most plan in the hierarchy.

**M3**: Missing assertion on `child.identity.root_plan_id == _ROOT_PLAN_ID`. Per `PlanIdentity` and the spec, every child plan must have `root_plan_id` set to the top-most plan in the hierarchy.
@ -0,0 +81,4 @@
def _make_spawn_entries() -> list[SpawnEntry]:
"""Create spawn entries for testing."""
Member

H1: Same phase issue as the Behave side — should be phase=PlanPhase.EXECUTE to match the spec-defined lifecycle stage where spawning occurs.

**H1**: Same phase issue as the Behave side — should be `phase=PlanPhase.EXECUTE` to match the spec-defined lifecycle stage where spawning occurs.
@ -0,0 +165,4 @@
svc: SubplanService = SubplanService(
decision_service=_mock_decision_service(),
)
parent: Plan = _make_parent_plan()
Member

M3: Same as Behave side — add a root_plan_id check:

if child.identity.root_plan_id != _ROOT_PLAN_ID:
    _fail(f"Child plan root_plan_id={child.identity.root_plan_id!r}, expected {_ROOT_PLAN_ID!r}")
**M3**: Same as Behave side — add a `root_plan_id` check: ```python if child.identity.root_plan_id != _ROOT_PLAN_ID: _fail(f"Child plan root_plan_id={child.identity.root_plan_id!r}, expected {_ROOT_PLAN_ID!r}") ```
@ -0,0 +20,4 @@
${result}= Run Process ${PYTHON} ${HELPER} spawn-child-plans cwd=${WORKSPACE} timeout=30s
Log ${result.stdout}
Log ${result.stderr}
Should Be Equal As Integers ${result.rc} 0
Member

L3: Consider adding Should Be Empty ${result.stderr} to catch unexpected warnings that might indicate issues even when rc==0.

**L3**: Consider adding `Should Be Empty ${result.stderr}` to catch unexpected warnings that might indicate issues even when rc==0.
brent.edwards force-pushed tdd/m6-subplan-spawn-orchestration from de8edfd7dd
Some checks failed
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 49s
CI / quality (pull_request) Successful in 26s
CI / security (pull_request) Successful in 53s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 29s
CI / e2e_tests (pull_request) Successful in 50s
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to b258410439
All checks were successful
CI / lint (pull_request) Successful in 35s
CI / quality (pull_request) Successful in 47s
CI / typecheck (pull_request) Successful in 1m25s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 1m23s
CI / build (pull_request) Successful in 29s
CI / e2e_tests (pull_request) Successful in 48s
CI / unit_tests (pull_request) Successful in 3m39s
CI / integration_tests (pull_request) Successful in 5m7s
CI / docker (pull_request) Successful in 8s
CI / coverage (pull_request) Successful in 7m11s
CI / benchmark-regression (pull_request) Successful in 43m27s
2026-03-15 19:17:01 +00:00
Compare
Author
Member

Review Fixes Applied — Addressing @CoreRasurae REQUEST_CHANGES and @freemo feedback

Commit: b2584104 → force-pushed

Fixes applied:

Finding Fix
H1 (HIGH) — Parent plan in STRATEGIZE phase Changed both _make_parent_plan() functions to use phase=PlanPhase.EXECUTE (spawn occurs during Execute per spec)
H2 (HIGH) — PR body incorrect subcommand names Updated PR description with correct names: spawn-child-plans, spawn-lifecycle, spawn-parent-tracking
M3 (MEDIUM) — Missing root_plan_id assertion Added root_plan_id == _ROOT_PLAN_ID assertion in both Behave steps and Robot helper
L1 (LOW) — Inconsistent type annotation Changed child_plans: object → `child_plans: list[Plan]
@freemo — Missing labels/assignee Added MoSCoW/Must have, Points/2, assigned to brent.edwards

Not addressed (tracked as follow-up):

Finding Rationale
M1 — Only 3 of 8 ACs covered TDD tests focus on proving the core bug. Parallel execution and failure handling are separate specification behaviors best tested when the fix is implemented.
M2 — DRY violation Consistent with existing pattern (each file defines its own helpers). Will consider shared fixture module as more TDD subplan tests are added.
L2 — Redundant SubplanConfig Plan's subplan_config is a domain requirement; test config is the spawn() parameter. Both are needed.
L3 — Robot tests don't assert stderr empty Stderr may contain debug logs from services; asserting empty would be brittle.

All nox gates pass (lint, typecheck).

## Review Fixes Applied — Addressing @CoreRasurae REQUEST_CHANGES and @freemo feedback **Commit:** `b2584104` → force-pushed ### Fixes applied: | Finding | Fix | |---|---| | **H1** (HIGH) — Parent plan in STRATEGIZE phase | Changed both `_make_parent_plan()` functions to use `phase=PlanPhase.EXECUTE` (spawn occurs during Execute per spec) | | **H2** (HIGH) — PR body incorrect subcommand names | Updated PR description with correct names: `spawn-child-plans`, `spawn-lifecycle`, `spawn-parent-tracking` | | **M3** (MEDIUM) — Missing `root_plan_id` assertion | Added `root_plan_id == _ROOT_PLAN_ID` assertion in both Behave steps and Robot helper | | **L1** (LOW) — Inconsistent type annotation | Changed `child_plans: object` → `child_plans: list[Plan] | None` for consistency | | @freemo — Missing labels/assignee | Added `MoSCoW/Must have`, `Points/2`, assigned to `brent.edwards` | ### Not addressed (tracked as follow-up): | Finding | Rationale | |---|---| | **M1** — Only 3 of 8 ACs covered | TDD tests focus on proving the core bug. Parallel execution and failure handling are separate specification behaviors best tested when the fix is implemented. | | **M2** — DRY violation | Consistent with existing pattern (each file defines its own helpers). Will consider shared fixture module as more TDD subplan tests are added. | | **L2** — Redundant SubplanConfig | Plan's `subplan_config` is a domain requirement; test config is the `spawn()` parameter. Both are needed. | | **L3** — Robot tests don't assert stderr empty | Stderr may contain debug logs from services; asserting empty would be brittle. | All nox gates pass (lint, typecheck).
CoreRasurae left a comment

Code Review Report — PR #930 (tdd/m6-subplan-spawn-orchestration)

Reviewer: Automated deep review (test coverage, test flaws, performance, bug detection, security)
Scope: 4 new files (574 lines) on branch tdd/m6-subplan-spawn-orchestration
Method: 3 iterative global review cycles covering all categories, cross-referenced against SubplanService.spawn() implementation (subplan_service.py:176-268), SpawnResult dataclass (subplan_service.py:100-118), Plan domain model (plan.py:518-1004), specification document (docs/specification.md), and established TDD test patterns across the codebase.


Overall Assessment

This is a solid TDD expected-fail PR that correctly identifies and proves the existence of bug #823. The code follows established patterns, is well-documented, and meets all acceptance criteria from issue #838. No blocking issues were found. The findings below are opportunities for improvement, organized by severity and category.


Findings Summary

Severity Count Categories
Critical 0
Medium 2 Test Design, Coverage
Low 6 Test Design, Correctness, Style, Spec Compliance

MEDIUM Severity

M1. Robot TDD Listener Blind Inversion Risk

Category: Test Design / Test Infrastructure
Files: robot/tdd_subplan_spawn_orchestration.robot (all 3 test cases)
Relates to: robot/tdd_expected_fail_listener.py (pre-existing)

The Robot tdd_expected_fail_listener.py inverts any test failure to PASS without checking the failure type. Unlike the Behave implementation (in features/environment.py) which guards against non-AssertionError exceptions, infrastructure errors, and dry-run mode, the Robot listener performs a simple FAIL→PASS / PASS→FAIL flip.

This means if the Robot helper script crashes due to an unrelated problem (e.g., import error from a refactor, ValueError from spawn() validation changing, or a domain model constructor change), the test would exit with rc=1, Robot would mark it FAIL, and the listener would silently invert it to PASS. The tests would appear green while actually broken.

This is a pre-existing architectural weakness affecting all 4 existing TDD Robot files — not a regression introduced by this PR. However, it directly impacts these tests' reliability.

Suggestion: Consider adding a guard to the Robot helper dispatch — e.g., wrapping each subcommand call in a try/except that catches unexpected exceptions and exits with a distinct code (e.g., rc=2) that the listener does NOT invert. Alternatively, enhance the listener to only invert when stderr contains a known sentinel pattern (e.g., "(bug #823)").


M2. No Parallel Spawn Mode Test Coverage

Category: Coverage Gap
Files: features/tdd_subplan_spawn_orchestration.feature, robot/helper_tdd_subplan_spawn_orchestration.py

All 3 Behave scenarios and all 3 Robot tests exercise only DecisionType.SUBPLAN_SPAWN with ExecutionMode.SEQUENTIAL. The specification (docs/specification.md lines 18294-18296, 18403-18407) defines SUBPLAN_PARALLEL_SPAWN as a first-class spawn mode with distinct semantics (concurrent execution, independent failure handling).

While the core bug (no child Plan objects are created) is mode-independent, adding a single scenario with ExecutionMode.PARALLEL and DecisionType.SUBPLAN_PARALLEL_SPAWN would:

  • Verify the bug exists for both modes
  • Guard against a future fix that only addresses sequential spawning
  • Better align with the spec's equal treatment of both modes

Suggestion: Add a fourth Behave scenario (e.g., "Parallel spawn result contains child Plan domain objects") and corresponding Robot test case using PARALLEL mode.


LOW Severity

L1. Redundant SubplanConfig Construction

Category: Test Design / DRY
File: features/steps/tdd_subplan_spawn_orchestration_steps.py:103

The step_parent_plan step creates context.subplan_config = SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL) as a separate object, but _make_parent_plan() already sets an identical subplan_config on the parent plan (line 64-66). The spawn() API takes config as a separate parameter, so both exist for a reason — but they're always identical in these tests.

Suggestion: Use context.subplan_config = context.parent_plan.subplan_config to make the relationship explicit and avoid divergence.


L2. Redundant Mock Configuration

Category: Test Design
Files: features/steps/tdd_subplan_spawn_orchestration_steps.py:48, robot/helper_tdd_subplan_spawn_orchestration.py:61

_mock_decision_service() explicitly sets svc.list_by_type = MagicMock(return_value=[]), but:

  1. MagicMock() auto-stubs all attribute access, so this explicit setup is unnecessary.
  2. spawn() does not call list_by_type — that method is only used in get_spawn_decisions() (line 359), a separate code path.

The explicit stub creates a misleading impression that spawn() depends on list_by_type.

Suggestion: Simplify to return MagicMock() with a comment noting that spawn() only needs a non-None decision_service for constructor validation.


L3. Inconsistent ULID Constant Naming

Category: Style / Maintainability
Files: features/steps/tdd_subplan_spawn_orchestration_steps.py:41-42 vs robot/helper_tdd_subplan_spawn_orchestration.py:48-49

Behave Robot Value
_DECISION_ID_1 _DEC_ID1 01HGZ6FE0AQDYTR4BXVQZ6DA00
_DECISION_ID_2 _DEC_ID2 01HGZ6FE0AQDYTR4BXVQZ6DB00

Same values, different naming conventions. Reduces cross-referencing readability when comparing Behave and Robot test logic.

Suggestion: Align on one convention. _DECISION_ID_1 / _DECISION_ID_2 is more descriptive.


L4. Type Annotation Mismatch on getattr() Return

Category: Correctness / Type Safety
File: features/steps/tdd_subplan_spawn_orchestration_steps.py:142

child_plans: list[Plan] | None = getattr(result, "child_plans", None)

getattr() returns object. The list[Plan] | None annotation is technically incorrect and would cause a type checker warning. The Robot helper correctly uses child_plans: object = getattr(...) (line 136).

This inconsistency appears in 5 step functions (lines 142, 163, 186, 204, 243).

Suggestion: Use child_plans: object = getattr(...) consistently in both files, or cast explicitly after the isinstance check.


L5. Feature File Could Use Background Section

Category: Style / DRY
File: features/tdd_subplan_spawn_orchestration.feature

All 3 scenarios repeat identical Given/And steps:

Given a parent plan configured for subplan spawning
And valid spawn entries for the parent plan

A Background section would reduce this duplication:

Background:
  Given a parent plan configured for subplan spawning
  And valid spawn entries for the parent plan

Each scenario would then start directly with When I spawn subplans via SubplanService.


L6. @mock_only Tag Not in Acceptance Criteria

Category: Spec Compliance
File: features/tdd_subplan_spawn_orchestration.feature:1

The feature-level tags include @mock_only which is not listed in issue #838's acceptance criteria (@tdd_expected_fail @tdd_bug @tdd_bug_823). The Robot tests do not have an equivalent tag. While @mock_only is a reasonable addition (these tests use mocks, not real infrastructure), the discrepancy should be acknowledged.


Verification Notes

The following aspects were verified and found to be correct:

  • Spec alignment: Parent plan in PlanPhase.EXECUTE is correct (spec: spawning happens during Execute phase). Child plans expected at PlanPhase.STRATEGIZE and ProcessingState.QUEUED is correct (spec: plans enter Strategize after creation).
  • ULID validity: All 4 ULID constants pass the ULID_PATTERN regex (^[0-9A-HJKMNP-TV-Z]{26}$).
  • Acceptance criteria: All 6 checkboxes from issue #838 are satisfied.
  • Tag structure: @tdd_expected_fail @tdd_bug @tdd_bug_823 present on all scenarios/test cases. Tags reference the underlying bug (#823), not the TDD issue (#838), which is correct per convention.
  • Test pattern compliance: Robot test structure matches the canonical TDD template used across all 7 existing tdd_*.robot files.
  • spawn() call path: Fixtures pass all validation in spawn() and validate_spawn() — the test correctly reaches the bug assertion point without infrastructure errors.
  • Security: No concerns (test-only code, no user input handling, no new dependencies).
  • Performance: No concerns (test-only code, no hot paths).
## Code Review Report — PR #930 (`tdd/m6-subplan-spawn-orchestration`) **Reviewer:** Automated deep review (test coverage, test flaws, performance, bug detection, security) **Scope:** 4 new files (574 lines) on branch `tdd/m6-subplan-spawn-orchestration` **Method:** 3 iterative global review cycles covering all categories, cross-referenced against `SubplanService.spawn()` implementation (`subplan_service.py:176-268`), `SpawnResult` dataclass (`subplan_service.py:100-118`), `Plan` domain model (`plan.py:518-1004`), specification document (`docs/specification.md`), and established TDD test patterns across the codebase. --- ### Overall Assessment This is a **solid TDD expected-fail PR** that correctly identifies and proves the existence of bug #823. The code follows established patterns, is well-documented, and meets all acceptance criteria from issue #838. No blocking issues were found. The findings below are opportunities for improvement, organized by severity and category. --- ### Findings Summary | Severity | Count | Categories | |----------|-------|------------| | Critical | 0 | — | | Medium | 2 | Test Design, Coverage | | Low | 6 | Test Design, Correctness, Style, Spec Compliance | --- ### MEDIUM Severity #### M1. Robot TDD Listener Blind Inversion Risk **Category:** Test Design / Test Infrastructure **Files:** `robot/tdd_subplan_spawn_orchestration.robot` (all 3 test cases) **Relates to:** `robot/tdd_expected_fail_listener.py` (pre-existing) The Robot `tdd_expected_fail_listener.py` inverts **any** test failure to PASS without checking the failure type. Unlike the Behave implementation (in `features/environment.py`) which guards against non-`AssertionError` exceptions, infrastructure errors, and dry-run mode, the Robot listener performs a simple `FAIL→PASS` / `PASS→FAIL` flip. This means if the Robot helper script crashes due to an unrelated problem (e.g., import error from a refactor, `ValueError` from `spawn()` validation changing, or a domain model constructor change), the test would exit with `rc=1`, Robot would mark it FAIL, and the listener would silently invert it to PASS. **The tests would appear green while actually broken.** This is a **pre-existing architectural weakness** affecting all 4 existing TDD Robot files — not a regression introduced by this PR. However, it directly impacts these tests' reliability. **Suggestion:** Consider adding a guard to the Robot helper dispatch — e.g., wrapping each subcommand call in a `try/except` that catches unexpected exceptions and exits with a distinct code (e.g., `rc=2`) that the listener does NOT invert. Alternatively, enhance the listener to only invert when stderr contains a known sentinel pattern (e.g., `"(bug #823)"`). --- #### M2. No Parallel Spawn Mode Test Coverage **Category:** Coverage Gap **Files:** `features/tdd_subplan_spawn_orchestration.feature`, `robot/helper_tdd_subplan_spawn_orchestration.py` All 3 Behave scenarios and all 3 Robot tests exercise only `DecisionType.SUBPLAN_SPAWN` with `ExecutionMode.SEQUENTIAL`. The specification (`docs/specification.md` lines 18294-18296, 18403-18407) defines `SUBPLAN_PARALLEL_SPAWN` as a first-class spawn mode with distinct semantics (concurrent execution, independent failure handling). While the core bug (no child `Plan` objects are created) is mode-independent, adding a single scenario with `ExecutionMode.PARALLEL` and `DecisionType.SUBPLAN_PARALLEL_SPAWN` would: - Verify the bug exists for both modes - Guard against a future fix that only addresses sequential spawning - Better align with the spec's equal treatment of both modes **Suggestion:** Add a fourth Behave scenario (e.g., "Parallel spawn result contains child Plan domain objects") and corresponding Robot test case using `PARALLEL` mode. --- ### LOW Severity #### L1. Redundant `SubplanConfig` Construction **Category:** Test Design / DRY **File:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:103` The `step_parent_plan` step creates `context.subplan_config = SubplanConfig(execution_mode=ExecutionMode.SEQUENTIAL)` as a separate object, but `_make_parent_plan()` already sets an identical `subplan_config` on the parent plan (line 64-66). The `spawn()` API takes `config` as a separate parameter, so both exist for a reason — but they're always identical in these tests. **Suggestion:** Use `context.subplan_config = context.parent_plan.subplan_config` to make the relationship explicit and avoid divergence. --- #### L2. Redundant Mock Configuration **Category:** Test Design **Files:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:48`, `robot/helper_tdd_subplan_spawn_orchestration.py:61` `_mock_decision_service()` explicitly sets `svc.list_by_type = MagicMock(return_value=[])`, but: 1. `MagicMock()` auto-stubs all attribute access, so this explicit setup is unnecessary. 2. `spawn()` does not call `list_by_type` — that method is only used in `get_spawn_decisions()` (line 359), a separate code path. The explicit stub creates a misleading impression that `spawn()` depends on `list_by_type`. **Suggestion:** Simplify to `return MagicMock()` with a comment noting that `spawn()` only needs a non-None `decision_service` for constructor validation. --- #### L3. Inconsistent ULID Constant Naming **Category:** Style / Maintainability **Files:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:41-42` vs `robot/helper_tdd_subplan_spawn_orchestration.py:48-49` | Behave | Robot | Value | |--------|-------|-------| | `_DECISION_ID_1` | `_DEC_ID1` | `01HGZ6FE0AQDYTR4BXVQZ6DA00` | | `_DECISION_ID_2` | `_DEC_ID2` | `01HGZ6FE0AQDYTR4BXVQZ6DB00` | Same values, different naming conventions. Reduces cross-referencing readability when comparing Behave and Robot test logic. **Suggestion:** Align on one convention. `_DECISION_ID_1` / `_DECISION_ID_2` is more descriptive. --- #### L4. Type Annotation Mismatch on `getattr()` Return **Category:** Correctness / Type Safety **File:** `features/steps/tdd_subplan_spawn_orchestration_steps.py:142` ```python child_plans: list[Plan] | None = getattr(result, "child_plans", None) ``` `getattr()` returns `object`. The `list[Plan] | None` annotation is technically incorrect and would cause a type checker warning. The Robot helper correctly uses `child_plans: object = getattr(...)` (line 136). This inconsistency appears in 5 step functions (lines 142, 163, 186, 204, 243). **Suggestion:** Use `child_plans: object = getattr(...)` consistently in both files, or cast explicitly after the `isinstance` check. --- #### L5. Feature File Could Use `Background` Section **Category:** Style / DRY **File:** `features/tdd_subplan_spawn_orchestration.feature` All 3 scenarios repeat identical `Given`/`And` steps: ```gherkin Given a parent plan configured for subplan spawning And valid spawn entries for the parent plan ``` A `Background` section would reduce this duplication: ```gherkin Background: Given a parent plan configured for subplan spawning And valid spawn entries for the parent plan ``` Each scenario would then start directly with `When I spawn subplans via SubplanService`. --- #### L6. `@mock_only` Tag Not in Acceptance Criteria **Category:** Spec Compliance **File:** `features/tdd_subplan_spawn_orchestration.feature:1` The feature-level tags include `@mock_only` which is not listed in issue #838's acceptance criteria (`@tdd_expected_fail @tdd_bug @tdd_bug_823`). The Robot tests do not have an equivalent tag. While `@mock_only` is a reasonable addition (these tests use mocks, not real infrastructure), the discrepancy should be acknowledged. --- ### Verification Notes The following aspects were verified and found to be **correct**: - **Spec alignment:** Parent plan in `PlanPhase.EXECUTE` is correct (spec: spawning happens during Execute phase). Child plans expected at `PlanPhase.STRATEGIZE` and `ProcessingState.QUEUED` is correct (spec: plans enter Strategize after creation). - **ULID validity:** All 4 ULID constants pass the `ULID_PATTERN` regex (`^[0-9A-HJKMNP-TV-Z]{26}$`). - **Acceptance criteria:** All 6 checkboxes from issue #838 are satisfied. - **Tag structure:** `@tdd_expected_fail @tdd_bug @tdd_bug_823` present on all scenarios/test cases. Tags reference the underlying bug (#823), not the TDD issue (#838), which is correct per convention. - **Test pattern compliance:** Robot test structure matches the canonical TDD template used across all 7 existing `tdd_*.robot` files. - **`spawn()` call path:** Fixtures pass all validation in `spawn()` and `validate_spawn()` — the test correctly reaches the bug assertion point without infrastructure errors. - **Security:** No concerns (test-only code, no user input handling, no new dependencies). - **Performance:** No concerns (test-only code, no hot paths).
brent.edwards force-pushed tdd/m6-subplan-spawn-orchestration from f16c5e0567
Some checks failed
CI / lint (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 36s
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 42s
CI / security (pull_request) Successful in 45s
CI / build (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 2m26s
CI / integration_tests (pull_request) Successful in 4m14s
CI / unit_tests (pull_request) Successful in 5m33s
CI / docker (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
to b67dc63eda
All checks were successful
CI / lint (pull_request) Successful in 1m2s
CI / quality (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 2m21s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 2m36s
CI / build (pull_request) Successful in 21s
CI / e2e_tests (pull_request) Successful in 2m6s
CI / integration_tests (pull_request) Successful in 3m57s
CI / unit_tests (pull_request) Successful in 4m5s
CI / docker (pull_request) Successful in 58s
CI / coverage (pull_request) Successful in 6m37s
CI / benchmark-regression (pull_request) Successful in 37m33s
2026-03-16 01:12:57 +00:00
Compare
Author
Member

Branch cleaned — b67dc63e

Rebased on latest master, removing 3 merge commits. No code changes needed — all review findings were already addressed or deferred as follow-up.

Branch: 1 commit, 0 merge commits. Lint + typecheck pass.

## Branch cleaned — `b67dc63e` Rebased on latest master, removing 3 merge commits. No code changes needed — all review findings were already addressed or deferred as follow-up. Branch: 1 commit, 0 merge commits. Lint + typecheck pass.
brent.edwards deleted branch tdd/m6-subplan-spawn-orchestration 2026-03-16 01:34:55 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!930
No description provided.