Add integration test for full hierarchical plan 4-phase lifecycle execution #10270

Closed
opened 2026-04-17 18:27:33 +00:00 by CoreRasurae · 1 comment
Member

Integration Test: Hierarchical Plan 4-Phase Lifecycle Execution

Metadata

  • Commit Message: test(integration): add BDD scenarios for full hierarchical plan 4-phase lifecycle execution
  • Branch: test/hierarchical-plan-4phase-lifecycle

Background and Context

The v3.5.0 acceptance criteria require that child plans execute through a full 4-phase lifecycle (Strategize → Decompose → Execute → Validate), hierarchically decomposing large tasks into smaller, manageable subplans. Currently:

  1. No explicit integration test verifies that child plans undergo all 4 phases with real (non-mocked) execution
  2. Existing SubplanExecutionService tests use mock executors that only set counters, not real execution
  3. No test verifies the 4-phase contract — child plans should receive independent:
    • Strategize phase: Create a Strategy Actor and invoke it for each child plan
    • Decompose phase: Recursively decompose child plans using DecompositionService
    • Execute phase: Execute actions from the child plan's decomposed tree
    • Validate phase: Verify the child plan succeeded and produced expected artifacts
  4. Actor hierarchy tests (features/steps/actor_hierarchy_steps.py) test actor graph definitions, not hierarchical plan execution

This means the design for hierarchical execution exists but is not validated against real execution paths.


Current Behavior

  • Only unit-level tests for SubplanExecutionService exist (using mocks)
  • Only spawning/orchestration logic is tested, not the full 4-phase execution of children
  • No test exercises: "Create a plan with 3 files → decompose into 2 child plans → each child goes through 4 phases → all complete → parent aggregates results"

Expected Behavior

A complete BDD feature file with scenarios covering:

  1. Happy path: Parent plan with 2-level hierarchy executes all 4 phases for both parent and children
  2. Checkpoint triggers: on_subplan_spawn checkpoint is created before first child plan execution
  3. Max-depth enforcement: When a child plan tries to decompose beyond plan.max-child-depth, it stops and creates a leaf node
  4. Child success aggregation: Parent plan collects and aggregates results from all executed children
  5. Child failure handling: When a child plan fails in the Execute phase, the parent plan handles it gracefully
  6. Nested execution: A grandchild plan (3-level hierarchy) executes and cascades results up through the parent

Acceptance Criteria

  1. New BDD feature file created: features/plan_execution_hierarchical_4phase.feature
  2. Scenarios cover at least: happy path (2-level hierarchy), max-depth limit, child failure, nested execution
  3. Step definitions implement real execution (not mocks) using the actual service implementations
  4. Each scenario verifies:
    • All 4 phases (Strategize, Decompose, Execute, Validate) are invoked for parent and each child
    • Checkpoint triggers are created correctly (on_subplan_spawn before child execution)
    • plan.max-child-depth is enforced during child decomposition
    • Child plan results are aggregated into the parent's final status
  5. All existing Behave and Robot tests continue to pass
  6. Coverage remains >=97%
  7. Full nox passes without errors

Subtasks

  • Create features/plan_execution_hierarchical_4phase.feature with Gherkin scenarios
  • Write scenario: "Parent plan with 2-level hierarchy executes all 4 phases successfully"
    • Parent has 10 files → decomposes into 2 child plans (5 files each)
    • Each child goes through Strategize→Decompose→Execute→Validate
    • Parent aggregates results and completes
  • Write scenario: "on_subplan_spawn checkpoint is created before first child execution"
    • Verify checkpoint service records on_subplan_spawn trigger
  • Write scenario: "Decomposition respects max-child-depth during child plan execution"
    • Set plan.max-child-depth=2
    • Parent decomposes into children; each child respects the limit
  • Write scenario: "Child plan failure is handled gracefully by parent"
    • Force a child plan to fail in Execute phase
    • Verify parent plan captures the failure and completes with error status
  • Write scenario: "3-level hierarchy (grandchild) executes with proper cascading"
    • Parent → 2 children (level 2) → at least 1 grandchild (level 3)
    • Verify all results cascade correctly
  • Create step definitions in features/steps/plan_execution_hierarchical_4phase_steps.py
    • Steps use real service implementations, not mocks
    • Steps verify checkpoint creation via CheckpointService
    • Steps verify each phase's completion via PlanLifecycleService
  • Run nox -e unit_tests to verify all Behave tests pass
  • Run nox -e integration_tests (Robot Framework) to verify no regressions
  • Verify coverage >=97% via nox -s coverage_report
  • Run full nox to verify all quality gates pass

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant implementation details.
  • The commit is pushed to the remote on the test/hierarchical-plan-4phase-lifecycle branch.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All quality gates pass: nox -e lint, nox -e typecheck, nox -e unit_tests, nox -e integration_tests, and coverage >=97%.
  • The v3.5.0 acceptance criterion "Full hierarchical plan execution with 4-phase lifecycle per child plan" is verifiably tested and passing.
# Integration Test: Hierarchical Plan 4-Phase Lifecycle Execution ## Metadata - **Commit Message**: `test(integration): add BDD scenarios for full hierarchical plan 4-phase lifecycle execution` - **Branch**: `test/hierarchical-plan-4phase-lifecycle` --- ## Background and Context The v3.5.0 acceptance criteria require that **child plans execute through a full 4-phase lifecycle** (Strategize → Decompose → Execute → Validate), hierarchically decomposing large tasks into smaller, manageable subplans. Currently: 1. **No explicit integration test** verifies that child plans undergo all 4 phases with real (non-mocked) execution 2. **Existing SubplanExecutionService tests** use mock executors that only set counters, not real execution 3. **No test verifies the 4-phase contract** — child plans should receive independent: - **Strategize phase**: Create a Strategy Actor and invoke it for each child plan - **Decompose phase**: Recursively decompose child plans using DecompositionService - **Execute phase**: Execute actions from the child plan's decomposed tree - **Validate phase**: Verify the child plan succeeded and produced expected artifacts 4. **Actor hierarchy tests** (`features/steps/actor_hierarchy_steps.py`) test **actor graph definitions**, not hierarchical **plan execution** This means the design for hierarchical execution exists but is not validated against real execution paths. --- ## Current Behavior - Only unit-level tests for `SubplanExecutionService` exist (using mocks) - Only spawning/orchestration logic is tested, not the full 4-phase execution of children - No test exercises: "Create a plan with 3 files → decompose into 2 child plans → each child goes through 4 phases → all complete → parent aggregates results" --- ## Expected Behavior A complete BDD feature file with scenarios covering: 1. **Happy path**: Parent plan with 2-level hierarchy executes all 4 phases for both parent and children 2. **Checkpoint triggers**: `on_subplan_spawn` checkpoint is created before first child plan execution 3. **Max-depth enforcement**: When a child plan tries to decompose beyond `plan.max-child-depth`, it stops and creates a leaf node 4. **Child success aggregation**: Parent plan collects and aggregates results from all executed children 5. **Child failure handling**: When a child plan fails in the Execute phase, the parent plan handles it gracefully 6. **Nested execution**: A grandchild plan (3-level hierarchy) executes and cascades results up through the parent --- ## Acceptance Criteria 1. New BDD feature file created: `features/plan_execution_hierarchical_4phase.feature` 2. Scenarios cover at least: happy path (2-level hierarchy), max-depth limit, child failure, nested execution 3. Step definitions implement real execution (not mocks) using the actual service implementations 4. Each scenario verifies: - All 4 phases (Strategize, Decompose, Execute, Validate) are invoked for parent and each child - Checkpoint triggers are created correctly (`on_subplan_spawn` before child execution) - `plan.max-child-depth` is enforced during child decomposition - Child plan results are aggregated into the parent's final status 5. All existing Behave and Robot tests continue to pass 6. Coverage remains >=97% 7. Full `nox` passes without errors --- ## Subtasks - [ ] Create `features/plan_execution_hierarchical_4phase.feature` with Gherkin scenarios - [ ] Write scenario: "Parent plan with 2-level hierarchy executes all 4 phases successfully" - Parent has 10 files → decomposes into 2 child plans (5 files each) - Each child goes through Strategize→Decompose→Execute→Validate - Parent aggregates results and completes - [ ] Write scenario: "on_subplan_spawn checkpoint is created before first child execution" - Verify checkpoint service records `on_subplan_spawn` trigger - [ ] Write scenario: "Decomposition respects max-child-depth during child plan execution" - Set `plan.max-child-depth=2` - Parent decomposes into children; each child respects the limit - [ ] Write scenario: "Child plan failure is handled gracefully by parent" - Force a child plan to fail in Execute phase - Verify parent plan captures the failure and completes with error status - [ ] Write scenario: "3-level hierarchy (grandchild) executes with proper cascading" - Parent → 2 children (level 2) → at least 1 grandchild (level 3) - Verify all results cascade correctly - [ ] Create step definitions in `features/steps/plan_execution_hierarchical_4phase_steps.py` - Steps use real service implementations, not mocks - Steps verify checkpoint creation via `CheckpointService` - Steps verify each phase's completion via `PlanLifecycleService` - [ ] Run `nox -e unit_tests` to verify all Behave tests pass - [ ] Run `nox -e integration_tests` (Robot Framework) to verify no regressions - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run full `nox` to verify all quality gates pass --- ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant implementation details. - The commit is pushed to the remote on the `test/hierarchical-plan-4phase-lifecycle` branch. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All quality gates pass: `nox -e lint`, `nox -e typecheck`, `nox -e unit_tests`, `nox -e integration_tests`, and coverage >=97%. - The v3.5.0 acceptance criterion "Full hierarchical plan execution with 4-phase lifecycle per child plan" is verifiably tested and passing.
HAL9000 added this to the v3.5.0 milestone 2026-04-18 07:55:29 +00:00
Owner

[GROOMED] Quality Analysis Complete

Triage Assessment

Validity: This is a real, actionable issue with clear acceptance criteria and well-defined subtasks.

Label Verification

State Label: State/Unverified (present)
Type Label: Type/Testing (present)
Priority Label: Priority/High (present)

All required labels are present and correct.

Milestone Assignment

Milestone: Assigned to v3.5.0 (M6: Autonomy Hardening)

  • This issue directly tests the v3.5.0 acceptance criterion: "Full hierarchical plan execution with 4-phase lifecycle per child plan is verifiably tested and passing"
  • The issue body explicitly references v3.5.0 requirements

Epic/Parent Issue Check

Epic Relationship: This issue is part of the v3.5.0 milestone epic, which encompasses hierarchical plan execution features (issues #10268, #10269, and #10270 form a cohesive feature set)

Issue Quality

Well-Documented: Comprehensive background, context, and acceptance criteria
Actionable: Clear subtasks with specific deliverables
Testable: Acceptance criteria are measurable and verifiable
Not Duplicate: Unique focus on integration testing (not unit testing)

Recommendation

Status: VERIFIED ✓

This issue is ready to move from State/Unverified to State/Verified. All required labels are present, milestone is assigned, and the issue is well-scoped for implementation.


Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

[GROOMED] Quality Analysis Complete ## Triage Assessment ✅ **Validity**: This is a real, actionable issue with clear acceptance criteria and well-defined subtasks. ## Label Verification ✅ **State Label**: State/Unverified (present) ✅ **Type Label**: Type/Testing (present) ✅ **Priority Label**: Priority/High (present) All required labels are present and correct. ## Milestone Assignment ✅ **Milestone**: Assigned to v3.5.0 (M6: Autonomy Hardening) - This issue directly tests the v3.5.0 acceptance criterion: "Full hierarchical plan execution with 4-phase lifecycle per child plan is verifiably tested and passing" - The issue body explicitly references v3.5.0 requirements ## Epic/Parent Issue Check ✅ **Epic Relationship**: This issue is part of the v3.5.0 milestone epic, which encompasses hierarchical plan execution features (issues #10268, #10269, and #10270 form a cohesive feature set) ## Issue Quality ✅ **Well-Documented**: Comprehensive background, context, and acceptance criteria ✅ **Actionable**: Clear subtasks with specific deliverables ✅ **Testable**: Acceptance criteria are measurable and verifiable ✅ **Not Duplicate**: Unique focus on integration testing (not unit testing) ## Recommendation **Status**: VERIFIED ✓ This issue is ready to move from State/Unverified to State/Verified. All required labels are present, milestone is assigned, and the issue is well-scoped for implementation. --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#10270
No description provided.