test: add TDD bug-capture test for #967 — plan execute phase processing #1050

Merged
hurui200320 merged 2 commits from tdd/m3-plan-execute-phase-processing into master 2026-03-19 06:41:27 +00:00
Member

Summary

This PR adds TDD bug-capture tests for bug #967plan execute only transitions state without running strategize or execute phase processing.

Motivation

Bug #967 describes that the plan execute CLI command originally only called service.execute_plan(plan_id), which is a state transition only (Strategize/COMPLETE → Execute/QUEUED). When a plan was in Strategize/QUEUED state (immediately after plan use), the command failed because execute_plan() requires Strategize/COMPLETE. The CLI should detect the plan's current phase and run PlanExecutor.run_strategize() before transitioning.

Per the project's TDD Bug Fix Workflow (CONTRIBUTING.md), the first step in fixing any bug is to write a test that captures the buggy behavior. Since the fix for #967 is already present in the codebase (the CLI handler already orchestrates properly), the @tdd_expected_fail tags have been removed and these tests serve as permanent regression guards ensuring the fix is never reverted.

Design Approach

All @tdd_expected_fail scenarios were rewritten to exercise the CLI orchestration layer — the actual code path affected by bug #967. This satisfies AC4: "The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed."

  • Scenarios 1, 2, 4 (previously @tdd_expected_fail): Use Typer's CliRunner with mocked services to invoke the plan execute CLI command handler directly. This tests the orchestration logic in plan.py — the exact code that was buggy.
  • Scenario 3 (positive control): Uses real PlanLifecycleService (in-memory) and PlanExecutor (stub actors) to demonstrate that proper service-level orchestration works.
  • Robot tests: Replicate the CLI orchestration logic using real services to verify at the integration level.

Changes

Behave Unit Tests

  • features/tdd_plan_execute_phase_processing.feature — 4 scenarios
  • features/steps/tdd_plan_execute_phase_processing_steps.py — Step definitions using CliRunner and mocked services

Scenarios:

  1. CLI execute command handles plan in Strategize/QUEUED state — Invokes plan execute via CliRunner on a QUEUED plan. Verifies the CLI succeeds and the plan reaches Execute phase.
  2. CLI execute command orchestrates full lifecycle for QUEUED plan — Verifies run_strategize() and run_execute() are both called by the CLI handler.
  3. Positive control — proper orchestration transitions QUEUED plan to Execute — Demonstrates that run_strategize()execute_plan() works correctly at the service level. Always passes.
  4. CLI auto-discovery finds plans in Strategize/QUEUED state — Invokes plan execute with no plan_id. Verifies the auto-discovery filter includes QUEUED plans.

Robot Integration Tests

  • robot/tdd_plan_execute_phase_processing.robot — 4 test cases matching the Behave scenarios
  • robot/helper_tdd_plan_execute_phase_processing.py — Helper script replicating CLI orchestration logic

CHANGELOG

  • CHANGELOG.md — Added entry under "## Unreleased" describing the new tests.

Quality Gates

  • nox -e lint: passed
  • nox -e typecheck: passed (0 errors)
  • nox -e unit_tests: passed (391 features, 11177 scenarios, 0 failures)
  • nox -e integration_tests: passed (1572 tests, 0 failures)
  • nox -e e2e_tests: passed (16 tests, 0 failures)
  • nox -e coverage_report: 97% coverage

Review Cycle 2 Fixes

  • Critical #1: Rewrote @tdd_expected_fail scenarios to exercise the CLI orchestration layer via CliRunner instead of testing service/executor APIs that are correct by design. Removed @tdd_expected_fail tags since the bug fix is already in the codebase.
  • Major #2: Added CHANGELOG entry.
  • Major #3: Rebased onto current master.
  • Minor #4: Added ProcessingState.QUEUED assertion to positive control scenario.
  • Minor #5: Narrowed exception handler from bare except Exception to except (PlanError, PlanNotReadyError).
  • Minor #7: Changed Robot suite to use Setup Test Environment With Database Isolation.
  • Minor #8: Added on_timeout=kill to all Robot Run Process calls.
  • Minor #12: Added docstring to _fail() helper.
  • Minor #13: Removed redundant Settings() instantiation.

Known Limitations

  • The @tdd_expected_fail tags were removed because the bug fix for #967 is already in the codebase. If this PR is merged before the #967 fix PR, the tags would need to be re-added. However, the CHANGELOG and existing CLI code confirm the fix is already on master.

Closes #977

## Summary This PR adds TDD bug-capture tests for bug #967 — `plan execute` only transitions state without running strategize or execute phase processing. ### Motivation Bug #967 describes that the `plan execute` CLI command originally only called `service.execute_plan(plan_id)`, which is a state transition only (Strategize/COMPLETE → Execute/QUEUED). When a plan was in Strategize/QUEUED state (immediately after `plan use`), the command failed because `execute_plan()` requires Strategize/COMPLETE. The CLI should detect the plan's current phase and run `PlanExecutor.run_strategize()` before transitioning. Per the project's TDD Bug Fix Workflow (`CONTRIBUTING.md`), the first step in fixing any bug is to write a test that captures the buggy behavior. Since the fix for #967 is already present in the codebase (the CLI handler already orchestrates properly), the `@tdd_expected_fail` tags have been removed and these tests serve as **permanent regression guards** ensuring the fix is never reverted. ### Design Approach **All `@tdd_expected_fail` scenarios were rewritten to exercise the CLI orchestration layer** — the actual code path affected by bug #967. This satisfies AC4: "The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed." - **Scenarios 1, 2, 4** (previously `@tdd_expected_fail`): Use Typer's `CliRunner` with mocked services to invoke the `plan execute` CLI command handler directly. This tests the orchestration logic in `plan.py` — the exact code that was buggy. - **Scenario 3** (positive control): Uses real `PlanLifecycleService` (in-memory) and `PlanExecutor` (stub actors) to demonstrate that proper service-level orchestration works. - **Robot tests**: Replicate the CLI orchestration logic using real services to verify at the integration level. ### Changes #### Behave Unit Tests - `features/tdd_plan_execute_phase_processing.feature` — 4 scenarios - `features/steps/tdd_plan_execute_phase_processing_steps.py` — Step definitions using CliRunner and mocked services **Scenarios:** 1. **CLI execute command handles plan in Strategize/QUEUED state** — Invokes `plan execute` via CliRunner on a QUEUED plan. Verifies the CLI succeeds and the plan reaches Execute phase. 2. **CLI execute command orchestrates full lifecycle for QUEUED plan** — Verifies `run_strategize()` and `run_execute()` are both called by the CLI handler. 3. **Positive control — proper orchestration transitions QUEUED plan to Execute** — Demonstrates that `run_strategize()` → `execute_plan()` works correctly at the service level. Always passes. 4. **CLI auto-discovery finds plans in Strategize/QUEUED state** — Invokes `plan execute` with no plan_id. Verifies the auto-discovery filter includes QUEUED plans. #### Robot Integration Tests - `robot/tdd_plan_execute_phase_processing.robot` — 4 test cases matching the Behave scenarios - `robot/helper_tdd_plan_execute_phase_processing.py` — Helper script replicating CLI orchestration logic #### CHANGELOG - `CHANGELOG.md` — Added entry under "## Unreleased" describing the new tests. ### Quality Gates - `nox -e lint`: ✅ passed - `nox -e typecheck`: ✅ passed (0 errors) - `nox -e unit_tests`: ✅ passed (391 features, 11177 scenarios, 0 failures) - `nox -e integration_tests`: ✅ passed (1572 tests, 0 failures) - `nox -e e2e_tests`: ✅ passed (16 tests, 0 failures) - `nox -e coverage_report`: ✅ 97% coverage ### Review Cycle 2 Fixes - **Critical #1**: Rewrote `@tdd_expected_fail` scenarios to exercise the CLI orchestration layer via CliRunner instead of testing service/executor APIs that are correct by design. Removed `@tdd_expected_fail` tags since the bug fix is already in the codebase. - **Major #2**: Added CHANGELOG entry. - **Major #3**: Rebased onto current `master`. - **Minor #4**: Added `ProcessingState.QUEUED` assertion to positive control scenario. - **Minor #5**: Narrowed exception handler from bare `except Exception` to `except (PlanError, PlanNotReadyError)`. - **Minor #7**: Changed Robot suite to use `Setup Test Environment With Database Isolation`. - **Minor #8**: Added `on_timeout=kill` to all Robot `Run Process` calls. - **Minor #12**: Added docstring to `_fail()` helper. - **Minor #13**: Removed redundant `Settings()` instantiation. ### Known Limitations - The `@tdd_expected_fail` tags were removed because the bug fix for #967 is already in the codebase. If this PR is merged before the #967 fix PR, the tags would need to be re-added. However, the CHANGELOG and existing CLI code confirm the fix is already on `master`. Closes #977
hurui200320 added this to the v3.2.0 milestone 2026-03-18 08:37:21 +00:00
hurui200320 force-pushed tdd/m3-plan-execute-phase-processing from 49d403a004
All checks were successful
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 48s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 31s
CI / build (pull_request) Successful in 21s
CI / unit_tests (pull_request) Successful in 3m19s
CI / integration_tests (pull_request) Successful in 3m40s
CI / e2e_tests (pull_request) Successful in 3m47s
CI / coverage (pull_request) Successful in 7m52s
CI / docker (pull_request) Successful in 56s
CI / benchmark-regression (pull_request) Successful in 38m51s
to 99d7004480
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 56s
CI / unit_tests (pull_request) Successful in 2m59s
CI / integration_tests (pull_request) Successful in 3m37s
CI / docker (pull_request) Successful in 54s
CI / e2e_tests (pull_request) Successful in 4m40s
CI / coverage (pull_request) Successful in 6m51s
CI / benchmark-regression (pull_request) Successful in 38m42s
2026-03-18 11:48:53 +00:00
Compare
hurui200320 force-pushed tdd/m3-plan-execute-phase-processing from 99d7004480
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 56s
CI / unit_tests (pull_request) Successful in 2m59s
CI / integration_tests (pull_request) Successful in 3m37s
CI / docker (pull_request) Successful in 54s
CI / e2e_tests (pull_request) Successful in 4m40s
CI / coverage (pull_request) Successful in 6m51s
CI / benchmark-regression (pull_request) Successful in 38m42s
to a641dae5bf
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 30s
CI / security (pull_request) Successful in 1m1s
CI / typecheck (pull_request) Successful in 1m2s
CI / unit_tests (pull_request) Successful in 2m56s
CI / integration_tests (pull_request) Successful in 3m41s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 4m19s
CI / coverage (pull_request) Successful in 6m33s
CI / benchmark-regression (pull_request) Successful in 37m27s
2026-03-18 13:43:33 +00:00
Compare
freemo approved these changes 2026-03-19 04:55:08 +00:00
freemo left a comment

Code Review — PR #1050 test: TDD bug-capture test for #967 — plan execute phase processing

Comprehensive 4-scenario test suite. The correct removal of @tdd_expected_fail is well-justified since the bug fix for #967 is already on master — these serve as permanent regression guards. The step file is well-structured with helper functions for constructing Plan domain objects. Good documentation of 13 fixes from 2 review cycles.

Approved. No issues found.

## Code Review — PR #1050 `test: TDD bug-capture test for #967 — plan execute phase processing` Comprehensive 4-scenario test suite. The correct removal of `@tdd_expected_fail` is well-justified since the bug fix for #967 is already on master — these serve as permanent regression guards. The step file is well-structured with helper functions for constructing `Plan` domain objects. Good documentation of 13 fixes from 2 review cycles. **Approved.** No issues found.
Merge branch 'master' into tdd/m3-plan-execute-phase-processing
All checks were successful
CI / lint (pull_request) Successful in 22s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 32s
CI / build (pull_request) Successful in 25s
CI / security (pull_request) Successful in 53s
CI / typecheck (pull_request) Successful in 1m4s
CI / unit_tests (pull_request) Successful in 3m42s
CI / docker (pull_request) Successful in 9s
CI / e2e_tests (pull_request) Successful in 3m53s
CI / integration_tests (pull_request) Successful in 5m10s
CI / coverage (pull_request) Successful in 7m12s
CI / benchmark-regression (pull_request) Successful in 40m6s
468c84ec16
hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-19 06:33:35 +00:00
hurui200320 deleted branch tdd/m3-plan-execute-phase-processing 2026-03-19 06:41:27 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!1050
No description provided.