TDD: Write failing test for #969 — plan correct expects decision_id but test passes plan_id #979

Closed
opened 2026-03-16 16:00:05 +00:00 by freemo · 2 comments
Owner

Metadata

  • Commit Message: test: add TDD bug-capture test for #969 — plan correct plan_id handling
  • Branch: tdd/m3-plan-correct-plan-id

Background and Context

This is the TDD counterpart to bug #969. Per the project's Test-Driven Development workflow for bugs (see CONTRIBUTING.md > Bug Fix Workflow), the first step in fixing any bug is to write a test that captures the buggy behavior. The test is tagged with @tdd_bug, @tdd_bug_969, and @tdd_expected_fail so that it passes CI while the bug is still unfixed. Once the fix is implemented in #969, the @tdd_expected_fail tag will be removed and the test will run normally.

See #969 for full bug details.

Summary of the bug: The plan correct CLI command declares its first positional argument as decision_id. When the M3 acceptance test passes a plan_id, the plan_id is used as target_decision_id, which fails because the plan ID is not a valid decision ID. The correction service raises an error because the targeted decision cannot be found.

Expected Behavior

A new test exists that:

  1. Captures the exact failure described in #969.
  2. Is tagged with @tdd_bug, @tdd_bug_969, and @tdd_expected_fail.
  3. Passes CI via the expected-failure mechanism (the underlying assertion fails, confirming the bug exists, but the tag inversion causes the test to pass).
  4. Would fail CI if the bug were fixed without removing the @tdd_expected_fail tag.

Acceptance Criteria

  • A test is written that captures the bug behavior described in #969.
  • The test is tagged with @tdd_bug, @tdd_bug_969, and @tdd_expected_fail.
  • The @tdd_expected_fail tag causes the test to pass CI (the underlying assertion fails as expected, proving the bug exists).
  • The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed.
  • Tag validation rules pass: @tdd_bug_969 has corresponding @tdd_bug, and @tdd_expected_fail has both.
  • A pull request is opened from tdd/m3-plan-correct-plan-id to master, CI passes, and the PR is merged through the normal merge process.

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the test and what bug behavior it captures.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, CI passes, and the PR is merged before this issue is marked done.

Subtasks

  • Code: Analyze bug #969 to identify the exact failure condition — plan correct called with a plan_id fails because correct_decision treats the argument as a decision_id and request_correction() cannot find a decision with that ID in the plan's decision tree.
  • Code: Determine the appropriate test type (Behave unit test) and file location (features/tdd_plan_correct_plan_id.feature).
  • Tests (Behave): Write a Behave scenario in features/ that captures the bug. Tag the scenario with @tdd_bug, @tdd_bug_969, and @tdd_expected_fail. The scenario must: (1) create a plan with decisions, (2) call plan correct <plan_id> --mode revert --guidance "...", (3) assert that the command succeeds (rc=0) and applies the correction (which currently fails). Name the scenario descriptively to indicate it is a bug regression test.
  • Tests (Robot): Add a Robot test in robot/ with equivalent tags that captures the failure at the integration level, since the M3 acceptance test exercises this path.
  • Docs: Add a comment in the test file explaining that this test was written to capture bug #969 and uses the @tdd_expected_fail tag until the fix is merged.
  • Quality: Verify CI passes with the tagged test. Confirm the underlying assertion fails for the correct reason (the bug itself, not a test setup error).
  • Quality: Verify tag validation rules pass (@tdd_bug_969 has @tdd_bug, and @tdd_expected_fail has both).
  • Quality: Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improve coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun nox -s coverage_report to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%.
  • Quality: Run nox (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across entire code base, do not ignore any failure even if it seems unrelated to this commit, fix it.
## Metadata - **Commit Message**: `test: add TDD bug-capture test for #969 — plan correct plan_id handling` - **Branch**: `tdd/m3-plan-correct-plan-id` ## Background and Context This is the TDD counterpart to bug #969. Per the project's Test-Driven Development workflow for bugs (see `CONTRIBUTING.md` > Bug Fix Workflow), the first step in fixing any bug is to write a test that captures the buggy behavior. The test is tagged with `@tdd_bug`, `@tdd_bug_969`, and `@tdd_expected_fail` so that it passes CI while the bug is still unfixed. Once the fix is implemented in #969, the `@tdd_expected_fail` tag will be removed and the test will run normally. See #969 for full bug details. **Summary of the bug:** The `plan correct` CLI command declares its first positional argument as `decision_id`. When the M3 acceptance test passes a plan_id, the plan_id is used as `target_decision_id`, which fails because the plan ID is not a valid decision ID. The correction service raises an error because the targeted decision cannot be found. ## Expected Behavior A new test exists that: 1. Captures the exact failure described in #969. 2. Is tagged with `@tdd_bug`, `@tdd_bug_969`, and `@tdd_expected_fail`. 3. Passes CI via the expected-failure mechanism (the underlying assertion fails, confirming the bug exists, but the tag inversion causes the test to pass). 4. Would fail CI if the bug were fixed without removing the `@tdd_expected_fail` tag. ## Acceptance Criteria - [x] A test is written that captures the bug behavior described in #969. - [x] The test is tagged with `@tdd_bug`, `@tdd_bug_969`, and `@tdd_expected_fail`. - [x] The `@tdd_expected_fail` tag causes the test to pass CI (the underlying assertion fails as expected, proving the bug exists). - [x] The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed. - [x] Tag validation rules pass: `@tdd_bug_969` has corresponding `@tdd_bug`, and `@tdd_expected_fail` has both. - [ ] A pull request is opened from `tdd/m3-plan-correct-plan-id` to `master`, CI passes, and the PR is merged through the normal merge process. ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the test and what bug behavior it captures. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, CI passes, and the PR is **merged** before this issue is marked done. ## Subtasks - [x] Code: Analyze bug #969 to identify the exact failure condition — `plan correct` called with a plan_id fails because `correct_decision` treats the argument as a decision_id and `request_correction()` cannot find a decision with that ID in the plan's decision tree. - [x] Code: Determine the appropriate test type (Behave unit test) and file location (`features/tdd_plan_correct_plan_id.feature`). - [x] Tests (Behave): Write a Behave scenario in `features/` that captures the bug. Tag the scenario with `@tdd_bug`, `@tdd_bug_969`, and `@tdd_expected_fail`. The scenario must: (1) create a plan with decisions, (2) call `plan correct <plan_id> --mode revert --guidance "..."`, (3) assert that the command succeeds (rc=0) and applies the correction (which currently fails). Name the scenario descriptively to indicate it is a bug regression test. - [x] Tests (Robot): Add a Robot test in `robot/` with equivalent tags that captures the failure at the integration level, since the M3 acceptance test exercises this path. - [x] Docs: Add a comment in the test file explaining that this test was written to capture bug #969 and uses the `@tdd_expected_fail` tag until the fix is merged. - [x] Quality: Verify CI passes with the tagged test. Confirm the underlying assertion fails for the correct reason (the bug itself, not a test setup error). - [x] Quality: Verify tag validation rules pass (`@tdd_bug_969` has `@tdd_bug`, and `@tdd_expected_fail` has both). - [x] Quality: Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improve coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun `nox -s coverage_report` to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%. - [x] Quality: Run `nox` (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across **entire** code base, do not ignore any failure even if it seems unrelated to this commit, fix it.
freemo added this to the v3.2.0 milestone 2026-03-16 16:00:15 +00:00
Member

Implementation Notes

Bug Analysis

Analyzed the correct_decision function in cleveragents.cli.commands.plan (the @app.command("correct") handler). The bug manifests as follows:

  1. The function declares decision_id as its first positional argument via typer.Argument.
  2. When the M3 acceptance test calls plan correct <plan_id> --mode revert --guidance "...", the plan_id value is captured by the decision_id parameter.
  3. The code resolves plan_id separately via plan_id or _resolve_active_plan_id(), but uses the positional decision_id (which is actually the plan_id) directly as target_decision_id in svc.request_correction(plan_id=resolved_plan_id, target_decision_id=decision_id, ...).
  4. Since the plan_id is not a valid decision_id, the CorrectionService cannot find the targeted decision in the plan's decision tree.

Test Design

Behave test (features/tdd_plan_correct_plan_id.feature):

  • One scenario: "plan correct resolves root decision when given a plan_id as positional"
  • Tags: @tdd_expected_fail @tdd_bug @tdd_bug_969
  • Uses tpcpid step prefix to avoid collisions
  • Mocks DI container (DecisionService returning decisions), CorrectionService, and _resolve_active_plan_id
  • Core assertion: request_correction was called with target_decision_id equal to the root decision ID, not the plan_id
  • Currently the assertion FAILS (because the buggy code passes plan_id as target_decision_id), and the @tdd_expected_fail tag inverts this to a CI pass

Robot test (robot/tdd_plan_correct_plan_id.robot):

  • One test case: "TDD Plan Correct Accepts Plan ID As Positional Argument"
  • Tags: tdd_expected_fail tdd_bug tdd_bug_969
  • Helper script: robot/helper_tdd_plan_correct_plan_id.py
  • Same mock setup as Behave test, exercises the same code path
  • Exits 1 with descriptive error when the bug is present; the tdd_expected_fail_listener handles inversion

Key Design Decisions

  1. Single scenario instead of two: Initially created two scenarios (one for target resolution, one for plan_id passthrough), but the second scenario's assertion passed even with the bug because _resolve_active_plan_id was mocked to return the same plan_id. Removed the second scenario to keep the test focused on the actual bug.

  2. Mock setup: Rather than simulating a ResourceNotFoundError from CorrectionService (which would require complex side_effect logic), the test asserts on the arguments passed to request_correction. This directly captures the bug: the code passes plan_id where it should pass root_decision_id.

  3. Step prefix: Used tpcpid (TDD Plan Correct Plan ID) following the project convention of prefixed steps to avoid collisions.

Files Changed

  • features/tdd_plan_correct_plan_id.feature — Behave feature with tagged TDD scenario
  • features/steps/tdd_plan_correct_plan_id_steps.py — Step definitions with mock setup and assertions
  • robot/tdd_plan_correct_plan_id.robot — Robot integration test
  • robot/helper_tdd_plan_correct_plan_id.py — Robot test helper script

Quality Gate Results

  • nox -s lint: All checks passed
  • nox -s typecheck: 0 errors
  • nox -s unit_tests: 387 features passed, 0 failed, 11120 scenarios passed
  • nox -s integration_tests: 1560 tests passed, 0 failed
  • nox -s coverage_report: 97% (meets threshold)
  • nox -s benchmark: Successful
  • nox -s docs: Successful
  • nox -s build: Successful
## Implementation Notes ### Bug Analysis Analyzed the `correct_decision` function in `cleveragents.cli.commands.plan` (the `@app.command("correct")` handler). The bug manifests as follows: 1. The function declares `decision_id` as its first positional argument via `typer.Argument`. 2. When the M3 acceptance test calls `plan correct <plan_id> --mode revert --guidance "..."`, the plan_id value is captured by the `decision_id` parameter. 3. The code resolves `plan_id` separately via `plan_id or _resolve_active_plan_id()`, but uses the positional `decision_id` (which is actually the plan_id) directly as `target_decision_id` in `svc.request_correction(plan_id=resolved_plan_id, target_decision_id=decision_id, ...)`. 4. Since the plan_id is not a valid decision_id, the CorrectionService cannot find the targeted decision in the plan's decision tree. ### Test Design **Behave test** (`features/tdd_plan_correct_plan_id.feature`): - One scenario: "plan correct resolves root decision when given a plan_id as positional" - Tags: `@tdd_expected_fail @tdd_bug @tdd_bug_969` - Uses `tpcpid` step prefix to avoid collisions - Mocks DI container (DecisionService returning decisions), CorrectionService, and `_resolve_active_plan_id` - Core assertion: `request_correction` was called with `target_decision_id` equal to the root decision ID, not the plan_id - Currently the assertion FAILS (because the buggy code passes plan_id as target_decision_id), and the `@tdd_expected_fail` tag inverts this to a CI pass **Robot test** (`robot/tdd_plan_correct_plan_id.robot`): - One test case: "TDD Plan Correct Accepts Plan ID As Positional Argument" - Tags: `tdd_expected_fail tdd_bug tdd_bug_969` - Helper script: `robot/helper_tdd_plan_correct_plan_id.py` - Same mock setup as Behave test, exercises the same code path - Exits 1 with descriptive error when the bug is present; the `tdd_expected_fail_listener` handles inversion ### Key Design Decisions 1. **Single scenario instead of two**: Initially created two scenarios (one for target resolution, one for plan_id passthrough), but the second scenario's assertion passed even with the bug because `_resolve_active_plan_id` was mocked to return the same plan_id. Removed the second scenario to keep the test focused on the actual bug. 2. **Mock setup**: Rather than simulating a ResourceNotFoundError from CorrectionService (which would require complex side_effect logic), the test asserts on the *arguments* passed to `request_correction`. This directly captures the bug: the code passes `plan_id` where it should pass `root_decision_id`. 3. **Step prefix**: Used `tpcpid` (TDD Plan Correct Plan ID) following the project convention of prefixed steps to avoid collisions. ### Files Changed - `features/tdd_plan_correct_plan_id.feature` — Behave feature with tagged TDD scenario - `features/steps/tdd_plan_correct_plan_id_steps.py` — Step definitions with mock setup and assertions - `robot/tdd_plan_correct_plan_id.robot` — Robot integration test - `robot/helper_tdd_plan_correct_plan_id.py` — Robot test helper script ### Quality Gate Results - `nox -s lint`: ✅ All checks passed - `nox -s typecheck`: ✅ 0 errors - `nox -s unit_tests`: ✅ 387 features passed, 0 failed, 11120 scenarios passed - `nox -s integration_tests`: ✅ 1560 tests passed, 0 failed - `nox -s coverage_report`: ✅ 97% (meets threshold) - `nox -s benchmark`: ✅ Successful - `nox -s docs`: ✅ Successful - `nox -s build`: ✅ Successful
Member

Self-QA Implementation Notes (Cycles 1–2)

Cycle 1

Review findings: 1 major, 3 minor, 3 nits

  • Major: Missing CHANGELOG.md update (required by CONTRIBUTING.md for all PRs including test-only)
  • Minor 1: Exit-code assertion may mask infrastructure errors under @tdd_expected_fail — implicit dependency on assertion ordering
  • Minor 2: ~100+ lines of code duplication between Behave steps and Robot helper (constants, mock builders, patch targets, kwargs extraction)
  • Minor 3: Only --mode revert tested; --mode append exercises the same bug path at plan.py:2662
  • Nit 1: Redundant dead-code fallback in call_args kwargs extraction
  • Nit 2: Misleading variable name call_kwargs (holds full (args, kwargs) tuple, not just kwargs)
  • Nit 3: _CHILD_DECISION_ID constant defined but never asserted — reader confusion risk

Fixes applied:

  1. Added CHANGELOG.md entry under ## Unreleased describing TDD bug-capture tests for #979
  2. Added NOTE: docstring on step_tpcpid_exit_ok clarifying exit-code assertion passes with bug present; @tdd_expected_fail inversion only triggers on second assertion
  3. Created shared fixtures module features/mocks/tdd_plan_correct_plan_id_fixtures.py — centralizes constants, mock builders (make_decision_ns, make_mock_container, make_correction_svc), and build_cli_args. Both Behave steps and Robot helper now import from it (~100+ lines of duplication eliminated)
  4. Added append-mode scenario to both Behave feature (second scenario) and Robot test suite (second test case). New step definitions for append setup/invocation. Robot helper refactored to unified _run_plan_correct(mode, sentinel) function
  5. Simplified kwargs extraction to kw = call_record.kwargs — removed dead-code fallback
  6. Renamed call_kwargscall_record in both Behave steps and Robot helper
  7. Added inline comment on CHILD_DECISION_ID: "Used to build a multi-node decision tree for realism; not directly asserted."

Cycle 2

Review findings: 0 critical, 0 major — Approved

  • 3 minor observations noted (all non-blocking): feature-level @tdd_expected_fail narrow false-positive window (already documented), make_default_container() unused by Behave steps (stylistic), no negative-path scenario (out of scope for bug-capture PR)
  • 2 nits noted (non-blocking): kw.get() silent None on missing key, nearly identical patch blocks in revert/append WHEN steps
  • All 7 Cycle 1 fixes verified as properly implemented with no regressions

Remaining Issues

None — all blocking issues resolved. Non-blocking observations documented for future consideration.

## Self-QA Implementation Notes (Cycles 1–2) ### Cycle 1 **Review findings:** 1 major, 3 minor, 3 nits - **Major:** Missing `CHANGELOG.md` update (required by CONTRIBUTING.md for all PRs including test-only) - **Minor 1:** Exit-code assertion may mask infrastructure errors under `@tdd_expected_fail` — implicit dependency on assertion ordering - **Minor 2:** ~100+ lines of code duplication between Behave steps and Robot helper (constants, mock builders, patch targets, kwargs extraction) - **Minor 3:** Only `--mode revert` tested; `--mode append` exercises the same bug path at `plan.py:2662` - **Nit 1:** Redundant dead-code fallback in `call_args` kwargs extraction - **Nit 2:** Misleading variable name `call_kwargs` (holds full `(args, kwargs)` tuple, not just kwargs) - **Nit 3:** `_CHILD_DECISION_ID` constant defined but never asserted — reader confusion risk **Fixes applied:** 1. Added `CHANGELOG.md` entry under `## Unreleased` describing TDD bug-capture tests for #979 2. Added `NOTE:` docstring on `step_tpcpid_exit_ok` clarifying exit-code assertion passes with bug present; `@tdd_expected_fail` inversion only triggers on second assertion 3. Created shared fixtures module `features/mocks/tdd_plan_correct_plan_id_fixtures.py` — centralizes constants, mock builders (`make_decision_ns`, `make_mock_container`, `make_correction_svc`), and `build_cli_args`. Both Behave steps and Robot helper now import from it (~100+ lines of duplication eliminated) 4. Added append-mode scenario to both Behave feature (second scenario) and Robot test suite (second test case). New step definitions for append setup/invocation. Robot helper refactored to unified `_run_plan_correct(mode, sentinel)` function 5. Simplified kwargs extraction to `kw = call_record.kwargs` — removed dead-code fallback 6. Renamed `call_kwargs` → `call_record` in both Behave steps and Robot helper 7. Added inline comment on `CHILD_DECISION_ID`: "Used to build a multi-node decision tree for realism; not directly asserted." ### Cycle 2 **Review findings:** 0 critical, 0 major — **Approved** ✅ - 3 minor observations noted (all non-blocking): feature-level `@tdd_expected_fail` narrow false-positive window (already documented), `make_default_container()` unused by Behave steps (stylistic), no negative-path scenario (out of scope for bug-capture PR) - 2 nits noted (non-blocking): `kw.get()` silent None on missing key, nearly identical patch blocks in revert/append WHEN steps - All 7 Cycle 1 fixes verified as properly implemented with no regressions ### Remaining Issues None — all blocking issues resolved. Non-blocking observations documented for future consideration.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#979
No description provided.