feat(cli): add plan explain and decision tree outputs #464

2026-02-27T11:47:35Z

hamza.khyari commented

2026-02-27 11:47:35 +00:00

Summary

Add plan explain <decision_id> CLI command with --format (json/yaml/table/rich), --show-context, --show-reasoning, --show-alternatives flags for inspecting individual decisions
Add plan tree <plan_id> CLI command with --format, --show-superseded, --depth flags for rendering the full decision tree
BFS tree building uses collections.deque (no list.pop(0))

Testing

14 Behave BDD scenarios / 54 steps (all pass)
Robot Framework smoke tests (robot/plan_explain.robot)
ASV benchmarks (benchmarks/plan_explain_bench.py)
Lint clean (ruff)

Files Changed

src/cleveragents/cli/commands/plan.py — Added plan explain and plan tree commands (+317 lines)
features/plan_explain.feature + features/steps/plan_explain_steps.py — BDD tests
robot/plan_explain.robot + robot/helper_plan_explain.py — Integration tests
benchmarks/plan_explain_bench.py — ASV benchmarks
docs/reference/plan_cli.md — Updated with explain/tree docs
CHANGELOG.md — Updated

Closes #174

## Summary - Add `plan explain <decision_id>` CLI command with `--format` (json/yaml/table/rich), `--show-context`, `--show-reasoning`, `--show-alternatives` flags for inspecting individual decisions - Add `plan tree <plan_id>` CLI command with `--format`, `--show-superseded`, `--depth` flags for rendering the full decision tree - BFS tree building uses `collections.deque` (no `list.pop(0)`) ## Testing - 14 Behave BDD scenarios / 54 steps (all pass) - Robot Framework smoke tests (`robot/plan_explain.robot`) - ASV benchmarks (`benchmarks/plan_explain_bench.py`) - Lint clean (ruff) ## Files Changed - `src/cleveragents/cli/commands/plan.py` — Added `plan explain` and `plan tree` commands (+317 lines) - `features/plan_explain.feature` + `features/steps/plan_explain_steps.py` — BDD tests - `robot/plan_explain.robot` + `robot/helper_plan_explain.py` — Integration tests - `benchmarks/plan_explain_bench.py` — ASV benchmarks - `docs/reference/plan_cli.md` — Updated with explain/tree docs - `CHANGELOG.md` — Updated Closes #174

hamza.khyari added this to the v3.2.0 milestone 2026-02-27 11:47:35 +00:00

hamza.khyari added the

Type

Feature

label 2026-02-27 11:47:35 +00:00

hamza.khyari self-assigned this 2026-02-27 11:48:53 +00:00

hamza.khyari force-pushed feature/m4-decision-cli from 4e5df4cc63 to f50bd80686

2026-02-27 14:41:28 +00:00

Compare

CoreRasurae requested changes 2026-02-27 18:05:13 +00:00

CoreRasurae left a comment

Code Review — Commit `f50bd80`: `test(cli): add 32 behave scenarios to close plan.py diff-coverage gap`

Review scope: test coverage, test flaws, performance, bug detection, security, and spec compliance against issue #174 and docs/specification.md.

All 32 scenarios pass (126 steps, 0.4s). No security or performance issues found. However, the following findings should be addressed before merge.

1. HIGH — Sham Test: Orphan Scenario Does Not Exercise the Claimed Code Path

File: features/steps/plan_explain_cli_coverage_steps.py lines 240–249

Scenario 16, "Tree builder skips orphan child references gracefully", does not actually trigger the orphan branch at plan.py:2724-2726:

for child_id in children_map.get(did, []):
    if child_id not in by_id:   # <-- NEVER REACHED
        continue

The step definition contains an explicit comment admitting this:

"We test this by providing only a root decision (no children), which exercises the tree-building logic without triggering the orphan branch."

Since only a root decision with no children is provided, children_map.get(did, []) returns an empty list and the defensive guard is never evaluated. The scenario name and assertion give the illusion of coverage without actually hitting the branch.

Fix: Construct a decision list where children_map contains a reference to a decision_id that is absent from by_id. For example, manually craft a parent_decision_id in the child that points at the root, then include a second child whose decision_id does not match any entry in the filtered set (e.g., add a superseded child with show_superseded=False).

2. MEDIUM — Dead Code: Two Module-Level Constants Are Defined but Never Used

File: features/steps/plan_explain_cli_coverage_steps.py lines 40–44

_PATCH_CORRECTION_SVC = "cleveragents.cli.commands.plan.correct_decision"
_PATCH_RESUME_SVC_MOD = (
    "cleveragents.application.services.plan_resume_service.PlanResumeService"
)

Neither constant is referenced anywhere in the file. The actual patch targets are hardcoded inline in _invoke_correct and the resume when steps. A reader assumes these named constants are the canonical patch targets, but they are not used.

Fix: Either use the constants in the when steps, or remove them.

3. MEDIUM — Spec Deviation: `--mode` and `--guidance` Are Required in Spec but Optional in Implementation

Spec ref: docs/specification.md line 321

The specification defines (parentheses = required, brackets = optional):

agents plan correct --mode (revert|append) (--guidance|-g) <GUIDANCE> [--dry-run] [--yes|-y] <DECISION_ID>

The implementation gives both flags defaults (mode="revert", guidance=""), making them optional at the Typer level. The function body catches empty guidance with a manual check, but --mode silently defaults to "revert".

No test validates the behavior when --mode or --guidance is omitted. Tests always supply both.

4. MEDIUM — Spec Deviation: `--show-alternatives` Flag Absent from Specification

Spec ref: docs/specification.md line 320

The specification defines:

agents plan explain [--show-context] [--show-reasoning] <DECISION_ID>

The implementation and tests add --show-alternatives, which is documented in docs/reference/plan_cli.md but not present in the canonical spec. Either update the spec or mark this as an intentional extension.

5. MEDIUM — Unnecessary Mock: `_resolve_active_plan_id` Patched but Never Called in Correction Tests

File: features/steps/plan_explain_cli_coverage_steps.py lines 615–628

In _invoke_correct, _resolve_active_plan_id is patched via patch.multiple, but the test always explicitly passes --plan context.pec_plan_id. Since plan_id is always non-None in the production code path (resolved_plan_id = plan_id or _resolve_active_plan_id()), the mock is never triggered. This is dead mock setup that creates a false sense of thoroughness.

6. LOW — Weak Assertions: Two Scenarios Lack Output Content Verification

Scenario	Gap
Tree CLI rich format with depth limit (feature line 87)	Checks only exit code 0 — no output content assertion
Tree CLI with show-superseded flag (feature line 91)	Checks only exit code 0 — does not verify superseded decisions appear in output

For the superseded scenario, the --show-superseded flag should cause superseded decisions to appear. Without asserting their presence, the test passes even if the flag is silently ignored.

7. LOW — `typer.Abort()` Conflates Distinct Error Types to Same Exit Behavior

The correct_decision command wraps ResourceNotFoundError, ValidationError, and CleverAgentsError all in typer.Abort(). Tests check exit code should be nonzero (not specific codes) plus different output strings. If someone reorders or removes a handler, the broader CleverAgentsError catch would absorb the specific ones, and the tests would still partially pass.

Suggestion: Add negative assertions (e.g., when ResourceNotFoundError is raised, assert "Validation Error" is NOT in the output).

8. LOW — Global `_PLAN_ID` Creates Hidden Coupling Between Steps

File: features/steps/plan_explain_cli_coverage_steps.py line 38

_PLAN_ID = str(ULID()) is generated once at import time and implicitly shared between the given step (step_pec_lifecycle_active) and the then step (step_pec_resolved_matches). This hidden coupling makes refactoring fragile. Prefer storing the expected ID on context in the given step and reading it back in the then step.

9. INFO — Coverage Claim May Be Off by 1 Line

The commit message states "Raises plan.py diff-coverage from 69% to 99.7% (1 line remaining)". Given finding #1 (the orphan guard at plan.py:2726 is never reached), the actual uncovered lines may be 2 rather than 1.

Summary

#	Category	Severity	Issue
1	Test Flaw	High	Orphan scenario is sham coverage — never exercises the orphan branch
2	Code Quality	Medium	Two unused constants (`_PATCH_CORRECTION_SVC`, `_PATCH_RESUME_SVC_MOD`)
3	Spec Compliance	Medium	`--mode` / `--guidance` optional in code, required in spec
4	Spec Compliance	Medium	`--show-alternatives` flag not in specification
5	Test Flaw	Medium	`_resolve_active_plan_id` patched but never triggered
6	Test Quality	Low	Weak/missing output assertions on 2 scenarios
7	Test Quality	Low	`typer.Abort()` conflates distinct error types to same exit behavior
8	Maintainability	Low	Global `_PLAN_ID` creates hidden coupling between steps
9	Info	Info	Coverage claim may be off by 1 line due to orphan gap

Recommendation: Address at minimum findings #1 (sham coverage) and #2 (dead constants) before merge. Findings #3 and #4 (spec deviations) should be reconciled — either update the spec or adjust the implementation.

## Code Review — Commit `f50bd80`: `test(cli): add 32 behave scenarios to close plan.py diff-coverage gap` Review scope: test coverage, test flaws, performance, bug detection, security, and spec compliance against **issue #174** and `docs/specification.md`. All 32 scenarios pass (126 steps, 0.4s). No security or performance issues found. However, the following findings should be addressed before merge. --- ### 1. HIGH — Sham Test: Orphan Scenario Does Not Exercise the Claimed Code Path **File:** `features/steps/plan_explain_cli_coverage_steps.py` lines 240–249 Scenario 16, *"Tree builder skips orphan child references gracefully"*, **does not actually trigger the orphan branch** at `plan.py:2724-2726`: ```python for child_id in children_map.get(did, []): if child_id not in by_id: # <-- NEVER REACHED continue ``` The step definition contains an explicit comment admitting this: > *"We test this by providing only a root decision (no children), which exercises the tree-building logic without triggering the orphan branch."* Since only a root decision with **no children** is provided, `children_map.get(did, [])` returns an empty list and the defensive guard is never evaluated. The scenario name and assertion give the illusion of coverage without actually hitting the branch. **Fix:** Construct a decision list where `children_map` contains a reference to a `decision_id` that is absent from `by_id`. For example, manually craft a `parent_decision_id` in the child that points at the root, then include a second child whose `decision_id` does not match any entry in the filtered set (e.g., add a superseded child with `show_superseded=False`). --- ### 2. MEDIUM — Dead Code: Two Module-Level Constants Are Defined but Never Used **File:** `features/steps/plan_explain_cli_coverage_steps.py` lines 40–44 ```python _PATCH_CORRECTION_SVC = "cleveragents.cli.commands.plan.correct_decision" _PATCH_RESUME_SVC_MOD = ( "cleveragents.application.services.plan_resume_service.PlanResumeService" ) ``` Neither constant is referenced anywhere in the file. The actual patch targets are hardcoded inline in `_invoke_correct` and the resume `when` steps. A reader assumes these named constants are the canonical patch targets, but they are not used. **Fix:** Either use the constants in the `when` steps, or remove them. --- ### 3. MEDIUM — Spec Deviation: `--mode` and `--guidance` Are Required in Spec but Optional in Implementation **Spec ref:** `docs/specification.md` line 321 The specification defines (parentheses = required, brackets = optional): ``` agents plan correct --mode (revert|append) (--guidance|-g) <GUIDANCE> [--dry-run] [--yes|-y] <DECISION_ID> ``` The implementation gives both flags defaults (`mode="revert"`, `guidance=""`), making them optional at the Typer level. The function body catches empty guidance with a manual check, but `--mode` silently defaults to `"revert"`. No test validates the behavior when `--mode` or `--guidance` is omitted. Tests always supply both. --- ### 4. MEDIUM — Spec Deviation: `--show-alternatives` Flag Absent from Specification **Spec ref:** `docs/specification.md` line 320 The specification defines: ``` agents plan explain [--show-context] [--show-reasoning] <DECISION_ID> ``` The implementation and tests add `--show-alternatives`, which is documented in `docs/reference/plan_cli.md` but **not present in the canonical spec**. Either update the spec or mark this as an intentional extension. --- ### 5. MEDIUM — Unnecessary Mock: `_resolve_active_plan_id` Patched but Never Called in Correction Tests **File:** `features/steps/plan_explain_cli_coverage_steps.py` lines 615–628 In `_invoke_correct`, `_resolve_active_plan_id` is patched via `patch.multiple`, but the test always explicitly passes `--plan context.pec_plan_id`. Since `plan_id` is always non-None in the production code path (`resolved_plan_id = plan_id or _resolve_active_plan_id()`), the mock is never triggered. This is dead mock setup that creates a false sense of thoroughness. --- ### 6. LOW — Weak Assertions: Two Scenarios Lack Output Content Verification | Scenario | Gap | |---|---| | *Tree CLI rich format with depth limit* (feature line 87) | Checks only exit code 0 — no output content assertion | | *Tree CLI with show-superseded flag* (feature line 91) | Checks only exit code 0 — does not verify superseded decisions appear in output | For the superseded scenario, the `--show-superseded` flag should cause superseded decisions to appear. Without asserting their presence, the test passes even if the flag is silently ignored. --- ### 7. LOW — `typer.Abort()` Conflates Distinct Error Types to Same Exit Behavior The `correct_decision` command wraps `ResourceNotFoundError`, `ValidationError`, and `CleverAgentsError` all in `typer.Abort()`. Tests check `exit code should be nonzero` (not specific codes) plus different output strings. If someone reorders or removes a handler, the broader `CleverAgentsError` catch would absorb the specific ones, and the tests would still partially pass. **Suggestion:** Add negative assertions (e.g., when `ResourceNotFoundError` is raised, assert "Validation Error" is NOT in the output). --- ### 8. LOW — Global `_PLAN_ID` Creates Hidden Coupling Between Steps **File:** `features/steps/plan_explain_cli_coverage_steps.py` line 38 `_PLAN_ID = str(ULID())` is generated once at import time and implicitly shared between the `given` step (`step_pec_lifecycle_active`) and the `then` step (`step_pec_resolved_matches`). This hidden coupling makes refactoring fragile. Prefer storing the expected ID on `context` in the `given` step and reading it back in the `then` step. --- ### 9. INFO — Coverage Claim May Be Off by 1 Line The commit message states *"Raises plan.py diff-coverage from 69% to 99.7% (1 line remaining)"*. Given finding #1 (the orphan guard at `plan.py:2726` is never reached), the actual uncovered lines may be 2 rather than 1. --- ### Summary | # | Category | Severity | Issue | |---|---|---|---| | 1 | Test Flaw | **High** | Orphan scenario is sham coverage — never exercises the orphan branch | | 2 | Code Quality | Medium | Two unused constants (`_PATCH_CORRECTION_SVC`, `_PATCH_RESUME_SVC_MOD`) | | 3 | Spec Compliance | Medium | `--mode` / `--guidance` optional in code, required in spec | | 4 | Spec Compliance | Medium | `--show-alternatives` flag not in specification | | 5 | Test Flaw | Medium | `_resolve_active_plan_id` patched but never triggered | | 6 | Test Quality | Low | Weak/missing output assertions on 2 scenarios | | 7 | Test Quality | Low | `typer.Abort()` conflates distinct error types to same exit behavior | | 8 | Maintainability | Low | Global `_PLAN_ID` creates hidden coupling between steps | | 9 | Info | Info | Coverage claim may be off by 1 line due to orphan gap | **Recommendation:** Address at minimum findings #1 (sham coverage) and #2 (dead constants) before merge. Findings #3 and #4 (spec deviations) should be reconciled — either update the spec or adjust the implementation.

features/steps/plan_explain_cli_coverage_steps.py Outdated

						
				@@ -0,0 +37,4 @@

				_PATCH_CONTAINER = "cleveragents.application.container.get_container"

				_PATCH_LIFECYCLE = "cleveragents.cli.commands.plan._get_lifecycle_service"

				_PATCH_RESOLVE = "cleveragents.cli.commands.plan._resolve_active_plan_id"

				_PATCH_CORRECTION_SVC = "cleveragents.cli.commands.plan.correct_decision"

CoreRasurae commented

2026-02-27 18:05:13 +00:00

MEDIUM — Dead code. _PATCH_CORRECTION_SVC and _PATCH_RESUME_SVC_MOD are defined here but never referenced anywhere in this file. The actual patch targets are hardcoded inline in _invoke_correct (line 620) and the resume when steps (lines 665, 677, 691). Either use these constants or remove them.

**MEDIUM — Dead code.** `_PATCH_CORRECTION_SVC` and `_PATCH_RESUME_SVC_MOD` are defined here but never referenced anywhere in this file. The actual patch targets are hardcoded inline in `_invoke_correct` (line 620) and the resume `when` steps (lines 665, 677, 691). Either use these constants or remove them.

features/steps/plan_explain_cli_coverage_steps.py Outdated

						
				@@ -0,0 +237,4 @@

				    context.pec_plan_id = _PLAN_ID

				    root_id = str(ULID())

				    old_id = str(ULID())

				    new_id = str(ULID())

CoreRasurae commented

2026-02-27 18:05:13 +00:00

HIGH — Sham test. This step provides only a root decision with no children, so children_map.get(did, []) returns [] and the orphan guard at plan.py:2726 (if child_id not in by_id: continue) is never reached. The comment in the code acknowledges this explicitly.

To actually cover the orphan branch, construct a scenario where children_map contains a child_id that is absent from by_id. For example, include a superseded child decision (so it gets filtered out of by_id when show_superseded=False) while the parent still references it in children_map.

**HIGH — Sham test.** This step provides only a root decision with no children, so `children_map.get(did, [])` returns `[]` and the orphan guard at `plan.py:2726` (`if child_id not in by_id: continue`) is never reached. The comment in the code acknowledges this explicitly. To actually cover the orphan branch, construct a scenario where `children_map` contains a `child_id` that is absent from `by_id`. For example, include a superseded child decision (so it gets filtered out of `by_id` when `show_superseded=False`) while the parent still references it in `children_map`.

features/steps/plan_explain_cli_coverage_steps.py Outdated

						
				@@ -0,0 +617,4 @@

				        "correct",

				        context.pec_decision_id,

				        "--mode",

				        "revert",

CoreRasurae commented

2026-02-27 18:05:13 +00:00

MEDIUM — Unnecessary mock. Since --plan is always passed in _invoke_correct (line 630), the production code path resolved_plan_id = plan_id or _resolve_active_plan_id() never calls _resolve_active_plan_id(). This patch.multiple is dead mock setup.

**MEDIUM — Unnecessary mock.** Since `--plan` is always passed in `_invoke_correct` (line 630), the production code path `resolved_plan_id = plan_id or _resolve_active_plan_id()` never calls `_resolve_active_plan_id()`. This `patch.multiple` is dead mock setup.

freemo added the

labels 2026-03-01 04:36:09 +00:00

hamza.khyari force-pushed feature/m4-decision-cli from f50bd80686 to f0e924c1a1

2026-03-02 15:39:30 +00:00

Compare

hamza.khyari requested review from CoreRasurae 2026-03-02 16:39:48 +00:00

hamza.khyari force-pushed feature/m4-decision-cli from c4c6e71e20 to dc8dd9b57d

2026-03-02 16:45:36 +00:00

Compare

hamza.khyari force-pushed feature/m4-decision-cli from dc8dd9b57d to 21e7aecbe7

2026-03-02 17:31:45 +00:00

Compare

hamza.khyari requested review from brent.edwards 2026-03-02 23:06:14 +00:00

hamza.khyari requested review from aditya 2026-03-02 23:06:24 +00:00

aditya approved these changes 2026-03-03 08:41:46 +00:00

Dismissed

hurui200320 referenced this pull request

2026-03-03 08:43:52 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

hurui200320 added a new dependency 2026-03-03 09:56:37 +00:00

#174 feat(cli): add plan explain and decision tree outputs

hamza.khyari force-pushed feature/m4-decision-cli from 21e7aecbe7 to f66cb8d68e

2026-03-03 12:11:36 +00:00

Compare

hamza.khyari dismissed aditya's review 2026-03-03 12:11:36 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-03-03 12:12:49 +00:00

hamza.khyari merged commit f434f96a76 into master

2026-03-03 12:16:04 +00:00

hamza.khyari deleted branch feature/m4-decision-cli

2026-03-03 12:16:31 +00:00

hamza.khyari referenced this issue from a commit

2026-03-03 12:20:26 +00:00

Merge pull request 'feat(cli): add plan explain and decision tree outputs' (#464) from feature/m4-decision-cli into master

freemo added

and removed

labels 2026-03-04 00:41:57 +00:00

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Blocks

#174 feat(cli): add plan explain and decision tree outputs

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#464

feat(cli): add plan explain and decision tree outputs #464

Summary

Testing

Files Changed

Code Review — Commit f50bd80: test(cli): add 32 behave scenarios to close plan.py diff-coverage gap

1. HIGH — Sham Test: Orphan Scenario Does Not Exercise the Claimed Code Path

2. MEDIUM — Dead Code: Two Module-Level Constants Are Defined but Never Used

3. MEDIUM — Spec Deviation: --mode and --guidance Are Required in Spec but Optional in Implementation

4. MEDIUM — Spec Deviation: --show-alternatives Flag Absent from Specification

5. MEDIUM — Unnecessary Mock: _resolve_active_plan_id Patched but Never Called in Correction Tests

6. LOW — Weak Assertions: Two Scenarios Lack Output Content Verification

7. LOW — typer.Abort() Conflates Distinct Error Types to Same Exit Behavior

8. LOW — Global _PLAN_ID Creates Hidden Coupling Between Steps

9. INFO — Coverage Claim May Be Off by 1 Line

Summary

Code Review — Commit `f50bd80`: `test(cli): add 32 behave scenarios to close plan.py diff-coverage gap`

3. MEDIUM — Spec Deviation: `--mode` and `--guidance` Are Required in Spec but Optional in Implementation

4. MEDIUM — Spec Deviation: `--show-alternatives` Flag Absent from Specification

5. MEDIUM — Unnecessary Mock: `_resolve_active_plan_id` Patched but Never Called in Correction Tests

7. LOW — `typer.Abort()` Conflates Distinct Error Types to Same Exit Behavior

8. LOW — Global `_PLAN_ID` Creates Hidden Coupling Between Steps