feat(cli): add plan explain and decision tree outputs #464
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Blocks
#174 feat(cli): add plan explain and decision tree outputs
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core!464
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m4-decision-cli"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
plan explain <decision_id>CLI command with--format(json/yaml/table/rich),--show-context,--show-reasoning,--show-alternativesflags for inspecting individual decisionsplan tree <plan_id>CLI command with--format,--show-superseded,--depthflags for rendering the full decision treecollections.deque(nolist.pop(0))Testing
robot/plan_explain.robot)benchmarks/plan_explain_bench.py)Files Changed
src/cleveragents/cli/commands/plan.py— Addedplan explainandplan treecommands (+317 lines)features/plan_explain.feature+features/steps/plan_explain_steps.py— BDD testsrobot/plan_explain.robot+robot/helper_plan_explain.py— Integration testsbenchmarks/plan_explain_bench.py— ASV benchmarksdocs/reference/plan_cli.md— Updated with explain/tree docsCHANGELOG.md— UpdatedCloses #174
4e5df4cc63f50bd80686Code Review — Commit
f50bd80:test(cli): add 32 behave scenarios to close plan.py diff-coverage gapReview scope: test coverage, test flaws, performance, bug detection, security, and spec compliance against issue #174 and
docs/specification.md.All 32 scenarios pass (126 steps, 0.4s). No security or performance issues found. However, the following findings should be addressed before merge.
1. HIGH — Sham Test: Orphan Scenario Does Not Exercise the Claimed Code Path
File:
features/steps/plan_explain_cli_coverage_steps.pylines 240–249Scenario 16, "Tree builder skips orphan child references gracefully", does not actually trigger the orphan branch at
plan.py:2724-2726:The step definition contains an explicit comment admitting this:
Since only a root decision with no children is provided,
children_map.get(did, [])returns an empty list and the defensive guard is never evaluated. The scenario name and assertion give the illusion of coverage without actually hitting the branch.Fix: Construct a decision list where
children_mapcontains a reference to adecision_idthat is absent fromby_id. For example, manually craft aparent_decision_idin the child that points at the root, then include a second child whosedecision_iddoes not match any entry in the filtered set (e.g., add a superseded child withshow_superseded=False).2. MEDIUM — Dead Code: Two Module-Level Constants Are Defined but Never Used
File:
features/steps/plan_explain_cli_coverage_steps.pylines 40–44Neither constant is referenced anywhere in the file. The actual patch targets are hardcoded inline in
_invoke_correctand the resumewhensteps. A reader assumes these named constants are the canonical patch targets, but they are not used.Fix: Either use the constants in the
whensteps, or remove them.3. MEDIUM — Spec Deviation:
--modeand--guidanceAre Required in Spec but Optional in ImplementationSpec ref:
docs/specification.mdline 321The specification defines (parentheses = required, brackets = optional):
The implementation gives both flags defaults (
mode="revert",guidance=""), making them optional at the Typer level. The function body catches empty guidance with a manual check, but--modesilently defaults to"revert".No test validates the behavior when
--modeor--guidanceis omitted. Tests always supply both.4. MEDIUM — Spec Deviation:
--show-alternativesFlag Absent from SpecificationSpec ref:
docs/specification.mdline 320The specification defines:
The implementation and tests add
--show-alternatives, which is documented indocs/reference/plan_cli.mdbut not present in the canonical spec. Either update the spec or mark this as an intentional extension.5. MEDIUM — Unnecessary Mock:
_resolve_active_plan_idPatched but Never Called in Correction TestsFile:
features/steps/plan_explain_cli_coverage_steps.pylines 615–628In
_invoke_correct,_resolve_active_plan_idis patched viapatch.multiple, but the test always explicitly passes--plan context.pec_plan_id. Sinceplan_idis always non-None in the production code path (resolved_plan_id = plan_id or _resolve_active_plan_id()), the mock is never triggered. This is dead mock setup that creates a false sense of thoroughness.6. LOW — Weak Assertions: Two Scenarios Lack Output Content Verification
For the superseded scenario, the
--show-supersededflag should cause superseded decisions to appear. Without asserting their presence, the test passes even if the flag is silently ignored.7. LOW —
typer.Abort()Conflates Distinct Error Types to Same Exit BehaviorThe
correct_decisioncommand wrapsResourceNotFoundError,ValidationError, andCleverAgentsErrorall intyper.Abort(). Tests checkexit code should be nonzero(not specific codes) plus different output strings. If someone reorders or removes a handler, the broaderCleverAgentsErrorcatch would absorb the specific ones, and the tests would still partially pass.Suggestion: Add negative assertions (e.g., when
ResourceNotFoundErroris raised, assert "Validation Error" is NOT in the output).8. LOW — Global
_PLAN_IDCreates Hidden Coupling Between StepsFile:
features/steps/plan_explain_cli_coverage_steps.pyline 38_PLAN_ID = str(ULID())is generated once at import time and implicitly shared between thegivenstep (step_pec_lifecycle_active) and thethenstep (step_pec_resolved_matches). This hidden coupling makes refactoring fragile. Prefer storing the expected ID oncontextin thegivenstep and reading it back in thethenstep.9. INFO — Coverage Claim May Be Off by 1 Line
The commit message states "Raises plan.py diff-coverage from 69% to 99.7% (1 line remaining)". Given finding #1 (the orphan guard at
plan.py:2726is never reached), the actual uncovered lines may be 2 rather than 1.Summary
_PATCH_CORRECTION_SVC,_PATCH_RESUME_SVC_MOD)--mode/--guidanceoptional in code, required in spec--show-alternativesflag not in specification_resolve_active_plan_idpatched but never triggeredtyper.Abort()conflates distinct error types to same exit behavior_PLAN_IDcreates hidden coupling between stepsRecommendation: Address at minimum findings #1 (sham coverage) and #2 (dead constants) before merge. Findings #3 and #4 (spec deviations) should be reconciled — either update the spec or adjust the implementation.
@ -0,0 +37,4 @@_PATCH_CONTAINER = "cleveragents.application.container.get_container"_PATCH_LIFECYCLE = "cleveragents.cli.commands.plan._get_lifecycle_service"_PATCH_RESOLVE = "cleveragents.cli.commands.plan._resolve_active_plan_id"_PATCH_CORRECTION_SVC = "cleveragents.cli.commands.plan.correct_decision"MEDIUM — Dead code.
_PATCH_CORRECTION_SVCand_PATCH_RESUME_SVC_MODare defined here but never referenced anywhere in this file. The actual patch targets are hardcoded inline in_invoke_correct(line 620) and the resumewhensteps (lines 665, 677, 691). Either use these constants or remove them.@ -0,0 +237,4 @@context.pec_plan_id = _PLAN_IDroot_id = str(ULID())old_id = str(ULID())new_id = str(ULID())HIGH — Sham test. This step provides only a root decision with no children, so
children_map.get(did, [])returns[]and the orphan guard atplan.py:2726(if child_id not in by_id: continue) is never reached. The comment in the code acknowledges this explicitly.To actually cover the orphan branch, construct a scenario where
children_mapcontains achild_idthat is absent fromby_id. For example, include a superseded child decision (so it gets filtered out ofby_idwhenshow_superseded=False) while the parent still references it inchildren_map.@ -0,0 +617,4 @@"correct",context.pec_decision_id,"--mode","revert",MEDIUM — Unnecessary mock. Since
--planis always passed in_invoke_correct(line 630), the production code pathresolved_plan_id = plan_id or _resolve_active_plan_id()never calls_resolve_active_plan_id(). Thispatch.multipleis dead mock setup.f50bd80686f0e924c1a1c4c6e71e20dc8dd9b57ddc8dd9b57d21e7aecbe721e7aecbe7f66cb8d68eNew commits pushed, approval review dismissed automatically according to repository settings