feat(plan): add error recovery patterns and CLI hints #186

Closed
opened 2026-02-22 23:39:56 +00:00 by freemo · 1 comment
Owner

Metadata

  • Commit Message: feat(plan): add error recovery patterns and CLI hints
  • Branch: feature/m4-error-recovery

Background

error_recovery decisions are recorded in the decision tree during Execute phase failures, capturing error category, recovery action, and retry count. Structured recovery hints appear in CLI error output (e.g., "use agents plan prompt" or "use agents plan revert"). The retry/self-repair loop runs up to the configured limit from the automation profile.

Acceptance Criteria

  • Record error_recovery decision type in the decision tree during Execute phase failures, capturing error category, recovery action taken, and retry count.
  • Add structured recovery hints to CLI error output (e.g., "use agents plan prompt <plan_id> to resume", "use agents plan revert <plan_id> --to-phase strategize to restart strategy").
  • Wire retry/self-repair loop into Execute phase up to the configured retry limit from the automation profile (max_retries field).
  • Add plan errors <plan_id> CLI command that shows all error decisions with recovery hints and retry history.
  • Capture structured error metadata (phase, actor, tool_call, stack_summary) into error_details JSON field on plan.

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches
    the Commit Message in Metadata exactly, followed by a blank line, then
    additional lines providing relevant details about the implementation. The
    commit body should be appropriate in size for a commit message and relatively
    complete in describing what was done.
  • The commit is pushed to the remote on the branch matching the Branch in
    Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and
    merged before this issue is marked done.

Subtasks

  • Record error_recovery decision type in the decision tree during Execute phase failures, capturing error category, recovery action taken, and retry count.
  • Add structured recovery hints to CLI error output (e.g., "use agents plan prompt <plan_id> to resume", "use agents plan revert <plan_id> --to-phase strategize to restart strategy").
  • Wire retry/self-repair loop into Execute phase up to the configured retry limit from the automation profile (max_retries field).
  • Add plan errors <plan_id> CLI command that shows all error decisions with recovery hints and retry history.
  • Capture structured error metadata (phase, actor, tool_call, stack_summary) into error_details JSON field on plan.
  • Add docs/reference/error_recovery.md documenting error categories, recovery patterns per phase, and retry behavior.
  • Tests (Behave): Add features/error_recovery.feature with scenarios for retry exhaustion, recovery hint output, and error decision recording.
  • Tests (Robot): Add robot/error_recovery.robot for end-to-end error and recovery flow.
  • Tests (ASV): Add benchmarks/error_recovery_bench.py for error handling overhead.
  • Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improves coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun nox -s coverage_report to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%.
  • Run nox (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across entire code base, do not ignore any failure even if it seems unrelated to this commit, fix it.

Section: #### M4: Corrections + Subplans + Checkpoints (Day 22)
Status: Open

## Metadata - **Commit Message**: `feat(plan): add error recovery patterns and CLI hints` - **Branch**: `feature/m4-error-recovery` ## Background `error_recovery` decisions are recorded in the decision tree during Execute phase failures, capturing error category, recovery action, and retry count. Structured recovery hints appear in CLI error output (e.g., "use `agents plan prompt`" or "use `agents plan revert`"). The retry/self-repair loop runs up to the configured limit from the automation profile. ## Acceptance Criteria - [ ] Record `error_recovery` decision type in the decision tree during Execute phase failures, capturing error category, recovery action taken, and retry count. - [ ] Add structured recovery hints to CLI error output (e.g., "use `agents plan prompt <plan_id>` to resume", "use `agents plan revert <plan_id> --to-phase strategize` to restart strategy"). - [ ] Wire retry/self-repair loop into Execute phase up to the configured retry limit from the automation profile (`max_retries` field). - [ ] Add `plan errors <plan_id>` CLI command that shows all error decisions with recovery hints and retry history. - [ ] Capture structured error metadata (phase, actor, tool_call, stack_summary) into `error_details` JSON field on plan. ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. The commit body should be appropriate in size for a commit message and relatively complete in describing what was done. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. ## Subtasks - [ ] Record `error_recovery` decision type in the decision tree during Execute phase failures, capturing error category, recovery action taken, and retry count. - [ ] Add structured recovery hints to CLI error output (e.g., "use `agents plan prompt <plan_id>` to resume", "use `agents plan revert <plan_id> --to-phase strategize` to restart strategy"). - [ ] Wire retry/self-repair loop into Execute phase up to the configured retry limit from the automation profile (`max_retries` field). - [ ] Add `plan errors <plan_id>` CLI command that shows all error decisions with recovery hints and retry history. - [ ] Capture structured error metadata (phase, actor, tool_call, stack_summary) into `error_details` JSON field on plan. - [ ] Add `docs/reference/error_recovery.md` documenting error categories, recovery patterns per phase, and retry behavior. - [ ] Tests (Behave): Add `features/error_recovery.feature` with scenarios for retry exhaustion, recovery hint output, and error decision recording. - [ ] Tests (Robot): Add `robot/error_recovery.robot` for end-to-end error and recovery flow. - [ ] Tests (ASV): Add `benchmarks/error_recovery_bench.py` for error handling overhead. - [ ] Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improves coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun `nox -s coverage_report` to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%. - [ ] Run `nox` (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across **entire** code base, do not ignore any failure even if it seems unrelated to this commit, fix it. **Section**: #### M4: Corrections + Subplans + Checkpoints (Day 22) **Status**: Open
freemo added this to the v3.3.0 milestone 2026-02-22 23:39:56 +00:00
Author
Owner

Expected completion updated (Day 15 rebaseline): Day 33 / 2026-03-13 (previously Day 26 / 2026-03-06)

**Expected completion updated (Day 15 rebaseline):** Day 33 / 2026-03-13 (previously Day 26 / 2026-03-06)
freemo added the due date 2026-03-02 2026-02-23 18:41:38 +00:00
freemo self-assigned this 2026-02-24 21:53:08 +00:00
CoreRasurae added reference develop-luis-2 2026-02-25 07:40:30 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

2026-03-02

Reference
cleveragents/cleveragents-core#186
No description provided.