UAT: PlanExecutor._run_execute_with_stub() overwrites plan.error_details on failure — erases structured error recovery metadata written by ErrorRecoveryService #4036

Open
opened 2026-04-06 08:57:00 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/plan-executor-error-details-merge
  • Commit Message: fix(plan_executor): merge error_details on failure instead of overwriting to preserve ErrorRecoveryService metadata
  • Milestone: None (backlog)
  • Parent Epic: #368

Bug Description

In src/cleveragents/application/services/plan_executor.py, the _run_execute_with_stub() method overwrites plan.error_details with a raw exception dict when execution fails (lines 818–822). This erases any structured error recovery metadata that ErrorRecoveryService._persist_error_metadata() had previously written to plan.error_details via PlanLifecycleService.update_error_details().

What Was Tested

  • src/cleveragents/application/services/plan_executor.py:818-822 — On failure after all retries, plan.error_details is replaced (not merged) with:
    plan.error_details = {
        "exception_type": type(last_exc).__name__,
        "traceback": traceback.format_exc(),
        "mode": "stub",
    }
    
  • src/cleveragents/application/services/error_recovery_service.py:307-331_persist_error_metadata() writes structured fields (error_category, error_phase, retry_count, max_retries, is_retriable, recovery_action, recovery_hint, recovery_command) to plan.error_details via update_error_details() which merges into the existing dict.
  • src/cleveragents/application/services/plan_lifecycle_service.py:683-688update_error_details() correctly merges into the existing dict.
  • plan_executor.py:818 uses direct assignment plan.error_details = {...} which replaces the entire dict, erasing all structured error recovery metadata.

Expected Behavior

When execution fails after all retries, the error details should be merged into the existing plan.error_details dict (not replaced), preserving the structured error recovery metadata written by ErrorRecoveryService. The agents plan errors <PLAN_ID> command should show both the structured error recovery data AND the raw exception info.

Actual Behavior

When execution fails, plan.error_details is replaced with a raw exception dict, erasing:

  • error_category (transient/validation/provider/etc.)
  • error_phase
  • retry_count / max_retries
  • is_retriable
  • recovery_action / recovery_hint / recovery_command

As a result, agents plan errors <PLAN_ID> shows only the raw exception type and traceback, with no structured error recovery data.

Code Locations

File Lines Issue
src/cleveragents/application/services/plan_executor.py 818–822 Direct assignment plan.error_details = {...} on failure path (should be merge)
src/cleveragents/application/services/plan_executor.py 764–768 Direct assignment plan.error_details = {...} on success path (same pattern, should be merge)
src/cleveragents/application/services/error_recovery_service.py 307–331 Writes structured metadata (would be erased by the above)
src/cleveragents/application/services/plan_lifecycle_service.py 683–688 update_error_details() correctly merges — this pattern should be used

Steps to Reproduce

  1. Wire ErrorRecoveryService into PlanExecutor (or test directly).
  2. Run a plan that fails during the execute phase after all retries are exhausted.
  3. Check agents plan errors <PLAN_ID> — structured error recovery data is absent; only raw exception type and traceback are shown.

Subtasks

  • Replace direct assignment plan.error_details = {...} at plan_executor.py:818-822 with a dict merge (e.g., plan.error_details = {**(plan.error_details or {}), "exception_type": ..., "traceback": ..., "mode": "stub"})
  • Replace direct assignment plan.error_details = {...} at plan_executor.py:764-768 (success path) with a dict merge to preserve any pre-existing structured metadata
  • Add a BDD scenario in features/ covering the merge behaviour: verify that after ErrorRecoveryService writes structured metadata and execution subsequently fails, plan.error_details contains both the structured fields and the raw exception fields
  • Verify agents plan errors <PLAN_ID> output includes both structured error recovery fields (error_category, retry_count, recovery_action, etc.) and raw exception info (exception_type, traceback)
  • Update any related unit tests that assert the exact shape of plan.error_details after failure

Definition of Done

  • Direct assignment at plan_executor.py:818-822 replaced with a non-destructive dict merge
  • Direct assignment at plan_executor.py:764-768 replaced with a non-destructive dict merge
  • BDD scenario added and passing that asserts structured metadata is preserved after failure
  • agents plan errors <PLAN_ID> displays both structured error recovery data and raw exception info
  • No regressions in existing plan_executor and error_recovery_service tests
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.3.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/plan-executor-error-details-merge` - **Commit Message**: `fix(plan_executor): merge error_details on failure instead of overwriting to preserve ErrorRecoveryService metadata` - **Milestone**: None (backlog) - **Parent Epic**: #368 ## Bug Description In `src/cleveragents/application/services/plan_executor.py`, the `_run_execute_with_stub()` method overwrites `plan.error_details` with a raw exception dict when execution fails (lines 818–822). This **erases** any structured error recovery metadata that `ErrorRecoveryService._persist_error_metadata()` had previously written to `plan.error_details` via `PlanLifecycleService.update_error_details()`. ### What Was Tested - **`src/cleveragents/application/services/plan_executor.py:818-822`** — On failure after all retries, `plan.error_details` is **replaced** (not merged) with: ```python plan.error_details = { "exception_type": type(last_exc).__name__, "traceback": traceback.format_exc(), "mode": "stub", } ``` - **`src/cleveragents/application/services/error_recovery_service.py:307-331`** — `_persist_error_metadata()` writes structured fields (`error_category`, `error_phase`, `retry_count`, `max_retries`, `is_retriable`, `recovery_action`, `recovery_hint`, `recovery_command`) to `plan.error_details` via `update_error_details()` which **merges** into the existing dict. - **`src/cleveragents/application/services/plan_lifecycle_service.py:683-688`** — `update_error_details()` correctly merges into the existing dict. - **`plan_executor.py:818`** uses direct assignment `plan.error_details = {...}` which **replaces** the entire dict, erasing all structured error recovery metadata. ### Expected Behavior When execution fails after all retries, the error details should be **merged** into the existing `plan.error_details` dict (not replaced), preserving the structured error recovery metadata written by `ErrorRecoveryService`. The `agents plan errors <PLAN_ID>` command should show both the structured error recovery data AND the raw exception info. ### Actual Behavior When execution fails, `plan.error_details` is replaced with a raw exception dict, erasing: - `error_category` (transient/validation/provider/etc.) - `error_phase` - `retry_count` / `max_retries` - `is_retriable` - `recovery_action` / `recovery_hint` / `recovery_command` As a result, `agents plan errors <PLAN_ID>` shows only the raw exception type and traceback, with no structured error recovery data. ### Code Locations | File | Lines | Issue | |------|-------|-------| | `src/cleveragents/application/services/plan_executor.py` | 818–822 | Direct assignment `plan.error_details = {...}` on failure path (should be merge) | | `src/cleveragents/application/services/plan_executor.py` | 764–768 | Direct assignment `plan.error_details = {...}` on success path (same pattern, should be merge) | | `src/cleveragents/application/services/error_recovery_service.py` | 307–331 | Writes structured metadata (would be erased by the above) | | `src/cleveragents/application/services/plan_lifecycle_service.py` | 683–688 | `update_error_details()` correctly merges — this pattern should be used | ### Steps to Reproduce 1. Wire `ErrorRecoveryService` into `PlanExecutor` (or test directly). 2. Run a plan that fails during the execute phase after all retries are exhausted. 3. Check `agents plan errors <PLAN_ID>` — structured error recovery data is absent; only raw exception type and traceback are shown. ## Subtasks - [ ] Replace direct assignment `plan.error_details = {...}` at `plan_executor.py:818-822` with a dict merge (e.g., `plan.error_details = {**(plan.error_details or {}), "exception_type": ..., "traceback": ..., "mode": "stub"}`) - [ ] Replace direct assignment `plan.error_details = {...}` at `plan_executor.py:764-768` (success path) with a dict merge to preserve any pre-existing structured metadata - [ ] Add a BDD scenario in `features/` covering the merge behaviour: verify that after `ErrorRecoveryService` writes structured metadata and execution subsequently fails, `plan.error_details` contains both the structured fields and the raw exception fields - [ ] Verify `agents plan errors <PLAN_ID>` output includes both structured error recovery fields (`error_category`, `retry_count`, `recovery_action`, etc.) and raw exception info (`exception_type`, `traceback`) - [ ] Update any related unit tests that assert the exact shape of `plan.error_details` after failure ## Definition of Done - [ ] Direct assignment at `plan_executor.py:818-822` replaced with a non-destructive dict merge - [ ] Direct assignment at `plan_executor.py:764-768` replaced with a non-destructive dict merge - [ ] BDD scenario added and passing that asserts structured metadata is preserved after failure - [ ] `agents plan errors <PLAN_ID>` displays both structured error recovery data and raw exception info - [ ] No regressions in existing `plan_executor` and `error_recovery_service` tests - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.3.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:49 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#4036
No description provided.