fix(plan): executor overwrites error_details destroying strategy_decisions_json and hardcodes mode as stub #10874

Closed
opened 2026-04-27 14:35:42 +00:00 by hamza.khyari · 0 comments
Member

Summary

_run_execute_with_stub() in PlanExecutor has three bugs:

  1. Overwrites error_details entirely (line 1046) — replaces the plan's error_details with {"mode": "stub", "tool_calls_count": ..., "sandbox_refs_count": ...}, destroying strategy_decisions_json stored by run_strategize() at line 774. On execute retry, _build_decisions() can't find the stored strategy and falls back to parsing definition_of_done — losing the full decision hierarchy, dependency ordering, and parent/child structure.

  2. "mode": "stub" hardcoded (lines 1049, 1113, 1122) — always reports the execute mode as "stub" even when LLMExecuteActor ran a real LLM call. Misleading for debugging and monitoring.

  3. Method misnamed_run_execute_with_stub runs whatever actor was passed to the constructor (including LLMExecuteActor). The name implies it only runs the stub actor.

Metadata

  • Commit Message: fix(plan): preserve strategy_decisions_json in error_details during execute and report actual actor mode
  • Branch: bugfix/executor-error-details-overwrite

Impact

  • Silent strategy loss: After a failed execute + retry, the LLM re-executes with a degraded strategy (parsed from definition_of_done text instead of the full decision tree). This produces lower quality results — the LLM loses step ordering, dependencies, and parent/child relationships.
  • Misleading diagnostics: All plans report mode: stub in error_details regardless of whether a real LLM or stub actor executed them.

Fix

  1. Merge error_details instead of replacing:

    existing = dict(plan.error_details or {})
    existing.update({
        "tool_calls_count": str(result.tool_calls_count),
        "sandbox_refs_count": str(len(result.sandbox_refs)),
        "mode": type(self._execute_actor).__name__,
    })
    plan.error_details = existing
    
  2. Report actual actor type instead of hardcoded "stub":

    "mode": type(self._execute_actor).__name__,
    
  3. Rename method from _run_execute_with_stub to _run_execute_with_actor.

Subtasks

  • Change line 1046 to merge error_details instead of replacing (preserve strategy_decisions_json)
  • Change lines 1049, 1113, 1122 to use type(self._execute_actor).__name__ instead of "stub"
  • Rename _run_execute_with_stub to _run_execute_with_actor
  • Add Behave test verifying strategy_decisions_json survives execute
  • Add Behave test verifying mode reflects actual actor type

Definition of Done

  • strategy_decisions_json preserved in error_details after execute
  • mode reflects actual actor type (LLMExecuteActor, ExecuteStubActor, etc.)
  • Method renamed
  • Behave tests pass
  • M1 E2E passes
  • All 15 scenarios pass
## Summary `_run_execute_with_stub()` in `PlanExecutor` has three bugs: 1. **Overwrites `error_details` entirely** (line 1046) — replaces the plan's `error_details` with `{"mode": "stub", "tool_calls_count": ..., "sandbox_refs_count": ...}`, destroying `strategy_decisions_json` stored by `run_strategize()` at line 774. On execute retry, `_build_decisions()` can't find the stored strategy and falls back to parsing `definition_of_done` — losing the full decision hierarchy, dependency ordering, and parent/child structure. 2. **`"mode": "stub"` hardcoded** (lines 1049, 1113, 1122) — always reports the execute mode as "stub" even when `LLMExecuteActor` ran a real LLM call. Misleading for debugging and monitoring. 3. **Method misnamed** — `_run_execute_with_stub` runs whatever actor was passed to the constructor (including `LLMExecuteActor`). The name implies it only runs the stub actor. ## Metadata - **Commit Message**: `fix(plan): preserve strategy_decisions_json in error_details during execute and report actual actor mode` - **Branch**: `bugfix/executor-error-details-overwrite` ## Impact - **Silent strategy loss**: After a failed execute + retry, the LLM re-executes with a degraded strategy (parsed from `definition_of_done` text instead of the full decision tree). This produces lower quality results — the LLM loses step ordering, dependencies, and parent/child relationships. - **Misleading diagnostics**: All plans report `mode: stub` in error_details regardless of whether a real LLM or stub actor executed them. ## Fix 1. **Merge** `error_details` instead of replacing: ```python existing = dict(plan.error_details or {}) existing.update({ "tool_calls_count": str(result.tool_calls_count), "sandbox_refs_count": str(len(result.sandbox_refs)), "mode": type(self._execute_actor).__name__, }) plan.error_details = existing ``` 2. **Report actual actor type** instead of hardcoded "stub": ```python "mode": type(self._execute_actor).__name__, ``` 3. **Rename method** from `_run_execute_with_stub` to `_run_execute_with_actor`. ## Subtasks - [ ] Change line 1046 to merge error_details instead of replacing (preserve `strategy_decisions_json`) - [ ] Change lines 1049, 1113, 1122 to use `type(self._execute_actor).__name__` instead of `"stub"` - [ ] Rename `_run_execute_with_stub` to `_run_execute_with_actor` - [ ] Add Behave test verifying `strategy_decisions_json` survives execute - [ ] Add Behave test verifying mode reflects actual actor type ## Definition of Done - `strategy_decisions_json` preserved in `error_details` after execute - `mode` reflects actual actor type (`LLMExecuteActor`, `ExecuteStubActor`, etc.) - Method renamed - Behave tests pass - M1 E2E passes - All 15 scenarios pass
hamza.khyari added this to the v3.5.0 milestone 2026-04-27 14:35:42 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#10874
No description provided.