UAT: RuntimeExecuteActor.execute() records stub invocations instead of dispatching real tool calls based on decision content #3819

Open
opened 2026-04-06 06:43:46 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/runtime-execute-actor-real-tool-dispatch
  • Commit Message: fix(executor): implement real tool dispatch in RuntimeExecuteActor
  • Milestone: (backlog — see note below)
  • Parent Epic: #368

Backlog note: This issue was discovered during autonomous operation
on milestone v3.4.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Summary

RuntimeExecuteActor.execute() in src/cleveragents/application/services/plan_execution_context.py (lines 328-442) records stub tool invocations for every decision node instead of actually dispatching real tool calls based on the decision content. This means the "runtime" execute actor is functionally equivalent to the stub actor — it never actually executes any tools.

What Was Tested

Code-level analysis of RuntimeExecuteActor.execute() against the specification §19513-19551 (Tool-Based Resource Modification) and §19205-19226 (Execute Phase).

Expected Behavior (from spec)

Per specification §19513-19551:

  • The Execute phase should dispatch real tool calls based on the decision tree
  • LLMs call tools directly (edit_file(), write_file(), etc.)
  • Tools operate on the sandbox
  • ChangeSet is built from actual tool invocations

Per docs/reference/plan_execute.md:

  • RuntimeExecuteActor wraps ToolRunner to execute strategy decisions with changeset capture
  • In later milestones, this will dispatch real tool calls based on the decision content

Actual Behavior

In plan_execution_context.py lines 381-404, for each decision, the actor:

  1. Calls self._tool_runner.discover() (just lists available tools, doesn't execute any)
  2. Creates a ToolInvocation with tool_name="stub/execute-step" — a hardcoded stub name
  3. Records a ChangeEntry with operation=ChangeOperation.MODIFY and path=f"decisions/{decision_id}" — a synthetic path, not a real resource path

No actual tool is ever executed. The ToolRunner.execute() method is never called. The changeset contains only synthetic stub entries.

Code Location

  • src/cleveragents/application/services/plan_execution_context.py, lines 381-404
  • The comment on line 335 explicitly states: "For each decision, discovers available tools and records a stub invocation entry for each decision"
  • Line 388: tool_name="stub/execute-step" — hardcoded stub name

Impact

Plans in runtime mode (with execution_context set) appear to execute successfully but no actual tools are invoked. The changeset produced contains only synthetic entries. This means:

  • No real file modifications occur during Execute phase
  • The sandbox is never actually populated with work
  • Apply phase has nothing real to apply

Steps to Reproduce

  1. Create a PlanExecutionContext with a real ToolRunner
  2. Create a RuntimeExecuteActor with the context
  3. Call actor.execute(decisions) with real decisions
  4. Observe that ToolRunner.execute() is never called — only ToolRunner.discover() is called
  5. Observe that the changeset contains only stub/execute-step entries

Subtasks

  • Implement real tool dispatch in RuntimeExecuteActor.execute() based on decision content
  • Map decision step_text to appropriate tool calls
  • Ensure ToolRunner.execute() is called for each tool invocation
  • Record real ChangeEntry objects from actual tool outputs
  • Add Behave tests for real tool dispatch behavior
  • Run nox (all default sessions), fix any errors
  • Verify coverage >= 97% via nox -s coverage_report

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • RuntimeExecuteActor.execute() dispatches real tool calls via ToolRunner.execute() based on decision content
  • Changeset contains real change entries from actual tool invocations
  • All existing tests pass
  • New Behave tests cover the real dispatch behavior
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/runtime-execute-actor-real-tool-dispatch` - **Commit Message**: `fix(executor): implement real tool dispatch in RuntimeExecuteActor` - **Milestone**: *(backlog — see note below)* - **Parent Epic**: #368 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.4.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Summary `RuntimeExecuteActor.execute()` in `src/cleveragents/application/services/plan_execution_context.py` (lines 328-442) records stub tool invocations for every decision node instead of actually dispatching real tool calls based on the decision content. This means the "runtime" execute actor is functionally equivalent to the stub actor — it never actually executes any tools. ## What Was Tested Code-level analysis of `RuntimeExecuteActor.execute()` against the specification §19513-19551 (Tool-Based Resource Modification) and §19205-19226 (Execute Phase). ## Expected Behavior (from spec) Per specification §19513-19551: - The Execute phase should dispatch real tool calls based on the decision tree - LLMs call tools directly (`edit_file()`, `write_file()`, etc.) - Tools operate on the sandbox - ChangeSet is built from actual tool invocations Per `docs/reference/plan_execute.md`: - `RuntimeExecuteActor` wraps `ToolRunner` to execute strategy decisions with changeset capture - In later milestones, this will dispatch real tool calls based on the decision content ## Actual Behavior In `plan_execution_context.py` lines 381-404, for each decision, the actor: 1. Calls `self._tool_runner.discover()` (just lists available tools, doesn't execute any) 2. Creates a `ToolInvocation` with `tool_name="stub/execute-step"` — a hardcoded stub name 3. Records a `ChangeEntry` with `operation=ChangeOperation.MODIFY` and `path=f"decisions/{decision_id}"` — a synthetic path, not a real resource path No actual tool is ever executed. The `ToolRunner.execute()` method is never called. The changeset contains only synthetic stub entries. ## Code Location - `src/cleveragents/application/services/plan_execution_context.py`, lines 381-404 - The comment on line 335 explicitly states: "For each decision, discovers available tools and records a stub invocation entry for each decision" - Line 388: `tool_name="stub/execute-step"` — hardcoded stub name ## Impact Plans in runtime mode (with `execution_context` set) appear to execute successfully but no actual tools are invoked. The changeset produced contains only synthetic entries. This means: - No real file modifications occur during Execute phase - The sandbox is never actually populated with work - Apply phase has nothing real to apply ## Steps to Reproduce 1. Create a `PlanExecutionContext` with a real `ToolRunner` 2. Create a `RuntimeExecuteActor` with the context 3. Call `actor.execute(decisions)` with real decisions 4. Observe that `ToolRunner.execute()` is never called — only `ToolRunner.discover()` is called 5. Observe that the changeset contains only `stub/execute-step` entries ## Subtasks - [ ] Implement real tool dispatch in `RuntimeExecuteActor.execute()` based on decision content - [ ] Map decision `step_text` to appropriate tool calls - [ ] Ensure `ToolRunner.execute()` is called for each tool invocation - [ ] Record real `ChangeEntry` objects from actual tool outputs - [ ] Add Behave tests for real tool dispatch behavior - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage >= 97% via `nox -s coverage_report` ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - `RuntimeExecuteActor.execute()` dispatches real tool calls via `ToolRunner.execute()` based on decision content - Changeset contains real change entries from actual tool invocations - All existing tests pass - New Behave tests cover the real dispatch behavior - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass - Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3819
No description provided.