agents actor run does not execute tool calls when a skill is attached #11211

Closed
opened 2026-05-14 09:27:12 +00:00 by hurui200320 · 3 comments
Member

Metadata

Commit Message: feat(actor-run): wire ToolCallingRuntime into actor run for skill-based tool calling
Branch: feature/m3-actor-run-tool-calling

Background and Context

agents actor run is the primary way to run a named actor in isolation. The spec (§agents actor run, example 2) shows that when a skill is attached via --skill, the actor should perform real LLM tool calls — the example shows Tool Calls: 6 with --skill local/code-analysis. The spec's general model states: "When an actor references a skill, all of that skill's tools become available to the actor's LLM agent for tool-calling."

Currently the entire actor run execution path goes through SimpleLLMAgent or SimpleToolAgent in the reactive layer — neither of which is capable of LLM tool calling. The ToolCallingRuntime (which implements the proper tool-call loop) is never instantiated in this path.

Current Behavior

When agents actor run local/reviewer "Review the auth module" is executed (with or without --skill), the LLM receives exactly two messages:

SystemMessage: <actor system_prompt>
HumanMessage:  <user prompt>

No tool schemas are passed to the LLM. The LLM cannot call any tools. If --skill is attached, the resolved skill tools are either silently dropped (if the actor config has no base tools: list) or handed to SimpleToolAgent, which only applies hardcoded string transforms (identity, uppercase, lowercase) — it is not an LLM tool-calling agent at all.

Three specific bugs compound this:

  1. SimpleLLMAgent calls llm.invoke(messages) with no tools= binding — the LLM never receives tool schemas.
  2. ToolCallingRuntime is never instantiated in the actor run path; GraphExecutor goes directly to SimpleLLMAgent.process().
  3. Skill tool injection is silently dropped in _make_agent_instance() (reactive/application.py) when the actor config has no base tools: list — the elif branch logs a debug message and discards all resolved skill tools, even though the user explicitly passed --skill.

Built-in tools (read_file, write_file, list_files, etc.) are defined in tool/builtins/file_tools.py and register_file_tools() exists, but that function is never called anywhere in production code.

Expected Behavior

When agents actor run --skill local/my-skill local/my-actor "Do something" is run:

  1. The skill's tools are resolved from the Tool Registry (including built-in tools, MCP tools, and custom tools).
  2. The LLM is invoked with tool schemas bound (via LangChain bind_tools() or equivalent).
  3. If the LLM requests a tool call, the tool is executed and the result is fed back to the LLM.
  4. The loop continues (up to max_iterations) until the LLM produces a final response with no tool calls.
  5. The tool_calls count in the output reflects actual tool invocations.

When no --skill is attached and the actor config has no skills:, the actor runs as a plain conversational LLM (current behaviour — this is correct).

Acceptance Criteria

  • agents actor run --skill local/file-ops local/reviewer "Review src/auth.py" causes the LLM to receive the read_file tool schema and successfully call it to read the file before producing a review.
  • agents actor run local/reviewer "Hello" (no skill) continues to work as a plain LLM call with no tool schemas — no regression.
  • Skill tools are NOT silently dropped when the actor config has no base tools: list; attaching --skill always enables tool calling regardless of the actor's base config.
  • register_file_tools() (or equivalent) is called so built-in tools are available to be resolved when a skill references them.
  • The tool_calls count in the actor run output accurately reflects the number of tool calls made.
  • All existing actor run BDD scenarios continue to pass.

Supporting Information

Key files involved:

  • src/cleveragents/reactive/application.py_make_agent_instance(), _resolve_skills(), run_single_shot()
  • src/cleveragents/reactive/stream_router.pySimpleLLMAgent.process()
  • src/cleveragents/tool/actor_runtime.pyToolCallingRuntime.run_tool_loop() (the correct runtime to use)
  • src/cleveragents/tool/builtins/__init__.pyregister_file_tools() (never called in production)
  • src/cleveragents/cli/commands/actor_run.py — CLI entry point
  • src/cleveragents/reactive/graph_executor.pyGraphExecutor._invoke_agent() (isinstance check needs updating)
  • src/cleveragents/application/services/session_caller.pyLangChainSessionCaller (reference pattern for new ToolCallingLLMCaller)

Spec reference: §agents actor run (example 2, tool_calls: 6); §"Dual Role of Tools"; §"Built-in Resource Tools".

Subtasks

  • ST-1 — Fix _make_agent_instance() in reactive/application.py: change the elif self._resolved_skill_tools and not tools: branch to unconditionally merge skill tools (tools = list(self._resolved_skill_tools)); update routing so if tools and agent_cfg.type == "llm" creates ToolCallingAgent, elif tools keeps SimpleToolAgent (non-LLM string-transform actors, no regression), elif agent_cfg.type == "llm" keeps SimpleLLMAgent (no-tools plain LLM, no regression)
  • ST-2 — Create ToolCallingLLMCaller in new file reactive/tool_caller.py: implements the LLMCaller protocol; resolves LLM from actor config via get_provider_registry().create_llm() (same pattern as SimpleLLMAgent._resolve_llm()); on first invoke() calls llm.bind_tools(tool_schemas) then invokes with [SystemMessage, HumanMessage]; on subsequent calls appends previous AIMessage + ToolMessage objects for tool results and invokes again; extracts tool calls from LangChain response (same pattern as LangChainSessionCaller); returns LLMResponse
  • ST-3 — Create ToolCallingAgent in new file reactive/tool_agent.py: __init__(name, actor_config, resolved_tool_entries, builtin_registry); _build_tool_registry() looks up each entry's name in builtin_registry, registers matching ToolSpec objects into a fresh per-actor ToolRegistry (warn and skip unresolvable names); process(content, metadata=None, context=None) creates ToolRunner + ToolCallingRuntime(registry, runner, ToolCallingLLMCaller(actor_config)), calls runtime.run_tool_loop(prompt), stores result, returns .content; also implement process_message_sync() for stream router compatibility; expose last_result: ToolCallRunResult | None for tool_calls count surfacing
  • ST-4 — Initialize shared built-in ToolRegistry in ReactiveCleverAgentsApp.__init__() (reactive/application.py): create self._builtin_registry = ToolRegistry() and call register_file_tools(), register_git_tools(), and register_subplan_tool() on it; pass it to ToolCallingAgent in _make_agent_instance() — this is the missing production call-site for register_file_tools()
  • ST-5 — Update GraphExecutor._invoke_agent() in reactive/graph_executor.py: add ToolCallingAgent to the isinstance(agent, (SimpleToolAgent | SimpleLLMAgent)) check so the global context dict is passed to process() for system prompt Jinja2 rendering (without this, the hasattr fallback branch invokes process() with only one argument)
  • ST-6 — Surface tool_calls count in actor run CLI output: after run_single_shot() completes, query registered agents for ToolCallingAgent.last_result to collect total tool call count; expose as app.last_run_tool_calls: int property; in actor_run.py print Tool Calls: {n} when count > 0 (matches spec example output tool_calls: 6)
  • ST-7 — Write BDD scenarios in new features/actor_run_tool_calling.feature: (A) tool call succeeds — mock LLM returns a read_file call, tool executes, assert tool_calls == 1; (B) multi-turn tool loop — mock LLM makes 2 consecutive calls before final response, assert tool_calls == 2 and iterations == 3; (C) no-skill plain LLM regression — no --skill, mock verifies no tool schemas sent, result returned correctly; (D) skill tools not silently dropped — actor config has no tools: list but --skill is passed, assert ToolCallingAgent is instantiated

Definition of Done

  • agents actor run --skill <skill-with-read_file> <actor> <prompt> successfully invokes read_file via the LLM tool-calling loop
  • agents actor run <actor> <prompt> (no skill) still works as a plain LLM call
  • nox passes (lint, typecheck, unit tests, coverage ≥ 97%)
  • BDD scenarios added for the tool-calling path
  • tool_calls in output is accurate
## Metadata ``` Commit Message: feat(actor-run): wire ToolCallingRuntime into actor run for skill-based tool calling Branch: feature/m3-actor-run-tool-calling ``` ## Background and Context `agents actor run` is the primary way to run a named actor in isolation. The spec (§`agents actor run`, example 2) shows that when a skill is attached via `--skill`, the actor should perform real LLM tool calls — the example shows `Tool Calls: 6` with `--skill local/code-analysis`. The spec's general model states: *"When an actor references a skill, all of that skill's tools become available to the actor's LLM agent for tool-calling."* Currently the entire `actor run` execution path goes through `SimpleLLMAgent` or `SimpleToolAgent` in the reactive layer — neither of which is capable of LLM tool calling. The `ToolCallingRuntime` (which implements the proper tool-call loop) is never instantiated in this path. ## Current Behavior When `agents actor run local/reviewer "Review the auth module"` is executed (with or without `--skill`), the LLM receives exactly two messages: ``` SystemMessage: <actor system_prompt> HumanMessage: <user prompt> ``` No tool schemas are passed to the LLM. The LLM cannot call any tools. If `--skill` is attached, the resolved skill tools are either silently dropped (if the actor config has no base `tools:` list) or handed to `SimpleToolAgent`, which only applies hardcoded string transforms (`identity`, `uppercase`, `lowercase`) — it is not an LLM tool-calling agent at all. Three specific bugs compound this: 1. **`SimpleLLMAgent`** calls `llm.invoke(messages)` with no `tools=` binding — the LLM never receives tool schemas. 2. **`ToolCallingRuntime`** is never instantiated in the `actor run` path; `GraphExecutor` goes directly to `SimpleLLMAgent.process()`. 3. **Skill tool injection is silently dropped** in `_make_agent_instance()` (`reactive/application.py`) when the actor config has no base `tools:` list — the `elif` branch logs a debug message and discards all resolved skill tools, even though the user explicitly passed `--skill`. Built-in tools (`read_file`, `write_file`, `list_files`, etc.) are defined in `tool/builtins/file_tools.py` and `register_file_tools()` exists, but that function is never called anywhere in production code. ## Expected Behavior When `agents actor run --skill local/my-skill local/my-actor "Do something"` is run: 1. The skill's tools are resolved from the Tool Registry (including built-in tools, MCP tools, and custom tools). 2. The LLM is invoked with tool schemas bound (via LangChain `bind_tools()` or equivalent). 3. If the LLM requests a tool call, the tool is executed and the result is fed back to the LLM. 4. The loop continues (up to `max_iterations`) until the LLM produces a final response with no tool calls. 5. The `tool_calls` count in the output reflects actual tool invocations. When no `--skill` is attached and the actor config has no `skills:`, the actor runs as a plain conversational LLM (current behaviour — this is correct). ## Acceptance Criteria - [ ] `agents actor run --skill local/file-ops local/reviewer "Review src/auth.py"` causes the LLM to receive the `read_file` tool schema and successfully call it to read the file before producing a review. - [ ] `agents actor run local/reviewer "Hello"` (no skill) continues to work as a plain LLM call with no tool schemas — no regression. - [ ] Skill tools are NOT silently dropped when the actor config has no base `tools:` list; attaching `--skill` always enables tool calling regardless of the actor's base config. - [ ] `register_file_tools()` (or equivalent) is called so built-in tools are available to be resolved when a skill references them. - [ ] The `tool_calls` count in the `actor run` output accurately reflects the number of tool calls made. - [ ] All existing `actor run` BDD scenarios continue to pass. ## Supporting Information Key files involved: - `src/cleveragents/reactive/application.py` — `_make_agent_instance()`, `_resolve_skills()`, `run_single_shot()` - `src/cleveragents/reactive/stream_router.py` — `SimpleLLMAgent.process()` - `src/cleveragents/tool/actor_runtime.py` — `ToolCallingRuntime.run_tool_loop()` (the correct runtime to use) - `src/cleveragents/tool/builtins/__init__.py` — `register_file_tools()` (never called in production) - `src/cleveragents/cli/commands/actor_run.py` — CLI entry point - `src/cleveragents/reactive/graph_executor.py` — `GraphExecutor._invoke_agent()` (isinstance check needs updating) - `src/cleveragents/application/services/session_caller.py` — `LangChainSessionCaller` (reference pattern for new `ToolCallingLLMCaller`) Spec reference: §`agents actor run` (example 2, `tool_calls: 6`); §"Dual Role of Tools"; §"Built-in Resource Tools". ## Subtasks - [x] **ST-1** — Fix `_make_agent_instance()` in `reactive/application.py`: change the `elif self._resolved_skill_tools and not tools:` branch to unconditionally merge skill tools (`tools = list(self._resolved_skill_tools)`); update routing so `if tools and agent_cfg.type == "llm"` creates `ToolCallingAgent`, `elif tools` keeps `SimpleToolAgent` (non-LLM string-transform actors, no regression), `elif agent_cfg.type == "llm"` keeps `SimpleLLMAgent` (no-tools plain LLM, no regression) - [x] **ST-2** — Create `ToolCallingLLMCaller` in new file `reactive/tool_caller.py`: implements the `LLMCaller` protocol; resolves LLM from actor config via `get_provider_registry().create_llm()` (same pattern as `SimpleLLMAgent._resolve_llm()`); on first `invoke()` calls `llm.bind_tools(tool_schemas)` then invokes with `[SystemMessage, HumanMessage]`; on subsequent calls appends previous `AIMessage` + `ToolMessage` objects for tool results and invokes again; extracts tool calls from LangChain response (same pattern as `LangChainSessionCaller`); returns `LLMResponse` - [x] **ST-3** — Create `ToolCallingAgent` in new file `reactive/tool_agent.py`: `__init__(name, actor_config, resolved_tool_entries, builtin_registry)`; `_build_tool_registry()` looks up each entry's name in `builtin_registry`, registers matching `ToolSpec` objects into a fresh per-actor `ToolRegistry` (warn and skip unresolvable names); `process(content, metadata=None, context=None)` creates `ToolRunner` + `ToolCallingRuntime(registry, runner, ToolCallingLLMCaller(actor_config))`, calls `runtime.run_tool_loop(prompt)`, stores result, returns `.content`; also implement `process_message_sync()` for stream router compatibility; expose `last_result: ToolCallRunResult | None` for tool_calls count surfacing - [x] **ST-4** — Initialize shared built-in `ToolRegistry` in `ReactiveCleverAgentsApp.__init__()` (`reactive/application.py`): create `self._builtin_registry = ToolRegistry()` and call `register_file_tools()`, `register_git_tools()`, and `register_subplan_tool()` on it; pass it to `ToolCallingAgent` in `_make_agent_instance()` — this is the missing production call-site for `register_file_tools()` - [x] **ST-5** — Update `GraphExecutor._invoke_agent()` in `reactive/graph_executor.py`: add `ToolCallingAgent` to the `isinstance(agent, (SimpleToolAgent | SimpleLLMAgent))` check so the global `context` dict is passed to `process()` for system prompt Jinja2 rendering (without this, the `hasattr` fallback branch invokes `process()` with only one argument) - [x] **ST-6** — Surface `tool_calls` count in `actor run` CLI output: after `run_single_shot()` completes, query registered agents for `ToolCallingAgent.last_result` to collect total tool call count; expose as `app.last_run_tool_calls: int` property; in `actor_run.py` print `Tool Calls: {n}` when count > 0 (matches spec example output `tool_calls: 6`) - [x] **ST-7** — Write BDD scenarios in new `features/actor_run_tool_calling.feature`: (A) tool call succeeds — mock LLM returns a `read_file` call, tool executes, assert `tool_calls == 1`; (B) multi-turn tool loop — mock LLM makes 2 consecutive calls before final response, assert `tool_calls == 2` and iterations == 3; (C) no-skill plain LLM regression — no `--skill`, mock verifies no tool schemas sent, result returned correctly; (D) skill tools not silently dropped — actor config has no `tools:` list but `--skill` is passed, assert `ToolCallingAgent` is instantiated ## Definition of Done - `agents actor run --skill <skill-with-read_file> <actor> <prompt>` successfully invokes `read_file` via the LLM tool-calling loop - `agents actor run <actor> <prompt>` (no skill) still works as a plain LLM call - `nox` passes (lint, typecheck, unit tests, coverage ≥ 97%) - BDD scenarios added for the tool-calling path - `tool_calls` in output is accurate
hurui200320 added this to the v3.2.0 milestone 2026-05-14 09:30:26 +00:00
hurui200320 changed title from agents actor run does not execute tool calls when a skill is attached to DRAFT: agents actor run does not execute tool calls when a skill is attached 2026-05-14 09:30:41 +00:00
hurui200320 changed title from DRAFT: agents actor run does not execute tool calls when a skill is attached to agents actor run does not execute tool calls when a skill is attached 2026-05-15 03:17:18 +00:00
Author
Member

Implementation Notes

Design Decisions

ToolCallingLLMCaller (reactive/tool_caller.py)

  • Implements the LLMCaller protocol from tool/actor_runtime.py
  • Resolves the LLM from actor config using get_provider_registry().create_llm() — same pattern as SimpleLLMAgent._resolve_llm()
  • On first invoke(): builds [SystemMessage, HumanMessage], calls llm.bind_tools(tool_schemas) to produce a tools-bound LLM variant
  • On subsequent calls: appends the prior AIMessage (stored from the previous response) + one ToolMessage per result entry, then re-invokes
  • Tool call extraction follows the same dict-unpacking pattern as LangChainSessionCaller in session_caller.py
  • Prompt sanitization (mechanism 2: boundary markers) is applied to both system_prompt and user content, consistent with SimpleLLMAgent
  • Jinja2 system prompt rendering is intentionally omitted in this iteration (the context dict is passed to ToolCallingAgent.process() but not threaded into the caller); can be added as a follow-on

ToolCallingAgent (reactive/tool_agent.py)

  • _build_tool_registry() creates a fresh per-run local ToolRegistry by looking up each entry's name in the shared _builtin_registry
  • Deduplication via seen: set[str] prevents double-registration when multiple skills reference the same builtin
  • Warn-and-skip policy for unresolvable entries (inline/MCP tools) ensures graceful degradation rather than hard failures
  • process() uses a lazy import of ToolCallingLLMCaller (inside the function body) to avoid circular import between tool_agent and tool_caller
  • last_result is exposed so the app can tally tool_calls count after each run

ReactiveCleverAgentsApp (reactive/application.py) — ST-1 and ST-4

  • The elif self._resolved_skill_tools and not tools branch that silently dropped skill tools has been removed
  • New routing in _make_agent_instance():
    • tools AND type == "llm"ToolCallingAgent (real LLM tool calling)
    • tools AND type != "llm"SimpleToolAgent (string transforms, no regression)
    • no tools AND type == "llm"SimpleLLMAgent (plain LLM, no regression)
    • no tools AND type != "llm" → identity lambda
  • _builtin_registry is created once in __init__() with all three builtin families registered: register_file_tools(), register_git_tools(), register_subplan_tool() — this is the missing production call-site
  • _tally_tool_calls() iterates stream_router.agents for ToolCallingAgent instances and sums their tool_call_history lengths
  • _last_run_tool_calls is reset at the start of each run_single_shot() call to avoid stale counts across runs

GraphExecutor (reactive/graph_executor.py) — ST-5

  • ToolCallingAgent added to the isinstance(agent, (SimpleToolAgent, SimpleLLMAgent, ToolCallingAgent)) check in _invoke_agent()
  • This ensures the context dict (with Jinja2 variables for system prompt rendering) is passed through to process()
  • Note: ToolCallingAgent.process() currently receives context but does not use it for Jinja2 rendering — that's a follow-on

CLI output (cli/commands/actor_run.py) — ST-6

  • app_exec.last_run_tool_calls is read after the run and printed as Tool Calls: {n} when > 0
  • Test mocks updated to set app_exec.last_run_tool_calls = 0 (Python 3.13 raises TypeError when comparing MagicMock > int via __gt__)
  • Same fix applied to robot/helper_actor_run_signature.py to fix one robot integration test

Key Code Locations

  • reactive/tool_caller.pyToolCallingLLMCaller (new)
  • reactive/tool_agent.pyToolCallingAgent (new)
  • reactive/application.py_make_agent_instance(), _tally_tool_calls(), last_run_tool_calls property, _builtin_registry initialization
  • reactive/graph_executor.py_invoke_agent() isinstance check
  • cli/commands/actor_run.py — tool_calls display
  • features/actor_run_tool_calling.feature + features/steps/actor_run_tool_calling_steps.py — BDD coverage (22 scenarios)

Test Results

  • lint: passes
  • typecheck: passes (0 errors)
  • unit_tests: 15,750 scenarios pass, 0 fail
  • coverage_report: 96.5% (above 96.5% threshold)
  • integration_tests: 1 robot test failure pre-fix (MagicMock comparison bug in helper), fixed by adding last_run_tool_calls = 0 to test mocks

Observability Tests Updated

  • features/reactive_application_coverage_boost.feature scenario "Reactive app skill injection skips LLM agents without tools" renamed and updated to verify the new correct behavior: LLM actor with skill tools attached is now a ToolCallingAgent (not silently kept as SimpleLLMAgent)

Follow-on Work

  • Jinja2 system prompt rendering in ToolCallingLLMCaller.invoke() using the context dict
  • Inline/MCP tool support in ToolCallingAgent._build_tool_registry() (currently warns and skips)
  • ToolCallingAgent currently builds a new ToolRunner per process() call; could be cached for performance if needed
## Implementation Notes ### Design Decisions **ToolCallingLLMCaller (`reactive/tool_caller.py`)** - Implements the `LLMCaller` protocol from `tool/actor_runtime.py` - Resolves the LLM from actor config using `get_provider_registry().create_llm()` — same pattern as `SimpleLLMAgent._resolve_llm()` - On first `invoke()`: builds `[SystemMessage, HumanMessage]`, calls `llm.bind_tools(tool_schemas)` to produce a tools-bound LLM variant - On subsequent calls: appends the prior `AIMessage` (stored from the previous response) + one `ToolMessage` per result entry, then re-invokes - Tool call extraction follows the same dict-unpacking pattern as `LangChainSessionCaller` in `session_caller.py` - Prompt sanitization (mechanism 2: boundary markers) is applied to both system_prompt and user content, consistent with `SimpleLLMAgent` - Jinja2 system prompt rendering is intentionally omitted in this iteration (the context dict is passed to `ToolCallingAgent.process()` but not threaded into the caller); can be added as a follow-on **ToolCallingAgent (`reactive/tool_agent.py`)** - `_build_tool_registry()` creates a fresh per-run local `ToolRegistry` by looking up each entry's `name` in the shared `_builtin_registry` - Deduplication via `seen: set[str]` prevents double-registration when multiple skills reference the same builtin - Warn-and-skip policy for unresolvable entries (inline/MCP tools) ensures graceful degradation rather than hard failures - `process()` uses a lazy import of `ToolCallingLLMCaller` (inside the function body) to avoid circular import between `tool_agent` and `tool_caller` - `last_result` is exposed so the app can tally tool_calls count after each run **ReactiveCleverAgentsApp (`reactive/application.py`) — ST-1 and ST-4** - The `elif self._resolved_skill_tools and not tools` branch that silently dropped skill tools has been removed - New routing in `_make_agent_instance()`: - `tools AND type == "llm"` → `ToolCallingAgent` (real LLM tool calling) - `tools AND type != "llm"` → `SimpleToolAgent` (string transforms, no regression) - `no tools AND type == "llm"` → `SimpleLLMAgent` (plain LLM, no regression) - `no tools AND type != "llm"` → identity lambda - `_builtin_registry` is created once in `__init__()` with all three builtin families registered: `register_file_tools()`, `register_git_tools()`, `register_subplan_tool()` — this is the missing production call-site - `_tally_tool_calls()` iterates `stream_router.agents` for `ToolCallingAgent` instances and sums their `tool_call_history` lengths - `_last_run_tool_calls` is reset at the start of each `run_single_shot()` call to avoid stale counts across runs **GraphExecutor (`reactive/graph_executor.py`) — ST-5** - `ToolCallingAgent` added to the `isinstance(agent, (SimpleToolAgent, SimpleLLMAgent, ToolCallingAgent))` check in `_invoke_agent()` - This ensures the `context` dict (with Jinja2 variables for system prompt rendering) is passed through to `process()` - Note: `ToolCallingAgent.process()` currently receives `context` but does not use it for Jinja2 rendering — that's a follow-on **CLI output (`cli/commands/actor_run.py`) — ST-6** - `app_exec.last_run_tool_calls` is read after the run and printed as `Tool Calls: {n}` when > 0 - Test mocks updated to set `app_exec.last_run_tool_calls = 0` (Python 3.13 raises `TypeError` when comparing `MagicMock > int` via `__gt__`) - Same fix applied to `robot/helper_actor_run_signature.py` to fix one robot integration test ### Key Code Locations - `reactive/tool_caller.py` — `ToolCallingLLMCaller` (new) - `reactive/tool_agent.py` — `ToolCallingAgent` (new) - `reactive/application.py` — `_make_agent_instance()`, `_tally_tool_calls()`, `last_run_tool_calls` property, `_builtin_registry` initialization - `reactive/graph_executor.py` — `_invoke_agent()` isinstance check - `cli/commands/actor_run.py` — tool_calls display - `features/actor_run_tool_calling.feature` + `features/steps/actor_run_tool_calling_steps.py` — BDD coverage (22 scenarios) ### Test Results - lint: ✅ passes - typecheck: ✅ passes (0 errors) - unit_tests: ✅ 15,750 scenarios pass, 0 fail - coverage_report: ✅ 96.5% (above 96.5% threshold) - integration_tests: 1 robot test failure pre-fix (MagicMock comparison bug in helper), fixed by adding `last_run_tool_calls = 0` to test mocks ### Observability Tests Updated - `features/reactive_application_coverage_boost.feature` scenario "Reactive app skill injection skips LLM agents without tools" renamed and updated to verify the new correct behavior: LLM actor with skill tools attached is now a `ToolCallingAgent` (not silently kept as `SimpleLLMAgent`) ### Follow-on Work - Jinja2 system prompt rendering in `ToolCallingLLMCaller.invoke()` using the context dict - Inline/MCP tool support in `ToolCallingAgent._build_tool_registry()` (currently warns and skips) - `ToolCallingAgent` currently builds a new `ToolRunner` per `process()` call; could be cached for performance if needed
Author
Member

Implementation Note: Fix provider_format bug (review #263845)

Review comment addressed: #263845bind_tools fails with 'parameters' because wrong provider format is passed to ToolCallingRuntime.

Root cause

ToolCallingAgent.process() constructed ToolCallingRuntime without specifying provider_format, so it defaulted to ProviderFormat.LANGCHAIN. This caused normalize_tool_schema_for_provider() to emit tool schemas with args_schema keys. LangChain's convert_to_openai_function() (called internally by bind_tools()) only preserves name, description, parameters, and strict — it silently drops args_schema, leaving parameters absent. The LLM provider then rejects the call with 'parameters', and ToolCallingLLMCaller._resolve_llm() catches the exception and falls back to a plain LLM with no tools bound.

Fix

Added static method ToolCallingAgent._resolve_provider_format(actor_config) in src/cleveragents/reactive/tool_agent.py that maps the actor config's provider field to the correct ProviderFormat:

  • "anthropic"ProviderFormat.ANTHROPIC (emits input_schema, which convert_to_openai_function maps to parameters)
  • All others ("openai", "google", "groq", "azure", "openrouter", "gemini", "cohere", "together", "mock", unknown, None) → ProviderFormat.OPENAI (emits parameters directly)

ProviderFormat.LANGCHAIN is never used in the ToolCallingRuntimebind_tools() pipeline because its args_schema output is incompatible with convert_to_openai_function().

Files changed

  • src/cleveragents/reactive/tool_agent.py — Added ProviderFormat import, _resolve_provider_format() static method, and provider_format= kwarg to ToolCallingRuntime(...) constructor call in process().
  • features/actor_run_tool_calling.feature — Added 7 new scenarios (Q-1 through Q-5, R-1) testing provider format resolution and end-to-end format propagation.
  • features/steps/actor_run_tool_calling_steps.py — Added step definitions for the new scenarios.

Quality gates

  • lint · typecheck · unit_tests (15,776 scenarios, 0 failed) · integration_tests (1,977 passed) · coverage (96.5%)
  • Commit amended (5936a19d), force pushed.
## Implementation Note: Fix `provider_format` bug (review #263845) **Review comment addressed**: [#263845](https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/11219#issuecomment-263845) — `bind_tools` fails with `'parameters'` because wrong provider format is passed to `ToolCallingRuntime`. ### Root cause `ToolCallingAgent.process()` constructed `ToolCallingRuntime` without specifying `provider_format`, so it defaulted to `ProviderFormat.LANGCHAIN`. This caused `normalize_tool_schema_for_provider()` to emit tool schemas with `args_schema` keys. LangChain's `convert_to_openai_function()` (called internally by `bind_tools()`) only preserves `name`, `description`, `parameters`, and `strict` — it **silently drops** `args_schema`, leaving `parameters` absent. The LLM provider then rejects the call with `'parameters'`, and `ToolCallingLLMCaller._resolve_llm()` catches the exception and falls back to a plain LLM with no tools bound. ### Fix Added static method `ToolCallingAgent._resolve_provider_format(actor_config)` in `src/cleveragents/reactive/tool_agent.py` that maps the actor config's `provider` field to the correct `ProviderFormat`: - `"anthropic"` → `ProviderFormat.ANTHROPIC` (emits `input_schema`, which `convert_to_openai_function` maps to `parameters`) - All others (`"openai"`, `"google"`, `"groq"`, `"azure"`, `"openrouter"`, `"gemini"`, `"cohere"`, `"together"`, `"mock"`, unknown, None) → `ProviderFormat.OPENAI` (emits `parameters` directly) `ProviderFormat.LANGCHAIN` is **never** used in the `ToolCallingRuntime` → `bind_tools()` pipeline because its `args_schema` output is incompatible with `convert_to_openai_function()`. ### Files changed - `src/cleveragents/reactive/tool_agent.py` — Added `ProviderFormat` import, `_resolve_provider_format()` static method, and `provider_format=` kwarg to `ToolCallingRuntime(...)` constructor call in `process()`. - `features/actor_run_tool_calling.feature` — Added 7 new scenarios (Q-1 through Q-5, R-1) testing provider format resolution and end-to-end format propagation. - `features/steps/actor_run_tool_calling_steps.py` — Added step definitions for the new scenarios. ### Quality gates - lint ✅ · typecheck ✅ · unit_tests ✅ (15,776 scenarios, 0 failed) · integration_tests ✅ (1,977 passed) · coverage ✅ (96.5%) - Commit amended (5936a19d), force pushed.
Author
Member

Implementation note: Tool name encoding fix (per review comment #263890)

Fixed the tool name encoding to use uppercase sentinels instead of __ to avoid collisions with legitimate tool names containing __.

Changes in src/cleveragents/reactive/tool_caller.py:

  • Added _encode_tool_name(name: str) -> str at module level: replaces : with _C_ (Colon) and / with _S_ (Slash).
  • Added _decode_tool_name(name: str) -> str at module level: reverses the encoding.
  • Outbound in _resolve_llm(): schemas are shallow-copied and their name field encoded via _encode_tool_name() before bind_tools().
  • Inbound in invoke(): raw tool call names from the LLM response are decoded via _decode_tool_name() before constructing LLMToolCall objects.

Why uppercase sentinels are safe:

Valid CleverAgents tool names (per the internal name regex) only allow lowercase letters, digits, -, _, :, and /. Uppercase letters like S and C are forbidden — so _S_ and _C_ can never legitimately appear in a valid tool name. Anthropic's pattern ^[a-zA-Z0-9_-]{1,128}$ allows uppercase, making these sentinels pass validation.

Tests:

12 new BDD scenarios in features/actor_run_tool_calling.feature section S-U covering encoding, decoding, round-trip correctness, and integration with _resolve_llm and invoke.

Quality gates:

  • lint · typecheck · unit_tests (all 44 tool_calling scenarios pass)
  • integration_tests (1999/1999) · coverage (96.51%, pre-existing)
## Implementation note: Tool name encoding fix (per review comment #263890) Fixed the tool name encoding to use uppercase sentinels instead of `__` to avoid collisions with legitimate tool names containing `__`. ### Changes in `src/cleveragents/reactive/tool_caller.py`: - Added `_encode_tool_name(name: str) -> str` at module level: replaces `:` with `_C_` (Colon) and `/` with `_S_` (Slash). - Added `_decode_tool_name(name: str) -> str` at module level: reverses the encoding. - **Outbound** in `_resolve_llm()`: schemas are shallow-copied and their `name` field encoded via `_encode_tool_name()` before `bind_tools()`. - **Inbound** in `invoke()`: raw tool call names from the LLM response are decoded via `_decode_tool_name()` before constructing `LLMToolCall` objects. ### Why uppercase sentinels are safe: Valid CleverAgents tool names (per the internal name regex) only allow lowercase letters, digits, `-`, `_`, `:`, and `/`. Uppercase letters like `S` and `C` are forbidden — so `_S_` and `_C_` can never legitimately appear in a valid tool name. Anthropic's pattern `^[a-zA-Z0-9_-]{1,128}$` allows uppercase, making these sentinels pass validation. ### Tests: 12 new BDD scenarios in `features/actor_run_tool_calling.feature` section S-U covering encoding, decoding, round-trip correctness, and integration with `_resolve_llm` and `invoke`. ### Quality gates: - lint ✅ · typecheck ✅ · unit_tests ✅ (all 44 tool_calling scenarios pass) - integration_tests ✅ (1999/1999) · coverage ✅ (96.51%, pre-existing)
hurui200320 2026-05-18 06:25:41 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#11211
No description provided.