feat(ActorResult): implement ActorResult and NodeUsage types; capture per-node token counts from LangChain responses #14

Open
opened 2026-06-03 05:59:42 +00:00 by hurui200320 · 0 comments
Member

Background

Executor.execute() currently returns a plain str. The CleverThis router needs per-node token counts to calculate billing — prompt_tokens and completion_tokens from each LLM node invocation. These must be returned by the library; the router must not import LangChain directly.

Currently LLMAgent.process_message() reads only response.content from the LangChain response and discards all usage metadata.

Spec references: ADR-2027 (ActorResult and Token Counting), Actor Configuration Standard Glossary

Depends on: #13Executor must exist before execute() return type can be updated. Implement concurrently with #13 on the same feature branch. The structural precondition (Executor existing) is already met by the bot's partial implementation. Both tickets are blocked on #12 and both require modifying the same two methods (_execute_llm() and _execute_graph() in runtime.py) — splitting them across separate branches would cause double-churn and merge conflicts on those methods.

Current State (Post-Bot Commit e7a7d39)

A bot pushed e7a7d39 directly to master, partially touching the scope of this ticket.
Three critical deviations from the spec remain:

  1. Wrong module location: ActorResult and NodeUsage are defined in runtime.py rather than the spec'd cleveractors/result.py. All imports and __init__.py re-exports must be updated after the move.

  2. Estimated tokens instead of real LangChain metadata: Every execution path calls _estimate_tokens() (tiktoken when available, 4-chars/token heuristic otherwise). AC2 mandates extraction from response.usage_metadata with response.response_metadata.get("token_usage", {}) as fallback.

  3. LLMAgent and PureLangGraph internals untouched: process_message() and _execute_from_node() still discard all usage metadata — the bot's Executor bypasses them entirely and estimates instead.

What Is Currently Missing

  • cleveractors/result.py does not exist — ActorResult and NodeUsage are stranded in runtime.py (wrong location per AC1).
  • LLMAgent.process_message() reads only response.content — no token usage captured.
  • PureLangGraph._execute_from_node() does not collect per-node token data.
  • Executor.execute() returns ActorResult but all token counts are estimated (_estimate_tokens()), not read from LangChain usage_metadata.

Acceptance Criteria

  1. Define in cleveractors/result.py:
    @dataclass
    class NodeUsage:
        node_id: str
        provider: str
        model: str
        prompt_tokens: int
        completion_tokens: int
    
    @dataclass
    class ActorResult:
        response: str
        prompt_tokens: int      # sum over all nodes
        completion_tokens: int  # sum over all nodes
        nodes: list[NodeUsage]
    
  2. LLMAgent.process_message() extracts token usage from response.usage_metadata (primary) with fallback to response.response_metadata.get("token_usage", {}). If no usage data is available: log a warning and use 0.
  3. process_message() returns token counts alongside the response string.
  4. PureLangGraph._execute_from_node() collects (node_id, provider, model, prompt_tokens, completion_tokens) per LLM node invocation.
  5. Executor.execute() aggregates into ActorResult and returns it (breaking change from str).
  6. Aggregation invariant: result.prompt_tokens == sum(n.prompt_tokens for n in result.nodes).
  7. ActorResult and NodeUsage exported from cleveractors/__init__.py and __all__.

Subtasks

  • Create cleveractors/result.py; move NodeUsage and ActorResult from runtime.py into it; update runtime.py import
  • Refactor LLMAgent.process_message() to extract and return token usage from LangChain response (usage_metadata primary, response_metadata["token_usage"] fallback)
  • Update PureLangGraph._execute_from_node() to collect per-node token usage and thread it back through LLMAgent
  • Remove _estimate_tokens() from runtime.py; update Executor._execute_llm() and _execute_graph() to use real token counts from the refactored LLMAgent/PureLangGraph (coordinate with #13 — both modify the same methods)
  • Export ActorResult and NodeUsage from cleveractors/__init__.py and __all__ (done; verify import path after result.py move)
  • Write tests asserting the aggregation invariant with mock LangChain responses
  • Write tests for missing usage metadata fallback (log warning, counts = 0)
  • Verify project coverage threshold is maintained

Definition of Done

  • All subtasks checked off.
  • executor.execute(msg) returns ActorResult with response, prompt_tokens, completion_tokens, and nodes.
  • Aggregation invariant verified in tests.
  • from cleveractors import ActorResult, NodeUsage works without error.
  • All tests pass. Coverage at or above project threshold.
## Background `Executor.execute()` currently returns a plain `str`. The CleverThis router needs per-node token counts to calculate billing — `prompt_tokens` and `completion_tokens` from each LLM node invocation. These must be returned by the library; the router must not import LangChain directly. Currently `LLMAgent.process_message()` reads only `response.content` from the LangChain response and discards all usage metadata. **Spec references:** ADR-2027 (ActorResult and Token Counting), Actor Configuration Standard Glossary ~~**Depends on:** #13 — `Executor` must exist before `execute()` return type can be updated.~~ **Implement concurrently with #13 on the same feature branch.** The structural precondition (`Executor` existing) is already met by the bot's partial implementation. Both tickets are blocked on #12 and both require modifying the same two methods (`_execute_llm()` and `_execute_graph()` in `runtime.py`) — splitting them across separate branches would cause double-churn and merge conflicts on those methods. ## Current State (Post-Bot Commit `e7a7d39`) A bot pushed `e7a7d39` directly to `master`, partially touching the scope of this ticket. **Three critical deviations from the spec remain:** 1. **Wrong module location**: `ActorResult` and `NodeUsage` are defined in `runtime.py` rather than the spec'd `cleveractors/result.py`. All imports and `__init__.py` re-exports must be updated after the move. 2. **Estimated tokens instead of real LangChain metadata**: Every execution path calls `_estimate_tokens()` (tiktoken when available, 4-chars/token heuristic otherwise). AC2 mandates extraction from `response.usage_metadata` with `response.response_metadata.get("token_usage", {})` as fallback. 3. **`LLMAgent` and `PureLangGraph` internals untouched**: `process_message()` and `_execute_from_node()` still discard all usage metadata — the bot's `Executor` bypasses them entirely and estimates instead. ## What Is Currently Missing - `cleveractors/result.py` does not exist — `ActorResult` and `NodeUsage` are stranded in `runtime.py` (wrong location per AC1). - `LLMAgent.process_message()` reads only `response.content` — no token usage captured. - `PureLangGraph._execute_from_node()` does not collect per-node token data. - `Executor.execute()` returns `ActorResult` ✅ but all token counts are **estimated** (`_estimate_tokens()`), not read from LangChain `usage_metadata`. ## Acceptance Criteria 1. Define in `cleveractors/result.py`: ```python @dataclass class NodeUsage: node_id: str provider: str model: str prompt_tokens: int completion_tokens: int @dataclass class ActorResult: response: str prompt_tokens: int # sum over all nodes completion_tokens: int # sum over all nodes nodes: list[NodeUsage] ``` 2. `LLMAgent.process_message()` extracts token usage from `response.usage_metadata` (primary) with fallback to `response.response_metadata.get("token_usage", {})`. If no usage data is available: log a warning and use `0`. 3. `process_message()` returns token counts alongside the response string. 4. `PureLangGraph._execute_from_node()` collects `(node_id, provider, model, prompt_tokens, completion_tokens)` per LLM node invocation. 5. `Executor.execute()` aggregates into `ActorResult` and returns it (breaking change from `str`). 6. Aggregation invariant: `result.prompt_tokens == sum(n.prompt_tokens for n in result.nodes)`. 7. `ActorResult` and `NodeUsage` exported from `cleveractors/__init__.py` and `__all__`. ## Subtasks - [ ] Create `cleveractors/result.py`; move `NodeUsage` and `ActorResult` from `runtime.py` into it; update `runtime.py` import - [ ] Refactor `LLMAgent.process_message()` to extract and return token usage from LangChain response (`usage_metadata` primary, `response_metadata["token_usage"]` fallback) - [ ] Update `PureLangGraph._execute_from_node()` to collect per-node token usage and thread it back through `LLMAgent` - [ ] Remove `_estimate_tokens()` from `runtime.py`; update `Executor._execute_llm()` and `_execute_graph()` to use real token counts from the refactored `LLMAgent`/`PureLangGraph` *(coordinate with #13 — both modify the same methods)* - [x] Export `ActorResult` and `NodeUsage` from `cleveractors/__init__.py` and `__all__` *(done; verify import path after `result.py` move)* - [ ] Write tests asserting the aggregation invariant with mock LangChain responses - [ ] Write tests for missing usage metadata fallback (log warning, counts = 0) - [ ] Verify project coverage threshold is maintained ## Definition of Done - All subtasks checked off. - `executor.execute(msg)` returns `ActorResult` with `response`, `prompt_tokens`, `completion_tokens`, and `nodes`. - Aggregation invariant verified in tests. - `from cleveractors import ActorResult, NodeUsage` works without error. - All tests pass. Coverage at or above project threshold.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveractors-core#14
No description provided.