Remove memory_enabled from LLM agent config — make all actors stateless by design #21

New Issue

2026-05-27T09:25:52Z

hurui200320 commented

2026-05-27 09:25:52 +00:00

Summary

Remove the memory_enabled and max_history fields from the LLM agent configuration. All conversation history management should be the responsibility of the caller, injected per-invocation via context.conversation_history. This makes actor behaviour predictable across all deployment targets and eliminates a class of subtle production bugs.

Problem

memory_enabled: true stores conversation history inside the agent object on the server. This creates deployment-dependent behaviour from the same configuration:

Deployment	Behaviour with `memory_enabled: true`
Long-running process	History accumulates correctly
Process restart / redeploy	History silently lost
Stateless handler (serverless, load-balanced)	History never accumulates — every request starts cold
Horizontally scaled service	History depends on which instance receives the request

The same YAML produces four different behaviours depending on how it is deployed. A developer who builds a working chatbot locally with memory_enabled: true and then deploys it behind a load balancer will encounter a bug that is invisible in the configuration and difficult to diagnose.

This is also in tension with §10.2 of the spec, which explicitly delegates cross-invocation persistence to the host environment. memory_enabled puts persistence inside the agent, contradicting the spec's own stated philosophy.

Proposal

Remove memory_enabled and max_history from the LLM agent configuration fields (§4.4).
Promote the existing context.conversation_history injection mechanism (§4.4.4) as the sole way to supply history to an LLM agent.
Keep the graph state messages list (§6.3.1) untouched — this is a different concept (within-execution coordination between nodes, not cross-invocation persistence) and is unaffected by this change.
Add a standard session pattern to the spec's appendix showing the canonical stateless usage with a simple dict-based history.
Provide a reference truncation utility in accompanying library code so callers do not have to re-implement the max_history logic themselves.

The §4.4.4 clause — "if conversation_history is in context, use it for the current invocation without persisting it" — already exists and already does exactly what is needed. This proposal makes it the only path rather than a secondary one.

Impact Analysis

No functional loss. Every capability provided by memory_enabled: true is fully replicable via context injection.

CLI tools — the simplest affected use case — require approximately four lines of caller code:

history = []
while True:
    query = input("> ")
    history.append({"role": "user", "content": query})
    response = actor.invoke({"messages": history, "query": query})
    history.append({"role": "assistant", "content": response})
    print(response)

A library-level Session wrapper can reduce this to a single call for the common case.

Multi-agent pipelines are actually improved. Currently each agent silently maintains its own separate history, so agents in the same pipeline cannot see each other's exchanges. With external history via context.conversation_history, all agents in a pipeline share the same view of the conversation — which is almost always the correct behaviour.

Graph-based actors are unaffected. Agent nodes inside a graph already receive conversation_history injected from the graph state's messages list (§6.2.2), independently of memory_enabled.

Stateless HTTP endpoints are improved. The current spec encourages a pattern that is incompatible with stateless deployment. Removing memory_enabled makes the spec honest about what a stateless endpoint requires.

Spec Consistency Improvement

Removing memory_enabled makes the LLM agent configuration fully consistent with how other time-varying inputs work. Every value that changes at invocation time flows through context:

What varies per invocation	Mechanism
Temperature	`context._temperature_override`
Conversation history	`context.conversation_history` (after this change: the only path)
Any application state	`context.<key>`

This is a uniform model. memory_enabled is the one exception to it, and removing it eliminates the exception.

Migration

Existing configurations using memory_enabled: true would need updating. The migration is mechanical:

Before:

agents:
  chat:
    type: llm
    config:
      provider: openai
      model: gpt-4
      memory_enabled: true
      max_history: 10

After:

agents:
  chat:
    type: llm
    config:
      provider: openai
      model: gpt-4

Calling code adds history management (or uses the provided Session wrapper).

What Is Not Proposed

The graph state messages list (§6.3.1) is not being changed.
The context.conversation_history injection mechanism (§4.4.4) is not being changed — it is being promoted.
No changes to tool agents, composite agents, or graph routes.
No changes to the runtime, operators, conditions, or any other part of the spec.

## Summary Remove the `memory_enabled` and `max_history` fields from the LLM agent configuration. All conversation history management should be the responsibility of the caller, injected per-invocation via `context.conversation_history`. This makes actor behaviour predictable across all deployment targets and eliminates a class of subtle production bugs. --- ## Problem `memory_enabled: true` stores conversation history inside the agent object on the server. This creates **deployment-dependent behaviour from the same configuration**: | Deployment | Behaviour with `memory_enabled: true` | |------------|--------------------------------------| | Long-running process | History accumulates correctly | | Process restart / redeploy | History silently lost | | Stateless handler (serverless, load-balanced) | History never accumulates — every request starts cold | | Horizontally scaled service | History depends on which instance receives the request | The same YAML produces four different behaviours depending on how it is deployed. A developer who builds a working chatbot locally with `memory_enabled: true` and then deploys it behind a load balancer will encounter a bug that is invisible in the configuration and difficult to diagnose. This is also in tension with §10.2 of the spec, which explicitly delegates cross-invocation persistence to the host environment. `memory_enabled` puts persistence inside the agent, contradicting the spec's own stated philosophy. --- ## Proposal 1. **Remove** `memory_enabled` and `max_history` from the LLM agent configuration fields (§4.4). 2. **Promote** the existing `context.conversation_history` injection mechanism (§4.4.4) as the sole way to supply history to an LLM agent. 3. **Keep** the graph state `messages` list (§6.3.1) untouched — this is a different concept (within-execution coordination between nodes, not cross-invocation persistence) and is unaffected by this change. 4. **Add** a standard session pattern to the spec's appendix showing the canonical stateless usage with a simple dict-based history. 5. **Provide** a reference truncation utility in accompanying library code so callers do not have to re-implement the `max_history` logic themselves. The §4.4.4 clause — *"if `conversation_history` is in context, use it for the current invocation without persisting it"* — already exists and already does exactly what is needed. This proposal makes it the only path rather than a secondary one. --- ## Impact Analysis **No functional loss.** Every capability provided by `memory_enabled: true` is fully replicable via context injection. **CLI tools** — the simplest affected use case — require approximately four lines of caller code: ```python history = [] while True: query = input("> ") history.append({"role": "user", "content": query}) response = actor.invoke({"messages": history, "query": query}) history.append({"role": "assistant", "content": response}) print(response) ``` A library-level `Session` wrapper can reduce this to a single call for the common case. **Multi-agent pipelines** are actually improved. Currently each agent silently maintains its own separate history, so agents in the same pipeline cannot see each other's exchanges. With external history via `context.conversation_history`, all agents in a pipeline share the same view of the conversation — which is almost always the correct behaviour. **Graph-based actors** are unaffected. Agent nodes inside a graph already receive `conversation_history` injected from the graph state's `messages` list (§6.2.2), independently of `memory_enabled`. **Stateless HTTP endpoints** are improved. The current spec encourages a pattern that is incompatible with stateless deployment. Removing `memory_enabled` makes the spec honest about what a stateless endpoint requires. --- ## Spec Consistency Improvement Removing `memory_enabled` makes the LLM agent configuration fully consistent with how other time-varying inputs work. Every value that changes at invocation time flows through `context`: | What varies per invocation | Mechanism | |---------------------------|-----------| | Temperature | `context._temperature_override` | | Conversation history | `context.conversation_history` (after this change: the only path) | | Any application state | `context.<key>` | This is a uniform model. `memory_enabled` is the one exception to it, and removing it eliminates the exception. --- ## Migration Existing configurations using `memory_enabled: true` would need updating. The migration is mechanical: **Before:** ```yaml agents: chat: type: llm config: provider: openai model: gpt-4 memory_enabled: true max_history: 10 ``` **After:** ```yaml agents: chat: type: llm config: provider: openai model: gpt-4 ``` Calling code adds history management (or uses the provided `Session` wrapper). --- ## What Is Not Proposed - The graph state `messages` list (§6.3.1) is **not** being changed. - The `context.conversation_history` injection mechanism (§4.4.4) is **not** being changed — it is being promoted. - No changes to tool agents, composite agents, or graph routes. - No changes to the runtime, operators, conditions, or any other part of the spec.

hurui200320 added the

labels 2026-05-27 09:25:58 +00:00

hurui200320 added Needs Feedback

State

Verified

Priority