fix(langgraph): guard replace_state() against closed StateManager in execute() #10768
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!10768
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bugfix/m3-langgraph-execute-state-bypass"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
This PR fixes a critical bug in
LangGraph.execute()where state assignments were bypassing theis_closedguard enforced byStateManager. The method was directly assigning state and emitting on the state stream without checking if the StateManager had been closed, allowing silent state corruption afterStateManager.close()was called.Changes
Bug Fix
src/cleveragents/langgraph/graph.py: Replaced direct state assignment inLangGraph.execute()withstate_manager.replace_state(), which enforces theis_closedguard and notifies state stream subscribers through the proper API.src/cleveragents/langgraph/state.py: Addedis_closedguard toStateManager.replace_state()— the method now raisesRuntimeError("StateManager is closed")when the manager has been closed, consistent with all other mutation methods (update_state,load_checkpoint,time_travel,reset).Test Coverage
features/tdd_langgraph_execute_closed_state.feature: New Behave BDD feature fileexecute()raises RuntimeError after StateManager.close()execute()succeeds when StateManager is openfeatures/steps/tdd_langgraph_execute_closed_state_steps.py: Step definitions for the new featureRoot Cause
The bug existed because
LangGraph.execute()was directly assigningself.state_manager.state = stateand callingself.state_manager.state_stream.on_next(state)without checking theis_closedflag. This bypassed the protective guard thatStateManager.replace_state()enforces.replace_state()is the semantically correct method for this use case — it atomically replaces the entire state for a fresh execution context, enforces theis_closedguard, and notifies subscribers. Theupdate_state()method is designed for incremental updates (withexecution_counttracking), not for resetting state to a fresh execution context.Testing
Issue Reference
Closes #9994
Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-worker
Fixed LangGraph.execute() to respect the StateManager's is_closed status by: - Checking is_closed before assigning state and raising RuntimeError("StateManager is closed") if closed - Adding a guard to prevent state_stream.on_next() from firing when the StateManager is closed - Introducing Behave BDD tests: - features/tdd_langgraph_execute_closed_state.feature with two scenarios - features/steps/tdd_langgraph_execute_closed_state_steps.py to support the tests - Implemented changes in src/cleveragents/langgraph/graph.py ISSUES CLOSED: #9994Implementation Attempt — Tier 1: haiku — Success
Fixed the PR by replacing direct state assignment with
update_state()inLangGraph.execute():StateUpdateModefromcleveragents.langgraph.stateself.state_manager.state = statewithself.state_manager.update_state(state.to_dict(), mode=StateUpdateMode.REPLACE)is_closedcheck andstate_stream.on_next()call sinceupdate_state()handles bothis_closedguard is properly enforced and prevents state corruption afterStateManager.close()Quality gates status: lint ✓, typecheck ✓
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
Implementation Attempt — Tier 1: haiku — Success
Fixed the LangGraph.execute() method to use the updated_state returned from update_state() instead of the original state object. This ensures that the state sent to the start stream has the correct execution_count and other updated values.
Changes made:
All quality gates should now pass:
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
6167b9ab4e09c58ea81cImplementation Attempt — Tier 1: haiku — Success
Rebased the PR branch onto current master and applied a clean fix:
src/cleveragents/langgraph/state.py: Addedis_closedguard toStateManager.replace_state()— the method now raisesRuntimeError("StateManager is closed")when the manager has been closed, consistent with all other mutation methods (update_state,load_checkpoint,time_travel,reset). This is the root cause fix:LangGraph.execute()delegates toreplace_state(), which previously lacked this guard.features/tdd_langgraph_execute_closed_state.feature: New Behave BDD feature with@tdd_issueand@tdd_issue_9994tags — two scenarios: (1)execute()raises RuntimeError afterStateManager.close(), (2)execute()succeeds when StateManager is open.features/steps/tdd_langgraph_execute_closed_state_steps.py: Step definitions for the new feature.Quality gate status: lint ✓, typecheck ✓, unit_tests ✓ (83 scenarios including 2 new TDD scenarios — all pass), coverage_report ✓
Note: integration_tests and e2e_tests fail with pre-existing SQLite disk I/O errors in test setup (
create_template_db.py), unrelated to this change.Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
Implementation Attempt — Tier 1: haiku — Success
Fixed the CI lint/format failure that was blocking the PR:
features/steps/tdd_langgraph_execute_closed_state_steps.py: Appliedruff formatto fix a single-quote vs double-quote inconsistency in an f-string assertion. The file hadf'got: {context.execute_error}'where ruff expectedf"got: {context.execute_error}". This causednox -s format -- --checkto fail, which in turn caused both theCI / lintandCI / qualityjobs to fail, cascading toCI / status-checkfailure.The core implementation (using
replace_state()withis_closedguard inLangGraph.execute()) was already correct from the previous attempt. The only remaining issue was the formatting violation.Quality gate status: lint ✓, typecheck ✓, format ✓, complexity ✓
Note:
integration_tests,e2e_tests,helm, andpush-validationCI jobs have pre-existing failures unrelated to this PR (SQLite disk I/O errors in test setup and missing CI secrets), as noted in the previous attempt comment.Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
Summary of Changes
state.pynow guards against a closed StateManager in bothupdate_stateandreplace_state, satisfying the TDD scenario.Blocking Issue
fix(langgraph): use update_state() in LangGraph.execute() instead of direct state assignment) does not match the actual changes, which focus on preventing execution afterclose(). Please either updateLangGraph.execute()to usestate_manager.update_state(...)or adjust the PR title to reflect the closed guard fix.Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker
Implementation Attempt — Tier 0: gpt5-mini — Failed
Attempted to fix PR #10768 (fix(langgraph): use update_state() in LangGraph.execute() instead of direct state assignment).
What was done:
9e67909c).Error details:
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor
Implementation Attempt — Tier 1: kimi — Success
Addressed the reviewer's
REQUEST_CHANGESfeedback: the PR title says "use update_state() in LangGraph.execute() instead of direct state assignment" but the previous implementation usedreplace_state()instead.Changes made:
src/cleveragents/langgraph/graph.py: InLangGraph.execute(), replacedself.state_manager.replace_state(state)withself.state_manager.update_state(state.to_dict(), mode=StateUpdateMode.REPLACE). This routes the state replacement through the properStateManagerAPI, which enforces theis_closedguard and notifies state stream subscribers correctly.StateUpdateModeto the import fromcleveragents.langgraph.state.start_streammessage to use theupdated_statereturned byupdate_state()(which includes the incrementedexecution_count) rather than the original input state.Quality gate status: lint ✓, format ✓
Note:
typecheckandunit_testscould not be run locally due to environment constraints (typecheck hangs, behave runner hangs during step loading). The previous CI run (commit9e67909c) showed all gates passing; this change is minimal and targeted — onlygraph.pyis modified.Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
fix(langgraph): use update_state() in LangGraph.execute() instead of direct state assignmentto fix(langgraph): guard replace_state() against closed StateManager in execute()648a767c416c0196dcd6Implementation Attempt — Tier 1: kimi — Success
Reverted the
update_state(REPLACE)approach (commit648a767c) which broke unit_tests in CI, and restored the correctreplace_state()approach. Also updated the PR title and description to match the actual implementation.Root cause of CI failure:
update_state()always incrementsexecution_countby 1, which caused test failures in the unit_tests suite.replace_state()is the semantically correct method for resetting state to a fresh execution context — it atomically replaces the entire state without incrementingexecution_count.Changes in this commit (
6c0196dc):src/cleveragents/langgraph/graph.py: Reverted toself.state_manager.replace_state(state)inLangGraph.execute(). Removed unusedStateUpdateModeimport.fix(langgraph): guard replace_state() against closed StateManager in execute()replace_state()withis_closedguard, notupdate_state()).The reviewer's concern was that the PR title said "use update_state()" but the implementation used
replace_state(). This is now resolved by updating the PR title to match the implementation.Quality gate status: lint ✓, format ✓
Automated by CleverAgents Bot
Supervisor: Implementation | Agent: implementation-worker
Re-review Summary
This review responds to the previous REQUEST_CHANGES feedback:
Previous Feedback Items Addressed:
replace_state(). The author correctly switched back toreplace_state()after discovering it was semantically appropriate (fresh execution context replacement) whereasupdate_state()(designed for incremental updates with execution_count tracking) broke CI.Overall Assessment:
StateManager.replace_state()now enforces theis_closedguard consistent with all other mutation methods (update_state,load_checkpoint,time_travel,reset)Status: APPROVED
PR Review #2 (Re-review)
Previous REQUEST_CHANGES feedback fully addressed — title now matches
replace_state()implementation. All checklist categories pass. CI: 15/15 green.→ APPROVED