BUG-HUNT: [concurrency] PlanExecutor._run_execute_with_stub double-counts guardrail steps on retry — retries always hit step limit even when max_steps allows them #6559

New issue

Open

opened 2026-04-09 21:19:56 +00:00 by HAL9000 · 0 comments

HAL9000 commented

2026-04-09 21:19:56 +00:00

Owner

Bug Report: [concurrency] — `PlanExecutor._run_execute_with_stub` Double-Counts Steps on Retry

Severity Assessment

Impact: When ErrorRecoveryService approves a retry after a recoverable error, the second attempt immediately hits the guardrail step limit (even if max_steps would allow the full retry), causing every retry to fail with PlanError: Guardrail step limit reached. This makes the retry logic in _run_execute_with_stub functionally useless when AutonomyGuardrailService is configured.
Likelihood: High — affects every plan that (a) has a step limit configured, (b) has error recovery with retries enabled, and (c) encounters a recoverable error during execute.
Priority: High

Location

File: src/cleveragents/application/services/plan_executor.py
Function: _run_execute_with_stub
Lines: ~930–1045
Secondary: _enforce_guardrails_per_step (~line 821) and AutonomyGuardrailService.check_step_limit (~line 104 in autonomy_guardrail_service.py)

Description

_run_execute_with_stub implements a retry loop driven by ErrorRecoveryService. Before iterating the execute actor, it loops over all decisions and calls _enforce_guardrails_per_step() for each one:

for attempt in range(max_attempts):
    try:
        # Enforce per-step guardrails for each decision
        for _decision in decisions:
            self._enforce_guardrails_per_step(plan_id)   # increments step_count each call
        
        result = self._execute_actor.execute(...)
        # ...
    except Exception as exc:
        # ...
        if self._error_recovery.should_retry(plan_id):
            continue  # go to next attempt

_enforce_guardrails_per_step reads guardrails.step_count, computes next_step = step_count + 1, calls check_step_limit(plan_id, next_step), and check_step_limit permanently sets guardrails.step_count = next_step:

# autonomy_guardrail_service.py
def check_step_limit(self, plan_id, current_step):
    with self._lock:
        guardrails = self._guardrails.get(plan_id)
        guardrails.step_count = current_step   # mutates permanent state
        allowed, reason = guardrails.check_step_limit()
        ...

The step count is never reset between retry attempts. If there are N decisions and the plan's max_steps = N, the flow is:

Attempt 1: steps 1→N are checked and counted → execute actor raises exception → retry approved
Attempt 2: _enforce_guardrails_per_step checks step N+1 → guardrails.step_count = N+1 > N = max_steps → PlanError: Guardrail step limit reached for plan ... at step N+1

The retry immediately fails at the very first decision of the second attempt, regardless of whether the step limit was chosen to allow the retry.

Evidence

# plan_executor.py ~line 930
def _run_execute_with_stub(self, plan_id, stream_callback=None):
    plan = self._guard_execute(plan_id)
    decisions = self._build_decisions(plan)   # N decisions

    self._lifecycle.start_execute(plan_id)
    # ...
    for attempt in range(max_attempts):
        try:
            for _decision in decisions:
                self._enforce_guardrails_per_step(plan_id)
                # ^ step_count = 1, 2, ..., N on attempt 0
                # ^ step_count = N+1, N+2, ... on attempt 1 -- PROBLEM

# plan_executor.py ~line 821
def _enforce_guardrails_per_step(self, plan_id):
    guardrails = self._guardrail_service.get_guardrails(plan_id)
    next_step = guardrails.step_count + 1
    if not self._guardrail_service.check_step_limit(plan_id, next_step):
        raise PlanError(f"Guardrail step limit reached ...")

# autonomy_guardrail_service.py ~line 123
guardrails.step_count = current_step   # permanently sets to next_step

Expected Behavior

Step counting should be scoped to the attempt, not accumulated across all attempts. Each retry should start from the step count at the beginning of the attempt (or reset to 0). The step limit represents "steps allowed per execution attempt", not "total lifetime steps across all retries".

Actual Behavior

Every retry attempt immediately exceeds the step limit because step_count is never reset. Even with max_steps = 100 and N = 10 decisions, the second attempt will check step 11 and fail (because the first attempt already consumed steps 1–10 and max_steps = 10).

Suggested Fix

Option 1: Reset guardrails.step_count to 0 (or its value before the attempt started) at the beginning of each retry attempt:

initial_step_count = (guardrails.step_count if guardrails else 0)
for attempt in range(max_attempts):
    if attempt > 0 and self._guardrail_service:
        guardrails = self._guardrail_service.get_guardrails(plan_id)
        if guardrails:
            guardrails.step_count = initial_step_count
    try:
        for _decision in decisions:
            self._enforce_guardrails_per_step(plan_id)
        ...

Option 2: Add a reset_step_count(plan_id) method to AutonomyGuardrailService and call it before each retry.

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [concurrency] — `PlanExecutor._run_execute_with_stub` Double-Counts Steps on Retry ### Severity Assessment - **Impact**: When `ErrorRecoveryService` approves a retry after a recoverable error, the second attempt immediately hits the guardrail step limit (even if `max_steps` would allow the full retry), causing every retry to fail with `PlanError: Guardrail step limit reached`. This makes the retry logic in `_run_execute_with_stub` functionally useless when `AutonomyGuardrailService` is configured. - **Likelihood**: High — affects every plan that (a) has a step limit configured, (b) has error recovery with retries enabled, and (c) encounters a recoverable error during execute. - **Priority**: High ### Location - **File**: `src/cleveragents/application/services/plan_executor.py` - **Function**: `_run_execute_with_stub` - **Lines**: ~930–1045 - **Secondary**: `_enforce_guardrails_per_step` (~line 821) and `AutonomyGuardrailService.check_step_limit` (~line 104 in `autonomy_guardrail_service.py`) ### Description `_run_execute_with_stub` implements a retry loop driven by `ErrorRecoveryService`. Before iterating the execute actor, it loops over all decisions and calls `_enforce_guardrails_per_step()` for each one: ```python for attempt in range(max_attempts): try: # Enforce per-step guardrails for each decision for _decision in decisions: self._enforce_guardrails_per_step(plan_id) # increments step_count each call result = self._execute_actor.execute(...) # ... except Exception as exc: # ... if self._error_recovery.should_retry(plan_id): continue # go to next attempt ``` `_enforce_guardrails_per_step` reads `guardrails.step_count`, computes `next_step = step_count + 1`, calls `check_step_limit(plan_id, next_step)`, and `check_step_limit` **permanently sets** `guardrails.step_count = next_step`: ```python # autonomy_guardrail_service.py def check_step_limit(self, plan_id, current_step): with self._lock: guardrails = self._guardrails.get(plan_id) guardrails.step_count = current_step # mutates permanent state allowed, reason = guardrails.check_step_limit() ... ``` **The step count is never reset between retry attempts.** If there are `N` decisions and the plan's `max_steps = N`, the flow is: 1. **Attempt 1**: steps 1→N are checked and counted → execute actor raises exception → retry approved 2. **Attempt 2**: `_enforce_guardrails_per_step` checks step `N+1` → `guardrails.step_count = N+1 > N = max_steps` → `PlanError: Guardrail step limit reached for plan ... at step N+1` The retry immediately fails at the very first decision of the second attempt, regardless of whether the step limit was chosen to allow the retry. ### Evidence ```python # plan_executor.py ~line 930 def _run_execute_with_stub(self, plan_id, stream_callback=None): plan = self._guard_execute(plan_id) decisions = self._build_decisions(plan) # N decisions self._lifecycle.start_execute(plan_id) # ... for attempt in range(max_attempts): try: for _decision in decisions: self._enforce_guardrails_per_step(plan_id) # ^ step_count = 1, 2, ..., N on attempt 0 # ^ step_count = N+1, N+2, ... on attempt 1 -- PROBLEM ``` ```python # plan_executor.py ~line 821 def _enforce_guardrails_per_step(self, plan_id): guardrails = self._guardrail_service.get_guardrails(plan_id) next_step = guardrails.step_count + 1 if not self._guardrail_service.check_step_limit(plan_id, next_step): raise PlanError(f"Guardrail step limit reached ...") ``` ```python # autonomy_guardrail_service.py ~line 123 guardrails.step_count = current_step # permanently sets to next_step ``` ### Expected Behavior Step counting should be scoped to the *attempt*, not accumulated across all attempts. Each retry should start from the step count at the beginning of the attempt (or reset to 0). The step limit represents "steps allowed per execution attempt", not "total lifetime steps across all retries". ### Actual Behavior Every retry attempt immediately exceeds the step limit because `step_count` is never reset. Even with `max_steps = 100` and `N = 10` decisions, the second attempt will check step 11 and fail (because the first attempt already consumed steps 1–10 and `max_steps = 10`). ### Suggested Fix Option 1: Reset `guardrails.step_count` to 0 (or its value before the attempt started) at the beginning of each retry attempt: ```python initial_step_count = (guardrails.step_count if guardrails else 0) for attempt in range(max_attempts): if attempt > 0 and self._guardrail_service: guardrails = self._guardrail_service.get_guardrails(plan_id) if guardrails: guardrails.step_count = initial_step_count try: for _decision in decisions: self._enforce_guardrails_per_step(plan_id) ... ``` Option 2: Add a `reset_step_count(plan_id)` method to `AutonomyGuardrailService` and call it before each retry. ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter

HAL9000 added the

labels

2026-04-09 21:26:08 +00:00

HAL9000 added this to the v3.2.0 milestone

2026-04-09 21:27:52 +00:00

HAL9000 referenced this issue