UAT: SubplanExecutionService._record_attempt() always sets was_retried=True — final failed attempt incorrectly reports a subsequent retry occurred #4010

Open
opened 2026-04-06 08:35:00 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/subplan-record-attempt-was-retried
  • Commit Message: fix(subplan): set was_retried=False on final failed attempt in _record_attempt()
  • Milestone: None (backlog)
  • Parent Epic: #368

Background and Context

SubplanExecutionService._record_attempt() always creates a SubplanAttempt with was_retried=True, even for the final failed attempt when no subsequent retry will occur. The was_retried field is documented as "Whether a subsequent attempt was made after this one" — it should be False for the last attempt.

This was discovered during UAT code-level analysis of src/cleveragents/application/services/subplan_execution_service.py, specifically _record_attempt() (lines 585–600) and _execute_one_with_retry() (lines 416–493).

Current Behavior (Bug)

@staticmethod
def _record_attempt(
    status: SubplanStatus,
    started_at: datetime,
    error: str | None,
) -> SubplanStatus:
    """Record a failed attempt in the status's previous_attempts."""
    attempt = SubplanAttempt(
        attempt_number=status.attempt_number,
        started_at=started_at,
        completed_at=datetime.now(tz=UTC),
        error=error,
        was_retried=True,  # BUG: always True, even for final attempt
    )
    ...

_record_attempt() is called before should_retry() is checked. If should_retry() returns False, the attempt is recorded with was_retried=True but no retry occurs.

Steps to Reproduce (Code-Level)

# Subplan with max_retries=1 that always fails with a retriable error
config = SubplanConfig(retry_failed=True, max_retries=1)
# After 2 attempts (1 initial + 1 retry), the second attempt is recorded
# with was_retried=True even though no third attempt will occur

Expected Behavior

SubplanAttempt.was_retried should be:

  • True when a subsequent retry attempt was made
  • False when this was the final attempt (retries exhausted or error is non-retriable)

Impact

Post-mortem analysis of subplan failures will show incorrect retry history. The was_retried field is used for "post-mortem analysis and informed retry decisions" (per the SubplanAttempt docstring). Incorrect values could mislead debugging and monitoring tools.

Subtasks

  • Fix _record_attempt() to accept a was_retried parameter (default False)
  • Pass was_retried=True only when should_retry() returns True
  • Add unit test (Behave) verifying the final failed attempt has was_retried=False
  • Verify existing retry tests still pass
  • Run nox (all default sessions), fix any errors
  • Verify coverage >= 97% via nox -s coverage_report

Definition of Done

  • SubplanAttempt.was_retried is True only when a subsequent attempt was actually made
  • The final failed attempt (when retries are exhausted) has was_retried=False
  • Unit tests (Behave) cover both the retried and final-failure cases
  • No regression in existing retry tests
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.4.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/subplan-record-attempt-was-retried` - **Commit Message**: `fix(subplan): set was_retried=False on final failed attempt in _record_attempt()` - **Milestone**: None (backlog) - **Parent Epic**: #368 ## Background and Context `SubplanExecutionService._record_attempt()` always creates a `SubplanAttempt` with `was_retried=True`, even for the final failed attempt when no subsequent retry will occur. The `was_retried` field is documented as "Whether a subsequent attempt was made after this one" — it should be `False` for the last attempt. This was discovered during UAT code-level analysis of `src/cleveragents/application/services/subplan_execution_service.py`, specifically `_record_attempt()` (lines 585–600) and `_execute_one_with_retry()` (lines 416–493). ## Current Behavior (Bug) ```python @staticmethod def _record_attempt( status: SubplanStatus, started_at: datetime, error: str | None, ) -> SubplanStatus: """Record a failed attempt in the status's previous_attempts.""" attempt = SubplanAttempt( attempt_number=status.attempt_number, started_at=started_at, completed_at=datetime.now(tz=UTC), error=error, was_retried=True, # BUG: always True, even for final attempt ) ... ``` `_record_attempt()` is called before `should_retry()` is checked. If `should_retry()` returns `False`, the attempt is recorded with `was_retried=True` but no retry occurs. ### Steps to Reproduce (Code-Level) ```python # Subplan with max_retries=1 that always fails with a retriable error config = SubplanConfig(retry_failed=True, max_retries=1) # After 2 attempts (1 initial + 1 retry), the second attempt is recorded # with was_retried=True even though no third attempt will occur ``` ## Expected Behavior `SubplanAttempt.was_retried` should be: - `True` when a subsequent retry attempt was made - `False` when this was the final attempt (retries exhausted or error is non-retriable) ## Impact Post-mortem analysis of subplan failures will show incorrect retry history. The `was_retried` field is used for "post-mortem analysis and informed retry decisions" (per the `SubplanAttempt` docstring). Incorrect values could mislead debugging and monitoring tools. ## Subtasks - [ ] Fix `_record_attempt()` to accept a `was_retried` parameter (default `False`) - [ ] Pass `was_retried=True` only when `should_retry()` returns `True` - [ ] Add unit test (Behave) verifying the final failed attempt has `was_retried=False` - [ ] Verify existing retry tests still pass - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage >= 97% via `nox -s coverage_report` ## Definition of Done - [ ] `SubplanAttempt.was_retried` is `True` only when a subsequent attempt was actually made - [ ] The final failed attempt (when retries are exhausted) has `was_retried=False` - [ ] Unit tests (Behave) cover both the retried and final-failure cases - [ ] No regression in existing retry tests - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.4.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:12:01 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#4010
No description provided.