UAT: ErrorRecoveryService uses hardcoded max_retries=3 instead of SafetyProfile.max_retries_per_step — retry limit is not driven by the plan's automation profile #4040

Open
opened 2026-04-06 09:00:51 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/v3.6.0/error-recovery-service-max-retries-safety-profile
  • Commit Message: fix(error-recovery): read max_retries_per_step from SafetyProfile instead of hardcoded default
  • Milestone: Backlog (no milestone — see backlog note below)
  • Parent Epic: #3370

Backlog note: This issue was discovered during autonomous operation
on milestone v3.6.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Bug Description

The spec (docs/specification.md, line 28392) defines max_retries_per_step as a field of SafetyProfile:

max_retries_per_step | integer | 3 | Maximum retry attempts per action step | Limits the number of retries for a single step before escalating to the user. Range: 0–100.

This field is part of the plan's automation profile and should control how many times a failing step is retried. However, ErrorRecoveryService uses its own max_retries constructor parameter (defaulting to 3) and never reads SafetyProfile.max_retries_per_step from the plan.

What Was Tested

  • src/cleveragents/application/services/error_recovery_service.py:54max_retries: int = 3 is a constructor parameter with a hardcoded default
  • src/cleveragents/application/services/error_recovery_service.py:66-68ErrorRecoveryPolicy is created with this hardcoded value
  • src/cleveragents/domain/models/core/safety_profile.py:138-143max_retries_per_step: int = Field(3, ge=0, le=100, ...) is defined in SafetyProfile
  • src/cleveragents/tool/lifecycle.py:916-924max_retries_per_step IS used in tool lifecycle to block execution when exceeded, but this is a separate enforcement path
  • src/cleveragents/application/services/error_recovery_service.py — Never reads plan.automation_profile.safety.max_retries_per_step

Expected Behavior (from spec)

When ErrorRecoveryService determines whether to retry a failed step, it should use plan.automation_profile.safety.max_retries_per_step as the retry limit. This allows operators to configure retry behavior per plan via the automation profile (e.g., manual profile = 3 retries, custom profile = 0 retries for no auto-retry).

Actual Behavior

ErrorRecoveryService always uses the hardcoded max_retries=3 value passed at construction time, regardless of the plan's automation profile. This means:

  1. Changing max_retries_per_step in the automation profile has no effect on ErrorRecoveryService retry behavior
  2. The retry limit is not per-plan but per-service-instance
  3. The spec's intent to make retry behavior configurable via automation profiles is not honored

Code Locations

File Line(s) Issue
src/cleveragents/application/services/error_recovery_service.py 54 Hardcoded max_retries=3 constructor default
src/cleveragents/application/services/error_recovery_service.py 66–68 ErrorRecoveryPolicy created with hardcoded value
src/cleveragents/domain/models/core/safety_profile.py 138–143 max_retries_per_step field (never read by ErrorRecoveryService)
src/cleveragents/domain/models/core/automation_profile.py 446, 469, 492, 515, 538, 561, 584, 607 All built-in profiles set max_retries_per_step=3 (never read by ErrorRecoveryService)

Steps to Reproduce

  1. Create a plan with a custom automation profile where max_retries_per_step=0 (no auto-retry)
  2. Run a plan that fails during execute phase
  3. Observe that ErrorRecoveryService still retries 3 times instead of 0

Subtasks

  • Audit ErrorRecoveryService.__init__ and remove (or demote to fallback) the hardcoded max_retries=3 constructor parameter
  • Update ErrorRecoveryService to accept the plan context and read plan.automation_profile.safety.max_retries_per_step when determining the retry limit
  • Update ErrorRecoveryPolicy creation (lines 66–68) to use the plan's safety profile value
  • Verify consistency with the tool/lifecycle.py:916-924 enforcement path (both should use the same source of truth)
  • Write BDD unit test scenarios (Behave) covering: retry limit respected from SafetyProfile, max_retries_per_step=0 disables auto-retry, custom profile overrides default
  • Update or add integration tests (Robot Framework) if the error recovery flow is covered there
  • Update docstrings and type annotations to reflect the new behaviour

Definition of Done

  • ErrorRecoveryService reads max_retries_per_step from plan.automation_profile.safety at runtime
  • Hardcoded max_retries=3 default is removed or used only as a last-resort fallback with no plan context
  • Setting max_retries_per_step=0 on a plan's automation profile prevents ErrorRecoveryService from retrying
  • BDD unit test scenarios (Behave) cover all retry-limit scenarios
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/v3.6.0/error-recovery-service-max-retries-safety-profile` - **Commit Message**: `fix(error-recovery): read max_retries_per_step from SafetyProfile instead of hardcoded default` - **Milestone**: Backlog (no milestone — see backlog note below) - **Parent Epic**: #3370 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.6.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Bug Description The spec (`docs/specification.md`, line 28392) defines `max_retries_per_step` as a field of `SafetyProfile`: > `max_retries_per_step` | integer | `3` | Maximum retry attempts per action step | Limits the number of retries for a single step before escalating to the user. Range: 0–100. This field is part of the plan's automation profile and should control how many times a failing step is retried. However, `ErrorRecoveryService` uses its own `max_retries` constructor parameter (defaulting to `3`) and **never reads** `SafetyProfile.max_retries_per_step` from the plan. ### What Was Tested - `src/cleveragents/application/services/error_recovery_service.py:54` — `max_retries: int = 3` is a constructor parameter with a hardcoded default - `src/cleveragents/application/services/error_recovery_service.py:66-68` — `ErrorRecoveryPolicy` is created with this hardcoded value - `src/cleveragents/domain/models/core/safety_profile.py:138-143` — `max_retries_per_step: int = Field(3, ge=0, le=100, ...)` is defined in `SafetyProfile` - `src/cleveragents/tool/lifecycle.py:916-924` — `max_retries_per_step` IS used in tool lifecycle to block execution when exceeded, but this is a separate enforcement path - `src/cleveragents/application/services/error_recovery_service.py` — Never reads `plan.automation_profile.safety.max_retries_per_step` ### Expected Behavior (from spec) When `ErrorRecoveryService` determines whether to retry a failed step, it should use `plan.automation_profile.safety.max_retries_per_step` as the retry limit. This allows operators to configure retry behavior per plan via the automation profile (e.g., `manual` profile = 3 retries, custom profile = 0 retries for no auto-retry). ### Actual Behavior `ErrorRecoveryService` always uses the hardcoded `max_retries=3` value passed at construction time, regardless of the plan's automation profile. This means: 1. Changing `max_retries_per_step` in the automation profile has no effect on `ErrorRecoveryService` retry behavior 2. The retry limit is not per-plan but per-service-instance 3. The spec's intent to make retry behavior configurable via automation profiles is not honored ### Code Locations | File | Line(s) | Issue | |------|---------|-------| | `src/cleveragents/application/services/error_recovery_service.py` | 54 | Hardcoded `max_retries=3` constructor default | | `src/cleveragents/application/services/error_recovery_service.py` | 66–68 | `ErrorRecoveryPolicy` created with hardcoded value | | `src/cleveragents/domain/models/core/safety_profile.py` | 138–143 | `max_retries_per_step` field (never read by `ErrorRecoveryService`) | | `src/cleveragents/domain/models/core/automation_profile.py` | 446, 469, 492, 515, 538, 561, 584, 607 | All built-in profiles set `max_retries_per_step=3` (never read by `ErrorRecoveryService`) | ### Steps to Reproduce 1. Create a plan with a custom automation profile where `max_retries_per_step=0` (no auto-retry) 2. Run a plan that fails during execute phase 3. Observe that `ErrorRecoveryService` still retries 3 times instead of 0 ## Subtasks - [ ] Audit `ErrorRecoveryService.__init__` and remove (or demote to fallback) the hardcoded `max_retries=3` constructor parameter - [ ] Update `ErrorRecoveryService` to accept the plan context and read `plan.automation_profile.safety.max_retries_per_step` when determining the retry limit - [ ] Update `ErrorRecoveryPolicy` creation (lines 66–68) to use the plan's safety profile value - [ ] Verify consistency with the `tool/lifecycle.py:916-924` enforcement path (both should use the same source of truth) - [ ] Write BDD unit test scenarios (Behave) covering: retry limit respected from `SafetyProfile`, `max_retries_per_step=0` disables auto-retry, custom profile overrides default - [ ] Update or add integration tests (Robot Framework) if the error recovery flow is covered there - [ ] Update docstrings and type annotations to reflect the new behaviour ## Definition of Done - [ ] `ErrorRecoveryService` reads `max_retries_per_step` from `plan.automation_profile.safety` at runtime - [ ] Hardcoded `max_retries=3` default is removed or used only as a last-resort fallback with no plan context - [ ] Setting `max_retries_per_step=0` on a plan's automation profile prevents `ErrorRecoveryService` from retrying - [ ] BDD unit test scenarios (Behave) cover all retry-limit scenarios - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:46 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#4040
No description provided.