UAT: ServiceRetryWiring and ErrorRecoveryService are implemented but not wired into the DI container — retry policies and error recovery are never applied at runtime #4027

Open
opened 2026-04-06 08:45:42 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/wire-service-retry-and-error-recovery-into-di-container
  • Commit Message: fix(container): wire ServiceRetryWiring and ErrorRecoveryService into DI container
  • Milestone: None (backlog)
  • Parent Epic: #368

What Was Tested

  • src/cleveragents/application/container.py — Searched entire file (968 lines) for ServiceRetryWiring, ErrorRecoveryService, service_retry_wiring, error_recoveryzero matches found
  • src/cleveragents/application/services/service_retry_wiring.pyServiceRetryWiring is only instantiated in docstring examples (lines 18, 595) and in tests — never in production code
  • src/cleveragents/application/services/plan_executor.py:299PlanExecutor.__init__ accepts error_recovery_service: ErrorRecoveryService | None = None as an optional parameter, but the DI container never provides it, so it defaults to None at runtime
  • When error_recovery_service is None, the retry loop in _run_execute_with_stub (lines 740–811) runs with max_attempts=1 (no retries) and skips all error recording

Expected Behavior

Per the spec and the implementation's own docstrings:

  1. ServiceRetryWiring should be instantiated from Settings in the DI container and injected into services that need retry protection
  2. ErrorRecoveryService should be instantiated and injected into PlanExecutor so that:
    • Errors during plan execution are recorded with structured metadata
    • Retry decisions are made based on error category and retry policy
    • Recovery hints are generated and persisted to the plan

Actual Behavior

  • ServiceRetryWiring is never instantiated in production — all services run without retry protection from this wiring layer (though some have @database_retry decorators directly)
  • ErrorRecoveryService is never injected into PlanExecutor — the executor always runs with error_recovery_service=None, meaning:
    • No structured error recording
    • No retry logic in the execute phase (always 1 attempt)
    • No recovery hints generated
    • agents plan errors <PLAN_ID> shows no structured error recovery data

Code Locations

  • src/cleveragents/application/container.py — No reference to ServiceRetryWiring or ErrorRecoveryService
  • src/cleveragents/application/services/plan_executor.py:299,339error_recovery_service parameter defaults to None
  • src/cleveragents/application/services/plan_executor.py:740-744max_attempts is 1 when error_recovery is None
  • src/cleveragents/application/services/service_retry_wiring.py — Fully implemented but never used in production

Steps to Reproduce

  1. Run any plan that fails during execute phase
  2. Check agents plan errors <PLAN_ID> — no structured error recovery data (no error_category, retry_count, recovery_hints)
  3. Observe that failed plans are never retried automatically

Subtasks

  • Add a TDD issue-capture Behave scenario proving ErrorRecoveryService is None in PlanExecutor at runtime
  • Add a TDD issue-capture Behave scenario proving ServiceRetryWiring is never instantiated in the DI container
  • Instantiate ServiceRetryWiring from Settings in container.py and register it as a singleton provider
  • Instantiate ErrorRecoveryService in container.py and register it as a singleton provider
  • Inject ErrorRecoveryService into PlanExecutor via the DI container (remove None default for production wiring)
  • Verify max_attempts in _run_execute_with_stub is driven by the retry policy when error_recovery_service is present
  • Update integration tests to assert structured error recovery data appears in agents plan errors <PLAN_ID> after a failed execute phase
  • Ensure all nox stages pass with coverage >= 97%

Definition of Done

  • ServiceRetryWiring is instantiated from Settings and registered in container.py
  • ErrorRecoveryService is instantiated and registered in container.py
  • PlanExecutor receives a non-None ErrorRecoveryService from the DI container at runtime
  • Failed plans are retried according to the configured retry policy (not always 1 attempt)
  • agents plan errors <PLAN_ID> returns structured error recovery data (error_category, retry_count, recovery_hints) after a failed execute phase
  • TDD issue-capture tests are green (bug proven before fix, passing after fix)
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.5.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/wire-service-retry-and-error-recovery-into-di-container` - **Commit Message**: `fix(container): wire ServiceRetryWiring and ErrorRecoveryService into DI container` - **Milestone**: None (backlog) - **Parent Epic**: #368 ## What Was Tested - `src/cleveragents/application/container.py` — Searched entire file (968 lines) for `ServiceRetryWiring`, `ErrorRecoveryService`, `service_retry_wiring`, `error_recovery` — **zero matches found** - `src/cleveragents/application/services/service_retry_wiring.py` — `ServiceRetryWiring` is only instantiated in docstring examples (lines 18, 595) and in tests — never in production code - `src/cleveragents/application/services/plan_executor.py:299` — `PlanExecutor.__init__` accepts `error_recovery_service: ErrorRecoveryService | None = None` as an optional parameter, but the DI container never provides it, so it defaults to `None` at runtime - When `error_recovery_service` is `None`, the retry loop in `_run_execute_with_stub` (lines 740–811) runs with `max_attempts=1` (no retries) and skips all error recording ## Expected Behavior Per the spec and the implementation's own docstrings: 1. `ServiceRetryWiring` should be instantiated from `Settings` in the DI container and injected into services that need retry protection 2. `ErrorRecoveryService` should be instantiated and injected into `PlanExecutor` so that: - Errors during plan execution are recorded with structured metadata - Retry decisions are made based on error category and retry policy - Recovery hints are generated and persisted to the plan ## Actual Behavior - `ServiceRetryWiring` is never instantiated in production — all services run without retry protection from this wiring layer (though some have `@database_retry` decorators directly) - `ErrorRecoveryService` is never injected into `PlanExecutor` — the executor always runs with `error_recovery_service=None`, meaning: - No structured error recording - No retry logic in the execute phase (always 1 attempt) - No recovery hints generated - `agents plan errors <PLAN_ID>` shows no structured error recovery data ## Code Locations - `src/cleveragents/application/container.py` — No reference to `ServiceRetryWiring` or `ErrorRecoveryService` - `src/cleveragents/application/services/plan_executor.py:299,339` — `error_recovery_service` parameter defaults to `None` - `src/cleveragents/application/services/plan_executor.py:740-744` — `max_attempts` is 1 when `error_recovery` is `None` - `src/cleveragents/application/services/service_retry_wiring.py` — Fully implemented but never used in production ## Steps to Reproduce 1. Run any plan that fails during execute phase 2. Check `agents plan errors <PLAN_ID>` — no structured error recovery data (no `error_category`, `retry_count`, `recovery_hints`) 3. Observe that failed plans are never retried automatically ## Subtasks - [ ] Add a TDD issue-capture Behave scenario proving `ErrorRecoveryService` is `None` in `PlanExecutor` at runtime - [ ] Add a TDD issue-capture Behave scenario proving `ServiceRetryWiring` is never instantiated in the DI container - [ ] Instantiate `ServiceRetryWiring` from `Settings` in `container.py` and register it as a singleton provider - [ ] Instantiate `ErrorRecoveryService` in `container.py` and register it as a singleton provider - [ ] Inject `ErrorRecoveryService` into `PlanExecutor` via the DI container (remove `None` default for production wiring) - [ ] Verify `max_attempts` in `_run_execute_with_stub` is driven by the retry policy when `error_recovery_service` is present - [ ] Update integration tests to assert structured error recovery data appears in `agents plan errors <PLAN_ID>` after a failed execute phase - [ ] Ensure all nox stages pass with coverage >= 97% ## Definition of Done - [ ] `ServiceRetryWiring` is instantiated from `Settings` and registered in `container.py` - [ ] `ErrorRecoveryService` is instantiated and registered in `container.py` - [ ] `PlanExecutor` receives a non-`None` `ErrorRecoveryService` from the DI container at runtime - [ ] Failed plans are retried according to the configured retry policy (not always 1 attempt) - [ ] `agents plan errors <PLAN_ID>` returns structured error recovery data (`error_category`, `retry_count`, `recovery_hints`) after a failed execute phase - [ ] TDD issue-capture tests are green (bug proven before fix, passing after fix) - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.5.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:49 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#4027
No description provided.