Bug: Total timeout in ServiceRetryWiring can be exceeded by long-running operations #8121

Open
opened 2026-04-13 03:36:46 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Module: src/cleveragents/application/services/service_retry_wiring.py
  • Lines: 420, 534
  • SHA: d5446081e3

Background and Context

The execute and async_execute methods in ServiceRetryWiring use tenacity.stop_after_delay to enforce a total timeout (MAX_RETRY_TOTAL_TIMEOUT) for all retry attempts combined. This is intended to prevent operations from hanging indefinitely.

Expected Behavior

The total timeout should be enforced more strictly, ensuring that the operation is cancelled if it exceeds the MAX_RETRY_TOTAL_TIMEOUT. This would provide more predictable behavior and prevent long-running operations from consuming resources beyond the configured limit.

For async_execute, this can be achieved by wrapping the AsyncRetrying loop with asyncio.wait_for. A similar solution for the synchronous execute method would require a different approach, possibly involving signals or a separate thread to monitor the execution time.

Acceptance Criteria

  • The async_execute method enforces a strict timeout using asyncio.wait_for.
  • The execute method's timeout mechanism is improved to be more accurate, or the documentation is updated to be more explicit about the potential for the timeout to be exceeded.
  • The behavior is covered by unit tests.

Subtasks

  • 1. Implement strict timeout enforcement for async_execute.
  • 2. Investigate and implement a stricter timeout for execute.
  • 3. Update documentation to clarify the timeout behavior.
  • 4. Add unit tests for the new timeout logic.

Definition of Done

  • The refactored code is merged into the master branch.
  • The associated unit tests are passing.

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Metadata - **Module:** `src/cleveragents/application/services/service_retry_wiring.py` - **Lines:** 420, 534 - **SHA:** d5446081e3b9f0fc7736afb21e0fd110ce7cae15 ## Background and Context The `execute` and `async_execute` methods in `ServiceRetryWiring` use `tenacity.stop_after_delay` to enforce a total timeout (`MAX_RETRY_TOTAL_TIMEOUT`) for all retry attempts combined. This is intended to prevent operations from hanging indefinitely. ## Expected Behavior The total timeout should be enforced more strictly, ensuring that the operation is cancelled if it exceeds the `MAX_RETRY_TOTAL_TIMEOUT`. This would provide more predictable behavior and prevent long-running operations from consuming resources beyond the configured limit. For `async_execute`, this can be achieved by wrapping the `AsyncRetrying` loop with `asyncio.wait_for`. A similar solution for the synchronous `execute` method would require a different approach, possibly involving signals or a separate thread to monitor the execution time. ## Acceptance Criteria - The `async_execute` method enforces a strict timeout using `asyncio.wait_for`. - The `execute` method's timeout mechanism is improved to be more accurate, or the documentation is updated to be more explicit about the potential for the timeout to be exceeded. - The behavior is covered by unit tests. ## Subtasks - [ ] 1. Implement strict timeout enforcement for `async_execute`. - [ ] 2. Investigate and implement a stricter timeout for `execute`. - [ ] 3. Update documentation to clarify the timeout behavior. - [ ] 4. Add unit tests for the new timeout logic. ## Definition of Done - The refactored code is merged into the `master` branch. - The associated unit tests are passing. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.3.0 milestone 2026-04-13 03:36:50 +00:00
Author
Owner

Verified — Timeout violations can cause unexpected behavior but are not immediately data-corrupting. Should Have fix for v3.3.0. Verified.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Timeout violations can cause unexpected behavior but are not immediately data-corrupting. **Should Have** fix for v3.3.0. Verified. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8121
No description provided.