UAT: LockService.count_stale_locks() and is_locked() lack exception handling and session cleanup #3997

Open
opened 2026-04-06 08:25:02 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/lock-service-session-cleanup
  • Commit Message: fix(concurrency): add exception handling to count_stale_locks and is_locked
  • Milestone: (backlog — see note below)
  • Parent Epic: #362

Bug Description

LockService.count_stale_locks() and LockService.is_locked() do not have try/except blocks for exception handling or session cleanup. This is inconsistent with all other LockService methods (acquire, release, renew, release_all_for_owner, cleanup_expired) which all have proper try/except/rollback patterns.

Expected Behavior

All LockService methods that open a database session should handle exceptions consistently:

  1. Wrap the database operation in a try/except block
  2. Call session.rollback() on unexpected exceptions
  3. Re-raise the exception after rollback

This is the pattern used by all other methods in the same class.

Actual Behavior

count_stale_locks() (lines 398-417 in lock_service.py):

def count_stale_locks(self) -> int:
    now_iso = datetime.now(tz=UTC).isoformat()
    session = self._session_factory()
    stmt = (
        select(func.count())
        .select_from(LockModel)
        .where(LockModel.expires_at < now_iso)
    )
    result: int = session.execute(stmt).scalar_one()
    return result

is_locked() (lines 419-448 in lock_service.py):

def is_locked(self, resource_type: str, resource_id: str) -> bool:
    ...
    session = self._session_factory()
    stmt = (...)
    result: int = session.execute(stmt).scalar_one()
    return result > 0

Neither method has a try/except block. If session.execute() raises an exception (e.g., database connection error, table not found), the exception propagates without rolling back the session, potentially leaving the session in a dirty state.

Impact

  • Database connection pool exhaustion if sessions are not properly cleaned up on error
  • Inconsistent error handling behavior compared to other methods in the same class
  • The agents diagnostics health check (which calls count_stale_locks()) may leave sessions open on database errors

Code Location

src/cleveragents/application/services/lock_service.py:

  • count_stale_locks(): lines 398-417
  • is_locked(): lines 419-448

Compare with the correct pattern in cleanup_expired() (lines 377-396) which has proper try/except/rollback.

Fix

Both methods should be wrapped in try/except blocks:

def count_stale_locks(self) -> int:
    now_iso = datetime.now(tz=UTC).isoformat()
    session = self._session_factory()
    try:
        stmt = (
            select(func.count())
            .select_from(LockModel)
            .where(LockModel.expires_at < now_iso)
        )
        result: int = session.execute(stmt).scalar_one()
        return result
    except Exception:
        session.rollback()
        raise

Subtasks

  • Add try/except/rollback to count_stale_locks()
  • Add try/except/rollback to is_locked()
  • Add BDD scenarios to features/lock_service_coverage.feature for exception rollback paths in these methods
  • Verify coverage >= 97%
  • Run nox and fix any errors

Definition of Done

  • All subtasks completed
  • Both methods have consistent exception handling
  • BDD tests cover the exception rollback paths
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.3.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/lock-service-session-cleanup` - **Commit Message**: `fix(concurrency): add exception handling to count_stale_locks and is_locked` - **Milestone**: (backlog — see note below) - **Parent Epic**: #362 ## Bug Description `LockService.count_stale_locks()` and `LockService.is_locked()` do not have `try/except` blocks for exception handling or session cleanup. This is inconsistent with all other `LockService` methods (`acquire`, `release`, `renew`, `release_all_for_owner`, `cleanup_expired`) which all have proper `try/except/rollback` patterns. ## Expected Behavior All `LockService` methods that open a database session should handle exceptions consistently: 1. Wrap the database operation in a `try/except` block 2. Call `session.rollback()` on unexpected exceptions 3. Re-raise the exception after rollback This is the pattern used by all other methods in the same class. ## Actual Behavior `count_stale_locks()` (lines 398-417 in `lock_service.py`): ```python def count_stale_locks(self) -> int: now_iso = datetime.now(tz=UTC).isoformat() session = self._session_factory() stmt = ( select(func.count()) .select_from(LockModel) .where(LockModel.expires_at < now_iso) ) result: int = session.execute(stmt).scalar_one() return result ``` `is_locked()` (lines 419-448 in `lock_service.py`): ```python def is_locked(self, resource_type: str, resource_id: str) -> bool: ... session = self._session_factory() stmt = (...) result: int = session.execute(stmt).scalar_one() return result > 0 ``` Neither method has a `try/except` block. If `session.execute()` raises an exception (e.g., database connection error, table not found), the exception propagates without rolling back the session, potentially leaving the session in a dirty state. ## Impact - Database connection pool exhaustion if sessions are not properly cleaned up on error - Inconsistent error handling behavior compared to other methods in the same class - The `agents diagnostics` health check (which calls `count_stale_locks()`) may leave sessions open on database errors ## Code Location `src/cleveragents/application/services/lock_service.py`: - `count_stale_locks()`: lines 398-417 - `is_locked()`: lines 419-448 Compare with the correct pattern in `cleanup_expired()` (lines 377-396) which has proper `try/except/rollback`. ## Fix Both methods should be wrapped in `try/except` blocks: ```python def count_stale_locks(self) -> int: now_iso = datetime.now(tz=UTC).isoformat() session = self._session_factory() try: stmt = ( select(func.count()) .select_from(LockModel) .where(LockModel.expires_at < now_iso) ) result: int = session.execute(stmt).scalar_one() return result except Exception: session.rollback() raise ``` ## Subtasks - [ ] Add `try/except/rollback` to `count_stale_locks()` - [ ] Add `try/except/rollback` to `is_locked()` - [ ] Add BDD scenarios to `features/lock_service_coverage.feature` for exception rollback paths in these methods - [ ] Verify coverage >= 97% - [ ] Run `nox` and fix any errors ## Definition of Done - [ ] All subtasks completed - [ ] Both methods have consistent exception handling - [ ] BDD tests cover the exception rollback paths - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.3.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:12:13 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3997
No description provided.