[Bug Hunt][Cycle 2][Resource] SQLite Connection Leak in Checkpoint Management #7082

Open
opened 2026-04-10 07:32:35 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: bugfix/m3-sqlite-connection-leak-checkpoint
  • Commit Message: fix(resource): close leaked SQLite connections in checkpoint management
  • Milestone: v3.2.0
  • Parent Epic: #7023

Bug Report: Resource Management — SQLite Connection Leak in Checkpoint Management

Severity Assessment

  • Impact: Memory and file descriptor exhaustion in long-running processes
  • Likelihood: High in workflows using SQLite checkpoints without rollbacks
  • Priority: Critical

Location

  • File: src/cleveragents/resource/handlers/database.py
  • Class: DatabaseResourceHandler
  • Lines: 710-745 (create_checkpoint_sqlite), 820-870 (rollback_sqlite)

Description

The DatabaseResourceHandler stores open SQLite connections in the _sqlite_checkpoints instance dictionary but provides no cleanup mechanism for connections that are never rolled back. Each call to create_checkpoint() on an SQLite resource opens a new connection and stores it indefinitely.

Evidence

def _create_checkpoint_sqlite(self, resource: Resource, plan_id: str, checkpoint_id: str) -> CheckpointResult:
    # ...
    conn = _open_sqlite(location)
    conn.execute(f"SAVEPOINT {savepoint_name}")
    # Store the open connection so rollback_to can use it
    self._sqlite_checkpoints[checkpoint_id] = (conn, savepoint_name)  # Connection never cleaned up if rollback not called

The _sqlite_checkpoints dictionary grows without bounds and connections are only removed in _rollback_sqlite(). There is no:

  • Cleanup in handler __del__ method
  • Timeout-based connection cleanup
  • Connection pool size limits
  • Cleanup when plans complete successfully without rollback

Expected Behavior

SQLite connections should be properly managed with:

  • Automatic cleanup when handler is destroyed
  • Timeout-based cleanup for long-running checkpoints
  • Connection limits to prevent resource exhaustion
  • Cleanup hooks when plans complete without rollback

Actual Behavior

Connections accumulate in memory indefinitely, eventually exhausting:

  • Available SQLite connections
  • File descriptors
  • Process memory
  • Database locks

Suggested Fix

  1. Add __del__ method to close all open checkpoint connections
  2. Implement timeout-based cleanup for stale checkpoints
  3. Add connection pool limits and LRU eviction
  4. Hook into plan completion events to cleanup unused checkpoints
  5. Add proper exception handling around connection operations

Category

resource

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.

Subtasks

  • Reproduce the connection leak with a targeted test scenario
  • Add __del__ method to DatabaseResourceHandler to close all open _sqlite_checkpoints connections
  • Implement timeout-based cleanup for stale checkpoint connections
  • Add connection pool size limits with LRU eviction policy
  • Hook into plan completion events to clean up unused checkpoints
  • Add proper exception handling around all SQLite connection operations
  • Tests (Behave): Add scenarios for SQLite checkpoint connection lifecycle
  • Tests (Robot): Add integration test for long-running checkpoint resource cleanup
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(resource): close leaked SQLite connections in checkpoint management), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (bugfix/m3-sqlite-connection-leak-checkpoint).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunt Automation | Agent: new-issue-creator

## Metadata - **Branch**: `bugfix/m3-sqlite-connection-leak-checkpoint` - **Commit Message**: `fix(resource): close leaked SQLite connections in checkpoint management` - **Milestone**: v3.2.0 - **Parent Epic**: #7023 ## Bug Report: Resource Management — SQLite Connection Leak in Checkpoint Management ### Severity Assessment - **Impact**: Memory and file descriptor exhaustion in long-running processes - **Likelihood**: High in workflows using SQLite checkpoints without rollbacks - **Priority**: Critical ### Location - **File**: `src/cleveragents/resource/handlers/database.py` - **Class**: `DatabaseResourceHandler` - **Lines**: 710-745 (create_checkpoint_sqlite), 820-870 (rollback_sqlite) ### Description The DatabaseResourceHandler stores open SQLite connections in the `_sqlite_checkpoints` instance dictionary but provides no cleanup mechanism for connections that are never rolled back. Each call to `create_checkpoint()` on an SQLite resource opens a new connection and stores it indefinitely. ### Evidence ```python def _create_checkpoint_sqlite(self, resource: Resource, plan_id: str, checkpoint_id: str) -> CheckpointResult: # ... conn = _open_sqlite(location) conn.execute(f"SAVEPOINT {savepoint_name}") # Store the open connection so rollback_to can use it self._sqlite_checkpoints[checkpoint_id] = (conn, savepoint_name) # Connection never cleaned up if rollback not called ``` The `_sqlite_checkpoints` dictionary grows without bounds and connections are only removed in `_rollback_sqlite()`. There is no: - Cleanup in handler `__del__` method - Timeout-based connection cleanup - Connection pool size limits - Cleanup when plans complete successfully without rollback ### Expected Behavior SQLite connections should be properly managed with: - Automatic cleanup when handler is destroyed - Timeout-based cleanup for long-running checkpoints - Connection limits to prevent resource exhaustion - Cleanup hooks when plans complete without rollback ### Actual Behavior Connections accumulate in memory indefinitely, eventually exhausting: - Available SQLite connections - File descriptors - Process memory - Database locks ### Suggested Fix 1. Add `__del__` method to close all open checkpoint connections 2. Implement timeout-based cleanup for stale checkpoints 3. Add connection pool limits and LRU eviction 4. Hook into plan completion events to cleanup unused checkpoints 5. Add proper exception handling around connection operations ### Category resource ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. ## Subtasks - [ ] Reproduce the connection leak with a targeted test scenario - [ ] Add `__del__` method to `DatabaseResourceHandler` to close all open `_sqlite_checkpoints` connections - [ ] Implement timeout-based cleanup for stale checkpoint connections - [ ] Add connection pool size limits with LRU eviction policy - [ ] Hook into plan completion events to clean up unused checkpoints - [ ] Add proper exception handling around all SQLite connection operations - [ ] Tests (Behave): Add scenarios for SQLite checkpoint connection lifecycle - [ ] Tests (Robot): Add integration test for long-running checkpoint resource cleanup - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(resource): close leaked SQLite connections in checkpoint management`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`bugfix/m3-sqlite-connection-leak-checkpoint`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass - Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Automation | Agent: new-issue-creator
HAL9000 added this to the v3.2.0 milestone 2026-04-10 07:32:43 +00:00
Author
Owner

Verified — Critical resource bug: SQLite connection leak in checkpoint management. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical resource bug: SQLite connection leak in checkpoint management. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7082
No description provided.