Concurrency Bug: Unsafe concurrent access to _checkpoints in FsDirectoryHandler #8113

Open
opened 2026-04-13 03:34:57 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Commit message: fix(fs_directory): add threading.Lock to protect _checkpoints from concurrent access
  • Branch name: fix/fs-directory-handler-checkpoints-thread-safety
  • Module: src/cleveragents/resource/handlers/fs_directory.py
  • Class: FsDirectoryHandler
  • Field: self._checkpoints
  • Lines: 272, 306, 337

Background and Context

The FsDirectoryHandler uses an instance-level dictionary, self._checkpoints, to track the filesystem paths of created snapshots. The resolve_handler function in resolver.py caches handler instances, meaning the same FsDirectoryHandler instance can be used concurrently by multiple threads.

The _checkpoints dictionary is accessed in create_checkpoint and rollback_to without any synchronization. This can lead to race conditions if multiple threads create or roll back checkpoints simultaneously on the same resource, potentially leading to lost checkpoints or incorrect state.

Expected Behavior

All access to the shared self._checkpoints dictionary should be protected by a thread lock to ensure thread safety. Concurrent calls to create_checkpoint and rollback_to must not cause race conditions or data corruption.

Acceptance Criteria

  • A threading.Lock is added to the FsDirectoryHandler.
  • The lock is acquired before any read or write access to self._checkpoints.
  • The lock is released after the access is complete (using a with statement).
  • Concurrent calls to create_checkpoint and rollback_to do not cause race conditions.
  • All existing tests continue to pass.

Subtasks

  • 1. Add a threading.Lock to FsDirectoryHandler.__init__.
  • 2. Wrap all accesses to self._checkpoints in create_checkpoint, rollback_to, and discard_checkpoints with the lock.
  • 3. Add a unit test to verify thread safety under concurrent access.

Definition of Done

  • All subtasks are complete.
  • The fix is reviewed and merged into the master branch.
  • Test coverage remains at or above the project threshold (≥ 97%).

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit message:** `fix(fs_directory): add threading.Lock to protect _checkpoints from concurrent access` - **Branch name:** `fix/fs-directory-handler-checkpoints-thread-safety` - **Module:** `src/cleveragents/resource/handlers/fs_directory.py` - **Class:** `FsDirectoryHandler` - **Field:** `self._checkpoints` - **Lines:** 272, 306, 337 ## Background and Context The `FsDirectoryHandler` uses an instance-level dictionary, `self._checkpoints`, to track the filesystem paths of created snapshots. The `resolve_handler` function in `resolver.py` caches handler instances, meaning the same `FsDirectoryHandler` instance can be used concurrently by multiple threads. The `_checkpoints` dictionary is accessed in `create_checkpoint` and `rollback_to` without any synchronization. This can lead to race conditions if multiple threads create or roll back checkpoints simultaneously on the same resource, potentially leading to lost checkpoints or incorrect state. ## Expected Behavior All access to the shared `self._checkpoints` dictionary should be protected by a thread lock to ensure thread safety. Concurrent calls to `create_checkpoint` and `rollback_to` must not cause race conditions or data corruption. ## Acceptance Criteria - [ ] A `threading.Lock` is added to the `FsDirectoryHandler`. - [ ] The lock is acquired before any read or write access to `self._checkpoints`. - [ ] The lock is released after the access is complete (using a `with` statement). - [ ] Concurrent calls to `create_checkpoint` and `rollback_to` do not cause race conditions. - [ ] All existing tests continue to pass. ## Subtasks - [ ] 1. Add a `threading.Lock` to `FsDirectoryHandler.__init__`. - [ ] 2. Wrap all accesses to `self._checkpoints` in `create_checkpoint`, `rollback_to`, and `discard_checkpoints` with the lock. - [ ] 3. Add a unit test to verify thread safety under concurrent access. ## Definition of Done - All subtasks are complete. - The fix is reviewed and merged into the master branch. - Test coverage remains at or above the project threshold (≥ 97%). --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 added this to the v3.4.0 milestone 2026-04-13 03:35:02 +00:00
Author
Owner

Verified — Unsafe concurrent access to checkpoints can cause data corruption and checkpoint loss. Given that checkpoints are a safety mechanism, this is Must Have fix for v3.4.0. Verified.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Unsafe concurrent access to checkpoints can cause data corruption and checkpoint loss. Given that checkpoints are a safety mechanism, this is **Must Have** fix for v3.4.0. Verified. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8113
No description provided.