BUG-HUNT: [data-migration] Non-atomic data cleanup operations in constraint migration risk data loss #7295

Open
opened 2026-04-10 15:22:38 +00:00 by HAL9000 · 4 comments
Owner

Background

The migration alembic/versions/a5_006_action_invariants_unique_constraint.py performs data cleanup operations using raw SQL DELETE and UPDATE statements in _deduplicate_action_invariants() and _deduplicate_action_arguments() without proper transaction isolation, pre-migration validation, or post-cleanup verification. If the migration is interrupted mid-process (e.g., due to a crash, timeout, or database error), the database can be left in an inconsistent state with partial data loss — some rows deleted but the unique constraint not yet applied.

Current Behavior

The deduplication functions execute raw SQL operations sequentially without:

  • Pre-migration validation to confirm data state before destructive operations
  • Post-cleanup verification to confirm the expected row count or data integrity
  • Explicit transaction boundaries wrapping the full cleanup sequence
  • Rollback mechanisms if any individual statement fails partway through

For example, _deduplicate_action_invariants() deletes rows based on MIN(id) selection with no verification that the deletion succeeded or that the resulting data satisfies the upcoming unique constraint:

def _deduplicate_action_invariants() -> None:
    op.execute(
        sa.text(
            """
            DELETE FROM action_invariants
            WHERE id NOT IN (
                SELECT MIN(id)
                FROM action_invariants
                GROUP BY action_name, position
            )
            """
        )
    )

Similarly, _deduplicate_action_arguments() executes three separate SQL statements (one DELETE and two UPDATEs) with no atomicity guarantee across the sequence.

Expected Behavior

Per the specification's data integrity and fail-fast principles, migrations that perform destructive data operations must:

  • Wrap all data cleanup steps in explicit transaction boundaries so a failure in any step rolls back the entire cleanup sequence
  • Validate pre-conditions before executing destructive operations (e.g., confirm no unexpected schema state)
  • Verify post-conditions after cleanup (e.g., assert no duplicates remain before attempting constraint creation)
  • Raise an explicit error with a clear message if data cannot be safely deduplicated, rather than silently leaving the database in a partial state

Acceptance Criteria

  • _deduplicate_action_invariants() and _deduplicate_action_arguments() are wrapped in explicit transaction boundaries
  • Pre-migration validation checks are added before destructive operations
  • Post-cleanup verification confirms data integrity before constraint creation proceeds
  • If deduplication cannot complete safely, the migration raises a descriptive error and rolls back cleanly
  • All nox stages pass with coverage ≥ 97%

Supporting Information

  • File: alembic/versions/a5_006_action_invariants_unique_constraint.py
  • Functions: _deduplicate_action_invariants() (lines 27–37), _deduplicate_action_arguments() (lines 44–77)
  • Related issue: #6647 (duplicate constraint creation in same migration file)
  • TDD Note: After this issue is verified, a corresponding Type/Testing issue will be created. The test will use tags @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.

Metadata

  • Branch: fix/data-migration-non-atomic-cleanup-data-loss
  • Commit Message: fix(alembic): wrap deduplication operations in explicit transactions with pre/post validation
  • Milestone: (none — backlog)
  • Parent Epic: (none identified — flagged as orphan below)

Subtasks

  • Add explicit transaction boundaries around _deduplicate_action_invariants() cleanup sequence
  • Add explicit transaction boundaries around _deduplicate_action_arguments() cleanup sequence
  • Implement pre-migration validation to check data state before destructive operations
  • Implement post-cleanup verification to assert no duplicates remain before constraint creation
  • Add descriptive error raising with rollback if deduplication cannot complete safely
  • Write BDD scenario(s) demonstrating the bug (tagged @tdd_issue, @tdd_expected_fail)
  • Fix the implementation so all BDD scenarios pass
  • Confirm all nox stages pass with coverage ≥ 97%

Definition of Done

  • _deduplicate_action_invariants() and _deduplicate_action_arguments() execute atomically
  • Pre-condition and post-condition checks are present and tested
  • Migration raises a clear error and rolls back if data cannot be safely cleaned
  • BDD tests tagged @tdd_issue prove the bug and then pass after the fix
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Background The migration `alembic/versions/a5_006_action_invariants_unique_constraint.py` performs data cleanup operations using raw SQL `DELETE` and `UPDATE` statements in `_deduplicate_action_invariants()` and `_deduplicate_action_arguments()` without proper transaction isolation, pre-migration validation, or post-cleanup verification. If the migration is interrupted mid-process (e.g., due to a crash, timeout, or database error), the database can be left in an inconsistent state with partial data loss — some rows deleted but the unique constraint not yet applied. ## Current Behavior The deduplication functions execute raw SQL operations sequentially without: - Pre-migration validation to confirm data state before destructive operations - Post-cleanup verification to confirm the expected row count or data integrity - Explicit transaction boundaries wrapping the full cleanup sequence - Rollback mechanisms if any individual statement fails partway through For example, `_deduplicate_action_invariants()` deletes rows based on `MIN(id)` selection with no verification that the deletion succeeded or that the resulting data satisfies the upcoming unique constraint: ```python def _deduplicate_action_invariants() -> None: op.execute( sa.text( """ DELETE FROM action_invariants WHERE id NOT IN ( SELECT MIN(id) FROM action_invariants GROUP BY action_name, position ) """ ) ) ``` Similarly, `_deduplicate_action_arguments()` executes three separate SQL statements (one `DELETE` and two `UPDATE`s) with no atomicity guarantee across the sequence. ## Expected Behavior Per the specification's data integrity and fail-fast principles, migrations that perform destructive data operations must: - Wrap all data cleanup steps in explicit transaction boundaries so a failure in any step rolls back the entire cleanup sequence - Validate pre-conditions before executing destructive operations (e.g., confirm no unexpected schema state) - Verify post-conditions after cleanup (e.g., assert no duplicates remain before attempting constraint creation) - Raise an explicit error with a clear message if data cannot be safely deduplicated, rather than silently leaving the database in a partial state ## Acceptance Criteria - `_deduplicate_action_invariants()` and `_deduplicate_action_arguments()` are wrapped in explicit transaction boundaries - Pre-migration validation checks are added before destructive operations - Post-cleanup verification confirms data integrity before constraint creation proceeds - If deduplication cannot complete safely, the migration raises a descriptive error and rolls back cleanly - All nox stages pass with coverage ≥ 97% ## Supporting Information - **File**: `alembic/versions/a5_006_action_invariants_unique_constraint.py` - **Functions**: `_deduplicate_action_invariants()` (lines 27–37), `_deduplicate_action_arguments()` (lines 44–77) - **Related issue**: #6647 (duplicate constraint creation in same migration file) - **TDD Note**: After this issue is verified, a corresponding `Type/Testing` issue will be created. The test will use tags `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. ## Metadata - **Branch**: `fix/data-migration-non-atomic-cleanup-data-loss` - **Commit Message**: `fix(alembic): wrap deduplication operations in explicit transactions with pre/post validation` - **Milestone**: *(none — backlog)* - **Parent Epic**: *(none identified — flagged as orphan below)* ## Subtasks - [ ] Add explicit transaction boundaries around `_deduplicate_action_invariants()` cleanup sequence - [ ] Add explicit transaction boundaries around `_deduplicate_action_arguments()` cleanup sequence - [ ] Implement pre-migration validation to check data state before destructive operations - [ ] Implement post-cleanup verification to assert no duplicates remain before constraint creation - [ ] Add descriptive error raising with rollback if deduplication cannot complete safely - [ ] Write BDD scenario(s) demonstrating the bug (tagged `@tdd_issue`, `@tdd_expected_fail`) - [ ] Fix the implementation so all BDD scenarios pass - [ ] Confirm all nox stages pass with coverage ≥ 97% ## Definition of Done - [ ] `_deduplicate_action_invariants()` and `_deduplicate_action_arguments()` execute atomically - [ ] Pre-condition and post-condition checks are present and tested - [ ] Migration raises a clear error and rolls back if data cannot be safely cleaned - [ ] BDD tests tagged `@tdd_issue` prove the bug and then pass after the fix - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

⚠️ Orphan Issue — Needs Manual Parent Epic Linking

This issue was created by the automated new-issue-creator agent. No parent Epic covering Alembic migration safety or data integrity was found in the open issues at creation time.

This issue must be manually linked to an appropriate parent Epic using Forgejo's dependency system (this issue blocks the parent Epic). If no suitable Epic exists, one should be created first.

Suggested parent scope: An Epic covering "Alembic migration safety, atomicity, and data integrity hardening" would be appropriate.

Labels note: The required labels (State/Unverified, Type/Bug, Priority/Backlog) could not be applied automatically due to a security policy restriction on the label API endpoint in this environment. Please apply them manually via the Forgejo UI.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

⚠️ **Orphan Issue — Needs Manual Parent Epic Linking** This issue was created by the automated new-issue-creator agent. No parent Epic covering Alembic migration safety or data integrity was found in the open issues at creation time. This issue must be manually linked to an appropriate parent Epic using Forgejo's dependency system (this issue **blocks** the parent Epic). If no suitable Epic exists, one should be created first. **Suggested parent scope**: An Epic covering "Alembic migration safety, atomicity, and data integrity hardening" would be appropriate. **Labels note**: The required labels (`State/Unverified`, `Type/Bug`, `Priority/Backlog`) could not be applied automatically due to a security policy restriction on the label API endpoint in this environment. Please apply them manually via the Forgejo UI. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

Verified — Data migration bug: non-atomic cleanup operations risk data loss. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Data migration bug: non-atomic cleanup operations risk data loss. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Data migration bug: non-atomic cleanup operations risk data loss. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Data migration bug: non-atomic cleanup operations risk data loss. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Data migration bug: non-atomic cleanup operations risk data loss. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Data migration bug: non-atomic cleanup operations risk data loss. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7295
No description provided.