BUG-HUNT: [migration] a5_005 downgrade crashes with CHECK constraint violation when plans have phase='action' or processing_state='applied'/'constrained' #6648

Open
opened 2026-04-09 22:45:11 +00:00 by HAL9000 · 0 comments
Owner

Bug Report: [migration] a5_005_rebaseline_plan_phases Downgrade Fails with Existing Production Data

Severity Assessment

  • Impact: Running alembic downgrade through a5_005 on a database containing any v3_plans row with phase='action' or processing_state IN ('applied', 'constrained') will fail mid-migration with a CHECK constraint violation, leaving the database in a partially migrated state with orphaned _v3_plans_new temp table and deleted child table backups. Data loss or corruption is possible if the failed migration is not rolled back cleanly.
  • Likelihood: High — the action phase is the default phase for new plans after a5_005 upgrades. Any system that has created at least one plan will have phase='action' rows. Similarly, applied and constrained are common terminal states.
  • Priority: High

Location

  • File: alembic/versions/a5_005_rebaseline_plan_phases.py
  • Function: _rebuild_v3_plans() (called by both upgrade() and downgrade()), specifically the data-copy step at lines ~203–207

Description

a5_005 rebuilds v3_plans to update CHECK constraints for phase and processing state. The upgrade adds 'action' to the allowed phases and 'applied'/'constrained' to the allowed states. The downgrade uses the same _rebuild_v3_plans() function with the old (narrower) constraints.

The downgrade data-copy is a straight INSERT INTO ... SELECT ... FROM v3_plans:

conn.execute(
    sa.text(
        f"INSERT INTO _v3_plans_new ({_ALL_DATA_COLUMNS}) "
        f"SELECT {_ALL_DATA_COLUMNS} FROM v3_plans"
    )
)

The _v3_plans_new temp table is created with the old CHECK constraints:

_OLD_PHASE_CK = "phase IN ('strategize', 'execute', 'apply', 'applied')"
_OLD_STATE_CK = (
    "processing_state IN ('queued', 'processing', 'errored', 'complete', 'cancelled')"
)

After upgrade, the v3_plans table allows:

  • phase IN ('action', 'strategize', 'execute', 'apply') — note 'action' is new, 'applied' removed
  • processing_state IN ('queued', 'processing', 'errored', 'complete', 'cancelled', 'applied', 'constrained')'applied' and 'constrained' are new

Therefore, any row with phase='action' will fail the OLD phase CHECK constraint during downgrade, and any row with processing_state='applied' or processing_state='constrained' will fail the OLD state CHECK constraint.

Since 'action' is the new default phase (set via _NEW_PHASE_DEFAULT = "action"), virtually every plan created after a5_005 runs will have phase='action', making this downgrade failure near-certain in production.

The failure occurs after child table data has been moved to backup tables and the original child tables (plan_projects, plan_arguments, plan_invariants) have been dropped — so a failed INSERT at this point leaves the database with:

  1. A stale _v3_plans_new temp table (partially populated or empty)
  2. The original v3_plans table dropped (step 4 of the rebuild)
  3. The original child tables dropped (step 3)
  4. Backup tables (_plan_projects_bak, _plan_arguments_bak, _plan_invariants_bak) still present

Recovery requires manual intervention.

Expected Behavior

The downgrade should either:

  1. Sanitize out-of-range values before the INSERT (e.g., coerce phase='action''strategize'), or
  2. Explicitly refuse to downgrade if any incompatible rows exist, with a clear error message before any destructive operations begin

Actual Behavior

sqlalchemy.exc.IntegrityError: CHECK constraint failed: ck_v3_plans_phase

(or ck_v3_plans_state for applied/constrained states), raised during the INSERT INTO _v3_plans_new step — after child tables have already been dropped.

Suggested Fix

Add a pre-downgrade compatibility check or value remapping before the data copy:

# Option A: Abort early with a clear message
def downgrade() -> None:
    conn = op.get_bind()
    bad_rows = conn.execute(sa.text(
        "SELECT COUNT(*) FROM v3_plans "
        "WHERE phase = 'action' OR processing_state IN ('applied', 'constrained')"
    )).scalar()
    if bad_rows:
        raise RuntimeError(
            f"Cannot downgrade a5_005: {bad_rows} row(s) in v3_plans have "
            "phase='action' or processing_state IN ('applied', 'constrained') "
            "which are not valid in the pre-a5_005 schema. "
            "Manually resolve these rows before downgrading."
        )
    _rebuild_v3_plans(...)

# Option B: Coerce incompatible values (data-lossy but survivable)
conn.execute(sa.text(
    "UPDATE v3_plans SET phase = 'strategize' WHERE phase = 'action'"
))
conn.execute(sa.text(
    "UPDATE v3_plans SET processing_state = 'complete' "
    "WHERE processing_state IN ('applied', 'constrained')"
))

The check should occur before any child table backup/drop operations.

Category

migration / data-loss / boundary

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [migration] `a5_005_rebaseline_plan_phases` Downgrade Fails with Existing Production Data ### Severity Assessment - **Impact**: Running `alembic downgrade` through `a5_005` on a database containing any `v3_plans` row with `phase='action'` or `processing_state IN ('applied', 'constrained')` will fail mid-migration with a CHECK constraint violation, leaving the database in a partially migrated state with orphaned `_v3_plans_new` temp table and deleted child table backups. Data loss or corruption is possible if the failed migration is not rolled back cleanly. - **Likelihood**: High — the `action` phase is the default phase for new plans after `a5_005` upgrades. Any system that has created at least one plan will have `phase='action'` rows. Similarly, `applied` and `constrained` are common terminal states. - **Priority**: High ### Location - **File**: `alembic/versions/a5_005_rebaseline_plan_phases.py` - **Function**: `_rebuild_v3_plans()` (called by both `upgrade()` and `downgrade()`), specifically the data-copy step at lines ~203–207 ### Description `a5_005` rebuilds `v3_plans` to update CHECK constraints for phase and processing state. The upgrade adds `'action'` to the allowed phases and `'applied'`/`'constrained'` to the allowed states. The downgrade uses the same `_rebuild_v3_plans()` function with the **old** (narrower) constraints. The downgrade data-copy is a straight `INSERT INTO ... SELECT ... FROM v3_plans`: ```python conn.execute( sa.text( f"INSERT INTO _v3_plans_new ({_ALL_DATA_COLUMNS}) " f"SELECT {_ALL_DATA_COLUMNS} FROM v3_plans" ) ) ``` The `_v3_plans_new` temp table is created with the **old** CHECK constraints: ```python _OLD_PHASE_CK = "phase IN ('strategize', 'execute', 'apply', 'applied')" _OLD_STATE_CK = ( "processing_state IN ('queued', 'processing', 'errored', 'complete', 'cancelled')" ) ``` After upgrade, the `v3_plans` table allows: - `phase IN ('action', 'strategize', 'execute', 'apply')` — note `'action'` is new, `'applied'` removed - `processing_state IN ('queued', 'processing', 'errored', 'complete', 'cancelled', 'applied', 'constrained')` — `'applied'` and `'constrained'` are new Therefore, **any row with `phase='action'` will fail the OLD phase CHECK constraint** during downgrade, and **any row with `processing_state='applied'` or `processing_state='constrained'` will fail the OLD state CHECK constraint**. Since `'action'` is the new default phase (set via `_NEW_PHASE_DEFAULT = "action"`), virtually every plan created after `a5_005` runs will have `phase='action'`, making this downgrade failure near-certain in production. The failure occurs **after** child table data has been moved to backup tables and the original child tables (`plan_projects`, `plan_arguments`, `plan_invariants`) have been dropped — so a failed INSERT at this point leaves the database with: 1. A stale `_v3_plans_new` temp table (partially populated or empty) 2. The original `v3_plans` table dropped (step 4 of the rebuild) 3. The original child tables dropped (step 3) 4. Backup tables (`_plan_projects_bak`, `_plan_arguments_bak`, `_plan_invariants_bak`) still present Recovery requires manual intervention. ### Expected Behavior The downgrade should either: 1. Sanitize out-of-range values before the INSERT (e.g., coerce `phase='action'` → `'strategize'`), or 2. Explicitly refuse to downgrade if any incompatible rows exist, with a clear error message before any destructive operations begin ### Actual Behavior ``` sqlalchemy.exc.IntegrityError: CHECK constraint failed: ck_v3_plans_phase ``` (or `ck_v3_plans_state` for `applied`/`constrained` states), raised during the `INSERT INTO _v3_plans_new` step — after child tables have already been dropped. ### Suggested Fix Add a pre-downgrade compatibility check or value remapping before the data copy: ```python # Option A: Abort early with a clear message def downgrade() -> None: conn = op.get_bind() bad_rows = conn.execute(sa.text( "SELECT COUNT(*) FROM v3_plans " "WHERE phase = 'action' OR processing_state IN ('applied', 'constrained')" )).scalar() if bad_rows: raise RuntimeError( f"Cannot downgrade a5_005: {bad_rows} row(s) in v3_plans have " "phase='action' or processing_state IN ('applied', 'constrained') " "which are not valid in the pre-a5_005 schema. " "Manually resolve these rows before downgrading." ) _rebuild_v3_plans(...) # Option B: Coerce incompatible values (data-lossy but survivable) conn.execute(sa.text( "UPDATE v3_plans SET phase = 'strategize' WHERE phase = 'action'" )) conn.execute(sa.text( "UPDATE v3_plans SET processing_state = 'complete' " "WHERE processing_state IN ('applied', 'constrained')" )) ``` The check should occur **before** any child table backup/drop operations. ### Category `migration` / `data-loss` / `boundary` ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-09 23:04:02 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6648
No description provided.