BUG-HUNT: [database] Alembic migrations are unnecessarily complex and risky #2418

Open
opened 2026-04-03 17:41:17 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/database-alembic-migrations-risky-patterns
  • Commit Message: fix(database): refactor risky drop-and-recreate migration patterns to safe ALTER TABLE operations
  • Milestone: v3.6.0 (Post-MVP Release)
  • Parent Epic: #1020

Bug Report: [database] — Alembic migrations are unnecessarily complex and risky

Severity Assessment

  • Impact: High. The current migration strategy involves dropping and recreating tables, which can lead to data loss and service downtime during deployments. The complex backfill logic is a source of potential data corruption.
  • Likelihood: High. The complexity is already present in the codebase and will be a recurring problem for future schema changes.
  • Priority: High

Location

  • File: alembic/versions/a5_003_spec_aligned_actions.py, alembic/versions/a5_004_spec_aligned_plans.py, alembic/versions/a5_005_rebaseline_plan_phases.py, alembic/versions/m8_001_align_plans_schema.py

Description

Several Alembic migrations in the alembic/versions directory are unnecessarily complex and risky. They follow a pattern of dropping and recreating tables to alter constraints or columns, which is a dangerous practice in a production environment.

Specifically:

  • a5_003_spec_aligned_actions.py and a5_004_spec_aligned_plans.py drop and recreate tables, which could be avoided with more granular ALTER TABLE statements.
  • a5_005_rebaseline_plan_phases.py rebuilds the v3_plans table to update a CHECK constraint, which is a very heavy-handed approach.
  • m8_001_align_plans_schema.py contains complex backfill logic with a loop that could cause performance issues or fail silently, leading to data inconsistencies.

This approach indicates a lack of understanding of safe database migration practices and puts the application at risk of data loss and extended downtime during upgrades.

Evidence

(Code snippets from the mentioned migration files)

Expected Behavior

Database migrations should be written defensively to preserve data and minimize downtime. This means:

  • Using ALTER TABLE to add, remove, or modify columns and constraints whenever possible.
  • Avoiding dropping and recreating tables unless absolutely necessary (e.g., for a complete schema overhaul).
  • Writing backfill logic that is idempotent, efficient, and robust against edge cases like cycles in hierarchical data.

Actual Behavior

The migrations use a risky drop-and-recreate pattern and contain complex, potentially fragile backfill logic.

Suggested Fix

  • Refactor the identified migrations to use op.alter_column, op.create_check_constraint, op.drop_constraint, etc., to modify the schema without dropping tables.
  • Simplify the backfill logic in m8_001_align_plans_schema.py to be more robust and less prone to infinite loops. Consider using a more declarative approach if possible.

Category

database | consistency

Subtasks

  • Audit all migration files in alembic/versions/ for drop-and-recreate patterns
  • Refactor a5_003_spec_aligned_actions.py to use op.alter_column / op.add_column / op.drop_column instead of table drop-and-recreate
  • Refactor a5_004_spec_aligned_plans.py to use granular ALTER TABLE operations instead of table drop-and-recreate
  • Refactor a5_005_rebaseline_plan_phases.py to update the CHECK constraint on v3_plans without rebuilding the table
  • Refactor backfill logic in m8_001_align_plans_schema.py to be idempotent, efficient, and safe against cycles in hierarchical data
  • Verify all refactored migrations are reversible (downgrade path tested)
  • Tests (Behave): Add/update scenarios covering migration upgrade and downgrade paths
  • Tests (Robot): Add integration test verifying data integrity across migration steps
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • All identified migration files have been refactored to use safe, non-destructive ALTER TABLE operations wherever possible.
  • Backfill logic in m8_001_align_plans_schema.py is idempotent, efficient, and handles edge cases (e.g., cycles in hierarchical data) without risk of infinite loops or silent failures.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/database-alembic-migrations-risky-patterns` - **Commit Message**: `fix(database): refactor risky drop-and-recreate migration patterns to safe ALTER TABLE operations` - **Milestone**: v3.6.0 (Post-MVP Release) - **Parent Epic**: #1020 ## Bug Report: [database] — Alembic migrations are unnecessarily complex and risky ### Severity Assessment - **Impact**: High. The current migration strategy involves dropping and recreating tables, which can lead to data loss and service downtime during deployments. The complex backfill logic is a source of potential data corruption. - **Likelihood**: High. The complexity is already present in the codebase and will be a recurring problem for future schema changes. - **Priority**: High ### Location - **File**: `alembic/versions/a5_003_spec_aligned_actions.py`, `alembic/versions/a5_004_spec_aligned_plans.py`, `alembic/versions/a5_005_rebaseline_plan_phases.py`, `alembic/versions/m8_001_align_plans_schema.py` ### Description Several Alembic migrations in the `alembic/versions` directory are unnecessarily complex and risky. They follow a pattern of dropping and recreating tables to alter constraints or columns, which is a dangerous practice in a production environment. Specifically: - `a5_003_spec_aligned_actions.py` and `a5_004_spec_aligned_plans.py` drop and recreate tables, which could be avoided with more granular `ALTER TABLE` statements. - `a5_005_rebaseline_plan_phases.py` rebuilds the `v3_plans` table to update a CHECK constraint, which is a very heavy-handed approach. - `m8_001_align_plans_schema.py` contains complex backfill logic with a loop that could cause performance issues or fail silently, leading to data inconsistencies. This approach indicates a lack of understanding of safe database migration practices and puts the application at risk of data loss and extended downtime during upgrades. ### Evidence (Code snippets from the mentioned migration files) ### Expected Behavior Database migrations should be written defensively to preserve data and minimize downtime. This means: - Using `ALTER TABLE` to add, remove, or modify columns and constraints whenever possible. - Avoiding dropping and recreating tables unless absolutely necessary (e.g., for a complete schema overhaul). - Writing backfill logic that is idempotent, efficient, and robust against edge cases like cycles in hierarchical data. ### Actual Behavior The migrations use a risky drop-and-recreate pattern and contain complex, potentially fragile backfill logic. ### Suggested Fix - Refactor the identified migrations to use `op.alter_column`, `op.create_check_constraint`, `op.drop_constraint`, etc., to modify the schema without dropping tables. - Simplify the backfill logic in `m8_001_align_plans_schema.py` to be more robust and less prone to infinite loops. Consider using a more declarative approach if possible. ### Category database | consistency ## Subtasks - [ ] Audit all migration files in `alembic/versions/` for drop-and-recreate patterns - [ ] Refactor `a5_003_spec_aligned_actions.py` to use `op.alter_column` / `op.add_column` / `op.drop_column` instead of table drop-and-recreate - [ ] Refactor `a5_004_spec_aligned_plans.py` to use granular `ALTER TABLE` operations instead of table drop-and-recreate - [ ] Refactor `a5_005_rebaseline_plan_phases.py` to update the CHECK constraint on `v3_plans` without rebuilding the table - [ ] Refactor backfill logic in `m8_001_align_plans_schema.py` to be idempotent, efficient, and safe against cycles in hierarchical data - [ ] Verify all refactored migrations are reversible (downgrade path tested) - [ ] Tests (Behave): Add/update scenarios covering migration upgrade and downgrade paths - [ ] Tests (Robot): Add integration test verifying data integrity across migration steps - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - All identified migration files have been refactored to use safe, non-destructive `ALTER TABLE` operations wherever possible. - Backfill logic in `m8_001_align_plans_schema.py` is idempotent, efficient, and handles edge cases (e.g., cycles in hierarchical data) without risk of infinite loops or silent failures. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.6.0 milestone 2026-04-03 17:41:21 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: High — Risky drop-and-recreate migration patterns can cause data loss in production. This is a database safety issue.
  • Milestone: v3.6.0
  • MoSCoW: Should Have — While the migrations work, the risky patterns should be refactored to safe ALTER TABLE operations before any production deployment. Not blocking development but blocking production readiness.
  • Parent Epic: #1020 (Database resource hierarchy restructuring)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: High — Risky drop-and-recreate migration patterns can cause data loss in production. This is a database safety issue. - **Milestone**: v3.6.0 - **MoSCoW**: Should Have — While the migrations work, the risky patterns should be refactored to safe ALTER TABLE operations before any production deployment. Not blocking development but blocking production readiness. - **Parent Epic**: #1020 (Database resource hierarchy restructuring) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
Reference
cleveragents/cleveragents-core#2418
No description provided.