fix(database): move get_all_for_project call outside loop in LegacyDataMigrator #3249

Merged
freemo merged 1 commit from fix/legacy-migrator-inefficient-get-all-for-project into master 2026-04-05 21:08:52 +00:00
Owner

Summary

Fixes an N+1 database query pattern in LegacyDataMigrator.migrate_project_data where ctx.plans.get_all_for_project(project.id) was called on every iteration of the plans loop instead of once before the loop.

Problem

In src/cleveragents/infrastructure/database/legacy_migrator.py, the migrate_project_data method iterated through plans from a legacy plans.json file and, inside the loop, called ctx.plans.get_all_for_project(project.id) to check if a plan already existed:

# BEFORE (N+1 pattern — one DB query per plan):
for plan_name, plan_info in plans_data.items():
    all_plans = ctx.plans.get_all_for_project(project.id)  # ← called N times!
    existing_plan = next(
        (p for p in all_plans if p.name == plan_name), None
    )

For a project with N plans in plans.json, this resulted in N database queries where 1 would suffice. For projects with many plans, this caused significant unnecessary database load during migration.

Fix

Move the get_all_for_project call outside the loop so it executes exactly once:

# AFTER (1 query total):
all_plans = ctx.plans.get_all_for_project(project.id)  # ← called once
for plan_name, plan_info in plans_data.items():
    existing_plan = next(
        (p for p in all_plans if p.name == plan_name), None
    )

Changes

  • src/cleveragents/infrastructure/database/legacy_migrator.py: Move get_all_for_project call outside the plans loop; add explanatory comment
  • features/legacy_migrator_coverage.feature: Add new scenario get_all_for_project is called only once for multiple plans
  • features/steps/legacy_migrator_steps.py: Add step definitions for the new efficiency test scenario, including repository-level call tracking via method patching

Testing

  • All 26 existing legacy migrator scenarios continue to pass
  • New scenario verifies get_all_for_project is called exactly once even when migrating 4 plans
  • Lint (ruff) and typecheck (pyright) pass with 0 errors
  • No other N+1 patterns found in LegacyDataMigrator

Closes #3047


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-worker

## Summary Fixes an N+1 database query pattern in `LegacyDataMigrator.migrate_project_data` where `ctx.plans.get_all_for_project(project.id)` was called on every iteration of the plans loop instead of once before the loop. ## Problem In `src/cleveragents/infrastructure/database/legacy_migrator.py`, the `migrate_project_data` method iterated through plans from a legacy `plans.json` file and, inside the loop, called `ctx.plans.get_all_for_project(project.id)` to check if a plan already existed: ```python # BEFORE (N+1 pattern — one DB query per plan): for plan_name, plan_info in plans_data.items(): all_plans = ctx.plans.get_all_for_project(project.id) # ← called N times! existing_plan = next( (p for p in all_plans if p.name == plan_name), None ) ``` For a project with N plans in `plans.json`, this resulted in N database queries where 1 would suffice. For projects with many plans, this caused significant unnecessary database load during migration. ## Fix Move the `get_all_for_project` call outside the loop so it executes exactly once: ```python # AFTER (1 query total): all_plans = ctx.plans.get_all_for_project(project.id) # ← called once for plan_name, plan_info in plans_data.items(): existing_plan = next( (p for p in all_plans if p.name == plan_name), None ) ``` ## Changes - **`src/cleveragents/infrastructure/database/legacy_migrator.py`**: Move `get_all_for_project` call outside the plans loop; add explanatory comment - **`features/legacy_migrator_coverage.feature`**: Add new scenario `get_all_for_project is called only once for multiple plans` - **`features/steps/legacy_migrator_steps.py`**: Add step definitions for the new efficiency test scenario, including repository-level call tracking via method patching ## Testing - All 26 existing legacy migrator scenarios continue to pass - New scenario verifies `get_all_for_project` is called exactly once even when migrating 4 plans - Lint (`ruff`) and typecheck (`pyright`) pass with 0 errors - No other N+1 patterns found in `LegacyDataMigrator` Closes #3047 --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-worker
fix(database): move get_all_for_project call outside loop in LegacyDataMigrator
All checks were successful
CI / lint (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 51s
CI / security (pull_request) Successful in 58s
CI / quality (pull_request) Successful in 34s
CI / build (pull_request) Successful in 17s
CI / helm (pull_request) Successful in 22s
CI / unit_tests (pull_request) Successful in 6m48s
CI / e2e_tests (pull_request) Successful in 15m40s
CI / integration_tests (pull_request) Successful in 22m59s
CI / coverage (pull_request) Successful in 10m59s
CI / docker (pull_request) Successful in 1m24s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 58m1s
e1f5c95bad
Move ctx.plans.get_all_for_project(project.id) outside the plan
iteration loop in LegacyDataMigrator.migrate_project_data to eliminate
an N+1 database query pattern.

Previously, the method fetched all plans from the database on every
iteration of the plans loop, resulting in O(N) queries where N is the
number of plans in the legacy plans.json file. For projects with many
plans, this caused significant unnecessary database load during migration.

The fix fetches all existing plans once before the loop begins, then
uses the in-memory list for duplicate detection on each iteration.

Also verified no other similar N+1 patterns exist in LegacyDataMigrator.

Added a new Behave scenario 'get_all_for_project is called only once for
multiple plans' that patches the repository method to count invocations
and asserts exactly one call regardless of how many plans are migrated.

ISSUES CLOSED: #3047
freemo added this to the v3.7.0 milestone 2026-04-05 08:34:33 +00:00
Author
Owner

🔒 Review claimed by reviewer-pool-1 [claim-token: reviewer-pool-1-3249-1775373200]


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: ca-continuous-pr-reviewer

🔒 Review claimed by reviewer-pool-1 [claim-token: reviewer-pool-1-3249-1775373200] --- **Automated by CleverAgents Bot** Supervisor: PR Review | Agent: ca-continuous-pr-reviewer
freemo left a comment

Review Summary (Self-Review — LGTM)

Reviewed PR #3249 with focus on performance-implications, resource-usage, and scalability.

What Changed

This PR fixes an N+1 database query pattern in LegacyDataMigrator.migrate_project_data() where ctx.plans.get_all_for_project(project.id) was called inside the plans iteration loop (once per plan in the legacy JSON file). The fix moves this call outside the loop so it executes exactly once, then uses the in-memory list for duplicate detection on each iteration.

Files changed:

  • src/cleveragents/infrastructure/database/legacy_migrator.py — Core fix (2 lines moved + explanatory comment)
  • features/legacy_migrator_coverage.feature — New scenario for call-count verification
  • features/steps/legacy_migrator_steps.py — Step definitions for the new scenario

Standard Criteria

Commit Message: fix(database): move get_all_for_project call outside loop in LegacyDataMigrator — Conventional Changelog format, with ISSUES CLOSED: #3047 footer
PR Metadata: Closes #3047, Milestone v3.7.0 (matches issue), Type/Bug label present
Single Atomic Commit: Clean single commit with tests included
Branch Name: fix/legacy-migrator-inefficient-get-all-for-project matches issue metadata
Test Coverage: New Behave scenario verifies get_all_for_project is called exactly once when migrating multiple plans
Code Quality: Clean, minimal diff — only the necessary lines were changed

Deep Dive: Performance, Resource Usage, Scalability

N+1 Fix Correctness
The get_all_for_project() call is correctly hoisted outside the for plan_name, plan_info in plans_data.items() loop. For a project with N plans in plans.json, this reduces database round-trips from O(N) to O(1) — a significant improvement for projects with many plans.

Stale Data Safety
The all_plans list is fetched once before the loop. Plans created inside the loop (via ctx.plans.create()) won't appear in all_plans. This is correct and safe because:

  1. The purpose of all_plans is to detect pre-existing plans to avoid duplicates
  2. plans_data comes from a JSON dict with unique keys, so no newly created plan can conflict with a later iteration's plan_name

Memory Usage
The all_plans list is held in memory for the loop duration. This is acceptable — migration data is bounded by the number of plans in a single project's plans.json, which is typically small.

Scalability

  • Database queries: O(N) → O(1)
  • In-memory name matching: O(N) per iteration → O(N²) total, but this is in-memory and N is small for migration data. This is a massive improvement over N database round-trips.

No Other N+1 Patterns
The PR description confirms no other similar patterns exist in LegacyDataMigrator. I verified the contexts and changes migration loops do not have this issue (they don't call get_all_for_* inside their loops).

Pre-existing Observation (Non-blocking)

There is a pre-existing # type: ignore suppression at plan_id_map[plan_name] = existing_plan.id # type: ignore (present on both master and this branch). This violates the project's strict no-type-ignore rule per CONTRIBUTING.md. This PR did not introduce it, but it would be good to address in a separate issue.

Test Quality

The new scenario get_all_for_project is called only once for multiple plans is a well-designed behavioral test that:

  • Sets up a project with multiple plans to migrate
  • Tracks repository method invocations via patching
  • Asserts exactly one call regardless of plan count
  • Verifies all plans were still migrated correctly

This tests the actual performance characteristic, not just code coverage.

Self-Review Verdict: APPROVE — No issues found. Recommending merge.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: ca-pr-self-reviewer

## Review Summary (Self-Review — LGTM) Reviewed PR #3249 with focus on **performance-implications**, **resource-usage**, and **scalability**. ### What Changed This PR fixes an N+1 database query pattern in `LegacyDataMigrator.migrate_project_data()` where `ctx.plans.get_all_for_project(project.id)` was called inside the plans iteration loop (once per plan in the legacy JSON file). The fix moves this call outside the loop so it executes exactly once, then uses the in-memory list for duplicate detection on each iteration. **Files changed:** - `src/cleveragents/infrastructure/database/legacy_migrator.py` — Core fix (2 lines moved + explanatory comment) - `features/legacy_migrator_coverage.feature` — New scenario for call-count verification - `features/steps/legacy_migrator_steps.py` — Step definitions for the new scenario ### Standard Criteria ✅ **Commit Message**: `fix(database): move get_all_for_project call outside loop in LegacyDataMigrator` — Conventional Changelog format, with `ISSUES CLOSED: #3047` footer ✅ **PR Metadata**: Closes #3047, Milestone v3.7.0 (matches issue), Type/Bug label present ✅ **Single Atomic Commit**: Clean single commit with tests included ✅ **Branch Name**: `fix/legacy-migrator-inefficient-get-all-for-project` matches issue metadata ✅ **Test Coverage**: New Behave scenario verifies `get_all_for_project` is called exactly once when migrating multiple plans ✅ **Code Quality**: Clean, minimal diff — only the necessary lines were changed ### Deep Dive: Performance, Resource Usage, Scalability **N+1 Fix Correctness** ✅ The `get_all_for_project()` call is correctly hoisted outside the `for plan_name, plan_info in plans_data.items()` loop. For a project with N plans in `plans.json`, this reduces database round-trips from O(N) to O(1) — a significant improvement for projects with many plans. **Stale Data Safety** ✅ The `all_plans` list is fetched once before the loop. Plans created *inside* the loop (via `ctx.plans.create()`) won't appear in `all_plans`. This is correct and safe because: 1. The purpose of `all_plans` is to detect *pre-existing* plans to avoid duplicates 2. `plans_data` comes from a JSON dict with unique keys, so no newly created plan can conflict with a later iteration's `plan_name` **Memory Usage** ✅ The `all_plans` list is held in memory for the loop duration. This is acceptable — migration data is bounded by the number of plans in a single project's `plans.json`, which is typically small. **Scalability** ✅ - Database queries: O(N) → O(1) ✅ - In-memory name matching: O(N) per iteration → O(N²) total, but this is in-memory and N is small for migration data. This is a massive improvement over N database round-trips. **No Other N+1 Patterns** ✅ The PR description confirms no other similar patterns exist in `LegacyDataMigrator`. I verified the contexts and changes migration loops do not have this issue (they don't call `get_all_for_*` inside their loops). ### Pre-existing Observation (Non-blocking) There is a pre-existing `# type: ignore` suppression at `plan_id_map[plan_name] = existing_plan.id # type: ignore` (present on both master and this branch). This violates the project's strict no-type-ignore rule per CONTRIBUTING.md. **This PR did not introduce it**, but it would be good to address in a separate issue. ### Test Quality The new scenario `get_all_for_project is called only once for multiple plans` is a well-designed behavioral test that: - Sets up a project with multiple plans to migrate - Tracks repository method invocations via patching - Asserts exactly one call regardless of plan count - Verifies all plans were still migrated correctly This tests the actual performance characteristic, not just code coverage. **Self-Review Verdict: APPROVE** ✅ — No issues found. Recommending merge. --- **Automated by CleverAgents Bot** Supervisor: PR Review | Agent: ca-pr-self-reviewer
freemo left a comment

Independent Code Review — LGTM

Reviewed PR #3249 with focus on performance-implications, resource-usage, and scalability.

Note

: This review recommends APPROVAL but is posted as COMMENT due to Forgejo self-approval restrictions on the shared bot account. A non-author reviewer should approve.

What Changed

This PR fixes an N+1 database query pattern in LegacyDataMigrator.migrate_project_data() where ctx.plans.get_all_for_project(project.id) was called inside the plans iteration loop (once per plan in the legacy JSON file). The fix moves this call outside the loop so it executes exactly once.

Files changed (3):

  • src/cleveragents/infrastructure/database/legacy_migrator.py — Core fix: 2 lines moved + 2-line explanatory comment
  • features/legacy_migrator_coverage.feature — New scenario for call-count verification + EOF newline fix
  • features/steps/legacy_migrator_steps.py — 4 new step definitions for the efficiency test scenario

Standard Criteria

Commit Message: fix(database): move get_all_for_project call outside loop in LegacyDataMigrator — Conventional Changelog format with ISSUES CLOSED: #3047 footer
PR Metadata: Closes #3047 present, Milestone v3.7.0 (matches issue), Type/Bug label present
Single Atomic Commit: Clean single commit including implementation, tests, and documentation
Branch Name: fix/legacy-migrator-inefficient-get-all-for-project matches issue metadata exactly
No New Forbidden Patterns: No new # type: ignore suppressions introduced (pre-existing one at line ~111 noted below)
Code Quality: Minimal, surgical diff — only the necessary lines were changed

Deep Dive: Performance, Resource Usage, Scalability

N+1 Fix Correctness
The get_all_for_project() call is correctly hoisted from inside the for plan_name, plan_info in plans_data.items() loop to before it. For a project with N plans in plans.json, this reduces database round-trips from O(N) to O(1) — a significant improvement.

Stale Data Safety
The all_plans list is fetched once before the loop. Plans created inside the loop (via ctx.plans.create()) won't appear in all_plans. This is correct and safe because:

  1. The purpose of all_plans is to detect pre-existing plans to avoid duplicates during migration
  2. plans_data comes from a JSON dict with unique keys, so no newly created plan can conflict with a later iteration's plan_name
  3. The duplicate-detection logic only needs to see plans that existed before migration began

Memory Usage
The all_plans list is held in memory for the loop duration. This is bounded by the number of plans in a single project's plans.json, which is inherently small for migration data. No concern here.

Scalability Analysis

  • Database queries: O(N) → O(1) — the primary improvement
  • In-memory name matching: O(N) per iteration → O(N²) total, but this is purely in-memory list traversal with small N. The elimination of N database round-trips is the dominant improvement.
  • No new resource allocations, connection pool changes, or file handle concerns

No Other N+1 Patterns
I verified the contexts and changes migration loops in the same file — they do not call get_all_for_* inside their loops, so no similar patterns remain.

Test Quality

The new scenario get_all_for_project is called only once for multiple plans is well-designed:

  • Setup: Creates 4 plans to ensure the loop runs multiple times
  • Tracking: Patches the repository method at the class level to count invocations, with proper cleanup in a finally block
  • Assertions: Verifies both the performance characteristic (exactly 1 call) AND correctness (all 4 plans migrated with correct names)
  • Error messages: Descriptive assertion messages that explain what went wrong if the test fails

This tests the actual performance characteristic, not just code coverage — exactly the right approach for verifying an N+1 fix.

Pre-existing Observations (Non-blocking, Not Introduced by This PR)

  1. # type: ignore suppression at plan_id_map[plan_name] = existing_plan.id # type: ignore (line ~111) — This violates the project's strict no-type-ignore rule per CONTRIBUTING.md. This PR did not introduce it (present on master). Should be addressed in a separate issue.

  2. Steps file size: features/steps/legacy_migrator_steps.py is ~1088 lines on the branch, exceeding the 500-line limit. However, it was already ~1012 lines on master. This PR only added ~76 lines. The pre-existing violation is not this PR's responsibility.

Verdict

Recommends: APPROVED

Clean, minimal, well-tested fix for a genuine performance issue. The N+1 pattern is correctly eliminated with no stale data risks, and the new BDD scenario properly verifies the fix. No issues found that would block merge.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: ca-pr-self-reviewer

## Independent Code Review — LGTM ✅ Reviewed PR #3249 with focus on **performance-implications**, **resource-usage**, and **scalability**. > **Note**: This review recommends APPROVAL but is posted as COMMENT due to Forgejo self-approval restrictions on the shared bot account. A non-author reviewer should approve. ### What Changed This PR fixes an N+1 database query pattern in `LegacyDataMigrator.migrate_project_data()` where `ctx.plans.get_all_for_project(project.id)` was called inside the plans iteration loop (once per plan in the legacy JSON file). The fix moves this call outside the loop so it executes exactly once. **Files changed (3):** - `src/cleveragents/infrastructure/database/legacy_migrator.py` — Core fix: 2 lines moved + 2-line explanatory comment - `features/legacy_migrator_coverage.feature` — New scenario for call-count verification + EOF newline fix - `features/steps/legacy_migrator_steps.py` — 4 new step definitions for the efficiency test scenario ### Standard Criteria ✅ **Commit Message**: `fix(database): move get_all_for_project call outside loop in LegacyDataMigrator` — Conventional Changelog format with `ISSUES CLOSED: #3047` footer ✅ **PR Metadata**: `Closes #3047` present, Milestone v3.7.0 (matches issue), `Type/Bug` label present ✅ **Single Atomic Commit**: Clean single commit including implementation, tests, and documentation ✅ **Branch Name**: `fix/legacy-migrator-inefficient-get-all-for-project` matches issue metadata exactly ✅ **No New Forbidden Patterns**: No new `# type: ignore` suppressions introduced (pre-existing one at line ~111 noted below) ✅ **Code Quality**: Minimal, surgical diff — only the necessary lines were changed ### Deep Dive: Performance, Resource Usage, Scalability **N+1 Fix Correctness** ✅ The `get_all_for_project()` call is correctly hoisted from inside the `for plan_name, plan_info in plans_data.items()` loop to before it. For a project with N plans in `plans.json`, this reduces database round-trips from O(N) to O(1) — a significant improvement. **Stale Data Safety** ✅ The `all_plans` list is fetched once before the loop. Plans created *inside* the loop (via `ctx.plans.create()`) won't appear in `all_plans`. This is correct and safe because: 1. The purpose of `all_plans` is to detect *pre-existing* plans to avoid duplicates during migration 2. `plans_data` comes from a JSON dict with unique keys, so no newly created plan can conflict with a later iteration's `plan_name` 3. The duplicate-detection logic only needs to see plans that existed *before* migration began **Memory Usage** ✅ The `all_plans` list is held in memory for the loop duration. This is bounded by the number of plans in a single project's `plans.json`, which is inherently small for migration data. No concern here. **Scalability Analysis** ✅ - Database queries: O(N) → O(1) — the primary improvement - In-memory name matching: O(N) per iteration → O(N²) total, but this is purely in-memory list traversal with small N. The elimination of N database round-trips is the dominant improvement. - No new resource allocations, connection pool changes, or file handle concerns **No Other N+1 Patterns** ✅ I verified the contexts and changes migration loops in the same file — they do not call `get_all_for_*` inside their loops, so no similar patterns remain. ### Test Quality The new scenario `get_all_for_project is called only once for multiple plans` is well-designed: - **Setup**: Creates 4 plans to ensure the loop runs multiple times - **Tracking**: Patches the repository method at the class level to count invocations, with proper cleanup in a `finally` block - **Assertions**: Verifies both the performance characteristic (exactly 1 call) AND correctness (all 4 plans migrated with correct names) - **Error messages**: Descriptive assertion messages that explain what went wrong if the test fails This tests the actual performance characteristic, not just code coverage — exactly the right approach for verifying an N+1 fix. ### Pre-existing Observations (Non-blocking, Not Introduced by This PR) 1. **`# type: ignore` suppression** at `plan_id_map[plan_name] = existing_plan.id # type: ignore` (line ~111) — This violates the project's strict no-type-ignore rule per CONTRIBUTING.md. **This PR did not introduce it** (present on master). Should be addressed in a separate issue. 2. **Steps file size**: `features/steps/legacy_migrator_steps.py` is ~1088 lines on the branch, exceeding the 500-line limit. However, it was already ~1012 lines on master. This PR only added ~76 lines. The pre-existing violation is not this PR's responsibility. ### Verdict **Recommends: APPROVED** ✅ Clean, minimal, well-tested fix for a genuine performance issue. The N+1 pattern is correctly eliminated with no stale data risks, and the new BDD scenario properly verifies the fix. No issues found that would block merge. --- **Automated by CleverAgents Bot** Supervisor: PR Review | Agent: ca-pr-self-reviewer
freemo merged commit 5824e7c0aa into master 2026-04-05 21:08:48 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!3249
No description provided.