test(e2e): workflow example 5 — database schema migration with safety nets (review profile) #816

2026-03-13T16:52:07Z

freemo commented

2026-03-13 16:52:07 +00:00

Summary

E2E test for Workflow Example 5 — database schema migration with safety nets using the review automation profile. Exercises the full spec-aligned workflow:

Custom resource type registration via resource type add --config (postgres-db type with transaction_rollback sandbox strategy, --host, --port, --database, --schema CLI args with flat type/default fields per ResourceTypeArgument schema)
Custom resource instantiation — attempts resource add with the custom type to exercise mixed resource types, followed by project link-resource to link DB resource to the project
Custom skill creation with spec-aligned database tools: local/query_db (read-only), local/execute_migration (writes, checkpointable), local/backfill_column (writes, checkpointable) — registered via skill add --config, with namespaced tool reference names per SkillToolRefSchema validation
Action creation with automation_profile: review, reusable: true, state: available, spec invariants, and typed arguments section (table_name, column_name, column_type, backfill_source — all required per spec, using arguments field per ActionConfigSchema)
Plan use with --arg flags exercising parameterized action invocation including backfill_source=audit_log, plus explicit --automation-profile review flag (action-to-plan profile propagation is not yet wired in PlanLifecycleService.use_action)
Phased child plan verification via plan tree --format json with decision_count >= 2 hard assertion on framework decisions plus WARN tiers for LLM decomposition quality (< 3, < 5)
Plan phase assertion — hard assertion that phase is populated after execute
Checkpoint-based rollback with hard assertions: rc=0 on rollback success, rc!=0 on fake checkpoint, None guard for JSON null checkpoint IDs, re-execute with Traceback/INTERNAL checks on success and explanatory comment on failure path
Plan diff with hard rc=0 assertion and content-signal verification
Migration content verification — baseline SHA saved before apply, diff against baseline (not HEAD~1), WARN-level check on migration keywords (last_login, schema, migration, column, alter) — flexible per LLM non-determinism
Commit count assertion >= 2 (fixture baseline: Create Temp Git Repo + DB fixture commit), WARN if no additional commits from lifecycle-apply
Backfill evidence WARN-level check in plan tree/execution output (backfill, batch, populate, last_login) with explanatory comment noting tree covers decomposition plan
Combined AC #6 gate — if both migration content and backfill evidence are absent, explicit WARN visibility for CI debugging
Terminal state assertion after lifecycle-apply — plan status call verifies phase/processing_state reflects terminal or apply-progress outcome
Automation profile fallback verification — if plan use output omits automation_profile, falls back to plan status for secondary verification (hard assertion always runs)
Traceback and INTERNAL checks on all CLI commands (resource add, project create, resource type add, skill add, action create, plan use, strategize, execute, plan tree, plan status, plan diff, plan rollback, re-execute after rollback, lifecycle-apply) including custom resource error paths
Dynamic actor selection — detects available API keys (Anthropic/OpenAI) at suite setup
Skip If No LLM Keys guard for graceful CI degradation
Test-level teardown with diagnostic logging for both plan status and plan tree on failure
30-minute timeout covering worst-case rollback+re-execute path
Force Tags for consistency with m6_acceptance.robot
Timeout parameters (timeout=60s on_timeout=kill) on all local Run Process git commands
Sequential section numbering (1 through 15) for readability

Closes #751

ISSUES CLOSED: #751

Approach

Follows the patterns established by m6_acceptance.robot and m2_acceptance.robot:

WF05 Suite Setup initialises the workspace, generates a unique run suffix, and detects available LLM API keys
Safe Parse Json Field from common_e2e.resource for JSON field extraction with None guards for JSON null values
All CLI commands use --format json for predictable, parseable output
expected_rc=None with explicit Should Be Equal As Integers for detailed failure messages
Hard assertions on infrastructure/framework behavior (CLI commands, phase transitions, tool registration)
WARN-level assertions on LLM-dependent output (decision decomposition, migration content, backfill evidence, commit count) — per ticket requirement "output validation is flexible"
Traceback and INTERNAL checks on all CLI commands following m2_acceptance.robot pattern
Baseline SHA approach for post-apply diff verification eliminates false positives from fixture commits

Bug Fix: LifecyclePlanRepository.update() UNIQUE Constraint Violation

Root cause: LifecyclePlanRepository.update() called clear() on child relationship collections (project_links, arguments, invariants) followed by append() with new items, but only flushed at the end. SQLAlchemy's default operation ordering can emit INSERTs before DELETEs within the same flush, causing UNIQUE constraint failed: plan_arguments.plan_id, plan_arguments.name when plans have arguments.

Fix: Group all three clear() calls together and flush them before appending new rows. This ensures the DELETEs are committed before any INSERTs, preventing the UNIQUE constraint violation.

Impact: This was a latent bug affecting ALL plans with arguments when update() is called. Previously undetected because existing E2E tests (M1, M2, M5, M6) create plans without --arg flags.

Review Fixes (addressing medium findings from @CoreRasurae review)

#	Finding	Fix
BUG-1	No regression test for UNIQUE constraint fix	Added targeted BDD scenario in `repositories_coverage_boost.feature` — creates plan with argument `x=v1`, updates to `x=v2`, asserts no `IntegrityError`
TEST-1	AC #4 weakened — fragile string counting	Replaced raw `count('"decision_id"')` with proper JSON parsing via `json.loads()`, recursive tree walking for decision counting, structural `children_key_count` and `child_link_count` verification
TEST-2	AC #5 conditionally tested	Added explicit WARN log when no checkpoint_id is present ("AC #5 visibility"); fake checkpoint test now runs unconditionally (moved outside IF/ELSE) with Traceback/INTERNAL checks
TEST-3	No terminal state assertion after lifecycle-apply	Added `plan status` call after apply with phase/processing_state extraction; hard assertion on terminal state or apply-phase progress
TEST-4	AC #6 migration/backfill WARN-only	Added combined gate (`has_ac6_evidence`): if both migration and backfill evidence are absent, explicit WARN for CI visibility. WARN-only is intentional per ticket AC "output validation is flexible"
TEST-5	Automation profile silently skipped	Added fallback to `plan status --format json` when `plan use` output omits `automation_profile`; hard assertion (`Should Be Equal As Strings review`) now always executes
TEST-8	Missing Traceback/INTERNAL on custom resource error paths	Added Traceback/INTERNAL checks inside both `resource add` and `project link-resource` ELSE branches with `NoSuchOption` guard

Quality Gates

nox -e lint ✅
nox -e typecheck ✅ (0 errors)
nox -e unit_tests ✅ (471 features, 12,422 scenarios, 0 failures)
nox -e integration_tests ✅ (1,727 tests, 0 failures)
nox -e e2e_tests ✅ (42 tests, 42 passed, 0 failed)
nox -e coverage_report ✅ (98%, meets threshold)

Manual Verification

Prerequisites

OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable set

Commands

nox -e e2e_tests
# Or run just this suite:
python -m robot --outputdir build/reports/robot --include E2E robot/e2e/wf05_db_migration.robot

## Summary E2E test for Workflow Example 5 — database schema migration with safety nets using the **review** automation profile. Exercises the full spec-aligned workflow: - **Custom resource type registration** via `resource type add --config` (postgres-db type with `transaction_rollback` sandbox strategy, `--host`, `--port`, `--database`, `--schema` CLI args with flat `type`/`default` fields per `ResourceTypeArgument` schema) - **Custom resource instantiation** — attempts `resource add` with the custom type to exercise mixed resource types, followed by `project link-resource` to link DB resource to the project - **Custom skill creation** with spec-aligned database tools: `local/query_db` (read-only), `local/execute_migration` (writes, checkpointable), `local/backfill_column` (writes, checkpointable) — registered via `skill add --config`, with namespaced tool reference names per `SkillToolRefSchema` validation - **Action creation** with `automation_profile: review`, `reusable: true`, `state: available`, spec invariants, and typed `arguments` section (`table_name`, `column_name`, `column_type`, `backfill_source` — all required per spec, using `arguments` field per `ActionConfigSchema`) - **Plan use** with `--arg` flags exercising parameterized action invocation including `backfill_source=audit_log`, plus **explicit `--automation-profile review`** flag (action-to-plan profile propagation is not yet wired in `PlanLifecycleService.use_action`) - **Phased child plan verification** via `plan tree --format json` with `decision_count >= 2` hard assertion on framework decisions plus WARN tiers for LLM decomposition quality (`< 3`, `< 5`) - **Plan phase assertion** — hard assertion that phase is populated after execute - **Checkpoint-based rollback** with hard assertions: `rc=0` on rollback success, `rc!=0` on fake checkpoint, None guard for JSON null checkpoint IDs, re-execute with Traceback/INTERNAL checks on success and explanatory comment on failure path - **Plan diff** with hard `rc=0` assertion and content-signal verification - **Migration content verification** — baseline SHA saved before apply, diff against baseline (not `HEAD~1`), WARN-level check on migration keywords (`last_login`, `schema`, `migration`, `column`, `alter`) — flexible per LLM non-determinism - **Commit count** assertion `>= 2` (fixture baseline: Create Temp Git Repo + DB fixture commit), WARN if no additional commits from lifecycle-apply - **Backfill evidence** WARN-level check in plan tree/execution output (`backfill`, `batch`, `populate`, `last_login`) with explanatory comment noting tree covers decomposition plan - **Combined AC #6 gate** — if *both* migration content *and* backfill evidence are absent, explicit WARN visibility for CI debugging - **Terminal state assertion** after `lifecycle-apply` — `plan status` call verifies phase/processing_state reflects terminal or apply-progress outcome - **Automation profile fallback verification** — if `plan use` output omits `automation_profile`, falls back to `plan status` for secondary verification (hard assertion always runs) - **Traceback and INTERNAL checks** on all CLI commands (resource add, project create, resource type add, skill add, action create, plan use, strategize, execute, plan tree, plan status, plan diff, plan rollback, re-execute after rollback, lifecycle-apply) including custom resource error paths - **Dynamic actor selection** — detects available API keys (Anthropic/OpenAI) at suite setup - **Skip If No LLM Keys** guard for graceful CI degradation - **Test-level teardown** with diagnostic logging for both plan status and plan tree on failure - **30-minute timeout** covering worst-case rollback+re-execute path - **Force Tags** for consistency with `m6_acceptance.robot` - **Timeout parameters** (`timeout=60s on_timeout=kill`) on all local `Run Process` git commands - **Sequential section numbering** (1 through 15) for readability Closes #751 ISSUES CLOSED: #751 ## Approach Follows the patterns established by `m6_acceptance.robot` and `m2_acceptance.robot`: - `WF05 Suite Setup` initialises the workspace, generates a unique run suffix, and detects available LLM API keys - `Safe Parse Json Field` from `common_e2e.resource` for JSON field extraction with None guards for JSON null values - All CLI commands use `--format json` for predictable, parseable output - `expected_rc=None` with explicit `Should Be Equal As Integers` for detailed failure messages - Hard assertions on infrastructure/framework behavior (CLI commands, phase transitions, tool registration) - WARN-level assertions on LLM-dependent output (decision decomposition, migration content, backfill evidence, commit count) — per ticket requirement "output validation is flexible" - Traceback and INTERNAL checks on all CLI commands following `m2_acceptance.robot` pattern - Baseline SHA approach for post-apply diff verification eliminates false positives from fixture commits ## Bug Fix: LifecyclePlanRepository.update() UNIQUE Constraint Violation **Root cause**: `LifecyclePlanRepository.update()` called `clear()` on child relationship collections (project_links, arguments, invariants) followed by `append()` with new items, but only flushed at the end. SQLAlchemy's default operation ordering can emit INSERTs before DELETEs within the same flush, causing `UNIQUE constraint failed: plan_arguments.plan_id, plan_arguments.name` when plans have arguments. **Fix**: Group all three `clear()` calls together and flush them before appending new rows. This ensures the DELETEs are committed before any INSERTs, preventing the UNIQUE constraint violation. **Impact**: This was a latent bug affecting ALL plans with arguments when `update()` is called. Previously undetected because existing E2E tests (M1, M2, M5, M6) create plans without `--arg` flags. ## Review Fixes (addressing medium findings from @CoreRasurae review) | # | Finding | Fix | |---|---------|-----| | **BUG-1** | No regression test for UNIQUE constraint fix | Added targeted BDD scenario in `repositories_coverage_boost.feature` — creates plan with argument `x=v1`, updates to `x=v2`, asserts no `IntegrityError` | | **TEST-1** | AC #4 weakened — fragile string counting | Replaced raw `count('"decision_id"')` with proper JSON parsing via `json.loads()`, recursive tree walking for decision counting, structural `children_key_count` and `child_link_count` verification | | **TEST-2** | AC #5 conditionally tested | Added explicit WARN log when no checkpoint_id is present ("AC #5 visibility"); fake checkpoint test now runs unconditionally (moved outside IF/ELSE) with Traceback/INTERNAL checks | | **TEST-3** | No terminal state assertion after lifecycle-apply | Added `plan status` call after apply with phase/processing_state extraction; hard assertion on terminal state or apply-phase progress | | **TEST-4** | AC #6 migration/backfill WARN-only | Added combined gate (`has_ac6_evidence`): if *both* migration and backfill evidence are absent, explicit WARN for CI visibility. WARN-only is intentional per ticket AC "output validation is flexible" | | **TEST-5** | Automation profile silently skipped | Added fallback to `plan status --format json` when `plan use` output omits `automation_profile`; hard assertion (`Should Be Equal As Strings review`) now always executes | | **TEST-8** | Missing Traceback/INTERNAL on custom resource error paths | Added Traceback/INTERNAL checks inside both `resource add` and `project link-resource` ELSE branches with `NoSuchOption` guard | ## Quality Gates - `nox -e lint` ✅ - `nox -e typecheck` ✅ (0 errors) - `nox -e unit_tests` ✅ (471 features, 12,422 scenarios, 0 failures) - `nox -e integration_tests` ✅ (1,727 tests, 0 failures) - `nox -e e2e_tests` ✅ (42 tests, 42 passed, 0 failed) - `nox -e coverage_report` ✅ (98%, meets threshold) ## Manual Verification ### Prerequisites - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variable set ### Commands ```bash nox -e e2e_tests # Or run just this suite: python -m robot --outputdir build/reports/robot --include E2E robot/e2e/wf05_db_migration.robot ```

freemo added the

labels 2026-03-13 16:52:13 +00:00

freemo added this to the v3.3.0 milestone 2026-03-13 16:52:14 +00:00

freemo force-pushed test/e2e-wf05-db-migration from d7f533337a to f25797e44b

2026-03-13 17:28:44 +00:00

Compare

freemo force-pushed test/e2e-wf05-db-migration from f25797e44b to ed2a254570

2026-03-13 17:46:55 +00:00

Compare

freemo force-pushed test/e2e-wf05-db-migration from ed2a254570 to 4d0c5f3308

2026-03-13 18:13:08 +00:00

Compare

freemo force-pushed test/e2e-wf05-db-migration from 4d0c5f3308 to 33858063b4

2026-03-13 18:26:26 +00:00

Compare

freemo referenced this pull request

2026-03-13 19:21:05 +00:00

test(e2e): workflow example 5 — database schema migration with safety nets (review profile) #751

freemo force-pushed test/e2e-wf05-db-migration from 33858063b4 to 8eae9d1e0b

2026-03-13 23:19:35 +00:00

Compare

freemo added the

Priority

Medium

label 2026-03-14 04:10:22 +00:00

freemo commented

2026-03-14 04:43:51 +00:00

PM Review — Day 34

Status: Mergeable, 0 reviews, M4 (v3.3.0)
Closes: #751 | Author: @freemo

E2E test for WF05 (database schema migration with safety nets, review profile). Tests skill creation, plan lifecycle, verifies git commits post-apply.

[MINOR] Committer name OpenCode Agent differs from siblings using OpenCode — normalize for consistency.
[MINOR] action.yaml / skill.yaml content not shown in PR body — reviewer can't fully reproduce manual verification.

Action Items

Who	Action	Deadline
@brent.edwards	Peer review	Day 36

## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M4 (v3.3.0) **Closes**: #751 | **Author**: @freemo E2E test for WF05 (database schema migration with safety nets, review profile). Tests skill creation, plan lifecycle, verifies git commits post-apply. **[MINOR]** Committer name `OpenCode Agent` differs from siblings using `OpenCode` — normalize for consistency. **[MINOR]** `action.yaml` / `skill.yaml` content not shown in PR body — reviewer can't fully reproduce manual verification. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @brent.edwards | **Peer review** | Day 36 |

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#627 Implement @tdd_expected_fail tag handling in Behave environment

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#628 Implement @tdd_expected_fail tag handling in Robot Framework

freemo added a new dependency 2026-03-16 02:42:19 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

freemo commented

2026-03-16 09:27:18 +00:00

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days.

Priority note: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order.

Assigned reviewer: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days. **Priority note**: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order. **Assigned reviewer**: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.

hurui200320 was assigned by freemo

2026-03-16 22:19:40 +00:00

freemo commented

2026-03-16 22:19:47 +00:00

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on https://git.cleverthis.com/cleveragents/cleveragents-core/issues/628 and https://git.cleverthis.com/cleveragents/cleveragents-core/issues/966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

freemo requested review from brent.edwards 2026-03-17 18:24:16 +00:00

freemo requested review from hamza.khyari 2026-03-17 18:24:16 +00:00

freemo commented

2026-03-17 18:40:33 +00:00

PM Status — Day 37

Ownership transferred to @hurui200320. Blocked on #628 and #966. PR is M4 (v3.3.0).

Author: Please rebase onto latest master by Day 39 EOD (2026-03-19) and confirm blocker status. Check for merge conflicts proactively.

PM status — Day 37

## PM Status — Day 37 Ownership transferred to @hurui200320. Blocked on #628 and #966. PR is M4 (v3.3.0). **Author**: Please rebase onto latest `master` by **Day 39 EOD (2026-03-19)** and confirm blocker status. Check for merge conflicts proactively. --- *PM status — Day 37*

hurui200320 force-pushed test/e2e-wf05-db-migration from 8eae9d1e0b to 73cddb0275

2026-03-18 08:35:54 +00:00

Compare

freemo commented

2026-03-19 04:57:58 +00:00

Code Review — PR #816

(Cannot submit formal approval — self-authored PR.)

E2E test for WF05. Well-structured with proper labels, milestone, and issue linkage. No issues found.

## Code Review — PR #816 *(Cannot submit formal approval — self-authored PR.)* E2E test for WF05. Well-structured with proper labels, milestone, and issue linkage. No issues found.

hurui200320 force-pushed test/e2e-wf05-db-migration from 73cddb0275 to cc8f0d0bd1

2026-03-19 09:22:46 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from cc8f0d0bd1 to 57dc8db0f8

2026-03-19 09:48:59 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from 57dc8db0f8 to 57d2907a70

2026-03-19 10:14:49 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from 57d2907a70 to 459fd89a74

2026-03-19 11:17:32 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from 459fd89a74 to 16c80defe8

2026-03-19 13:34:18 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from 16c80defe8 to 9c27a269fc

2026-03-20 05:39:13 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from 9c27a269fc to fa5ff97681

2026-03-23 04:10:46 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from fa5ff97681 to e382538edd

2026-03-24 05:40:12 +00:00

Compare

CoreRasurae requested review from CoreRasurae 2026-03-24 11:50:30 +00:00

CoreRasurae reviewed 2026-03-24 12:04:59 +00:00

CoreRasurae left a comment

Code Review Report -- PR #816 (Issue #751)

`test/e2e-wf05-db-migration`: WF05 Database Schema Migration E2E + Repository UNIQUE Fix

Scope: robot/e2e/wf05_db_migration.robot (new, 512 lines), src/cleveragents/infrastructure/database/repositories.py (14-line fix), CHANGELOG.md (7 lines).

Review methodology: 3 full global cycles across all categories (bugs, test flaws, coverage, security, performance) with targeted deep-dives into skill schema parsing, action YAML parsing, repository update flow, and specification alignment. Each cycle re-examined all categories until no new issues were found.

Overall assessment: The repository UNIQUE constraint fix is correct and well-commented. The E2E test is functional and follows established patterns from other WF tests. However, several acceptance criteria are significantly weakened by WARN-only assertions, and the repository fix lacks a dedicated regression test.

Findings Summary

Severity	Bug/Correctness	Test Flaws	Total
Medium	1	5	6
Low	1	3	4
Total	2	8	10

1. Bug Detection / Correctness

BUG-1 (Medium): No dedicated regression test for the UNIQUE constraint fix

File: src/cleveragents/infrastructure/database/repositories.py:1398-1406

The fix correctly consolidates clear() calls and adds session.flush() before append() to prevent SQLAlchemy INSERT-before-DELETE ordering. However, no existing test reproduces the exact failure scenario: creating a plan with argument name "x", then updating it with the same argument name "x".

Verified against all existing test suites:

repositories_coverage_boost.feature (line 72): updates a plan that starts with zero arguments -- never triggers UNIQUE collision.
plan_persistence.feature, plan_lifecycle_coverage.feature, plan_repository.feature: all create fresh arguments on update, never same-name replacement.

Risk: If the session.flush() at line 1406 is accidentally removed during future refactoring, no fast unit/BDD test will catch the regression. Only the slow E2E test would surface it, and only if the LLM happens to trigger a plan update with duplicate argument names.

Recommendation: Add a targeted BDD scenario that creates a plan with arguments {"x": "v1"}, then updates it with {"x": "v2"}, and asserts no IntegrityError.

BUG-2 (Low): Skill tool metadata silently discarded

File: robot/e2e/wf05_db_migration.robot:221-250

The test defines tools with writes: true/false and checkpointable: true/false, but SkillService._schema_to_skill_dict() (skill_service.py:345-346) reduces each tool to just its name string. The metadata is validated by SkillToolRefSchema but never stored in the Skill domain model. The tool name assertions on lines 248-250 pass, but the test creates a false impression that capability metadata is being verified.

Impact: Low -- the test correctly exercises the CLI registration flow. But the assertion Output Should Contain ${r_skill} local/query_db only proves the name was echoed, not that writes: false was recorded.

2. Test Flaws / Coverage Gaps

TEST-1 (Medium): AC #4 weakened -- "5 phases" reduced to 2-decision minimum

File: robot/e2e/wf05_db_migration.robot:349-364

Issue #751 AC says "Test exercises phased child plan execution (5 phases)". The spec (lines 38590-38594) describes a 5-phase plan: migration, rollback migration, backfill, code update, tests.

The test asserts:

Should Be True    ${decision_count} >= 2

This is a 60% reduction from the acceptance criterion. The WARNs at lines 356-361 log if below 3 or 5 but do not fail. Additionally, the children key check (line 363) only verifies the JSON key "children" exists as a substring in stdout -- it does not verify the children array is non-empty or that actual child plans were created.

Recommendation: Raise the hard minimum to >= 3 (still accommodating LLM variability), and parse the JSON to verify at least one child node exists in the children array.

TEST-2 (Medium): AC #5 conditionally tested -- rollback only if checkpoint exists

File: robot/e2e/wf05_db_migration.robot:400-437

Issue #751 AC says "Test exercises checkpoint creation and rollback (plan rollback)". However, the test conditionally branches:

IF checkpoint_id is non-empty: performs actual rollback (hard assertions) -- AC fulfilled.
ELSE (line 428): only verifies that a fake checkpoint ID fails gracefully -- AC not fulfilled.

If the system never creates checkpoints during execution, the test passes without ever exercising real rollback functionality.

Recommendation: Add a hard assertion (or at least a prominent log) when the ELSE branch is taken, to make it visible in CI that AC #5 was not actually validated. Alternatively, consider asserting checkpoint existence separately.

TEST-3 (Medium): No terminal state assertion after lifecycle-apply

File: robot/e2e/wf05_db_migration.robot:457-467

After lifecycle-apply, the test checks rc == 0 and Output Should Contain ${r_apply} ${plan_id}, but never queries plan status to verify the plan reached a terminal state (applied, complete, errored, etc.). If apply succeeds with rc=0 but the plan state is left in an intermediate or incorrect state, this would not be caught.

Other E2E tests (e.g., m6 acceptance) verify the phase field contains "apply" after the apply step.

Recommendation: Add a plan status call after apply and assert the phase/processing_state indicates a terminal outcome.

TEST-4 (Medium): AC #6 migration/backfill content checks are WARN-only

File: robot/e2e/wf05_db_migration.robot:493-509

Two content verification checks that map to AC #6 ("Test verifies database migration and backfill operations") are logged as WARNs but do not fail the test:

Migration content (line 493): checks for last_login, schema, migration, column, alter in git diff. If none found, only WARNs.
Backfill evidence (line 505): checks for backfill, batch, populate, last_login in plan/execution output. If none found, only WARNs.

If the LLM produces no migration-related changes whatsoever, the test still passes.

Recommendation: If these are intentionally non-blocking (due to LLM non-determinism), document this explicitly in the test. Consider adding a combined gate: if both migration content and backfill evidence are absent, fail the test.

TEST-5 (Medium): Automation profile verification silently skipped if field absent

File: robot/e2e/wf05_db_migration.robot:319-327

After plan use, the test extracts automation_profile from JSON output. If the field is absent or null (lines 320-321 handle Python None / string "None"), the test logs a WARN and skips the assertion entirely (line 326).

The issue AC implicitly requires the review profile to be active. If the system accepts the --automation-profile review flag but doesn't persist it in the plan JSON output, this is never caught.

Recommendation: Follow up with a plan status --format json call and extract automation_profile from there as a secondary verification.

TEST-6 (Low): `decision_count` based on fragile raw string counting

File: robot/e2e/wf05_db_migration.robot:352

${decision_count}=    Evaluate    $r_tree.stdout.count('"decision_id"')

This counts substring "decision_id" occurrences in raw JSON. If the JSON format changes (e.g., nested error messages contain "decision_id", or pretty-printing changes quoting), the count could be wrong. Similarly, "children" on line 363 is a raw substring check.

Recommendation: Use Safe Parse Json Field or full JSON parsing to count decision nodes in the tree structure.

TEST-7 (Low): No content assertion on plan diff output

File: robot/e2e/wf05_db_migration.robot:439-447

Section 13 checks plan diff with rc=0 and non-empty stdout, but does not inspect the diff content. This is a missed opportunity to verify the changeset contains migration-related changes before apply, providing an earlier feedback signal.

TEST-8 (Low): Missing Traceback/INTERNAL checks on custom resource error paths

File: robot/e2e/wf05_db_migration.robot:204-218

When resource add for the custom postgres-db type fails (rc != 0, line 204), the error path only WARNs. No Should Not Contain ... Traceback / Should Not Contain ... INTERNAL assertion runs on the failure output. This is inconsistent with the pattern used for all other commands in the test (e.g., lines 174-175, 185-186, 244-245).

Recommendation: Add Traceback/INTERNAL checks inside the ELSE branch at line 216-217.

3. Security

No security issues found within the scope of these changes. YAML files are created with hardcoded content in a temp directory. No user-supplied input flows into file content or command arguments.

4. Performance

No significant performance issues found. Timeout values (60s for git, 120s for CLI, 300s for execute, 30min test) are consistent with other E2E tests and appropriate for LLM-driven execution.

Positive Observations

Repository fix is well-commented: The explanation at lines 1398-1402 clearly documents why the flush is needed.
Defensive teardown: WF05 Test Teardown logs diagnostic context on failure, aiding CI debugging.
UUID suffix for isolation: Using uuid4().hex[:12] to prevent UNIQUE collisions across runs is good practice.
Baseline SHA approach: Capturing HEAD before apply and diffing against it (lines 451-491) is the most thorough pattern among all E2E tests.
Graceful degradation: The test handles custom resource type failures gracefully (WARN + continue), acknowledging that the full custom resource type flow may not be fully implemented.
Spec-aligned tool names: The three database tools (query_db, execute_migration, backfill_column) match the specification exactly.

# Code Review Report -- PR #816 (Issue #751) ## `test/e2e-wf05-db-migration`: WF05 Database Schema Migration E2E + Repository UNIQUE Fix **Scope**: `robot/e2e/wf05_db_migration.robot` (new, 512 lines), `src/cleveragents/infrastructure/database/repositories.py` (14-line fix), `CHANGELOG.md` (7 lines). **Review methodology**: 3 full global cycles across all categories (bugs, test flaws, coverage, security, performance) with targeted deep-dives into skill schema parsing, action YAML parsing, repository update flow, and specification alignment. Each cycle re-examined all categories until no new issues were found. **Overall assessment**: The repository UNIQUE constraint fix is correct and well-commented. The E2E test is functional and follows established patterns from other WF tests. However, several acceptance criteria are significantly weakened by WARN-only assertions, and the repository fix lacks a dedicated regression test. --- ## Findings Summary | Severity | Bug/Correctness | Test Flaws | Security | Performance | Total | |----------|----------------|------------|----------|-------------|-------| | **Medium** | 1 | 5 | 0 | 0 | **6** | | **Low** | 1 | 3 | 0 | 0 | **4** | | **Total** | **2** | **8** | **0** | **0** | **10** | --- ## 1. Bug Detection / Correctness ### BUG-1 (Medium): No dedicated regression test for the UNIQUE constraint fix **File**: `src/cleveragents/infrastructure/database/repositories.py:1398-1406` The fix correctly consolidates `clear()` calls and adds `session.flush()` before `append()` to prevent SQLAlchemy INSERT-before-DELETE ordering. However, **no existing test reproduces the exact failure scenario**: creating a plan with argument name `"x"`, then updating it with the same argument name `"x"`. Verified against all existing test suites: - `repositories_coverage_boost.feature` (line 72): updates a plan that starts with **zero** arguments -- never triggers UNIQUE collision. - `plan_persistence.feature`, `plan_lifecycle_coverage.feature`, `plan_repository.feature`: all create fresh arguments on update, never same-name replacement. **Risk**: If the `session.flush()` at line 1406 is accidentally removed during future refactoring, no fast unit/BDD test will catch the regression. Only the slow E2E test would surface it, and only if the LLM happens to trigger a plan update with duplicate argument names. **Recommendation**: Add a targeted BDD scenario that creates a plan with arguments `{"x": "v1"}`, then updates it with `{"x": "v2"}`, and asserts no `IntegrityError`. ### BUG-2 (Low): Skill tool metadata silently discarded **File**: `robot/e2e/wf05_db_migration.robot:221-250` The test defines tools with `writes: true/false` and `checkpointable: true/false`, but `SkillService._schema_to_skill_dict()` (`skill_service.py:345-346`) reduces each tool to just its `name` string. The metadata is validated by `SkillToolRefSchema` but never stored in the `Skill` domain model. The tool name assertions on lines 248-250 pass, but the test creates a false impression that capability metadata is being verified. **Impact**: Low -- the test correctly exercises the CLI registration flow. But the assertion `Output Should Contain ${r_skill} local/query_db` only proves the name was echoed, not that `writes: false` was recorded. --- ## 2. Test Flaws / Coverage Gaps ### TEST-1 (Medium): AC #4 weakened -- "5 phases" reduced to 2-decision minimum **File**: `robot/e2e/wf05_db_migration.robot:349-364` Issue #751 AC says _"Test exercises phased child plan execution (5 phases)"_. The spec (lines 38590-38594) describes a 5-phase plan: migration, rollback migration, backfill, code update, tests. The test asserts: ```robot Should Be True ${decision_count} >= 2 ``` This is a 60% reduction from the acceptance criterion. The WARNs at lines 356-361 log if below 3 or 5 but do not fail. Additionally, the `children` key check (line 363) only verifies the JSON _key_ `"children"` exists as a substring in stdout -- it does not verify the children array is non-empty or that actual child plans were created. **Recommendation**: Raise the hard minimum to `>= 3` (still accommodating LLM variability), and parse the JSON to verify at least one child node exists in the children array. ### TEST-2 (Medium): AC #5 conditionally tested -- rollback only if checkpoint exists **File**: `robot/e2e/wf05_db_migration.robot:400-437` Issue #751 AC says _"Test exercises checkpoint creation and rollback (`plan rollback`)"_. However, the test conditionally branches: - **IF** `checkpoint_id` is non-empty: performs actual rollback (hard assertions) -- AC fulfilled. - **ELSE** (line 428): only verifies that a fake checkpoint ID fails gracefully -- AC **not** fulfilled. If the system never creates checkpoints during execution, the test passes without ever exercising real rollback functionality. **Recommendation**: Add a hard assertion (or at least a prominent log) when the ELSE branch is taken, to make it visible in CI that AC #5 was not actually validated. Alternatively, consider asserting checkpoint existence separately. ### TEST-3 (Medium): No terminal state assertion after lifecycle-apply **File**: `robot/e2e/wf05_db_migration.robot:457-467` After `lifecycle-apply`, the test checks `rc == 0` and `Output Should Contain ${r_apply} ${plan_id}`, but never queries `plan status` to verify the plan reached a terminal state (`applied`, `complete`, `errored`, etc.). If apply succeeds with rc=0 but the plan state is left in an intermediate or incorrect state, this would not be caught. Other E2E tests (e.g., m6 acceptance) verify the phase field contains `"apply"` after the apply step. **Recommendation**: Add a `plan status` call after apply and assert the phase/processing_state indicates a terminal outcome. ### TEST-4 (Medium): AC #6 migration/backfill content checks are WARN-only **File**: `robot/e2e/wf05_db_migration.robot:493-509` Two content verification checks that map to AC #6 (_"Test verifies database migration and backfill operations"_) are logged as WARNs but do not fail the test: 1. **Migration content** (line 493): checks for `last_login`, `schema`, `migration`, `column`, `alter` in git diff. If none found, only WARNs. 2. **Backfill evidence** (line 505): checks for `backfill`, `batch`, `populate`, `last_login` in plan/execution output. If none found, only WARNs. If the LLM produces no migration-related changes whatsoever, the test still passes. **Recommendation**: If these are intentionally non-blocking (due to LLM non-determinism), document this explicitly in the test. Consider adding a combined gate: if _both_ migration content _and_ backfill evidence are absent, fail the test. ### TEST-5 (Medium): Automation profile verification silently skipped if field absent **File**: `robot/e2e/wf05_db_migration.robot:319-327` After `plan use`, the test extracts `automation_profile` from JSON output. If the field is absent or null (lines 320-321 handle Python `None` / string `"None"`), the test logs a WARN and skips the assertion entirely (line 326). The issue AC implicitly requires the `review` profile to be active. If the system accepts the `--automation-profile review` flag but doesn't persist it in the plan JSON output, this is never caught. **Recommendation**: Follow up with a `plan status --format json` call and extract `automation_profile` from there as a secondary verification. ### TEST-6 (Low): `decision_count` based on fragile raw string counting **File**: `robot/e2e/wf05_db_migration.robot:352` ```robot ${decision_count}= Evaluate $r_tree.stdout.count('"decision_id"') ``` This counts substring `"decision_id"` occurrences in raw JSON. If the JSON format changes (e.g., nested error messages contain "decision_id", or pretty-printing changes quoting), the count could be wrong. Similarly, `"children"` on line 363 is a raw substring check. **Recommendation**: Use `Safe Parse Json Field` or full JSON parsing to count decision nodes in the tree structure. ### TEST-7 (Low): No content assertion on plan diff output **File**: `robot/e2e/wf05_db_migration.robot:439-447` Section 13 checks `plan diff` with `rc=0` and non-empty stdout, but does not inspect the diff content. This is a missed opportunity to verify the changeset contains migration-related changes _before_ apply, providing an earlier feedback signal. ### TEST-8 (Low): Missing Traceback/INTERNAL checks on custom resource error paths **File**: `robot/e2e/wf05_db_migration.robot:204-218` When `resource add` for the custom postgres-db type fails (rc != 0, line 204), the error path only WARNs. No `Should Not Contain ... Traceback` / `Should Not Contain ... INTERNAL` assertion runs on the failure output. This is inconsistent with the pattern used for all other commands in the test (e.g., lines 174-175, 185-186, 244-245). **Recommendation**: Add Traceback/INTERNAL checks inside the ELSE branch at line 216-217. --- ## 3. Security No security issues found within the scope of these changes. YAML files are created with hardcoded content in a temp directory. No user-supplied input flows into file content or command arguments. --- ## 4. Performance No significant performance issues found. Timeout values (60s for git, 120s for CLI, 300s for execute, 30min test) are consistent with other E2E tests and appropriate for LLM-driven execution. --- ## Positive Observations 1. **Repository fix is well-commented**: The explanation at lines 1398-1402 clearly documents _why_ the flush is needed. 2. **Defensive teardown**: `WF05 Test Teardown` logs diagnostic context on failure, aiding CI debugging. 3. **UUID suffix for isolation**: Using `uuid4().hex[:12]` to prevent UNIQUE collisions across runs is good practice. 4. **Baseline SHA approach**: Capturing HEAD before apply and diffing against it (lines 451-491) is the most thorough pattern among all E2E tests. 5. **Graceful degradation**: The test handles custom resource type failures gracefully (WARN + continue), acknowledging that the full custom resource type flow may not be fully implemented. 6. **Spec-aligned tool names**: The three database tools (`query_db`, `execute_migration`, `backfill_column`) match the specification exactly.

CoreRasurae approved these changes 2026-03-24 12:07:08 +00:00

Dismissed

CoreRasurae approved these changes 2026-03-24 12:07:52 +00:00

Dismissed

CoreRasurae left a comment

Approved, provided that the medium level findings in #816 (comment) are addressed.

Approved, provided that the medium level findings in https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/816#issuecomment-71482 are addressed.

hurui200320 referenced this pull request

2026-03-25 05:37:47 +00:00

test(e2e): workflow example 5 — database schema migration with safety nets (review profile) #751

hurui200320 force-pushed test/e2e-wf05-db-migration from e382538edd to 2e9081d471

2026-03-25 11:53:08 +00:00

Compare

hurui200320 dismissed CoreRasurae's review 2026-03-25 11:53:08 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hurui200320 commented

2026-03-25 11:54:45 +00:00

Review Fix Round — Addressing Medium Findings from @CoreRasurae

All 6 medium-severity findings from the code review have been addressed. Branch rebased onto latest master (83c22b83).

Fixes Applied

BUG-1 (Medium): No regression test for UNIQUE constraint fix

Added targeted BDD scenario Updating a plan argument with the same name does not hit UNIQUE constraints in features/repositories_coverage_boost.feature
Creates plan with argument x=v1, persists, updates to x=v2 via LifecyclePlanRepository.update(), asserts no IntegrityError and verifies new value persisted
Three new step definitions in features/steps/repositories_coverage_boost_steps.py: step_set_initial_plan_argument, step_update_plan_argument_same_name, step_verify_plan_argument_value

TEST-1 (Medium): AC #4 — fragile string counting for decision nodes

Replaced raw $r_tree.stdout.count('"decision_id"') with proper JSON parsing
Plan tree output is now parsed via json.loads() with log-line prefix stripping
Decision counting uses recursive tree walker that traverses children arrays
Added structural verification: children_key_count >= 1 (hard) and child_link_count (WARN if < 1)
Hard minimum remains >= 2 (framework minimum: strategy_choice + implementation_choice) per ticket AC "output validation is flexible"

TEST-2 (Medium): AC #5 — rollback conditionally tested

Added explicit WARN when ELSE branch is taken: "AC #5 visibility: no checkpoint_id present, so real rollback path was not executed in this run"
Fake checkpoint test (no-such-checkpoint) moved outside IF/ELSE — now always runs unconditionally
Added Should Not Contain Traceback and Should Not Contain INTERNAL checks on fake rollback output

TEST-3 (Medium): No terminal state assertion after lifecycle-apply

Added plan status call after lifecycle-apply with --format json
Extracts phase and processing_state from JSON output
Hard assertion: plan must be in terminal state (applied, constrained, errored, cancelled, complete) OR show apply-phase progress
WARN if phase doesn't contain "apply" or state is non-terminal

TEST-4 (Medium): AC #6 — migration/backfill WARN-only

Added combined gate: has_ac6_evidence = has_migration_content or has_backfill_evidence
When both are absent, explicit WARN: "AC #6 visibility: neither migration nor backfill evidence was observed in this run (output is flexible)"
Also added plan diff content-signal check before apply (section 13) as early feedback
WARN-only behavior is intentional per ticket AC "Output validation is flexible"

TEST-5 (Medium): Automation profile silently skipped

Added fallback: when plan use output omits automation_profile, now calls plan status --format json for secondary verification
Extracts automation_profile from status output with same None/empty guards
Hard assertion Should Be Equal As Strings ${resolved_profile} review now always executes (never skipped)

Bonus: TEST-8 (Low) also addressed

Added Traceback/INTERNAL checks inside both resource add and project link-resource ELSE branches
Includes NoSuchOption guard to avoid false positives from unimplemented CLI flags

Quality Gates (all passing)

Gate	Result
`nox -e lint`	✅ All checks passed
`nox -e typecheck`	✅ 0 errors
`nox -e unit_tests`	✅ 471 features, 12,422 scenarios, 0 failures
`nox -e integration_tests`	✅ 1,727 tests, 0 failures
`nox -e e2e_tests`	✅ 42 tests, 42 passed (including WF05)
`nox -e coverage_report`	✅ 98% (≥ 97% threshold)

Low-severity findings (not addressed in this round)

Per Luis's approval condition ("provided that the medium level findings are addressed"), the 4 low-severity findings (BUG-2, TEST-6, TEST-7, TEST-8 partial) are acknowledged but not required for merge. TEST-8 was addressed as a bonus since it was straightforward.

## Review Fix Round — Addressing Medium Findings from @CoreRasurae All 6 medium-severity findings from the [code review](https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/816#issuecomment-71482) have been addressed. Branch rebased onto latest `master` (`83c22b83`). ### Fixes Applied **BUG-1 (Medium): No regression test for UNIQUE constraint fix** - Added targeted BDD scenario `Updating a plan argument with the same name does not hit UNIQUE constraints` in `features/repositories_coverage_boost.feature` - Creates plan with argument `x=v1`, persists, updates to `x=v2` via `LifecyclePlanRepository.update()`, asserts no `IntegrityError` and verifies new value persisted - Three new step definitions in `features/steps/repositories_coverage_boost_steps.py`: `step_set_initial_plan_argument`, `step_update_plan_argument_same_name`, `step_verify_plan_argument_value` **TEST-1 (Medium): AC #4 — fragile string counting for decision nodes** - Replaced raw `$r_tree.stdout.count('"decision_id"')` with proper JSON parsing - Plan tree output is now parsed via `json.loads()` with log-line prefix stripping - Decision counting uses recursive tree walker that traverses `children` arrays - Added structural verification: `children_key_count >= 1` (hard) and `child_link_count` (WARN if < 1) - Hard minimum remains `>= 2` (framework minimum: strategy_choice + implementation_choice) per ticket AC "output validation is flexible" **TEST-2 (Medium): AC #5 — rollback conditionally tested** - Added explicit WARN when ELSE branch is taken: `"AC #5 visibility: no checkpoint_id present, so real rollback path was not executed in this run"` - Fake checkpoint test (`no-such-checkpoint`) moved outside IF/ELSE — now always runs unconditionally - Added `Should Not Contain Traceback` and `Should Not Contain INTERNAL` checks on fake rollback output **TEST-3 (Medium): No terminal state assertion after lifecycle-apply** - Added `plan status` call after `lifecycle-apply` with `--format json` - Extracts `phase` and `processing_state` from JSON output - Hard assertion: plan must be in terminal state (`applied`, `constrained`, `errored`, `cancelled`, `complete`) OR show apply-phase progress - WARN if phase doesn't contain "apply" or state is non-terminal **TEST-4 (Medium): AC #6 — migration/backfill WARN-only** - Added combined gate: `has_ac6_evidence = has_migration_content or has_backfill_evidence` - When *both* are absent, explicit WARN: `"AC #6 visibility: neither migration nor backfill evidence was observed in this run (output is flexible)"` - Also added `plan diff` content-signal check before apply (section 13) as early feedback - WARN-only behavior is intentional per ticket AC "Output validation is flexible" **TEST-5 (Medium): Automation profile silently skipped** - Added fallback: when `plan use` output omits `automation_profile`, now calls `plan status --format json` for secondary verification - Extracts `automation_profile` from status output with same None/empty guards - Hard assertion `Should Be Equal As Strings ${resolved_profile} review` now always executes (never skipped) **Bonus: TEST-8 (Low) also addressed** - Added Traceback/INTERNAL checks inside both `resource add` and `project link-resource` ELSE branches - Includes `NoSuchOption` guard to avoid false positives from unimplemented CLI flags ### Quality Gates (all passing) | Gate | Result | |------|--------| | `nox -e lint` | ✅ All checks passed | | `nox -e typecheck` | ✅ 0 errors | | `nox -e unit_tests` | ✅ 471 features, 12,422 scenarios, 0 failures | | `nox -e integration_tests` | ✅ 1,727 tests, 0 failures | | `nox -e e2e_tests` | ✅ 42 tests, 42 passed (including WF05) | | `nox -e coverage_report` | ✅ 98% (≥ 97% threshold) | ### Low-severity findings (not addressed in this round) Per Luis's approval condition ("provided that the medium level findings are addressed"), the 4 low-severity findings (BUG-2, TEST-6, TEST-7, TEST-8 partial) are acknowledged but not required for merge. TEST-8 was addressed as a bonus since it was straightforward.

hurui200320 referenced this pull request

2026-03-25 11:55:05 +00:00

test(e2e): workflow example 5 — database schema migration with safety nets (review profile) #751

hurui200320 force-pushed test/e2e-wf05-db-migration from 2e9081d471 to dc1d4c44cd

2026-03-25 12:12:00 +00:00

Compare

hurui200320 force-pushed test/e2e-wf05-db-migration from dc1d4c44cd to 5759ac5974

2026-03-25 12:43:16 +00:00

Compare

hurui200320 removed a dependency 2026-03-25 12:54:14 +00:00

#965 refactor(testing): rename tdd_bug/tdd_bug_N tags to tdd_issue/tdd_issue_N across Behave and Robot Framework

hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-25 12:54:28 +00:00

hurui200320 merged commit 06130212ed into master

2026-03-25 13:05:05 +00:00

hurui200320 deleted branch test/e2e-wf05-db-migration

2026-03-25 13:05:06 +00:00

hurui200320 referenced this issue from a commit

2026-03-25 13:05:06 +00:00

test(e2e): workflow example 5 — database schema migration with safety nets (review profile) (#816)

Sign in to join this conversation.

No Reviewers

hamza.khyari

brent.edwards

CoreRasurae

3 Participants

Notifications

Due Date

No due date set.

Depends on

#627 Implement @tdd_expected_fail tag handling in Behave environment

cleveragents/cleveragents-core

#628 Implement @tdd_expected_fail tag handling in Robot Framework

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#816

test(e2e): workflow example 5 — database schema migration with safety nets (review profile) #816

Summary

Approach

Bug Fix: LifecyclePlanRepository.update() UNIQUE Constraint Violation

Review Fixes (addressing medium findings from @CoreRasurae review)

Quality Gates

Manual Verification

Prerequisites

Commands

PM Review — Day 34

Action Items

PM Status — Day 36 (2026-03-16)

PM Status — Day 37

Code Review — PR #816

Code Review Report -- PR #816 (Issue #751)

test/e2e-wf05-db-migration: WF05 Database Schema Migration E2E + Repository UNIQUE Fix

Findings Summary

1. Bug Detection / Correctness

BUG-1 (Medium): No dedicated regression test for the UNIQUE constraint fix

BUG-2 (Low): Skill tool metadata silently discarded

2. Test Flaws / Coverage Gaps

TEST-1 (Medium): AC #4 weakened -- "5 phases" reduced to 2-decision minimum

TEST-2 (Medium): AC #5 conditionally tested -- rollback only if checkpoint exists

TEST-3 (Medium): No terminal state assertion after lifecycle-apply

TEST-4 (Medium): AC #6 migration/backfill content checks are WARN-only

TEST-5 (Medium): Automation profile verification silently skipped if field absent

TEST-6 (Low): decision_count based on fragile raw string counting

TEST-7 (Low): No content assertion on plan diff output

TEST-8 (Low): Missing Traceback/INTERNAL checks on custom resource error paths

3. Security

4. Performance

Positive Observations

Review Fix Round — Addressing Medium Findings from @CoreRasurae

Fixes Applied

Quality Gates (all passing)

Low-severity findings (not addressed in this round)

`test/e2e-wf05-db-migration`: WF05 Database Schema Migration E2E + Repository UNIQUE Fix

TEST-6 (Low): `decision_count` based on fragile raw string counting