test(e2e): workflow example 5 — database schema migration with safety nets (review profile) #816

Merged
hurui200320 merged 1 commit from test/e2e-wf05-db-migration into master 2026-03-25 13:05:05 +00:00
Owner

Summary

E2E test for Workflow Example 5 — database schema migration with safety nets using the review automation profile. Exercises the full spec-aligned workflow:

  • Custom resource type registration via resource type add --config (postgres-db type with transaction_rollback sandbox strategy, --host, --port, --database, --schema CLI args with flat type/default fields per ResourceTypeArgument schema)
  • Custom resource instantiation — attempts resource add with the custom type to exercise mixed resource types, followed by project link-resource to link DB resource to the project
  • Custom skill creation with spec-aligned database tools: local/query_db (read-only), local/execute_migration (writes, checkpointable), local/backfill_column (writes, checkpointable) — registered via skill add --config, with namespaced tool reference names per SkillToolRefSchema validation
  • Action creation with automation_profile: review, reusable: true, state: available, spec invariants, and typed arguments section (table_name, column_name, column_type, backfill_source — all required per spec, using arguments field per ActionConfigSchema)
  • Plan use with --arg flags exercising parameterized action invocation including backfill_source=audit_log, plus explicit --automation-profile review flag (action-to-plan profile propagation is not yet wired in PlanLifecycleService.use_action)
  • Phased child plan verification via plan tree --format json with decision_count >= 2 hard assertion on framework decisions plus WARN tiers for LLM decomposition quality (< 3, < 5)
  • Plan phase assertion — hard assertion that phase is populated after execute
  • Checkpoint-based rollback with hard assertions: rc=0 on rollback success, rc!=0 on fake checkpoint, None guard for JSON null checkpoint IDs, re-execute with Traceback/INTERNAL checks on success and explanatory comment on failure path
  • Plan diff with hard rc=0 assertion and content-signal verification
  • Migration content verification — baseline SHA saved before apply, diff against baseline (not HEAD~1), WARN-level check on migration keywords (last_login, schema, migration, column, alter) — flexible per LLM non-determinism
  • Commit count assertion >= 2 (fixture baseline: Create Temp Git Repo + DB fixture commit), WARN if no additional commits from lifecycle-apply
  • Backfill evidence WARN-level check in plan tree/execution output (backfill, batch, populate, last_login) with explanatory comment noting tree covers decomposition plan
  • Combined AC #6 gate — if both migration content and backfill evidence are absent, explicit WARN visibility for CI debugging
  • Terminal state assertion after lifecycle-applyplan status call verifies phase/processing_state reflects terminal or apply-progress outcome
  • Automation profile fallback verification — if plan use output omits automation_profile, falls back to plan status for secondary verification (hard assertion always runs)
  • Traceback and INTERNAL checks on all CLI commands (resource add, project create, resource type add, skill add, action create, plan use, strategize, execute, plan tree, plan status, plan diff, plan rollback, re-execute after rollback, lifecycle-apply) including custom resource error paths
  • Dynamic actor selection — detects available API keys (Anthropic/OpenAI) at suite setup
  • Skip If No LLM Keys guard for graceful CI degradation
  • Test-level teardown with diagnostic logging for both plan status and plan tree on failure
  • 30-minute timeout covering worst-case rollback+re-execute path
  • Force Tags for consistency with m6_acceptance.robot
  • Timeout parameters (timeout=60s on_timeout=kill) on all local Run Process git commands
  • Sequential section numbering (1 through 15) for readability

Closes #751

ISSUES CLOSED: #751

Approach

Follows the patterns established by m6_acceptance.robot and m2_acceptance.robot:

  • WF05 Suite Setup initialises the workspace, generates a unique run suffix, and detects available LLM API keys
  • Safe Parse Json Field from common_e2e.resource for JSON field extraction with None guards for JSON null values
  • All CLI commands use --format json for predictable, parseable output
  • expected_rc=None with explicit Should Be Equal As Integers for detailed failure messages
  • Hard assertions on infrastructure/framework behavior (CLI commands, phase transitions, tool registration)
  • WARN-level assertions on LLM-dependent output (decision decomposition, migration content, backfill evidence, commit count) — per ticket requirement "output validation is flexible"
  • Traceback and INTERNAL checks on all CLI commands following m2_acceptance.robot pattern
  • Baseline SHA approach for post-apply diff verification eliminates false positives from fixture commits

Bug Fix: LifecyclePlanRepository.update() UNIQUE Constraint Violation

Root cause: LifecyclePlanRepository.update() called clear() on child relationship collections (project_links, arguments, invariants) followed by append() with new items, but only flushed at the end. SQLAlchemy's default operation ordering can emit INSERTs before DELETEs within the same flush, causing UNIQUE constraint failed: plan_arguments.plan_id, plan_arguments.name when plans have arguments.

Fix: Group all three clear() calls together and flush them before appending new rows. This ensures the DELETEs are committed before any INSERTs, preventing the UNIQUE constraint violation.

Impact: This was a latent bug affecting ALL plans with arguments when update() is called. Previously undetected because existing E2E tests (M1, M2, M5, M6) create plans without --arg flags.

Review Fixes (addressing medium findings from @CoreRasurae review)

# Finding Fix
BUG-1 No regression test for UNIQUE constraint fix Added targeted BDD scenario in repositories_coverage_boost.feature — creates plan with argument x=v1, updates to x=v2, asserts no IntegrityError
TEST-1 AC #4 weakened — fragile string counting Replaced raw count('"decision_id"') with proper JSON parsing via json.loads(), recursive tree walking for decision counting, structural children_key_count and child_link_count verification
TEST-2 AC #5 conditionally tested Added explicit WARN log when no checkpoint_id is present ("AC #5 visibility"); fake checkpoint test now runs unconditionally (moved outside IF/ELSE) with Traceback/INTERNAL checks
TEST-3 No terminal state assertion after lifecycle-apply Added plan status call after apply with phase/processing_state extraction; hard assertion on terminal state or apply-phase progress
TEST-4 AC #6 migration/backfill WARN-only Added combined gate (has_ac6_evidence): if both migration and backfill evidence are absent, explicit WARN for CI visibility. WARN-only is intentional per ticket AC "output validation is flexible"
TEST-5 Automation profile silently skipped Added fallback to plan status --format json when plan use output omits automation_profile; hard assertion (Should Be Equal As Strings review) now always executes
TEST-8 Missing Traceback/INTERNAL on custom resource error paths Added Traceback/INTERNAL checks inside both resource add and project link-resource ELSE branches with NoSuchOption guard

Quality Gates

  • nox -e lint
  • nox -e typecheck (0 errors)
  • nox -e unit_tests (471 features, 12,422 scenarios, 0 failures)
  • nox -e integration_tests (1,727 tests, 0 failures)
  • nox -e e2e_tests (42 tests, 42 passed, 0 failed)
  • nox -e coverage_report (98%, meets threshold)

Manual Verification

Prerequisites

  • OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable set

Commands

nox -e e2e_tests
# Or run just this suite:
python -m robot --outputdir build/reports/robot --include E2E robot/e2e/wf05_db_migration.robot
## Summary E2E test for Workflow Example 5 — database schema migration with safety nets using the **review** automation profile. Exercises the full spec-aligned workflow: - **Custom resource type registration** via `resource type add --config` (postgres-db type with `transaction_rollback` sandbox strategy, `--host`, `--port`, `--database`, `--schema` CLI args with flat `type`/`default` fields per `ResourceTypeArgument` schema) - **Custom resource instantiation** — attempts `resource add` with the custom type to exercise mixed resource types, followed by `project link-resource` to link DB resource to the project - **Custom skill creation** with spec-aligned database tools: `local/query_db` (read-only), `local/execute_migration` (writes, checkpointable), `local/backfill_column` (writes, checkpointable) — registered via `skill add --config`, with namespaced tool reference names per `SkillToolRefSchema` validation - **Action creation** with `automation_profile: review`, `reusable: true`, `state: available`, spec invariants, and typed `arguments` section (`table_name`, `column_name`, `column_type`, `backfill_source` — all required per spec, using `arguments` field per `ActionConfigSchema`) - **Plan use** with `--arg` flags exercising parameterized action invocation including `backfill_source=audit_log`, plus **explicit `--automation-profile review`** flag (action-to-plan profile propagation is not yet wired in `PlanLifecycleService.use_action`) - **Phased child plan verification** via `plan tree --format json` with `decision_count >= 2` hard assertion on framework decisions plus WARN tiers for LLM decomposition quality (`< 3`, `< 5`) - **Plan phase assertion** — hard assertion that phase is populated after execute - **Checkpoint-based rollback** with hard assertions: `rc=0` on rollback success, `rc!=0` on fake checkpoint, None guard for JSON null checkpoint IDs, re-execute with Traceback/INTERNAL checks on success and explanatory comment on failure path - **Plan diff** with hard `rc=0` assertion and content-signal verification - **Migration content verification** — baseline SHA saved before apply, diff against baseline (not `HEAD~1`), WARN-level check on migration keywords (`last_login`, `schema`, `migration`, `column`, `alter`) — flexible per LLM non-determinism - **Commit count** assertion `>= 2` (fixture baseline: Create Temp Git Repo + DB fixture commit), WARN if no additional commits from lifecycle-apply - **Backfill evidence** WARN-level check in plan tree/execution output (`backfill`, `batch`, `populate`, `last_login`) with explanatory comment noting tree covers decomposition plan - **Combined AC #6 gate** — if *both* migration content *and* backfill evidence are absent, explicit WARN visibility for CI debugging - **Terminal state assertion** after `lifecycle-apply` — `plan status` call verifies phase/processing_state reflects terminal or apply-progress outcome - **Automation profile fallback verification** — if `plan use` output omits `automation_profile`, falls back to `plan status` for secondary verification (hard assertion always runs) - **Traceback and INTERNAL checks** on all CLI commands (resource add, project create, resource type add, skill add, action create, plan use, strategize, execute, plan tree, plan status, plan diff, plan rollback, re-execute after rollback, lifecycle-apply) including custom resource error paths - **Dynamic actor selection** — detects available API keys (Anthropic/OpenAI) at suite setup - **Skip If No LLM Keys** guard for graceful CI degradation - **Test-level teardown** with diagnostic logging for both plan status and plan tree on failure - **30-minute timeout** covering worst-case rollback+re-execute path - **Force Tags** for consistency with `m6_acceptance.robot` - **Timeout parameters** (`timeout=60s on_timeout=kill`) on all local `Run Process` git commands - **Sequential section numbering** (1 through 15) for readability Closes #751 ISSUES CLOSED: #751 ## Approach Follows the patterns established by `m6_acceptance.robot` and `m2_acceptance.robot`: - `WF05 Suite Setup` initialises the workspace, generates a unique run suffix, and detects available LLM API keys - `Safe Parse Json Field` from `common_e2e.resource` for JSON field extraction with None guards for JSON null values - All CLI commands use `--format json` for predictable, parseable output - `expected_rc=None` with explicit `Should Be Equal As Integers` for detailed failure messages - Hard assertions on infrastructure/framework behavior (CLI commands, phase transitions, tool registration) - WARN-level assertions on LLM-dependent output (decision decomposition, migration content, backfill evidence, commit count) — per ticket requirement "output validation is flexible" - Traceback and INTERNAL checks on all CLI commands following `m2_acceptance.robot` pattern - Baseline SHA approach for post-apply diff verification eliminates false positives from fixture commits ## Bug Fix: LifecyclePlanRepository.update() UNIQUE Constraint Violation **Root cause**: `LifecyclePlanRepository.update()` called `clear()` on child relationship collections (project_links, arguments, invariants) followed by `append()` with new items, but only flushed at the end. SQLAlchemy's default operation ordering can emit INSERTs before DELETEs within the same flush, causing `UNIQUE constraint failed: plan_arguments.plan_id, plan_arguments.name` when plans have arguments. **Fix**: Group all three `clear()` calls together and flush them before appending new rows. This ensures the DELETEs are committed before any INSERTs, preventing the UNIQUE constraint violation. **Impact**: This was a latent bug affecting ALL plans with arguments when `update()` is called. Previously undetected because existing E2E tests (M1, M2, M5, M6) create plans without `--arg` flags. ## Review Fixes (addressing medium findings from @CoreRasurae review) | # | Finding | Fix | |---|---------|-----| | **BUG-1** | No regression test for UNIQUE constraint fix | Added targeted BDD scenario in `repositories_coverage_boost.feature` — creates plan with argument `x=v1`, updates to `x=v2`, asserts no `IntegrityError` | | **TEST-1** | AC #4 weakened — fragile string counting | Replaced raw `count('"decision_id"')` with proper JSON parsing via `json.loads()`, recursive tree walking for decision counting, structural `children_key_count` and `child_link_count` verification | | **TEST-2** | AC #5 conditionally tested | Added explicit WARN log when no checkpoint_id is present ("AC #5 visibility"); fake checkpoint test now runs unconditionally (moved outside IF/ELSE) with Traceback/INTERNAL checks | | **TEST-3** | No terminal state assertion after lifecycle-apply | Added `plan status` call after apply with phase/processing_state extraction; hard assertion on terminal state or apply-phase progress | | **TEST-4** | AC #6 migration/backfill WARN-only | Added combined gate (`has_ac6_evidence`): if *both* migration and backfill evidence are absent, explicit WARN for CI visibility. WARN-only is intentional per ticket AC "output validation is flexible" | | **TEST-5** | Automation profile silently skipped | Added fallback to `plan status --format json` when `plan use` output omits `automation_profile`; hard assertion (`Should Be Equal As Strings review`) now always executes | | **TEST-8** | Missing Traceback/INTERNAL on custom resource error paths | Added Traceback/INTERNAL checks inside both `resource add` and `project link-resource` ELSE branches with `NoSuchOption` guard | ## Quality Gates - `nox -e lint` ✅ - `nox -e typecheck` ✅ (0 errors) - `nox -e unit_tests` ✅ (471 features, 12,422 scenarios, 0 failures) - `nox -e integration_tests` ✅ (1,727 tests, 0 failures) - `nox -e e2e_tests` ✅ (42 tests, 42 passed, 0 failed) - `nox -e coverage_report` ✅ (98%, meets threshold) ## Manual Verification ### Prerequisites - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variable set ### Commands ```bash nox -e e2e_tests # Or run just this suite: python -m robot --outputdir build/reports/robot --include E2E robot/e2e/wf05_db_migration.robot ```
freemo added this to the v3.3.0 milestone 2026-03-13 16:52:14 +00:00
freemo force-pushed test/e2e-wf05-db-migration from d7f533337a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Failing after 26s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 37s
CI / unit_tests (pull_request) Successful in 1m58s
CI / integration_tests (pull_request) Successful in 2m37s
CI / docker (pull_request) Successful in 47s
CI / coverage (pull_request) Successful in 4m44s
CI / benchmark-regression (pull_request) Successful in 34m2s
to f25797e44b
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 24s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 37s
CI / e2e_tests (pull_request) Failing after 37s
CI / unit_tests (pull_request) Successful in 2m34s
CI / docker (pull_request) Successful in 48s
CI / integration_tests (pull_request) Successful in 3m55s
CI / coverage (pull_request) Successful in 10m35s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 17:28:44 +00:00
Compare
freemo force-pushed test/e2e-wf05-db-migration from f25797e44b
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 24s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 37s
CI / e2e_tests (pull_request) Failing after 37s
CI / unit_tests (pull_request) Successful in 2m34s
CI / docker (pull_request) Successful in 48s
CI / integration_tests (pull_request) Successful in 3m55s
CI / coverage (pull_request) Successful in 10m35s
CI / benchmark-regression (pull_request) Has been cancelled
to ed2a254570
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 13s
CI / quality (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Failing after 36s
CI / integration_tests (pull_request) Successful in 3m9s
CI / unit_tests (pull_request) Successful in 3m23s
CI / docker (pull_request) Successful in 37s
CI / coverage (pull_request) Successful in 5m37s
CI / benchmark-regression (pull_request) Failing after 18m49s
2026-03-13 17:46:55 +00:00
Compare
freemo force-pushed test/e2e-wf05-db-migration from ed2a254570
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 13s
CI / quality (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 34s
CI / e2e_tests (pull_request) Failing after 36s
CI / integration_tests (pull_request) Successful in 3m9s
CI / unit_tests (pull_request) Successful in 3m23s
CI / docker (pull_request) Successful in 37s
CI / coverage (pull_request) Successful in 5m37s
CI / benchmark-regression (pull_request) Failing after 18m49s
to 4d0c5f3308
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / build (pull_request) Successful in 23s
CI / e2e_tests (pull_request) Failing after 31s
CI / typecheck (pull_request) Successful in 35s
CI / security (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m20s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 3m10s
CI / coverage (pull_request) Successful in 5m56s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 18:13:08 +00:00
Compare
freemo force-pushed test/e2e-wf05-db-migration from 4d0c5f3308
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / build (pull_request) Successful in 23s
CI / e2e_tests (pull_request) Failing after 31s
CI / typecheck (pull_request) Successful in 35s
CI / security (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m20s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 3m10s
CI / coverage (pull_request) Successful in 5m56s
CI / benchmark-regression (pull_request) Has been cancelled
to 33858063b4
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 14s
CI / security (pull_request) Successful in 28s
CI / build (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 1m2s
CI / e2e_tests (pull_request) Successful in 1m2s
CI / unit_tests (pull_request) Successful in 2m7s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 3m2s
CI / coverage (pull_request) Successful in 5m19s
CI / benchmark-regression (pull_request) Successful in 34m42s
2026-03-13 18:26:26 +00:00
Compare
freemo force-pushed test/e2e-wf05-db-migration from 33858063b4
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 14s
CI / security (pull_request) Successful in 28s
CI / build (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 1m2s
CI / e2e_tests (pull_request) Successful in 1m2s
CI / unit_tests (pull_request) Successful in 2m7s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 3m2s
CI / coverage (pull_request) Successful in 5m19s
CI / benchmark-regression (pull_request) Successful in 34m42s
to 8eae9d1e0b
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 14s
CI / security (pull_request) Successful in 29s
CI / e2e_tests (pull_request) Failing after 30s
CI / typecheck (pull_request) Successful in 56s
CI / unit_tests (pull_request) Successful in 3m1s
CI / integration_tests (pull_request) Successful in 4m55s
CI / docker (pull_request) Successful in 13s
CI / coverage (pull_request) Successful in 4m55s
CI / benchmark-regression (pull_request) Failing after 38m20s
2026-03-13 23:19:35 +00:00
Compare
Author
Owner

PM Review — Day 34

Status: Mergeable, 0 reviews, M4 (v3.3.0)
Closes: #751 | Author: @freemo

E2E test for WF05 (database schema migration with safety nets, review profile). Tests skill creation, plan lifecycle, verifies git commits post-apply.

[MINOR] Committer name OpenCode Agent differs from siblings using OpenCode — normalize for consistency.
[MINOR] action.yaml / skill.yaml content not shown in PR body — reviewer can't fully reproduce manual verification.

Action Items

Who Action Deadline
@brent.edwards Peer review Day 36
## PM Review — Day 34 **Status**: Mergeable, 0 reviews, M4 (v3.3.0) **Closes**: #751 | **Author**: @freemo E2E test for WF05 (database schema migration with safety nets, review profile). Tests skill creation, plan lifecycle, verifies git commits post-apply. **[MINOR]** Committer name `OpenCode Agent` differs from siblings using `OpenCode` — normalize for consistency. **[MINOR]** `action.yaml` / `skill.yaml` content not shown in PR body — reviewer can't fully reproduce manual verification. ### Action Items | Who | Action | Deadline | |-----|--------|----------| | @brent.edwards | **Peer review** | Day 36 |
Author
Owner

PM Status — Day 36 (2026-03-16)

Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days.

Priority note: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order.

Assigned reviewer: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.

## PM Status — Day 36 (2026-03-16) Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days. **Priority note**: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order. **Assigned reviewer**: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.
Author
Owner

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.

@hurui200320 I am going to have you take over this PR, it is mostly completed but is waiting on https://git.cleverthis.com/cleveragents/cleveragents-core/issues/628 and https://git.cleverthis.com/cleveragents/cleveragents-core/issues/966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.
Author
Owner

PM Status — Day 37

Ownership transferred to @hurui200320. Blocked on #628 and #966. PR is M4 (v3.3.0).

Author: Please rebase onto latest master by Day 39 EOD (2026-03-19) and confirm blocker status. Check for merge conflicts proactively.


PM status — Day 37

## PM Status — Day 37 Ownership transferred to @hurui200320. Blocked on #628 and #966. PR is M4 (v3.3.0). **Author**: Please rebase onto latest `master` by **Day 39 EOD (2026-03-19)** and confirm blocker status. Check for merge conflicts proactively. --- *PM status — Day 37*
hurui200320 force-pushed test/e2e-wf05-db-migration from 8eae9d1e0b
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 14s
CI / security (pull_request) Successful in 29s
CI / e2e_tests (pull_request) Failing after 30s
CI / typecheck (pull_request) Successful in 56s
CI / unit_tests (pull_request) Successful in 3m1s
CI / integration_tests (pull_request) Successful in 4m55s
CI / docker (pull_request) Successful in 13s
CI / coverage (pull_request) Successful in 4m55s
CI / benchmark-regression (pull_request) Failing after 38m20s
to 73cddb0275
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 49s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / integration_tests (pull_request) Successful in 3m40s
CI / e2e_tests (pull_request) Failing after 4m1s
CI / unit_tests (pull_request) Successful in 6m17s
CI / docker (pull_request) Successful in 57s
CI / coverage (pull_request) Successful in 9m2s
CI / benchmark-regression (pull_request) Successful in 38m51s
2026-03-18 08:35:54 +00:00
Compare
Author
Owner

Code Review — PR #816

(Cannot submit formal approval — self-authored PR.)

E2E test for WF05. Well-structured with proper labels, milestone, and issue linkage. No issues found.

## Code Review — PR #816 *(Cannot submit formal approval — self-authored PR.)* E2E test for WF05. Well-structured with proper labels, milestone, and issue linkage. No issues found.
hurui200320 force-pushed test/e2e-wf05-db-migration from 73cddb0275
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 38s
CI / security (pull_request) Successful in 49s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / integration_tests (pull_request) Successful in 3m40s
CI / e2e_tests (pull_request) Failing after 4m1s
CI / unit_tests (pull_request) Successful in 6m17s
CI / docker (pull_request) Successful in 57s
CI / coverage (pull_request) Successful in 9m2s
CI / benchmark-regression (pull_request) Successful in 38m51s
to cc8f0d0bd1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 35s
CI / security (pull_request) Successful in 53s
CI / typecheck (pull_request) Successful in 55s
CI / unit_tests (pull_request) Successful in 3m1s
CI / docker (pull_request) Successful in 15s
CI / integration_tests (pull_request) Successful in 3m45s
CI / e2e_tests (pull_request) Failing after 4m5s
CI / coverage (pull_request) Successful in 7m5s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-19 09:22:46 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from cc8f0d0bd1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 35s
CI / security (pull_request) Successful in 53s
CI / typecheck (pull_request) Successful in 55s
CI / unit_tests (pull_request) Successful in 3m1s
CI / docker (pull_request) Successful in 15s
CI / integration_tests (pull_request) Successful in 3m45s
CI / e2e_tests (pull_request) Failing after 4m5s
CI / coverage (pull_request) Successful in 7m5s
CI / benchmark-regression (pull_request) Has been cancelled
to 57dc8db0f8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 48s
CI / unit_tests (pull_request) Successful in 3m12s
CI / integration_tests (pull_request) Successful in 3m41s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Failing after 5m33s
CI / coverage (pull_request) Successful in 7m12s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-19 09:48:59 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from 57dc8db0f8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 48s
CI / unit_tests (pull_request) Successful in 3m12s
CI / integration_tests (pull_request) Successful in 3m41s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Failing after 5m33s
CI / coverage (pull_request) Successful in 7m12s
CI / benchmark-regression (pull_request) Has been cancelled
to 57d2907a70
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Successful in 3m27s
CI / integration_tests (pull_request) Successful in 3m38s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Failing after 5m20s
CI / coverage (pull_request) Successful in 7m2s
CI / benchmark-regression (pull_request) Successful in 37m50s
2026-03-19 10:14:49 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from 57d2907a70
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Successful in 3m27s
CI / integration_tests (pull_request) Successful in 3m38s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Failing after 5m20s
CI / coverage (pull_request) Successful in 7m2s
CI / benchmark-regression (pull_request) Successful in 37m50s
to 459fd89a74
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 19s
CI / build (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 1m4s
CI / integration_tests (pull_request) Successful in 5m46s
CI / unit_tests (pull_request) Successful in 5m50s
CI / e2e_tests (pull_request) Failing after 6m21s
CI / docker (pull_request) Successful in 56s
CI / coverage (pull_request) Successful in 7m9s
CI / benchmark-regression (pull_request) Successful in 37m40s
2026-03-19 11:17:32 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from 459fd89a74
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 19s
CI / build (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 1m4s
CI / integration_tests (pull_request) Successful in 5m46s
CI / unit_tests (pull_request) Successful in 5m50s
CI / e2e_tests (pull_request) Failing after 6m21s
CI / docker (pull_request) Successful in 56s
CI / coverage (pull_request) Successful in 7m9s
CI / benchmark-regression (pull_request) Successful in 37m40s
to 16c80defe8
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 56s
CI / unit_tests (pull_request) Successful in 3m12s
CI / integration_tests (pull_request) Successful in 3m41s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 7m0s
CI / coverage (pull_request) Successful in 7m1s
CI / benchmark-regression (pull_request) Successful in 37m57s
2026-03-19 13:34:18 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from 16c80defe8
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 56s
CI / unit_tests (pull_request) Successful in 3m12s
CI / integration_tests (pull_request) Successful in 3m41s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 7m0s
CI / coverage (pull_request) Successful in 7m1s
CI / benchmark-regression (pull_request) Successful in 37m57s
to 9c27a269fc
All checks were successful
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 39s
CI / security (pull_request) Successful in 51s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m35s
CI / integration_tests (pull_request) Successful in 3m51s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / e2e_tests (pull_request) Successful in 7m27s
CI / coverage (pull_request) Successful in 7m2s
CI / docker (pull_request) Successful in 1m10s
CI / benchmark-regression (pull_request) Successful in 38m31s
2026-03-20 05:39:13 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from 9c27a269fc
All checks were successful
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 39s
CI / security (pull_request) Successful in 51s
CI / quality (pull_request) Successful in 27s
CI / unit_tests (pull_request) Successful in 3m35s
CI / integration_tests (pull_request) Successful in 3m51s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / e2e_tests (pull_request) Successful in 7m27s
CI / coverage (pull_request) Successful in 7m2s
CI / docker (pull_request) Successful in 1m10s
CI / benchmark-regression (pull_request) Successful in 38m31s
to fa5ff97681
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 3m49s
CI / typecheck (pull_request) Successful in 4m4s
CI / security (pull_request) Successful in 4m25s
CI / unit_tests (pull_request) Successful in 7m0s
CI / integration_tests (pull_request) Successful in 7m2s
CI / docker (pull_request) Successful in 1m8s
CI / coverage (pull_request) Successful in 11m33s
CI / status-check (pull_request) Successful in 1s
CI / e2e_tests (pull_request) Successful in 9m44s
CI / benchmark-regression (pull_request) Successful in 1h14m29s
2026-03-23 04:10:46 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from fa5ff97681
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 3m49s
CI / typecheck (pull_request) Successful in 4m4s
CI / security (pull_request) Successful in 4m25s
CI / unit_tests (pull_request) Successful in 7m0s
CI / integration_tests (pull_request) Successful in 7m2s
CI / docker (pull_request) Successful in 1m8s
CI / coverage (pull_request) Successful in 11m33s
CI / status-check (pull_request) Successful in 1s
CI / e2e_tests (pull_request) Successful in 9m44s
CI / benchmark-regression (pull_request) Successful in 1h14m29s
to e382538edd
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m53s
CI / quality (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m2s
CI / integration_tests (pull_request) Successful in 8m41s
CI / unit_tests (pull_request) Successful in 8m49s
CI / e2e_tests (pull_request) Successful in 9m35s
CI / docker (pull_request) Successful in 1m5s
CI / coverage (pull_request) Successful in 11m20s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 56m14s
2026-03-24 05:40:12 +00:00
Compare
CoreRasurae left a comment

Code Review Report -- PR #816 (Issue #751)

test/e2e-wf05-db-migration: WF05 Database Schema Migration E2E + Repository UNIQUE Fix

Scope: robot/e2e/wf05_db_migration.robot (new, 512 lines), src/cleveragents/infrastructure/database/repositories.py (14-line fix), CHANGELOG.md (7 lines).

Review methodology: 3 full global cycles across all categories (bugs, test flaws, coverage, security, performance) with targeted deep-dives into skill schema parsing, action YAML parsing, repository update flow, and specification alignment. Each cycle re-examined all categories until no new issues were found.

Overall assessment: The repository UNIQUE constraint fix is correct and well-commented. The E2E test is functional and follows established patterns from other WF tests. However, several acceptance criteria are significantly weakened by WARN-only assertions, and the repository fix lacks a dedicated regression test.


Findings Summary

Severity Bug/Correctness Test Flaws Security Performance Total
Medium 1 5 0 0 6
Low 1 3 0 0 4
Total 2 8 0 0 10

1. Bug Detection / Correctness

BUG-1 (Medium): No dedicated regression test for the UNIQUE constraint fix

File: src/cleveragents/infrastructure/database/repositories.py:1398-1406

The fix correctly consolidates clear() calls and adds session.flush() before append() to prevent SQLAlchemy INSERT-before-DELETE ordering. However, no existing test reproduces the exact failure scenario: creating a plan with argument name "x", then updating it with the same argument name "x".

Verified against all existing test suites:

  • repositories_coverage_boost.feature (line 72): updates a plan that starts with zero arguments -- never triggers UNIQUE collision.
  • plan_persistence.feature, plan_lifecycle_coverage.feature, plan_repository.feature: all create fresh arguments on update, never same-name replacement.

Risk: If the session.flush() at line 1406 is accidentally removed during future refactoring, no fast unit/BDD test will catch the regression. Only the slow E2E test would surface it, and only if the LLM happens to trigger a plan update with duplicate argument names.

Recommendation: Add a targeted BDD scenario that creates a plan with arguments {"x": "v1"}, then updates it with {"x": "v2"}, and asserts no IntegrityError.

BUG-2 (Low): Skill tool metadata silently discarded

File: robot/e2e/wf05_db_migration.robot:221-250

The test defines tools with writes: true/false and checkpointable: true/false, but SkillService._schema_to_skill_dict() (skill_service.py:345-346) reduces each tool to just its name string. The metadata is validated by SkillToolRefSchema but never stored in the Skill domain model. The tool name assertions on lines 248-250 pass, but the test creates a false impression that capability metadata is being verified.

Impact: Low -- the test correctly exercises the CLI registration flow. But the assertion Output Should Contain ${r_skill} local/query_db only proves the name was echoed, not that writes: false was recorded.


2. Test Flaws / Coverage Gaps

TEST-1 (Medium): AC #4 weakened -- "5 phases" reduced to 2-decision minimum

File: robot/e2e/wf05_db_migration.robot:349-364

Issue #751 AC says "Test exercises phased child plan execution (5 phases)". The spec (lines 38590-38594) describes a 5-phase plan: migration, rollback migration, backfill, code update, tests.

The test asserts:

Should Be True    ${decision_count} >= 2

This is a 60% reduction from the acceptance criterion. The WARNs at lines 356-361 log if below 3 or 5 but do not fail. Additionally, the children key check (line 363) only verifies the JSON key "children" exists as a substring in stdout -- it does not verify the children array is non-empty or that actual child plans were created.

Recommendation: Raise the hard minimum to >= 3 (still accommodating LLM variability), and parse the JSON to verify at least one child node exists in the children array.

TEST-2 (Medium): AC #5 conditionally tested -- rollback only if checkpoint exists

File: robot/e2e/wf05_db_migration.robot:400-437

Issue #751 AC says "Test exercises checkpoint creation and rollback (plan rollback)". However, the test conditionally branches:

  • IF checkpoint_id is non-empty: performs actual rollback (hard assertions) -- AC fulfilled.
  • ELSE (line 428): only verifies that a fake checkpoint ID fails gracefully -- AC not fulfilled.

If the system never creates checkpoints during execution, the test passes without ever exercising real rollback functionality.

Recommendation: Add a hard assertion (or at least a prominent log) when the ELSE branch is taken, to make it visible in CI that AC #5 was not actually validated. Alternatively, consider asserting checkpoint existence separately.

TEST-3 (Medium): No terminal state assertion after lifecycle-apply

File: robot/e2e/wf05_db_migration.robot:457-467

After lifecycle-apply, the test checks rc == 0 and Output Should Contain ${r_apply} ${plan_id}, but never queries plan status to verify the plan reached a terminal state (applied, complete, errored, etc.). If apply succeeds with rc=0 but the plan state is left in an intermediate or incorrect state, this would not be caught.

Other E2E tests (e.g., m6 acceptance) verify the phase field contains "apply" after the apply step.

Recommendation: Add a plan status call after apply and assert the phase/processing_state indicates a terminal outcome.

TEST-4 (Medium): AC #6 migration/backfill content checks are WARN-only

File: robot/e2e/wf05_db_migration.robot:493-509

Two content verification checks that map to AC #6 ("Test verifies database migration and backfill operations") are logged as WARNs but do not fail the test:

  1. Migration content (line 493): checks for last_login, schema, migration, column, alter in git diff. If none found, only WARNs.
  2. Backfill evidence (line 505): checks for backfill, batch, populate, last_login in plan/execution output. If none found, only WARNs.

If the LLM produces no migration-related changes whatsoever, the test still passes.

Recommendation: If these are intentionally non-blocking (due to LLM non-determinism), document this explicitly in the test. Consider adding a combined gate: if both migration content and backfill evidence are absent, fail the test.

TEST-5 (Medium): Automation profile verification silently skipped if field absent

File: robot/e2e/wf05_db_migration.robot:319-327

After plan use, the test extracts automation_profile from JSON output. If the field is absent or null (lines 320-321 handle Python None / string "None"), the test logs a WARN and skips the assertion entirely (line 326).

The issue AC implicitly requires the review profile to be active. If the system accepts the --automation-profile review flag but doesn't persist it in the plan JSON output, this is never caught.

Recommendation: Follow up with a plan status --format json call and extract automation_profile from there as a secondary verification.

TEST-6 (Low): decision_count based on fragile raw string counting

File: robot/e2e/wf05_db_migration.robot:352

${decision_count}=    Evaluate    $r_tree.stdout.count('"decision_id"')

This counts substring "decision_id" occurrences in raw JSON. If the JSON format changes (e.g., nested error messages contain "decision_id", or pretty-printing changes quoting), the count could be wrong. Similarly, "children" on line 363 is a raw substring check.

Recommendation: Use Safe Parse Json Field or full JSON parsing to count decision nodes in the tree structure.

TEST-7 (Low): No content assertion on plan diff output

File: robot/e2e/wf05_db_migration.robot:439-447

Section 13 checks plan diff with rc=0 and non-empty stdout, but does not inspect the diff content. This is a missed opportunity to verify the changeset contains migration-related changes before apply, providing an earlier feedback signal.

TEST-8 (Low): Missing Traceback/INTERNAL checks on custom resource error paths

File: robot/e2e/wf05_db_migration.robot:204-218

When resource add for the custom postgres-db type fails (rc != 0, line 204), the error path only WARNs. No Should Not Contain ... Traceback / Should Not Contain ... INTERNAL assertion runs on the failure output. This is inconsistent with the pattern used for all other commands in the test (e.g., lines 174-175, 185-186, 244-245).

Recommendation: Add Traceback/INTERNAL checks inside the ELSE branch at line 216-217.


3. Security

No security issues found within the scope of these changes. YAML files are created with hardcoded content in a temp directory. No user-supplied input flows into file content or command arguments.


4. Performance

No significant performance issues found. Timeout values (60s for git, 120s for CLI, 300s for execute, 30min test) are consistent with other E2E tests and appropriate for LLM-driven execution.


Positive Observations

  1. Repository fix is well-commented: The explanation at lines 1398-1402 clearly documents why the flush is needed.
  2. Defensive teardown: WF05 Test Teardown logs diagnostic context on failure, aiding CI debugging.
  3. UUID suffix for isolation: Using uuid4().hex[:12] to prevent UNIQUE collisions across runs is good practice.
  4. Baseline SHA approach: Capturing HEAD before apply and diffing against it (lines 451-491) is the most thorough pattern among all E2E tests.
  5. Graceful degradation: The test handles custom resource type failures gracefully (WARN + continue), acknowledging that the full custom resource type flow may not be fully implemented.
  6. Spec-aligned tool names: The three database tools (query_db, execute_migration, backfill_column) match the specification exactly.
# Code Review Report -- PR #816 (Issue #751) ## `test/e2e-wf05-db-migration`: WF05 Database Schema Migration E2E + Repository UNIQUE Fix **Scope**: `robot/e2e/wf05_db_migration.robot` (new, 512 lines), `src/cleveragents/infrastructure/database/repositories.py` (14-line fix), `CHANGELOG.md` (7 lines). **Review methodology**: 3 full global cycles across all categories (bugs, test flaws, coverage, security, performance) with targeted deep-dives into skill schema parsing, action YAML parsing, repository update flow, and specification alignment. Each cycle re-examined all categories until no new issues were found. **Overall assessment**: The repository UNIQUE constraint fix is correct and well-commented. The E2E test is functional and follows established patterns from other WF tests. However, several acceptance criteria are significantly weakened by WARN-only assertions, and the repository fix lacks a dedicated regression test. --- ## Findings Summary | Severity | Bug/Correctness | Test Flaws | Security | Performance | Total | |----------|----------------|------------|----------|-------------|-------| | **Medium** | 1 | 5 | 0 | 0 | **6** | | **Low** | 1 | 3 | 0 | 0 | **4** | | **Total** | **2** | **8** | **0** | **0** | **10** | --- ## 1. Bug Detection / Correctness ### BUG-1 (Medium): No dedicated regression test for the UNIQUE constraint fix **File**: `src/cleveragents/infrastructure/database/repositories.py:1398-1406` The fix correctly consolidates `clear()` calls and adds `session.flush()` before `append()` to prevent SQLAlchemy INSERT-before-DELETE ordering. However, **no existing test reproduces the exact failure scenario**: creating a plan with argument name `"x"`, then updating it with the same argument name `"x"`. Verified against all existing test suites: - `repositories_coverage_boost.feature` (line 72): updates a plan that starts with **zero** arguments -- never triggers UNIQUE collision. - `plan_persistence.feature`, `plan_lifecycle_coverage.feature`, `plan_repository.feature`: all create fresh arguments on update, never same-name replacement. **Risk**: If the `session.flush()` at line 1406 is accidentally removed during future refactoring, no fast unit/BDD test will catch the regression. Only the slow E2E test would surface it, and only if the LLM happens to trigger a plan update with duplicate argument names. **Recommendation**: Add a targeted BDD scenario that creates a plan with arguments `{"x": "v1"}`, then updates it with `{"x": "v2"}`, and asserts no `IntegrityError`. ### BUG-2 (Low): Skill tool metadata silently discarded **File**: `robot/e2e/wf05_db_migration.robot:221-250` The test defines tools with `writes: true/false` and `checkpointable: true/false`, but `SkillService._schema_to_skill_dict()` (`skill_service.py:345-346`) reduces each tool to just its `name` string. The metadata is validated by `SkillToolRefSchema` but never stored in the `Skill` domain model. The tool name assertions on lines 248-250 pass, but the test creates a false impression that capability metadata is being verified. **Impact**: Low -- the test correctly exercises the CLI registration flow. But the assertion `Output Should Contain ${r_skill} local/query_db` only proves the name was echoed, not that `writes: false` was recorded. --- ## 2. Test Flaws / Coverage Gaps ### TEST-1 (Medium): AC #4 weakened -- "5 phases" reduced to 2-decision minimum **File**: `robot/e2e/wf05_db_migration.robot:349-364` Issue #751 AC says _"Test exercises phased child plan execution (5 phases)"_. The spec (lines 38590-38594) describes a 5-phase plan: migration, rollback migration, backfill, code update, tests. The test asserts: ```robot Should Be True ${decision_count} >= 2 ``` This is a 60% reduction from the acceptance criterion. The WARNs at lines 356-361 log if below 3 or 5 but do not fail. Additionally, the `children` key check (line 363) only verifies the JSON _key_ `"children"` exists as a substring in stdout -- it does not verify the children array is non-empty or that actual child plans were created. **Recommendation**: Raise the hard minimum to `>= 3` (still accommodating LLM variability), and parse the JSON to verify at least one child node exists in the children array. ### TEST-2 (Medium): AC #5 conditionally tested -- rollback only if checkpoint exists **File**: `robot/e2e/wf05_db_migration.robot:400-437` Issue #751 AC says _"Test exercises checkpoint creation and rollback (`plan rollback`)"_. However, the test conditionally branches: - **IF** `checkpoint_id` is non-empty: performs actual rollback (hard assertions) -- AC fulfilled. - **ELSE** (line 428): only verifies that a fake checkpoint ID fails gracefully -- AC **not** fulfilled. If the system never creates checkpoints during execution, the test passes without ever exercising real rollback functionality. **Recommendation**: Add a hard assertion (or at least a prominent log) when the ELSE branch is taken, to make it visible in CI that AC #5 was not actually validated. Alternatively, consider asserting checkpoint existence separately. ### TEST-3 (Medium): No terminal state assertion after lifecycle-apply **File**: `robot/e2e/wf05_db_migration.robot:457-467` After `lifecycle-apply`, the test checks `rc == 0` and `Output Should Contain ${r_apply} ${plan_id}`, but never queries `plan status` to verify the plan reached a terminal state (`applied`, `complete`, `errored`, etc.). If apply succeeds with rc=0 but the plan state is left in an intermediate or incorrect state, this would not be caught. Other E2E tests (e.g., m6 acceptance) verify the phase field contains `"apply"` after the apply step. **Recommendation**: Add a `plan status` call after apply and assert the phase/processing_state indicates a terminal outcome. ### TEST-4 (Medium): AC #6 migration/backfill content checks are WARN-only **File**: `robot/e2e/wf05_db_migration.robot:493-509` Two content verification checks that map to AC #6 (_"Test verifies database migration and backfill operations"_) are logged as WARNs but do not fail the test: 1. **Migration content** (line 493): checks for `last_login`, `schema`, `migration`, `column`, `alter` in git diff. If none found, only WARNs. 2. **Backfill evidence** (line 505): checks for `backfill`, `batch`, `populate`, `last_login` in plan/execution output. If none found, only WARNs. If the LLM produces no migration-related changes whatsoever, the test still passes. **Recommendation**: If these are intentionally non-blocking (due to LLM non-determinism), document this explicitly in the test. Consider adding a combined gate: if _both_ migration content _and_ backfill evidence are absent, fail the test. ### TEST-5 (Medium): Automation profile verification silently skipped if field absent **File**: `robot/e2e/wf05_db_migration.robot:319-327` After `plan use`, the test extracts `automation_profile` from JSON output. If the field is absent or null (lines 320-321 handle Python `None` / string `"None"`), the test logs a WARN and skips the assertion entirely (line 326). The issue AC implicitly requires the `review` profile to be active. If the system accepts the `--automation-profile review` flag but doesn't persist it in the plan JSON output, this is never caught. **Recommendation**: Follow up with a `plan status --format json` call and extract `automation_profile` from there as a secondary verification. ### TEST-6 (Low): `decision_count` based on fragile raw string counting **File**: `robot/e2e/wf05_db_migration.robot:352` ```robot ${decision_count}= Evaluate $r_tree.stdout.count('"decision_id"') ``` This counts substring `"decision_id"` occurrences in raw JSON. If the JSON format changes (e.g., nested error messages contain "decision_id", or pretty-printing changes quoting), the count could be wrong. Similarly, `"children"` on line 363 is a raw substring check. **Recommendation**: Use `Safe Parse Json Field` or full JSON parsing to count decision nodes in the tree structure. ### TEST-7 (Low): No content assertion on plan diff output **File**: `robot/e2e/wf05_db_migration.robot:439-447` Section 13 checks `plan diff` with `rc=0` and non-empty stdout, but does not inspect the diff content. This is a missed opportunity to verify the changeset contains migration-related changes _before_ apply, providing an earlier feedback signal. ### TEST-8 (Low): Missing Traceback/INTERNAL checks on custom resource error paths **File**: `robot/e2e/wf05_db_migration.robot:204-218` When `resource add` for the custom postgres-db type fails (rc != 0, line 204), the error path only WARNs. No `Should Not Contain ... Traceback` / `Should Not Contain ... INTERNAL` assertion runs on the failure output. This is inconsistent with the pattern used for all other commands in the test (e.g., lines 174-175, 185-186, 244-245). **Recommendation**: Add Traceback/INTERNAL checks inside the ELSE branch at line 216-217. --- ## 3. Security No security issues found within the scope of these changes. YAML files are created with hardcoded content in a temp directory. No user-supplied input flows into file content or command arguments. --- ## 4. Performance No significant performance issues found. Timeout values (60s for git, 120s for CLI, 300s for execute, 30min test) are consistent with other E2E tests and appropriate for LLM-driven execution. --- ## Positive Observations 1. **Repository fix is well-commented**: The explanation at lines 1398-1402 clearly documents _why_ the flush is needed. 2. **Defensive teardown**: `WF05 Test Teardown` logs diagnostic context on failure, aiding CI debugging. 3. **UUID suffix for isolation**: Using `uuid4().hex[:12]` to prevent UNIQUE collisions across runs is good practice. 4. **Baseline SHA approach**: Capturing HEAD before apply and diffing against it (lines 451-491) is the most thorough pattern among all E2E tests. 5. **Graceful degradation**: The test handles custom resource type failures gracefully (WARN + continue), acknowledging that the full custom resource type flow may not be fully implemented. 6. **Spec-aligned tool names**: The three database tools (`query_db`, `execute_migration`, `backfill_column`) match the specification exactly.
CoreRasurae approved these changes 2026-03-24 12:07:08 +00:00
Dismissed
CoreRasurae approved these changes 2026-03-24 12:07:52 +00:00
Dismissed
CoreRasurae left a comment

Approved, provided that the medium level findings in #816 (comment) are addressed.

Approved, provided that the medium level findings in https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/816#issuecomment-71482 are addressed.
hurui200320 force-pushed test/e2e-wf05-db-migration from e382538edd
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m53s
CI / quality (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m2s
CI / integration_tests (pull_request) Successful in 8m41s
CI / unit_tests (pull_request) Successful in 8m49s
CI / e2e_tests (pull_request) Successful in 9m35s
CI / docker (pull_request) Successful in 1m5s
CI / coverage (pull_request) Successful in 11m20s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 56m14s
to 2e9081d471
Some checks are pending
CI / benchmark-publish (pull_request) Has been skipped
CI / e2e_tests (pull_request) Has started running
CI / build (pull_request) Successful in 15s
CI / lint (pull_request) Successful in 3m47s
CI / typecheck (pull_request) Successful in 4m29s
CI / benchmark-regression (pull_request) Has started running
CI / security (pull_request) Successful in 4m7s
CI / quality (pull_request) Successful in 3m39s
CI / unit_tests (pull_request) Successful in 3m59s
CI / docker (pull_request) Successful in 58s
CI / integration_tests (pull_request) Successful in 5m55s
CI / coverage (pull_request) Successful in 11m43s
CI / status-check (pull_request) Successful in 1s
2026-03-25 11:53:08 +00:00
Compare
hurui200320 dismissed CoreRasurae's review 2026-03-25 11:53:08 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Member

Review Fix Round — Addressing Medium Findings from @CoreRasurae

All 6 medium-severity findings from the code review have been addressed. Branch rebased onto latest master (83c22b83).

Fixes Applied

BUG-1 (Medium): No regression test for UNIQUE constraint fix

  • Added targeted BDD scenario Updating a plan argument with the same name does not hit UNIQUE constraints in features/repositories_coverage_boost.feature
  • Creates plan with argument x=v1, persists, updates to x=v2 via LifecyclePlanRepository.update(), asserts no IntegrityError and verifies new value persisted
  • Three new step definitions in features/steps/repositories_coverage_boost_steps.py: step_set_initial_plan_argument, step_update_plan_argument_same_name, step_verify_plan_argument_value

TEST-1 (Medium): AC #4 — fragile string counting for decision nodes

  • Replaced raw $r_tree.stdout.count('"decision_id"') with proper JSON parsing
  • Plan tree output is now parsed via json.loads() with log-line prefix stripping
  • Decision counting uses recursive tree walker that traverses children arrays
  • Added structural verification: children_key_count >= 1 (hard) and child_link_count (WARN if < 1)
  • Hard minimum remains >= 2 (framework minimum: strategy_choice + implementation_choice) per ticket AC "output validation is flexible"

TEST-2 (Medium): AC #5 — rollback conditionally tested

  • Added explicit WARN when ELSE branch is taken: "AC #5 visibility: no checkpoint_id present, so real rollback path was not executed in this run"
  • Fake checkpoint test (no-such-checkpoint) moved outside IF/ELSE — now always runs unconditionally
  • Added Should Not Contain Traceback and Should Not Contain INTERNAL checks on fake rollback output

TEST-3 (Medium): No terminal state assertion after lifecycle-apply

  • Added plan status call after lifecycle-apply with --format json
  • Extracts phase and processing_state from JSON output
  • Hard assertion: plan must be in terminal state (applied, constrained, errored, cancelled, complete) OR show apply-phase progress
  • WARN if phase doesn't contain "apply" or state is non-terminal

TEST-4 (Medium): AC #6 — migration/backfill WARN-only

  • Added combined gate: has_ac6_evidence = has_migration_content or has_backfill_evidence
  • When both are absent, explicit WARN: "AC #6 visibility: neither migration nor backfill evidence was observed in this run (output is flexible)"
  • Also added plan diff content-signal check before apply (section 13) as early feedback
  • WARN-only behavior is intentional per ticket AC "Output validation is flexible"

TEST-5 (Medium): Automation profile silently skipped

  • Added fallback: when plan use output omits automation_profile, now calls plan status --format json for secondary verification
  • Extracts automation_profile from status output with same None/empty guards
  • Hard assertion Should Be Equal As Strings ${resolved_profile} review now always executes (never skipped)

Bonus: TEST-8 (Low) also addressed

  • Added Traceback/INTERNAL checks inside both resource add and project link-resource ELSE branches
  • Includes NoSuchOption guard to avoid false positives from unimplemented CLI flags

Quality Gates (all passing)

Gate Result
nox -e lint All checks passed
nox -e typecheck 0 errors
nox -e unit_tests 471 features, 12,422 scenarios, 0 failures
nox -e integration_tests 1,727 tests, 0 failures
nox -e e2e_tests 42 tests, 42 passed (including WF05)
nox -e coverage_report 98% (≥ 97% threshold)

Low-severity findings (not addressed in this round)

Per Luis's approval condition ("provided that the medium level findings are addressed"), the 4 low-severity findings (BUG-2, TEST-6, TEST-7, TEST-8 partial) are acknowledged but not required for merge. TEST-8 was addressed as a bonus since it was straightforward.

## Review Fix Round — Addressing Medium Findings from @CoreRasurae All 6 medium-severity findings from the [code review](https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/816#issuecomment-71482) have been addressed. Branch rebased onto latest `master` (`83c22b83`). ### Fixes Applied **BUG-1 (Medium): No regression test for UNIQUE constraint fix** - Added targeted BDD scenario `Updating a plan argument with the same name does not hit UNIQUE constraints` in `features/repositories_coverage_boost.feature` - Creates plan with argument `x=v1`, persists, updates to `x=v2` via `LifecyclePlanRepository.update()`, asserts no `IntegrityError` and verifies new value persisted - Three new step definitions in `features/steps/repositories_coverage_boost_steps.py`: `step_set_initial_plan_argument`, `step_update_plan_argument_same_name`, `step_verify_plan_argument_value` **TEST-1 (Medium): AC #4 — fragile string counting for decision nodes** - Replaced raw `$r_tree.stdout.count('"decision_id"')` with proper JSON parsing - Plan tree output is now parsed via `json.loads()` with log-line prefix stripping - Decision counting uses recursive tree walker that traverses `children` arrays - Added structural verification: `children_key_count >= 1` (hard) and `child_link_count` (WARN if < 1) - Hard minimum remains `>= 2` (framework minimum: strategy_choice + implementation_choice) per ticket AC "output validation is flexible" **TEST-2 (Medium): AC #5 — rollback conditionally tested** - Added explicit WARN when ELSE branch is taken: `"AC #5 visibility: no checkpoint_id present, so real rollback path was not executed in this run"` - Fake checkpoint test (`no-such-checkpoint`) moved outside IF/ELSE — now always runs unconditionally - Added `Should Not Contain Traceback` and `Should Not Contain INTERNAL` checks on fake rollback output **TEST-3 (Medium): No terminal state assertion after lifecycle-apply** - Added `plan status` call after `lifecycle-apply` with `--format json` - Extracts `phase` and `processing_state` from JSON output - Hard assertion: plan must be in terminal state (`applied`, `constrained`, `errored`, `cancelled`, `complete`) OR show apply-phase progress - WARN if phase doesn't contain "apply" or state is non-terminal **TEST-4 (Medium): AC #6 — migration/backfill WARN-only** - Added combined gate: `has_ac6_evidence = has_migration_content or has_backfill_evidence` - When *both* are absent, explicit WARN: `"AC #6 visibility: neither migration nor backfill evidence was observed in this run (output is flexible)"` - Also added `plan diff` content-signal check before apply (section 13) as early feedback - WARN-only behavior is intentional per ticket AC "Output validation is flexible" **TEST-5 (Medium): Automation profile silently skipped** - Added fallback: when `plan use` output omits `automation_profile`, now calls `plan status --format json` for secondary verification - Extracts `automation_profile` from status output with same None/empty guards - Hard assertion `Should Be Equal As Strings ${resolved_profile} review` now always executes (never skipped) **Bonus: TEST-8 (Low) also addressed** - Added Traceback/INTERNAL checks inside both `resource add` and `project link-resource` ELSE branches - Includes `NoSuchOption` guard to avoid false positives from unimplemented CLI flags ### Quality Gates (all passing) | Gate | Result | |------|--------| | `nox -e lint` | ✅ All checks passed | | `nox -e typecheck` | ✅ 0 errors | | `nox -e unit_tests` | ✅ 471 features, 12,422 scenarios, 0 failures | | `nox -e integration_tests` | ✅ 1,727 tests, 0 failures | | `nox -e e2e_tests` | ✅ 42 tests, 42 passed (including WF05) | | `nox -e coverage_report` | ✅ 98% (≥ 97% threshold) | ### Low-severity findings (not addressed in this round) Per Luis's approval condition ("provided that the medium level findings are addressed"), the 4 low-severity findings (BUG-2, TEST-6, TEST-7, TEST-8 partial) are acknowledged but not required for merge. TEST-8 was addressed as a bonus since it was straightforward.
hurui200320 force-pushed test/e2e-wf05-db-migration from 2e9081d471
Some checks are pending
CI / benchmark-publish (pull_request) Has been skipped
CI / e2e_tests (pull_request) Has started running
CI / build (pull_request) Successful in 15s
CI / lint (pull_request) Successful in 3m47s
CI / typecheck (pull_request) Successful in 4m29s
CI / benchmark-regression (pull_request) Has started running
CI / security (pull_request) Successful in 4m7s
CI / quality (pull_request) Successful in 3m39s
CI / unit_tests (pull_request) Successful in 3m59s
CI / docker (pull_request) Successful in 58s
CI / integration_tests (pull_request) Successful in 5m55s
CI / coverage (pull_request) Successful in 11m43s
CI / status-check (pull_request) Successful in 1s
to dc1d4c44cd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 13s
CI / lint (pull_request) Successful in 3m22s
CI / quality (pull_request) Successful in 3m40s
CI / integration_tests (pull_request) Successful in 4m36s
CI / unit_tests (pull_request) Successful in 6m55s
CI / coverage (pull_request) Successful in 10m5s
CI / security (pull_request) Successful in 4m2s
CI / docker (pull_request) Successful in 1m1s
CI / status-check (pull_request) Successful in 1s
CI / e2e_tests (pull_request) Successful in 9m5s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-25 12:12:00 +00:00
Compare
hurui200320 force-pushed test/e2e-wf05-db-migration from dc1d4c44cd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 13s
CI / lint (pull_request) Successful in 3m22s
CI / quality (pull_request) Successful in 3m40s
CI / integration_tests (pull_request) Successful in 4m36s
CI / unit_tests (pull_request) Successful in 6m55s
CI / coverage (pull_request) Successful in 10m5s
CI / security (pull_request) Successful in 4m2s
CI / docker (pull_request) Successful in 1m1s
CI / status-check (pull_request) Successful in 1s
CI / e2e_tests (pull_request) Successful in 9m5s
CI / benchmark-regression (pull_request) Has been cancelled
to 5759ac5974
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 22s
CI / lint (pull_request) Successful in 3m18s
CI / security (pull_request) Successful in 3m56s
CI / typecheck (pull_request) Successful in 3m57s
CI / e2e_tests (pull_request) Successful in 10m0s
CI / coverage (pull_request) Successful in 7m4s
CI / unit_tests (pull_request) Successful in 3m47s
CI / docker (pull_request) Successful in 1m1s
CI / integration_tests (pull_request) Successful in 5m53s
CI / quality (pull_request) Successful in 3m37s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-25 12:43:16 +00:00
Compare
hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-25 12:54:28 +00:00
hurui200320 deleted branch test/e2e-wf05-db-migration 2026-03-25 13:05:06 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!816
No description provided.