feat(changeset): persist changesets and diff artifacts #429

Merged
freemo merged 1 commit from feature/m2-changeset-persistence into master 2026-02-26 05:05:07 +00:00
Owner

Summary

This PR implements persistent storage for ChangeSet entries and ToolInvocation records, replacing the in-memory stores used during plan execution with SQLite-backed repositories. Prior to this change, ChangeSet data (the record of every file mutation made during a plan's Execute phase) was held only in memory and lost on process restart. This made it impossible to review diffs, inspect artifacts, or audit tool invocations after the fact. With persistence in place, plan diff and plan status can now retrieve changeset data across restarts, and cleanup logic ensures that orphaned artifacts are removed when a plan is cancelled or apply fails.

This work is part of the M2 milestone (Actor Graphs + Tool Sources) and addresses the Stage D0 requirement for durable changeset storage with full tool invocation auditability.

Closes #163

Motivation

The Execute phase generates a SpecChangeSet containing ChangeEntry records (one per file mutation) and ToolInvocation records (one per tool execution). Without persistence, this data was ephemeral. Three concrete problems drove this change:

  1. Diff review after restart -- Users running plan diff or plan status after a process restart saw empty output because the in-memory changeset was gone.
  2. Audit trail -- Tool invocations (arguments, results, durations, errors) were not being stored, making it impossible to trace what tools did during execution.
  3. Cleanup on failure -- When a plan was cancelled or apply failed, there was no mechanism to reclaim storage used by changeset artifacts that were no longer needed.

Changes

Database layer (src/cleveragents/infrastructure/database/)

  • New ORM models (models.py): Added ChangeSetEntryModel (table changeset_entries) and ToolInvocationModel (table tool_invocations) with indexed columns for changeset_id, plan_id, path, and tool_name to support efficient lookups during diff rendering and artifact display. Both models follow the existing declarative Base pattern established by ADR-007.
  • New repository module (changeset_repository.py, 455 lines): Three classes implementing the session-factory pattern:
    • ChangeSetEntryRepository -- CRUD operations for ChangeEntry persistence. Methods: save_entry, get_entries_for_changeset, get_entries_for_plan, delete_for_changeset, delete_for_plan. Each method includes argument validation, retry decoration via @database_retry, and proper error wrapping into DatabaseError.
    • ToolInvocationRepository -- CRUD for ToolInvocation records. Serialises complex fields (arguments, results, change IDs, resource refs, provider metadata) to JSON text columns. Methods: save_invocation, get_invocations_for_plan, delete_for_plan.
    • SqliteChangeSetStore -- Implements the ChangeSetStore protocol from cleveragents.domain.models.core.change. Methods: start, record, get, get_for_plan, summarize, delete_for_plan. This is the entry point used by the service layer.

Alembic migration (alembic/versions/d0_001_changeset_artifacts.py)

  • Creates changeset_entries and tool_invocations tables with the column definitions matching the ORM models. Adds indexes on changeset_id, plan_id, path, and tool_name. Includes a downgrade() that drops both tables.

Service layer (src/cleveragents/application/services/plan_apply_service.py)

  • Added cleanup_changeset() method to PlanApplyService. Delegates to ChangeSetStore.delete_for_plan() to remove all persisted changeset entries and invocations for a given plan. Called on plan cancel and apply failure paths to reclaim storage. Validates plan_id is non-empty and logs the number of records deleted.

Documentation (docs/reference/changeset.md)

  • New reference document (156 lines) describing the persistence fields for both tables, diff output format examples (plain, rich, JSON, YAML), artifact summary structure, and cleanup behaviour. Written for mkdocs.

Tests

  • BDD unit tests (features/changeset_persistence.feature + features/steps/changeset_persistence_steps.py): 17 Behave scenarios covering SqliteChangeSetStore start/record/get round-trips, get_for_plan retrieval, summarize counts, ChangeSetEntryRepository CRUD, ToolInvocationRepository CRUD, cleanup via PlanApplyService, and argument validation (rejecting None session factories, empty plan IDs, and empty changeset IDs). Step definitions use in-memory SQLite for isolation.
  • Robot Framework integration tests (robot/changeset_persistence.robot): 5 test cases verifying CLI-level artifact output fields, help text, and exit codes for changeset-related commands.
  • ASV performance benchmarks (benchmarks/changeset_persistence_bench.py): Benchmarks for single-entry persistence, batch persistence (100 entries), retrieval by changeset ID, retrieval by plan ID, and store cleanup. Uses in-memory SQLite to measure persistence overhead without I/O variance.

Changelog

  • Added entry under ## Unreleased describing the new changeset persistence and diff artifact storage capability, with issue reference (#163).

Dependencies

  • This PR blocks issue #163 (the PR must be merged before the issue can be closed; the issue depends on this PR).
  • No external dependency changes. All new code uses existing project dependencies (SQLAlchemy, Alembic, structlog, ulid).
## Summary This PR implements persistent storage for ChangeSet entries and ToolInvocation records, replacing the in-memory stores used during plan execution with SQLite-backed repositories. Prior to this change, ChangeSet data (the record of every file mutation made during a plan's Execute phase) was held only in memory and lost on process restart. This made it impossible to review diffs, inspect artifacts, or audit tool invocations after the fact. With persistence in place, `plan diff` and `plan status` can now retrieve changeset data across restarts, and cleanup logic ensures that orphaned artifacts are removed when a plan is cancelled or apply fails. This work is part of the M2 milestone (Actor Graphs + Tool Sources) and addresses the Stage D0 requirement for durable changeset storage with full tool invocation auditability. Closes #163 ## Motivation The Execute phase generates a `SpecChangeSet` containing `ChangeEntry` records (one per file mutation) and `ToolInvocation` records (one per tool execution). Without persistence, this data was ephemeral. Three concrete problems drove this change: 1. **Diff review after restart** -- Users running `plan diff` or `plan status` after a process restart saw empty output because the in-memory changeset was gone. 2. **Audit trail** -- Tool invocations (arguments, results, durations, errors) were not being stored, making it impossible to trace what tools did during execution. 3. **Cleanup on failure** -- When a plan was cancelled or apply failed, there was no mechanism to reclaim storage used by changeset artifacts that were no longer needed. ## Changes ### Database layer (`src/cleveragents/infrastructure/database/`) - **New ORM models** (`models.py`): Added `ChangeSetEntryModel` (table `changeset_entries`) and `ToolInvocationModel` (table `tool_invocations`) with indexed columns for `changeset_id`, `plan_id`, `path`, and `tool_name` to support efficient lookups during diff rendering and artifact display. Both models follow the existing declarative Base pattern established by ADR-007. - **New repository module** (`changeset_repository.py`, 455 lines): Three classes implementing the session-factory pattern: - `ChangeSetEntryRepository` -- CRUD operations for `ChangeEntry` persistence. Methods: `save_entry`, `get_entries_for_changeset`, `get_entries_for_plan`, `delete_for_changeset`, `delete_for_plan`. Each method includes argument validation, retry decoration via `@database_retry`, and proper error wrapping into `DatabaseError`. - `ToolInvocationRepository` -- CRUD for `ToolInvocation` records. Serialises complex fields (arguments, results, change IDs, resource refs, provider metadata) to JSON text columns. Methods: `save_invocation`, `get_invocations_for_plan`, `delete_for_plan`. - `SqliteChangeSetStore` -- Implements the `ChangeSetStore` protocol from `cleveragents.domain.models.core.change`. Methods: `start`, `record`, `get`, `get_for_plan`, `summarize`, `delete_for_plan`. This is the entry point used by the service layer. ### Alembic migration (`alembic/versions/d0_001_changeset_artifacts.py`) - Creates `changeset_entries` and `tool_invocations` tables with the column definitions matching the ORM models. Adds indexes on `changeset_id`, `plan_id`, `path`, and `tool_name`. Includes a `downgrade()` that drops both tables. ### Service layer (`src/cleveragents/application/services/plan_apply_service.py`) - **Added `cleanup_changeset()` method** to `PlanApplyService`. Delegates to `ChangeSetStore.delete_for_plan()` to remove all persisted changeset entries and invocations for a given plan. Called on plan cancel and apply failure paths to reclaim storage. Validates `plan_id` is non-empty and logs the number of records deleted. ### Documentation (`docs/reference/changeset.md`) - New reference document (156 lines) describing the persistence fields for both tables, diff output format examples (plain, rich, JSON, YAML), artifact summary structure, and cleanup behaviour. Written for mkdocs. ### Tests - **BDD unit tests** (`features/changeset_persistence.feature` + `features/steps/changeset_persistence_steps.py`): 17 Behave scenarios covering `SqliteChangeSetStore` start/record/get round-trips, `get_for_plan` retrieval, summarize counts, `ChangeSetEntryRepository` CRUD, `ToolInvocationRepository` CRUD, cleanup via `PlanApplyService`, and argument validation (rejecting `None` session factories, empty plan IDs, and empty changeset IDs). Step definitions use in-memory SQLite for isolation. - **Robot Framework integration tests** (`robot/changeset_persistence.robot`): 5 test cases verifying CLI-level artifact output fields, help text, and exit codes for changeset-related commands. - **ASV performance benchmarks** (`benchmarks/changeset_persistence_bench.py`): Benchmarks for single-entry persistence, batch persistence (100 entries), retrieval by changeset ID, retrieval by plan ID, and store cleanup. Uses in-memory SQLite to measure persistence overhead without I/O variance. ### Changelog - Added entry under `## Unreleased` describing the new changeset persistence and diff artifact storage capability, with issue reference `(#163)`. ## Dependencies - This PR **blocks** issue #163 (the PR must be merged before the issue can be closed; the issue depends on this PR). - No external dependency changes. All new code uses existing project dependencies (SQLAlchemy, Alembic, structlog, ulid).
freemo added this to the v3.1.0 milestone 2026-02-25 03:30:01 +00:00
freemo force-pushed feature/m2-changeset-persistence from dcd026c312
Some checks failed
CI / lint (pull_request) Failing after 23s
CI / typecheck (pull_request) Successful in 56s
CI / coverage (pull_request) Has been skipped
CI / quality (pull_request) Successful in 31s
CI / security (pull_request) Successful in 1m3s
CI / integration_tests (pull_request) Successful in 4m58s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 26s
CI / benchmark-regression (pull_request) Has been skipped
CI / unit_tests (pull_request) Successful in 27m29s
CI / docker (pull_request) Has been skipped
to 9fe6425f07
Some checks failed
CI / lint (pull_request) Failing after 15s
CI / quality (pull_request) Successful in 23s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 51s
CI / build (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 57s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 3m2s
CI / unit_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-02-25 19:51:25 +00:00
Compare
freemo scheduled this pull request to auto merge when all checks succeed 2026-02-25 19:51:38 +00:00
Author
Owner

CONTRIBUTING.md Compliance Review for PR #429

Issues Fixed (pushed to branch)

  1. CHANGELOG.md — Mixed concerns in version header (Atomic Commits violation)
    The commit changed ## Unreleased## v3.0.0, bundling a version release designation with the feature addition. Per CONTRIBUTING.md §Atomic Commits: "Never bundle cosmetic changes with functional changes in the same commit." The ## Unreleased header has been restored. The version designation should be done in a separate, dedicated commit when v3.0.0 is actually released.

  2. CHANGELOG.md — Missing issue reference
    The new changelog entry did not include an issue reference (#163), which all other entries in the file consistently use. Added (#163) for consistency.

  3. Issue #163 — Ref field was empty
    Per CONTRIBUTING.md §Creating Issues: "the Ref field must be set to the same branch named in the issue body's Metadata section." The Metadata section specifies feature/m2-changeset-persistence but the Ref field was blank. Updated the issue Ref field to feature/m2-changeset-persistence.

Issues Requiring Author Attention

  1. Commit message footer format
    The first commit uses Closes #163 as the footer reference. CONTRIBUTING.md §Commit Message Format specifies the ISSUES CLOSED: #163 format (see the examples in that section). While Closes #163 is a valid Forgejo closing keyword, the project convention is ISSUES CLOSED: #163. This cannot be fixed without rewriting history; please use the correct format in future commits.

  2. # type: ignore[misc] on new model classes (models.py lines 2474, 2508)
    CONTRIBUTING.md §Type Safety states: "never use inline comments or annotations to suppress individual type checking errors (e.g., no type: ignore)." The two new model classes (ChangeSetEntryModel, ToolInvocationModel) both use # type: ignore[misc]. This follows the existing codebase pattern (20+ existing models use the same suppression due to declarative_base() returning Any), so it is a pre-existing codebase-wide issue. However, new code should ideally not perpetuate violations. A separate task should be created to migrate the Base definition to SQLAlchemy 2.0's DeclarativeBase to eliminate all # type: ignore[misc] suppressions.

Compliance Summary

Requirement Status
PR description with summary Pass
Closing keyword for issue (Closes #163) Pass
Single Epic scope Pass
Atomic, well-scoped commit Pass (after fix)
Conventional Changelog commit format Pass
Commit references issue in footer Pass (format differs from convention)
Changelog updated Pass (after fix)
CONTRIBUTORS.md (author already listed) Pass
No build/install artifacts Pass
Milestone assigned (v3.1.0) Pass
Type/Feature label Pass
File organization Pass
Argument validation in public methods Pass
Type annotations Pass
BDD tests in features/ Pass
Robot tests in robot/ Pass
ASV benchmarks in benchmarks/ Pass
Docs in docs/ Pass
Issue #163 State/In Review label Pass
Issue #163 Ref field Pass (after fix)
Issue #163 milestone matches PR Pass
## CONTRIBUTING.md Compliance Review for PR #429 ### Issues Fixed (pushed to branch) 1. **CHANGELOG.md — Mixed concerns in version header (Atomic Commits violation)** The commit changed `## Unreleased` → `## v3.0.0`, bundling a version release designation with the feature addition. Per CONTRIBUTING.md §Atomic Commits: "Never bundle cosmetic changes with functional changes in the same commit." The `## Unreleased` header has been restored. The version designation should be done in a separate, dedicated commit when v3.0.0 is actually released. 2. **CHANGELOG.md — Missing issue reference** The new changelog entry did not include an issue reference `(#163)`, which all other entries in the file consistently use. Added `(#163)` for consistency. 3. **Issue #163 — Ref field was empty** Per CONTRIBUTING.md §Creating Issues: "the Ref field must be set to the same branch named in the issue body's Metadata section." The Metadata section specifies `feature/m2-changeset-persistence` but the Ref field was blank. Updated the issue Ref field to `feature/m2-changeset-persistence`. ### Issues Requiring Author Attention 4. **Commit message footer format** The first commit uses `Closes #163` as the footer reference. CONTRIBUTING.md §Commit Message Format specifies the `ISSUES CLOSED: #163` format (see the examples in that section). While `Closes #163` is a valid Forgejo closing keyword, the project convention is `ISSUES CLOSED: #163`. This cannot be fixed without rewriting history; please use the correct format in future commits. 5. **`# type: ignore[misc]` on new model classes** (`models.py` lines 2474, 2508) CONTRIBUTING.md §Type Safety states: "never use inline comments or annotations to suppress individual type checking errors (e.g., no `type: ignore`)." The two new model classes (`ChangeSetEntryModel`, `ToolInvocationModel`) both use `# type: ignore[misc]`. This follows the existing codebase pattern (20+ existing models use the same suppression due to `declarative_base()` returning `Any`), so it is a pre-existing codebase-wide issue. However, new code should ideally not perpetuate violations. A separate task should be created to migrate the `Base` definition to SQLAlchemy 2.0's `DeclarativeBase` to eliminate all `# type: ignore[misc]` suppressions. ### Compliance Summary | Requirement | Status | |---|---| | PR description with summary | Pass | | Closing keyword for issue (`Closes #163`) | Pass | | Single Epic scope | Pass | | Atomic, well-scoped commit | Pass (after fix) | | Conventional Changelog commit format | Pass | | Commit references issue in footer | Pass (format differs from convention) | | Changelog updated | Pass (after fix) | | CONTRIBUTORS.md (author already listed) | Pass | | No build/install artifacts | Pass | | Milestone assigned (v3.1.0) | Pass | | `Type/Feature` label | Pass | | File organization | Pass | | Argument validation in public methods | Pass | | Type annotations | Pass | | BDD tests in `features/` | Pass | | Robot tests in `robot/` | Pass | | ASV benchmarks in `benchmarks/` | Pass | | Docs in `docs/` | Pass | | Issue #163 `State/In Review` label | Pass | | Issue #163 Ref field | Pass (after fix) | | Issue #163 milestone matches PR | Pass |
freemo force-pushed feature/m2-changeset-persistence from 4b6a6e4478
Some checks failed
CI / lint (pull_request) Failing after 21s
CI / quality (pull_request) Successful in 22s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 39s
CI / coverage (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / benchmark-regression (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 2m19s
CI / unit_tests (pull_request) Failing after 12m5s
CI / docker (pull_request) Has been skipped
to 2d9d751d79
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 24s
CI / security (pull_request) Successful in 41s
CI / typecheck (pull_request) Successful in 54s
CI / integration_tests (pull_request) Successful in 2m56s
CI / unit_tests (pull_request) Successful in 8m54s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 21m58s
CI / coverage (pull_request) Failing after 1h29m54s
2026-02-25 21:10:43 +00:00
Compare
freemo force-pushed feature/m2-changeset-persistence from 2d9d751d79
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / build (pull_request) Successful in 24s
CI / security (pull_request) Successful in 41s
CI / typecheck (pull_request) Successful in 54s
CI / integration_tests (pull_request) Successful in 2m56s
CI / unit_tests (pull_request) Successful in 8m54s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 21m58s
CI / coverage (pull_request) Failing after 1h29m54s
to e3fcce413b
All checks were successful
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 59s
CI / security (pull_request) Successful in 53s
CI / quality (pull_request) Successful in 36s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 27s
CI / integration_tests (pull_request) Successful in 5m30s
CI / benchmark-regression (pull_request) Successful in 25m38s
CI / unit_tests (pull_request) Successful in 36m51s
CI / docker (pull_request) Successful in 1m3s
CI / coverage (pull_request) Successful in 1h48m49s
CI / lint (push) Successful in 22s
CI / security (push) Successful in 58s
CI / typecheck (push) Successful in 1m3s
CI / quality (push) Successful in 46s
CI / build (push) Successful in 23s
CI / integration_tests (push) Successful in 5m23s
CI / benchmark-regression (push) Has been skipped
CI / benchmark-publish (push) Successful in 15m5s
CI / unit_tests (push) Successful in 20m1s
CI / docker (push) Successful in 1m0s
CI / coverage (push) Successful in 1h45m36s
2026-02-26 02:47:15 +00:00
Compare
freemo merged commit e3fcce413b into master 2026-02-26 05:05:07 +00:00
freemo deleted branch feature/m2-changeset-persistence 2026-02-26 05:05:07 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!429
No description provided.