feat(sandbox): implement overlay filesystem sandbox strategy #994

Merged
brent.edwards merged 5 commits from feature/m5-overlay-sandbox into master 2026-03-21 03:14:41 +00:00
Member

Summary

Implement the overlay filesystem sandbox strategy with OverlayFS support and userspace fallback.

Implementation

OverlaySandbox (infrastructure/sandbox/overlay.py, 497 lines):

  • Detects OverlayFS availability at runtime via /proc/filesystems + os.geteuid() == 0 check
  • Real OverlayFS mode (requires root): creates upper/work/merged dirs, mounts overlay filesystem, captures writes in upper layer
  • Userspace fallback (default in CI/containers): shutil.copytree the original into merged dir, tracks changes via filecmp diff on commit
  • create(): sets up directory structure, mounts if available
  • commit(): copies changed/added files from overlay to original, removes deleted files
  • rollback(): unmounts (or removes) merged, recreates from scratch
  • cleanup(): unmounts, removes all temp dirs, idempotent

Domain Model Updates

  • Added OVERLAY = "overlay" to SandboxStrategy enum in both resource_type.py and resource.py
  • Added STRATEGY_OVERLAY to SandboxFactory, registered for fs-mount, fs-directory, fs-file resources

Tests

  • 22 Behave scenarios: full lifecycle (create/commit/rollback/cleanup), status transitions, path traversal guard, fallback detection, error handling
  • 6 Robot integration tests: end-to-end overlay sandbox operations

Quality Gates

Session Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (10,917 scenarios)
nox -s coverage_report 97% (>= 97%)

Closes #880

## Summary Implement the overlay filesystem sandbox strategy with OverlayFS support and userspace fallback. ### Implementation **`OverlaySandbox`** (`infrastructure/sandbox/overlay.py`, 497 lines): - Detects OverlayFS availability at runtime via `/proc/filesystems` + `os.geteuid() == 0` check - **Real OverlayFS mode** (requires root): creates upper/work/merged dirs, mounts overlay filesystem, captures writes in upper layer - **Userspace fallback** (default in CI/containers): `shutil.copytree` the original into merged dir, tracks changes via `filecmp` diff on commit - `create()`: sets up directory structure, mounts if available - `commit()`: copies changed/added files from overlay to original, removes deleted files - `rollback()`: unmounts (or removes) merged, recreates from scratch - `cleanup()`: unmounts, removes all temp dirs, idempotent ### Domain Model Updates - Added `OVERLAY = "overlay"` to `SandboxStrategy` enum in both `resource_type.py` and `resource.py` - Added `STRATEGY_OVERLAY` to `SandboxFactory`, registered for `fs-mount`, `fs-directory`, `fs-file` resources ### Tests - **22 Behave scenarios**: full lifecycle (create/commit/rollback/cleanup), status transitions, path traversal guard, fallback detection, error handling - **6 Robot integration tests**: end-to-end overlay sandbox operations ### Quality Gates | Session | Result | |---|---| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,917 scenarios) | | `nox -s coverage_report` | 97% (>= 97%) | Closes #880
brent.edwards force-pushed feature/m5-overlay-sandbox from 1a466e2fd5
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 38s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 1m3s
CI / e2e_tests (pull_request) Successful in 1m25s
CI / unit_tests (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to 551e280180
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 49s
CI / e2e_tests (pull_request) Successful in 1m25s
CI / unit_tests (pull_request) Failing after 3m10s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 3m31s
CI / coverage (pull_request) Successful in 5m46s
CI / benchmark-regression (pull_request) Successful in 38m4s
2026-03-17 01:03:56 +00:00
Compare
brent.edwards added this to the v3.5.0 milestone 2026-03-17 01:04:17 +00:00
brent.edwards force-pushed feature/m5-overlay-sandbox from 551e280180
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 49s
CI / e2e_tests (pull_request) Successful in 1m25s
CI / unit_tests (pull_request) Failing after 3m10s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 3m31s
CI / coverage (pull_request) Successful in 5m46s
CI / benchmark-regression (pull_request) Successful in 38m4s
to db8767382e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 52s
CI / e2e_tests (pull_request) Successful in 1m38s
CI / unit_tests (pull_request) Successful in 3m10s
CI / integration_tests (pull_request) Successful in 3m36s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 5m59s
CI / benchmark-regression (pull_request) Successful in 42m9s
2026-03-17 02:45:00 +00:00
Compare
Owner

PM Status — Day 37 — Rebase Required

This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.

@brent.edwards — Please rebase this PR onto master by Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.


PM rebase request — Day 37

## PM Status — Day 37 — Rebase Required This PR has **merge conflicts** and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved. @brent.edwards — Please rebase this PR onto `master` by **Day 39 EOD (2026-03-19)**. If you cannot rebase by then, please post a comment explaining the blocker. --- *PM rebase request — Day 37*
brent.edwards force-pushed feature/m5-overlay-sandbox from db8767382e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 52s
CI / e2e_tests (pull_request) Successful in 1m38s
CI / unit_tests (pull_request) Successful in 3m10s
CI / integration_tests (pull_request) Successful in 3m36s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 5m59s
CI / benchmark-regression (pull_request) Successful in 42m9s
to b98568bd1a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 38s
CI / security (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-18 03:29:42 +00:00
Compare
CoreRasurae left a comment

Code Review Report -- PR #994: feat(sandbox): implement overlay filesystem sandbox strategy

Branch: feature/m5-overlay-sandbox | Issue: #880 | Reviewer scope: All code changes in this branch + close connections to surrounding code | Spec reference: docs/specification.md

This review was conducted over two full cycles scanning all problem categories (bugs, regressions, security, performance, test coverage, spec compliance) in each cycle. Findings are organized by severity then by category.


CRITICAL

Regression Bugs (Bundled plan.py / plan_lifecycle_service.py Changes)

The commit bundles several simplifications to plan.py and plan_lifecycle_service.py that revert fixes previously merged into master by other contributors. These are not related to the overlay sandbox feature itself.

[REG-1] list_plans() database fallback removed -- plan_lifecycle_service.py:800

  • Reverts the fix from commit 2258075 ("fix(service): add database fallback to PlanLifecycleService.list_plans()").
  • list_plans() now only reads from the in-memory _plans dict. Since the DI container creates PlanLifecycleService via providers.Factory (new instance per CLI invocation), every new process starts with an empty dict.
  • Impact: Plans created by plan use in one CLI invocation are invisible to plan list, plan execute (auto-select), and plan apply (auto-select) in subsequent invocations. This breaks the fundamental cross-process plan discovery workflow.

[REG-2] Plan override persistence removed from use_action() -- plan.py:~1614

  • Reverts the fix from commit f07d347 ("fix(cli): persist plan overrides and run strategize inline in plan execute").
  • Removes the service._commit_plan(plan) call after applying automation-profile, actor, and execution-environment overrides.
  • Impact: Overrides set during plan use --automation-profile X do not survive to plan execute (separate process). The second process loads the plan from the database without the overrides.

[REG-3] Shared lifecycle_service parameter removed from _get_plan_executor() -- plan.py:~1218

  • Reverts the fix from commit ff2d824 ("fix(cli): share PlanLifecycleService instance between CLI handler and PlanExecutor").
  • The executor now always creates an independent lifecycle service from the DI container.
  • Impact: Reintroduces the stale-read bug where the executor's service instance has a different in-memory cache than the CLI handler's, causing spurious "Plan is not in an executable state" errors after the executor advances the plan.

HIGH

Regression Bugs

[REG-4] PreflightRejection exception handler removed -- plan.py execute_plan()

  • Reverts the fix from commit f0bdc3c ("fix(cli): load persisted actions in start_strategize").
  • PreflightRejection (which extends bare Exception, not CleverAgentsError) is no longer caught. Users see opaque "Error [500] INTERNAL" tracebacks instead of actionable pre-flight failure messages.

[REG-5] Generic exception catch-all removed -- plan.py execute_plan()

  • The except Exception handler that produced clean "Unexpected error:" messages was removed. Any unexpected exception now produces a raw traceback.

[REG-6] lifecycle_apply_plan no longer handles auto-progressed plans -- plan.py:~1829

  • The master version searches both Execute/complete and Apply/queued plans. The branch only searches Execute/complete.
  • Impact: Plans that were auto-progressed to Apply/queued by complete_execute()'s auto_progress() call will never be found by plan apply auto-select, producing "No plans ready for apply."

[REG-7] execute_plan no longer completes in a single invocation for strategize-queued plans -- plan.py:~1728

  • Master's flow: strategize inline -> check auto-progress -> run execute inline -> report complete.
  • Branch's flow: strategize inline -> print "Run 'agents plan execute <id>' again" and exit.
  • Impact: Breaks automated/CI workflows that relied on plan execute completing the full strategize-through-execute pipeline in one call. Users must now invoke plan execute twice.

Overlay Sandbox Bugs

[BUG-1] Sandbox unusable after rollback -- violates protocol contract -- overlay.py:325-367

  • rollback() transitions to SandboxStatus.ROLLED_BACK. The state machine defines ROLLED_BACK: [ACTIVE, CLEANED_UP], and the protocol docstring states: "After rollback the sandbox may be re-used."
  • However, get_path() (the only method that transitions to ACTIVE) only accepts CREATED or ACTIVE states. There is no code path to re-enter ACTIVE from ROLLED_BACK.
  • Note: The same gap exists in CopyOnWriteSandbox (its docstring explicitly promises re-activation via get_path but the code prevents it). Consistency with the existing implementation is acknowledged, but both implementations violate the protocol contract.

Error Handling

[ERR-1] suppress(Exception) is dangerously broad -- plan_lifecycle_service.py:853

  • Changed from suppress(NotFoundError) to suppress(Exception).
  • This silently swallows all exceptions during action loading, including DatabaseError, ConnectionError, TypeError, AttributeError, and other programming/infrastructure errors.
  • If the action fails to load for any reason other than "not found," the preflight guardrail silently proceeds with an empty action registry, masking the root cause.

MEDIUM

Overlay Sandbox Bugs

[BUG-2] commit() message parameter silently ignored -- overlay.py:251

  • The docstring states the message is "stored in result metadata." In practice, it is never stored, logged, or included in the CommitResult. Callers passing a message get no indication it was discarded.

[BUG-3] Empty directories not preserved during commit -- overlay.py:_compute_diff()

  • os.walk() only yields file entries. If a new empty directory is created in the sandbox, it won't appear in added_files and won't be created in the original during commit. Similarly, directories that become empty after file deletion won't be removed from the original.

[BUG-4] Orphaned parent directories after file deletion during commit -- overlay.py:292-296

  • When files are deleted during commit, only os.remove(dst) is called. Parent directories that become empty as a result are left behind in the original tree.

Security

[SEC-1] Symlinks can escape sandbox boundary -- overlay.py:174

  • The userspace fallback uses shutil.copytree(symlinks=True), which preserves symbolic links. If the original directory contains symlinks pointing outside its tree (e.g., /etc/shadow, ../../../sensitive), those symlinks are copied into the merged directory, giving sandbox operations access to files outside the intended resource boundary.
  • Recommendation: Either resolve symlinks (symlinks=False) or validate that symlink targets remain within the original path.

[SEC-2] No boundary validation for original_path -- overlay.py:110

  • The sandbox accepts any absolute path. There is no check that original_path is within a project boundary or registered resource. In combination with real OverlayFS mounts (which require root), this could mount overlays on arbitrary filesystem locations.

Performance

[PERF-1] _is_overlayfs_available() probe not cached -- overlay.py:33-77

  • Every OverlaySandbox.__init__() invocation runs a full mount probe: tempfile.mkdtemp(), os.makedirs() (4 dirs), reads /proc/filesystems, and if root, performs mount + umount + shutil.rmtree.
  • The result cannot change within a process lifetime. A module-level functools.lru_cache or sentinel variable would eliminate this overhead for all but the first instantiation.

[PERF-2] Full directory tree walk during commit (real OverlayFS mode) -- overlay.py:_compute_diff()

  • For real OverlayFS, the upper directory already isolates only the files that were modified. The implementation ignores this and walks the entire merged tree + entire original tree + byte-compares every common file.
  • For the userspace fallback this is necessary, but for real OverlayFS, inspecting only the upper layer would reduce complexity from O(total_files) to O(changed_files).

Test Coverage

[TEST-1] Real OverlayFS code paths have zero test coverage -- overlay_sandbox.feature, helper_overlay_sandbox.py

  • All Behave scenarios and Robot tests rely on _use_real_overlay is False (the userspace fallback). The _mount_overlay(), _unmount_overlay(), real-overlay rollback() (unmount/recreate/remount), and real-overlay cleanup() (unmount before rmtree) paths are never executed.
  • Recommendation: Add at least a mock-based test that patches _is_overlayfs_available to return True and verifies that subprocess.run is called with the correct mount/umount arguments.

[TEST-2] No temp directory cleanup in BDD scenarios -- overlay_sandbox_steps.py

  • context.ovl_tmpdir is created in given_ovl_test_directory but never cleaned up. There is no @after_scenario hook to remove temp directories.
  • Impact: Each scenario run leaks a temp directory under /tmp/ovl-test-*.

[TEST-3] Tests depend on private implementation attributes -- overlay_sandbox_steps.py, helper_overlay_sandbox.py

  • Multiple assertions directly access _merged_dir, _upper_dir, _work_dir, _base_dir, _use_real_overlay. These are implementation details, not protocol-defined properties.
  • Impact: Any refactoring of internal directory naming or attribute structure will break tests even if the public protocol behavior is unchanged.

Spec Compliance

[SPEC-1] overlay not in resource type YAML JSON Schema enum -- spec line 34404

  • The specification's JSON Schema for sandbox_strategy defines: ["git_worktree", "copy_on_write", "transaction_rollback", "snapshot", "none"].
  • "overlay" is absent. Resource type YAML files specifying sandbox_strategy: overlay would fail schema validation against the spec's canonical schema.
  • The spec's descriptive tables (lines 19133, 24932, 45925) DO mention overlay as a strategy, creating an internal spec inconsistency.

[SPEC-2] Missing checkpoint() / restore_checkpoint() methods -- overlay.py

  • The specification's SandboxStrategy Protocol (line 46267) defines 9 methods including checkpoint() and restore_checkpoint().
  • The correction model (line 28649) classifies overlay as "rollbackable" with checkpoint support: "Discard overlay writes since checkpoint."
  • The OverlaySandbox implements none of the checkpoint functionality. This means overlay sandboxes cannot participate in the correction/rollback workflow described in the spec.

LOW

Overlay Sandbox Bugs

[BUG-5] Cleanup may attempt unmount on never-mounted path -- overlay.py:cleanup()

  • If create() failed during _mount_overlay(), _use_real_overlay is still True. Cleanup tries _unmount_overlay() on a path that was never mounted, producing a confusing warning log. Not harmful but noisy.

[BUG-6] Race condition in OverlayFS availability probe -- overlay.py:_is_overlayfs_available()

  • If the process is killed between mount and umount in the probe, the temporary overlay mount is leaked. The finally block does shutil.rmtree which cannot remove a mountpoint. Low probability but worth noting.

Performance

[PERF-3] Full directory copy for userspace fallback on creation -- overlay.py:174

  • shutil.copytree copies the entire original directory into the merged layer on sandbox creation. For large project directories (thousands of files), this is expensive both in I/O and disk space.
  • Alternative: Use hardlinks on supported filesystems, or defer copying until first write (true copy-on-write semantics).

Test Coverage

[TEST-4] No test for nested subdirectory operations

  • All file creation/modification/deletion tests use top-level files only (new_file.txt, existing.txt). No scenarios test paths like src/deep/nested/file.py.

[TEST-5] No test for symlink handling

  • No scenario verifies behavior when the original directory contains symlinks (internal or external).

[TEST-6] No test for binary file handling

  • All test content is plain text. No verification that binary content (images, compiled files) survives commit/rollback correctly.

[TEST-7] No test for double create() call

  • No scenario verifies that calling sandbox.create() twice raises SandboxStateError (CREATED -> CREATED is not a valid transition).

[TEST-8] commit(message=...) never tested

  • The message parameter exists in the API but no scenario passes a value or verifies its handling.

[TEST-9] Robot helper tests don't verify stderr -- overlay_sandbox.robot

  • Test cases only check result.stdout for success markers. If a Python exception occurs, it writes to stderr, which is never asserted.

Spec Compliance

[SPEC-3] Sandbox strength ordering doesn't include overlay -- spec line 24767

  • The canonical write target for virtual resources uses strength ordering: git_worktree > snapshot > copy_on_write > transaction_rollback. Overlay is not placed in this ordering, which may affect virtual resource resolution.

[SPEC-4] fs-directory / fs-file not explicitly listed with overlay in spec strategy tables -- spec line 24932

  • The detailed resource strategy table lists overlay only for fs-mount. The implementation adds overlay support to fs-directory and fs-file as well. While the security properties table (line 45925) describes overlay generically for "Filesystems supporting overlay mounts," the resource-specific tables are inconsistent.

Summary

Severity Count Breakdown
CRITICAL 3 3 regressions in plan lifecycle (REG-1, REG-2, REG-3)
HIGH 6 4 regressions (REG-4..7), 1 protocol violation (BUG-1), 1 error handling (ERR-1)
MEDIUM 10 3 bugs (BUG-2..4), 2 security (SEC-1..2), 2 performance (PERF-1..2), 3 test/spec
LOW 13 2 bugs (BUG-5..6), 1 performance (PERF-3), 6 test gaps, 2 spec gaps
Total 32

Recommendation

The overlay sandbox implementation (overlay.py) is solid in its core design -- the directory structure, lifecycle management, and userspace fallback are well-engineered. However:

  1. The bundled plan.py / plan_lifecycle_service.py changes (REG-1 through REG-7) revert multiple previously-merged bug fixes and should be separated from this PR or reverted. These affect core plan lifecycle functionality unrelated to sandbox overlay support.
  2. The suppress(Exception) broadening (ERR-1) should be narrowed back to specific exception types.
  3. The _is_overlayfs_available() caching (PERF-1) and the re-activation after rollback gap (BUG-1) are the highest-priority fixes within the overlay code itself.
  4. At minimum one mock-based test for the real OverlayFS path (TEST-1) should be added before merge.
# Code Review Report -- PR #994: feat(sandbox): implement overlay filesystem sandbox strategy **Branch:** `feature/m5-overlay-sandbox` | **Issue:** #880 | **Reviewer scope:** All code changes in this branch + close connections to surrounding code | **Spec reference:** `docs/specification.md` This review was conducted over two full cycles scanning all problem categories (bugs, regressions, security, performance, test coverage, spec compliance) in each cycle. Findings are organized by severity then by category. --- ## CRITICAL ### Regression Bugs (Bundled `plan.py` / `plan_lifecycle_service.py` Changes) The commit bundles several simplifications to `plan.py` and `plan_lifecycle_service.py` that revert fixes previously merged into master by other contributors. These are not related to the overlay sandbox feature itself. **[REG-1] `list_plans()` database fallback removed** -- `plan_lifecycle_service.py:800` - Reverts the fix from commit `2258075` ("fix(service): add database fallback to PlanLifecycleService.list\_plans()"). - `list_plans()` now only reads from the in-memory `_plans` dict. Since the DI container creates PlanLifecycleService via `providers.Factory` (new instance per CLI invocation), every new process starts with an empty dict. - **Impact:** Plans created by `plan use` in one CLI invocation are invisible to `plan list`, `plan execute` (auto-select), and `plan apply` (auto-select) in subsequent invocations. This breaks the fundamental cross-process plan discovery workflow. **[REG-2] Plan override persistence removed from `use_action()`** -- `plan.py:~1614` - Reverts the fix from commit `f07d347` ("fix(cli): persist plan overrides and run strategize inline in plan execute"). - Removes the `service._commit_plan(plan)` call after applying automation-profile, actor, and execution-environment overrides. - **Impact:** Overrides set during `plan use --automation-profile X` do not survive to `plan execute` (separate process). The second process loads the plan from the database without the overrides. **[REG-3] Shared `lifecycle_service` parameter removed from `_get_plan_executor()`** -- `plan.py:~1218` - Reverts the fix from commit `ff2d824` ("fix(cli): share PlanLifecycleService instance between CLI handler and PlanExecutor"). - The executor now always creates an independent lifecycle service from the DI container. - **Impact:** Reintroduces the stale-read bug where the executor's service instance has a different in-memory cache than the CLI handler's, causing spurious "Plan is not in an executable state" errors after the executor advances the plan. --- ## HIGH ### Regression Bugs **[REG-4] `PreflightRejection` exception handler removed** -- `plan.py` `execute_plan()` - Reverts the fix from commit `f0bdc3c` ("fix(cli): load persisted actions in start\_strategize"). - `PreflightRejection` (which extends bare `Exception`, not `CleverAgentsError`) is no longer caught. Users see opaque "Error [500] INTERNAL" tracebacks instead of actionable pre-flight failure messages. **[REG-5] Generic exception catch-all removed** -- `plan.py` `execute_plan()` - The `except Exception` handler that produced clean "Unexpected error:" messages was removed. Any unexpected exception now produces a raw traceback. **[REG-6] `lifecycle_apply_plan` no longer handles auto-progressed plans** -- `plan.py:~1829` - The master version searches both Execute/complete and Apply/queued plans. The branch only searches Execute/complete. - **Impact:** Plans that were auto-progressed to Apply/queued by `complete_execute()`'s `auto_progress()` call will never be found by `plan apply` auto-select, producing "No plans ready for apply." **[REG-7] `execute_plan` no longer completes in a single invocation for strategize-queued plans** -- `plan.py:~1728` - Master's flow: strategize inline -> check auto-progress -> run execute inline -> report complete. - Branch's flow: strategize inline -> print "Run 'agents plan execute \<id\>' again" and exit. - **Impact:** Breaks automated/CI workflows that relied on `plan execute` completing the full strategize-through-execute pipeline in one call. Users must now invoke `plan execute` twice. ### Overlay Sandbox Bugs **[BUG-1] Sandbox unusable after rollback -- violates protocol contract** -- `overlay.py:325-367` - `rollback()` transitions to `SandboxStatus.ROLLED_BACK`. The state machine defines `ROLLED_BACK: [ACTIVE, CLEANED_UP]`, and the protocol docstring states: *"After rollback the sandbox may be re-used."* - However, `get_path()` (the only method that transitions to ACTIVE) only accepts CREATED or ACTIVE states. There is no code path to re-enter ACTIVE from ROLLED_BACK. - **Note:** The same gap exists in `CopyOnWriteSandbox` (its docstring explicitly promises re-activation via `get_path` but the code prevents it). Consistency with the existing implementation is acknowledged, but both implementations violate the protocol contract. ### Error Handling **[ERR-1] `suppress(Exception)` is dangerously broad** -- `plan_lifecycle_service.py:853` - Changed from `suppress(NotFoundError)` to `suppress(Exception)`. - This silently swallows **all** exceptions during action loading, including `DatabaseError`, `ConnectionError`, `TypeError`, `AttributeError`, and other programming/infrastructure errors. - If the action fails to load for any reason other than "not found," the preflight guardrail silently proceeds with an empty action registry, masking the root cause. --- ## MEDIUM ### Overlay Sandbox Bugs **[BUG-2] `commit()` `message` parameter silently ignored** -- `overlay.py:251` - The docstring states the message is "stored in result metadata." In practice, it is never stored, logged, or included in the `CommitResult`. Callers passing a message get no indication it was discarded. **[BUG-3] Empty directories not preserved during commit** -- `overlay.py:_compute_diff()` - `os.walk()` only yields file entries. If a new empty directory is created in the sandbox, it won't appear in `added_files` and won't be created in the original during commit. Similarly, directories that become empty after file deletion won't be removed from the original. **[BUG-4] Orphaned parent directories after file deletion during commit** -- `overlay.py:292-296` - When files are deleted during commit, only `os.remove(dst)` is called. Parent directories that become empty as a result are left behind in the original tree. ### Security **[SEC-1] Symlinks can escape sandbox boundary** -- `overlay.py:174` - The userspace fallback uses `shutil.copytree(symlinks=True)`, which preserves symbolic links. If the original directory contains symlinks pointing outside its tree (e.g., `/etc/shadow`, `../../../sensitive`), those symlinks are copied into the merged directory, giving sandbox operations access to files outside the intended resource boundary. - Recommendation: Either resolve symlinks (`symlinks=False`) or validate that symlink targets remain within the original path. **[SEC-2] No boundary validation for `original_path`** -- `overlay.py:110` - The sandbox accepts any absolute path. There is no check that `original_path` is within a project boundary or registered resource. In combination with real OverlayFS mounts (which require root), this could mount overlays on arbitrary filesystem locations. ### Performance **[PERF-1] `_is_overlayfs_available()` probe not cached** -- `overlay.py:33-77` - Every `OverlaySandbox.__init__()` invocation runs a full mount probe: `tempfile.mkdtemp()`, `os.makedirs()` (4 dirs), reads `/proc/filesystems`, and if root, performs `mount` + `umount` + `shutil.rmtree`. - The result cannot change within a process lifetime. A module-level `functools.lru_cache` or sentinel variable would eliminate this overhead for all but the first instantiation. **[PERF-2] Full directory tree walk during commit (real OverlayFS mode)** -- `overlay.py:_compute_diff()` - For real OverlayFS, the upper directory already isolates only the files that were modified. The implementation ignores this and walks the entire merged tree + entire original tree + byte-compares every common file. - For the userspace fallback this is necessary, but for real OverlayFS, inspecting only the upper layer would reduce complexity from O(total\_files) to O(changed\_files). ### Test Coverage **[TEST-1] Real OverlayFS code paths have zero test coverage** -- `overlay_sandbox.feature`, `helper_overlay_sandbox.py` - All Behave scenarios and Robot tests rely on `_use_real_overlay is False` (the userspace fallback). The `_mount_overlay()`, `_unmount_overlay()`, real-overlay `rollback()` (unmount/recreate/remount), and real-overlay `cleanup()` (unmount before rmtree) paths are never executed. - Recommendation: Add at least a mock-based test that patches `_is_overlayfs_available` to return True and verifies that `subprocess.run` is called with the correct mount/umount arguments. **[TEST-2] No temp directory cleanup in BDD scenarios** -- `overlay_sandbox_steps.py` - `context.ovl_tmpdir` is created in `given_ovl_test_directory` but never cleaned up. There is no `@after_scenario` hook to remove temp directories. - **Impact:** Each scenario run leaks a temp directory under `/tmp/ovl-test-*`. **[TEST-3] Tests depend on private implementation attributes** -- `overlay_sandbox_steps.py`, `helper_overlay_sandbox.py` - Multiple assertions directly access `_merged_dir`, `_upper_dir`, `_work_dir`, `_base_dir`, `_use_real_overlay`. These are implementation details, not protocol-defined properties. - **Impact:** Any refactoring of internal directory naming or attribute structure will break tests even if the public protocol behavior is unchanged. ### Spec Compliance **[SPEC-1] `overlay` not in resource type YAML JSON Schema enum** -- spec line 34404 - The specification's JSON Schema for `sandbox_strategy` defines: `["git_worktree", "copy_on_write", "transaction_rollback", "snapshot", "none"]`. - `"overlay"` is absent. Resource type YAML files specifying `sandbox_strategy: overlay` would fail schema validation against the spec's canonical schema. - The spec's descriptive tables (lines 19133, 24932, 45925) DO mention overlay as a strategy, creating an internal spec inconsistency. **[SPEC-2] Missing `checkpoint()` / `restore_checkpoint()` methods** -- overlay.py - The specification's SandboxStrategy Protocol (line 46267) defines 9 methods including `checkpoint()` and `restore_checkpoint()`. - The correction model (line 28649) classifies overlay as "rollbackable" with checkpoint support: "Discard overlay writes since checkpoint." - The OverlaySandbox implements none of the checkpoint functionality. This means overlay sandboxes cannot participate in the correction/rollback workflow described in the spec. --- ## LOW ### Overlay Sandbox Bugs **[BUG-5] Cleanup may attempt unmount on never-mounted path** -- `overlay.py:cleanup()` - If `create()` failed during `_mount_overlay()`, `_use_real_overlay` is still True. Cleanup tries `_unmount_overlay()` on a path that was never mounted, producing a confusing warning log. Not harmful but noisy. **[BUG-6] Race condition in OverlayFS availability probe** -- `overlay.py:_is_overlayfs_available()` - If the process is killed between `mount` and `umount` in the probe, the temporary overlay mount is leaked. The `finally` block does `shutil.rmtree` which cannot remove a mountpoint. Low probability but worth noting. ### Performance **[PERF-3] Full directory copy for userspace fallback on creation** -- `overlay.py:174` - `shutil.copytree` copies the entire original directory into the merged layer on sandbox creation. For large project directories (thousands of files), this is expensive both in I/O and disk space. - Alternative: Use hardlinks on supported filesystems, or defer copying until first write (true copy-on-write semantics). ### Test Coverage **[TEST-4] No test for nested subdirectory operations** - All file creation/modification/deletion tests use top-level files only (`new_file.txt`, `existing.txt`). No scenarios test paths like `src/deep/nested/file.py`. **[TEST-5] No test for symlink handling** - No scenario verifies behavior when the original directory contains symlinks (internal or external). **[TEST-6] No test for binary file handling** - All test content is plain text. No verification that binary content (images, compiled files) survives commit/rollback correctly. **[TEST-7] No test for double `create()` call** - No scenario verifies that calling `sandbox.create()` twice raises `SandboxStateError` (CREATED -> CREATED is not a valid transition). **[TEST-8] `commit(message=...)` never tested** - The message parameter exists in the API but no scenario passes a value or verifies its handling. **[TEST-9] Robot helper tests don't verify stderr** -- `overlay_sandbox.robot` - Test cases only check `result.stdout` for success markers. If a Python exception occurs, it writes to stderr, which is never asserted. ### Spec Compliance **[SPEC-3] Sandbox strength ordering doesn't include overlay** -- spec line 24767 - The canonical write target for virtual resources uses strength ordering: `git_worktree > snapshot > copy_on_write > transaction_rollback`. Overlay is not placed in this ordering, which may affect virtual resource resolution. **[SPEC-4] `fs-directory` / `fs-file` not explicitly listed with overlay in spec strategy tables** -- spec line 24932 - The detailed resource strategy table lists overlay only for `fs-mount`. The implementation adds overlay support to `fs-directory` and `fs-file` as well. While the security properties table (line 45925) describes overlay generically for "Filesystems supporting overlay mounts," the resource-specific tables are inconsistent. --- ## Summary | Severity | Count | Breakdown | |----------|-------|-----------| | **CRITICAL** | 3 | 3 regressions in plan lifecycle (REG-1, REG-2, REG-3) | | **HIGH** | 6 | 4 regressions (REG-4..7), 1 protocol violation (BUG-1), 1 error handling (ERR-1) | | **MEDIUM** | 10 | 3 bugs (BUG-2..4), 2 security (SEC-1..2), 2 performance (PERF-1..2), 3 test/spec | | **LOW** | 13 | 2 bugs (BUG-5..6), 1 performance (PERF-3), 6 test gaps, 2 spec gaps | | **Total** | **32** | | ### Recommendation The overlay sandbox implementation (`overlay.py`) is solid in its core design -- the directory structure, lifecycle management, and userspace fallback are well-engineered. However: 1. **The bundled `plan.py` / `plan_lifecycle_service.py` changes (REG-1 through REG-7) revert multiple previously-merged bug fixes and should be separated from this PR or reverted.** These affect core plan lifecycle functionality unrelated to sandbox overlay support. 2. The `suppress(Exception)` broadening (ERR-1) should be narrowed back to specific exception types. 3. The `_is_overlayfs_available()` caching (PERF-1) and the re-activation after rollback gap (BUG-1) are the highest-priority fixes within the overlay code itself. 4. At minimum one mock-based test for the real OverlayFS path (TEST-1) should be added before merge.
brent.edwards force-pushed feature/m5-overlay-sandbox from b98568bd1a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 38s
CI / security (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to 7dedf9a093
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 17s
CI / build (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 42s
CI / typecheck (pull_request) Successful in 42s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Failing after 2m52s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 3m34s
CI / e2e_tests (pull_request) Successful in 4m8s
2026-03-18 18:57:07 +00:00
Compare
brent.edwards force-pushed feature/m5-overlay-sandbox from 7dedf9a093
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 17s
CI / build (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 42s
CI / typecheck (pull_request) Successful in 42s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Failing after 2m52s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 3m34s
CI / e2e_tests (pull_request) Successful in 4m8s
to 21fd9b1c9c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 16s
CI / build (pull_request) Successful in 22s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 46s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-18 19:03:44 +00:00
Compare
brent.edwards force-pushed feature/m5-overlay-sandbox from 21fd9b1c9c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 16s
CI / build (pull_request) Successful in 22s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 46s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to 5b84c86dcd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 18s
CI / build (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 43s
CI / typecheck (pull_request) Successful in 46s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / unit_tests (pull_request) Failing after 2m57s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 4m5s
CI / e2e_tests (pull_request) Successful in 4m44s
2026-03-18 19:05:41 +00:00
Compare
brent.edwards force-pushed feature/m5-overlay-sandbox from 5b84c86dcd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 18s
CI / build (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 43s
CI / typecheck (pull_request) Successful in 46s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / unit_tests (pull_request) Failing after 2m57s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 4m5s
CI / e2e_tests (pull_request) Successful in 4m44s
to b98568bd1a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 38s
CI / security (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-18 19:10:28 +00:00
Compare
brent.edwards force-pushed feature/m5-overlay-sandbox from b98568bd1a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 38s
CI / security (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to a669a99185
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 20s
CI / build (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 29s
CI / integration_tests (pull_request) Has been cancelled
CI / security (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-18 19:11:37 +00:00
Compare
brent.edwards force-pushed feature/m5-overlay-sandbox from a669a99185
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 20s
CI / build (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 29s
CI / integration_tests (pull_request) Has been cancelled
CI / security (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to e1c75bd60e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 19s
CI / build (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 33s
CI / security (pull_request) Successful in 43s
CI / typecheck (pull_request) Successful in 45s
CI / unit_tests (pull_request) Failing after 2m56s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 3m37s
CI / e2e_tests (pull_request) Successful in 3m55s
CI / coverage (pull_request) Successful in 7m1s
CI / benchmark-regression (pull_request) Successful in 38m26s
2026-03-18 19:12:14 +00:00
Compare
Author
Member

Changes Applied (REQUEST_CHANGES resolved)

Step 1: Reverted unrelated regressions

Restored the following 9 files to their master versions — these were accidentally included in the squashed overlay commit:

  • CHANGELOG.md
  • benchmarks/bench_uko_layer3.py
  • docs/timeline.md
  • features/consolidated_resource.feature
  • features/consolidated_sandbox.feature
  • features/steps/plan_lifecycle_commands_coverage_steps.py
  • features/steps/sandbox_factory_coverage_steps.py
  • src/cleveragents/application/services/plan_lifecycle_service.py
  • src/cleveragents/cli/commands/plan.py

The commit now touches only the 10 overlay-sandbox files.

Step 2: Overlay-specific fixes

ID Fix File Detail
BUG-1 get_path() from ROLLED_BACK overlay.py:233-237 Added SandboxStatus.ROLLED_BACK to allowed states so sandboxes can be re-used after rollback
PERF-1 Cache _is_overlayfs_available() overlay.py:33 Added @functools.lru_cache(maxsize=1) — avoids repeated /proc/filesystems reads and mount probes
SEC-1 Resolve symlinks in copytree overlay.py:181,362 Changed symlinks=Truesymlinks=False in both create() and rollback() userspace fallback paths
TEST-2 Cleanup temp dir in steps overlay_sandbox_steps.py:31 Added context.add_cleanup(shutil.rmtree, context.ovl_tmpdir) to prevent temp dir leaks
ERR-1 N/A (reverted to master) plan_lifecycle_service.py File reverted to master — the suppress(Exception) issue was in an unrelated file

Verification

  • nox -s lintpassed (0 errors)
  • nox -s typecheckpassed (0 errors, 1 pre-existing warning in strategy_registry.py)
  • overlay.py499 lines (under 500 limit)
  • No # type: ignore directives
## Changes Applied (REQUEST_CHANGES resolved) ### Step 1: Reverted unrelated regressions Restored the following 9 files to their `master` versions — these were accidentally included in the squashed overlay commit: - `CHANGELOG.md` - `benchmarks/bench_uko_layer3.py` - `docs/timeline.md` - `features/consolidated_resource.feature` - `features/consolidated_sandbox.feature` - `features/steps/plan_lifecycle_commands_coverage_steps.py` - `features/steps/sandbox_factory_coverage_steps.py` - `src/cleveragents/application/services/plan_lifecycle_service.py` - `src/cleveragents/cli/commands/plan.py` The commit now touches **only** the 10 overlay-sandbox files. ### Step 2: Overlay-specific fixes | ID | Fix | File | Detail | |----|-----|------|--------| | **BUG-1** | `get_path()` from ROLLED_BACK | `overlay.py:233-237` | Added `SandboxStatus.ROLLED_BACK` to allowed states so sandboxes can be re-used after rollback | | **PERF-1** | Cache `_is_overlayfs_available()` | `overlay.py:33` | Added `@functools.lru_cache(maxsize=1)` — avoids repeated `/proc/filesystems` reads and mount probes | | **SEC-1** | Resolve symlinks in copytree | `overlay.py:181,362` | Changed `symlinks=True` → `symlinks=False` in both `create()` and `rollback()` userspace fallback paths | | **TEST-2** | Cleanup temp dir in steps | `overlay_sandbox_steps.py:31` | Added `context.add_cleanup(shutil.rmtree, context.ovl_tmpdir)` to prevent temp dir leaks | | **ERR-1** | N/A (reverted to master) | `plan_lifecycle_service.py` | File reverted to master — the `suppress(Exception)` issue was in an unrelated file | ### Verification - `nox -s lint` — **passed** (0 errors) - `nox -s typecheck` — **passed** (0 errors, 1 pre-existing warning in `strategy_registry.py`) - `overlay.py` — **499 lines** (under 500 limit) - No `# type: ignore` directives
brent.edwards force-pushed feature/m5-overlay-sandbox from e1c75bd60e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 19s
CI / build (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 33s
CI / security (pull_request) Successful in 43s
CI / typecheck (pull_request) Successful in 45s
CI / unit_tests (pull_request) Failing after 2m56s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 3m37s
CI / e2e_tests (pull_request) Successful in 3m55s
CI / coverage (pull_request) Successful in 7m1s
CI / benchmark-regression (pull_request) Successful in 38m26s
to 88785aef99
Some checks failed
CI / lint (pull_request) Successful in 25s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 44s
CI / coverage (pull_request) Waiting to run
CI / benchmark-regression (pull_request) Waiting to run
CI / build (pull_request) Successful in 27s
CI / security (pull_request) Successful in 1m24s
CI / integration_tests (pull_request) Successful in 3m41s
CI / e2e_tests (pull_request) Successful in 4m54s
CI / unit_tests (pull_request) Failing after 5m45s
CI / docker (pull_request) Has been skipped
2026-03-18 22:14:18 +00:00
Compare
brent.edwards force-pushed feature/m5-overlay-sandbox from 88785aef99
Some checks failed
CI / lint (pull_request) Successful in 25s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 44s
CI / coverage (pull_request) Waiting to run
CI / benchmark-regression (pull_request) Waiting to run
CI / build (pull_request) Successful in 27s
CI / security (pull_request) Successful in 1m24s
CI / integration_tests (pull_request) Successful in 3m41s
CI / e2e_tests (pull_request) Successful in 4m54s
CI / unit_tests (pull_request) Failing after 5m45s
CI / docker (pull_request) Has been skipped
to 7013e1f060
Some checks failed
CI / lint (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 39s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 58s
CI / build (pull_request) Successful in 24s
CI / integration_tests (pull_request) Successful in 3m47s
CI / unit_tests (pull_request) Failing after 3m56s
CI / docker (pull_request) Has been skipped
CI / e2e_tests (pull_request) Successful in 3m57s
CI / coverage (pull_request) Failing after 6m57s
CI / benchmark-regression (pull_request) Successful in 38m54s
2026-03-18 22:22:14 +00:00
Compare
Author
Member

Review Fixes Applied — Commit 7013e1f0

Rebase onto master (resolves REG-1 through REG-7, ERR-1)

The branch has been rebased onto current origin/master (ad98d41d). This automatically resolved all critical and high regression findings (REG-1 through REG-7, ERR-1) because the regressions were caused by stale versions of plan.py and plan_lifecycle_service.py bundled into the commit. After rebase, these files match master exactly and are no longer modified by this PR.

The branch now modifies only overlay-sandbox-specific files:

  • src/cleveragents/infrastructure/sandbox/overlay.py (new)
  • src/cleveragents/infrastructure/sandbox/factory.py (registers overlay)
  • src/cleveragents/infrastructure/sandbox/__init__.py (exports)
  • src/cleveragents/infrastructure/sandbox/protocol.py (metadata field on CommitResult)
  • src/cleveragents/domain/models/core/resource.py / resource_type.py (overlay strategy)
  • features/overlay_sandbox.feature + features/steps/overlay_sandbox_steps.py
  • robot/overlay_sandbox.robot + robot/helper_overlay_sandbox.py
  • vulture_whitelist.py, CHANGELOG.md

Overlay-specific fixes

Finding Severity Status Detail
REG-1–7 CRITICAL/HIGH Resolved by rebase plan.py and plan_lifecycle_service.py no longer modified
ERR-1 HIGH Resolved by rebase suppress(Exception) was in plan_lifecycle_service.py
BUG-1 HIGH Already fixed ROLLED_BACK was already in get_path() allowed states
SEC-1 MEDIUM Already fixed symlinks=False already in both create() and rollback()
PERF-1 MEDIUM Already fixed @lru_cache(maxsize=1) already applied to _is_overlayfs_available()
TEST-2 MEDIUM Already fixed context.add_cleanup(shutil.rmtree, ...) already present
BUG-2 MEDIUM Fixed Added metadata field to CommitResult; commit() now stores message
CHANGELOG Fixed Added entry under ## Unreleased

Quality Gates

  • nox -s lintPASS
  • nox -s typecheckPASS (0 errors)
  • overlay.py498 lines (under 500 limit)
  • Single commit, no merge commits, rebased on ad98d41d
## Review Fixes Applied — Commit `7013e1f0` ### Rebase onto master (resolves REG-1 through REG-7, ERR-1) The branch has been rebased onto current `origin/master` (`ad98d41d`). This **automatically resolved all critical and high regression findings** (REG-1 through REG-7, ERR-1) because the regressions were caused by stale versions of `plan.py` and `plan_lifecycle_service.py` bundled into the commit. After rebase, these files match master exactly and are no longer modified by this PR. The branch now modifies **only** overlay-sandbox-specific files: - `src/cleveragents/infrastructure/sandbox/overlay.py` (new) - `src/cleveragents/infrastructure/sandbox/factory.py` (registers overlay) - `src/cleveragents/infrastructure/sandbox/__init__.py` (exports) - `src/cleveragents/infrastructure/sandbox/protocol.py` (metadata field on CommitResult) - `src/cleveragents/domain/models/core/resource.py` / `resource_type.py` (overlay strategy) - `features/overlay_sandbox.feature` + `features/steps/overlay_sandbox_steps.py` - `robot/overlay_sandbox.robot` + `robot/helper_overlay_sandbox.py` - `vulture_whitelist.py`, `CHANGELOG.md` ### Overlay-specific fixes | Finding | Severity | Status | Detail | |---------|----------|--------|--------| | **REG-1–7** | CRITICAL/HIGH | Resolved by rebase | `plan.py` and `plan_lifecycle_service.py` no longer modified | | **ERR-1** | HIGH | Resolved by rebase | `suppress(Exception)` was in plan_lifecycle_service.py | | **BUG-1** | HIGH | Already fixed | `ROLLED_BACK` was already in `get_path()` allowed states | | **SEC-1** | MEDIUM | Already fixed | `symlinks=False` already in both `create()` and `rollback()` | | **PERF-1** | MEDIUM | Already fixed | `@lru_cache(maxsize=1)` already applied to `_is_overlayfs_available()` | | **TEST-2** | MEDIUM | Already fixed | `context.add_cleanup(shutil.rmtree, ...)` already present | | **BUG-2** | MEDIUM | **Fixed** | Added `metadata` field to `CommitResult`; `commit()` now stores message | | **CHANGELOG** | — | **Fixed** | Added entry under `## Unreleased` | ### Quality Gates - `nox -s lint` — **PASS** - `nox -s typecheck` — **PASS** (0 errors) - `overlay.py` — **498 lines** (under 500 limit) - Single commit, no merge commits, rebased on `ad98d41d`
freemo approved these changes 2026-03-19 04:54:10 +00:00
Dismissed
freemo left a comment

Code Review — PR #994 feat(sandbox): implement overlay filesystem sandbox strategy

Well-implemented feature with a clean architecture: OverlayFS detection → real mount or userspace fallback → commit/rollback lifecycle. The 22 Behave scenarios + 6 Robot integration tests provide thorough coverage of the full lifecycle including edge cases (path traversal, empty commit, idempotent cleanup, state transitions).

Approved with minor notes:

  1. filecmp.dircmp shallow comparison — In _compute_diff(), filecmp.dircmp defaults to shallow=True, which compares only file metadata (size, mtime) rather than content. This means files with the same size and modification time but different content would not be detected as changed. Consider using shallow=False for content-based comparison, or document this as a known limitation.

  2. Robot helper temp dir cleanup — Functions like _overlay_lifecycle(), _overlay_rollback(), etc. create tempfile.mkdtemp() but don't clean up in finally blocks. On test failure, these temp directories will leak. The Behave steps correctly use context.add_cleanup(shutil.rmtree, ...) — consider applying the same pattern in the Robot helpers.

  3. Dual SandboxStrategy enumsOVERLAY = "overlay" is added to both resource.py:SandboxStrategy and resource_type.py:SandboxStrategy. This is a pre-existing design issue (not introduced by this PR), but worth noting as tech debt — two parallel enum definitions will eventually drift.

## Code Review — PR #994 `feat(sandbox): implement overlay filesystem sandbox strategy` Well-implemented feature with a clean architecture: OverlayFS detection → real mount or userspace fallback → commit/rollback lifecycle. The 22 Behave scenarios + 6 Robot integration tests provide thorough coverage of the full lifecycle including edge cases (path traversal, empty commit, idempotent cleanup, state transitions). **Approved** with minor notes: 1. **`filecmp.dircmp` shallow comparison** — In `_compute_diff()`, `filecmp.dircmp` defaults to `shallow=True`, which compares only file metadata (size, mtime) rather than content. This means files with the same size and modification time but different content would not be detected as changed. Consider using `shallow=False` for content-based comparison, or document this as a known limitation. 2. **Robot helper temp dir cleanup** — Functions like `_overlay_lifecycle()`, `_overlay_rollback()`, etc. create `tempfile.mkdtemp()` but don't clean up in `finally` blocks. On test failure, these temp directories will leak. The Behave steps correctly use `context.add_cleanup(shutil.rmtree, ...)` — consider applying the same pattern in the Robot helpers. 3. **Dual `SandboxStrategy` enums** — `OVERLAY = "overlay"` is added to both `resource.py:SandboxStrategy` and `resource_type.py:SandboxStrategy`. This is a pre-existing design issue (not introduced by this PR), but worth noting as tech debt — two parallel enum definitions will eventually drift.
brent.edwards force-pushed feature/m5-overlay-sandbox from 7013e1f060
Some checks failed
CI / lint (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 39s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 58s
CI / build (pull_request) Successful in 24s
CI / integration_tests (pull_request) Successful in 3m47s
CI / unit_tests (pull_request) Failing after 3m56s
CI / docker (pull_request) Has been skipped
CI / e2e_tests (pull_request) Successful in 3m57s
CI / coverage (pull_request) Failing after 6m57s
CI / benchmark-regression (pull_request) Successful in 38m54s
to b7cc214b94
Some checks failed
CI / lint (pull_request) Successful in 19s
CI / typecheck (pull_request) Successful in 43s
CI / quality (pull_request) Successful in 36s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 49s
CI / build (pull_request) Successful in 16s
CI / unit_tests (pull_request) Failing after 3m47s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 4m2s
CI / e2e_tests (pull_request) Successful in 7m27s
CI / coverage (pull_request) Failing after 8m47s
CI / benchmark-regression (pull_request) Failing after 26m39s
2026-03-20 00:08:06 +00:00
Compare
brent.edwards dismissed freemo's review 2026-03-20 00:08:06 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Rebased onto origin/master (79b0a2c5). CHANGELOG conflict resolved (kept master, re-added PR entry). resource_type.py auto-merged cleanly. nox -s lint PASS, nox -s typecheck PASS (0 errors). Commit b7cc214b.

Rebased onto `origin/master` (`79b0a2c5`). CHANGELOG conflict resolved (kept master, re-added PR entry). `resource_type.py` auto-merged cleanly. `nox -s lint` PASS, `nox -s typecheck` PASS (0 errors). Commit `b7cc214b`.
# Conflicts:
#	CHANGELOG.md
fix(test): update tests for overlay sandbox strategy and boost coverage
Some checks failed
CI / coverage (pull_request) Blocked by required conditions
CI / benchmark-regression (pull_request) Blocked by required conditions
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / quality (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 7m11s
CI / e2e_tests (pull_request) Successful in 9m3s
CI / integration_tests (pull_request) Failing after 15m3s
CI / typecheck (pull_request) Failing after 16m1s
CI / lint (pull_request) Failing after 16m1s
8555f62b16
- Update consolidated_resource.feature: SandboxStrategy now has 6 values (added overlay)
- Update consolidated_sandbox.feature: overlay is supported, fs-mount/fs-directory/fs-file include overlay
- Add 5 new overlay_sandbox.feature scenarios for coverage (commit message metadata, merged_dir None, OSError wrapping)
- Add overlay strategy support step in sandbox_factory_coverage_steps.py
Author
Member

Fixed 6 failing unit tests, boosted coverage, and merged latest master.

Unit test failures (all related to the new OVERLAY strategy not being reflected in existing assertions):

  1. consolidated_resource.feature:386 — "SandboxStrategy has exactly five values" → updated to six, added overlay to the values list
  2. consolidated_resource.feature:403 — "SandboxStrategy rejects overlay" → changed to "accepts overlay"
  3. consolidated_sandbox.feature:761 — "overlay strategy is now unknown and rejected" → changed to "overlay strategy is supported" with new step definition
  4. consolidated_sandbox.feature:813/819/825 — fs-mount/fs-directory/fs-file "support copy-on-write and none" → updated to "copy-on-write, overlay, and none" with new step definition

Coverage boost (96.9% → targeting 97%):
Added 5 new Behave scenarios to overlay_sandbox.feature covering:

  • Commit with message metadata (line 317)
  • get_path when merged_dir is None (line 248)
  • OSError wrapping in commit → SandboxCommitError (lines 299-303)
  • OSError wrapping in rollback → SandboxRollbackError (lines 370-374)
  • OSError wrapping in create → SandboxCreationError (lines 187-192)

Also: Merged latest origin/master (resolved CHANGELOG.md conflict).

Benchmark log: Server died at 25% — no code fix needed.

Fixed 6 failing unit tests, boosted coverage, and merged latest `master`. **Unit test failures (all related to the new OVERLAY strategy not being reflected in existing assertions):** 1. `consolidated_resource.feature:386` — "SandboxStrategy has exactly five values" → updated to six, added `overlay` to the values list 2. `consolidated_resource.feature:403` — "SandboxStrategy rejects overlay" → changed to "accepts overlay" 3. `consolidated_sandbox.feature:761` — "overlay strategy is now unknown and rejected" → changed to "overlay strategy is supported" with new step definition 4. `consolidated_sandbox.feature:813/819/825` — fs-mount/fs-directory/fs-file "support copy-on-write and none" → updated to "copy-on-write, overlay, and none" with new step definition **Coverage boost (96.9% → targeting 97%):** Added 5 new Behave scenarios to `overlay_sandbox.feature` covering: - Commit with message metadata (line 317) - `get_path` when `merged_dir` is None (line 248) - OSError wrapping in commit → `SandboxCommitError` (lines 299-303) - OSError wrapping in rollback → `SandboxRollbackError` (lines 370-374) - OSError wrapping in create → `SandboxCreationError` (lines 187-192) **Also:** Merged latest `origin/master` (resolved CHANGELOG.md conflict). **Benchmark log:** Server died at 25% — no code fix needed.
Merge remote-tracking branch 'origin/master' into feature/m5-overlay-sandbox
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 33s
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m39s
CI / security (pull_request) Successful in 4m5s
CI / typecheck (pull_request) Successful in 4m10s
CI / integration_tests (pull_request) Successful in 5m29s
CI / unit_tests (pull_request) Successful in 5m43s
CI / docker (pull_request) Successful in 1m8s
CI / coverage (pull_request) Failing after 14m41s
CI / e2e_tests (pull_request) Failing after 18m49s
CI / benchmark-regression (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
96abc0b516
# Conflicts:
#	CHANGELOG.md
Merge branch 'master' into feature/m5-overlay-sandbox
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 3m56s
CI / quality (pull_request) Successful in 4m15s
CI / security (pull_request) Successful in 4m37s
CI / e2e_tests (pull_request) Successful in 8m47s
CI / integration_tests (pull_request) Successful in 9m1s
CI / unit_tests (pull_request) Successful in 9m16s
CI / docker (pull_request) Successful in 1m13s
CI / coverage (pull_request) Successful in 11m14s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 29m0s
52270d5f43
brent.edwards deleted branch feature/m5-overlay-sandbox 2026-03-21 03:14:42 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!994
No description provided.