feat(devcontainer): add lazy activation and lifecycle management #613

Merged
CoreRasurae merged 1 commit from feature/m6plus-devcontainer-lifecycle into master 2026-03-10 12:43:26 +00:00
Member

Summary

Implements full lifecycle management for devcontainer-instance resources: lazy activation on first tool target, health checking, manual stop/rebuild CLI commands, and automatic cleanup on session close or plan completion.

Closes #514

Changes

Domain Model (container_lifecycle.py — new, 249 lines)

  • ContainerLifecycleState enum: detected, building, running, stopping, stopped, failed
  • VALID_TRANSITIONS mapping with validated state machine
  • ContainerLifecycleTracker Pydantic v2 model with session_id for scoped cleanup
  • transition_state() with history cap (100), automatic clearing of container_id/workspace_path on terminal states

Handler (devcontainer.py — new, 794 lines)

  • activate_container(): lazy activation via devcontainer up with JSON output parsing, orphan cleanup on timeout, auto-starts health check on success
  • stop_container(): idempotent stop (early return on terminal state), stops health check before lock to avoid deadlock
  • rebuild_container(): re-activates stopped/failed containers
  • start_health_check() / _health_check_loop(): background daemon thread with configurable interval, TOCTOU-safe transitions under lock
  • stop_all_active_containers(): session-scoped bulk stop with terminal tracker eviction
  • evict_terminal_trackers(): caps registry at 200 terminal-state entries, evicts oldest first
  • _parse_devcontainer_up_output(): reverse-line JSON scanning, hex container ID validation, absolute workspace path validation
  • DevcontainerHandler class: extends BaseResourceHandler with snapshot sandbox strategy

CLI Commands (resource.py — modified)

  • agents resource stop <name> [--yes]: transitions runningstoppingstopped
  • agents resource rebuild <name> [--yes]: transitions stopped/failedbuildingrunning
  • Type guards: _STOPPABLE_TYPES (devcontainer-instance, container-instance), _REBUILDABLE_TYPES (devcontainer-instance only)

Cleanup Wiring

  • AcpLocalFacade._handle_session_close() → session-scoped container cleanup (works even without session service)
  • PlanLifecycleService.complete_apply() / fail_apply() / fail_execute() / cancel_plan() → plan-scoped container cleanup
  • CleanupService.stop_active_devcontainers() → thin static wrapper delegating to handler

Specification Updates (specification.md)

  • Updated handler name to DevcontainerHandler in 3 locations
  • Added activation_state enum field to devcontainer-instance type definition
  • Updated handler dispatch table

Documentation (devcontainer_resources.md — new, 264 lines)

  • Full lifecycle docs: state diagram, transitions table, lazy activation flow
  • Health checking behavior, session/plan cleanup hooks
  • CLI usage with stop/rebuild commands
  • Known limitations table (in-memory state, eviction cap, sandbox_strategy mismatch)

Testing

Suite Result
Behave BDD 93 scenarios, 0 failures
Robot Framework 10 integration tests, 0 failures
ASV benchmarks 6 benchmarks (activation latency, transitions, parsing, registry ops)
ruff lint 0 errors
pyright typecheck 0 errors, 0 warnings

BDD Coverage Highlights (668-line feature file, 93 scenarios)

  • State model: enum values, all valid/invalid transitions, history cap
  • Lazy activation: success, failure, timeout with orphan cleanup, concurrent rejection
  • Stop: success, error paths, no container_id, non-zero rc, idempotent on terminal state
  • Rebuild: from stopped and failed states
  • Health check: success loop, failure transition, exception, registration, thread join
  • Session cleanup: scoped, unscoped, error handling, return list
  • CLI stop/rebuild: type validation, state preconditions, --yes flag, error paths (NotFoundError, RuntimeError)
  • Concurrency: dual-thread activation (one wins), dual-thread stop (both succeed — idempotent)
  • Parsing: valid JSON, invalid JSON, multi-line, invalid container ID, relative workspace path

Known Limitations

Limitation Detail
In-memory lifecycle state Not persisted to DB; lost on restart. Planned for M7+.
Registry eviction Terminal trackers evicted at 200 cap. Acceptable for MVP.
Spec sandbox_strategy Spec YAML says container_snapshot; implementation uses snapshot. Internal spec inconsistency — tables elsewhere also say snapshot.

Spec References

  • Devcontainer Integration: specification.md §33253–33263 (activation_state field)
  • Resource Type Inheritance: specification.md §23509–23527 (handler definition)
  • Handler Dispatch: specification.md §24876–24877 (sandbox strategy)
  • ADR-042: Resource Type Inheritance
  • ADR-043: Devcontainer Integration

ISSUES CLOSED: #514

## Summary Implements full lifecycle management for `devcontainer-instance` resources: lazy activation on first tool target, health checking, manual stop/rebuild CLI commands, and automatic cleanup on session close or plan completion. Closes #514 ## Changes ### Domain Model (`container_lifecycle.py` — new, 249 lines) - `ContainerLifecycleState` enum: `detected`, `building`, `running`, `stopping`, `stopped`, `failed` - `VALID_TRANSITIONS` mapping with validated state machine - `ContainerLifecycleTracker` Pydantic v2 model with `session_id` for scoped cleanup - `transition_state()` with history cap (100), automatic clearing of `container_id`/`workspace_path` on terminal states ### Handler (`devcontainer.py` — new, 794 lines) - `activate_container()`: lazy activation via `devcontainer up` with JSON output parsing, orphan cleanup on timeout, auto-starts health check on success - `stop_container()`: idempotent stop (early return on terminal state), stops health check before lock to avoid deadlock - `rebuild_container()`: re-activates stopped/failed containers - `start_health_check()` / `_health_check_loop()`: background daemon thread with configurable interval, TOCTOU-safe transitions under lock - `stop_all_active_containers()`: session-scoped bulk stop with terminal tracker eviction - `evict_terminal_trackers()`: caps registry at 200 terminal-state entries, evicts oldest first - `_parse_devcontainer_up_output()`: reverse-line JSON scanning, hex container ID validation, absolute workspace path validation - `DevcontainerHandler` class: extends `BaseResourceHandler` with `snapshot` sandbox strategy ### CLI Commands (`resource.py` — modified) - `agents resource stop <name> [--yes]`: transitions `running` → `stopping` → `stopped` - `agents resource rebuild <name> [--yes]`: transitions `stopped`/`failed` → `building` → `running` - Type guards: `_STOPPABLE_TYPES` (devcontainer-instance, container-instance), `_REBUILDABLE_TYPES` (devcontainer-instance only) ### Cleanup Wiring - `AcpLocalFacade._handle_session_close()` → session-scoped container cleanup (works even without session service) - `PlanLifecycleService.complete_apply()` / `fail_apply()` / `fail_execute()` / `cancel_plan()` → plan-scoped container cleanup - `CleanupService.stop_active_devcontainers()` → thin static wrapper delegating to handler ### Specification Updates (`specification.md`) - Updated handler name to `DevcontainerHandler` in 3 locations - Added `activation_state` enum field to `devcontainer-instance` type definition - Updated handler dispatch table ### Documentation (`devcontainer_resources.md` — new, 264 lines) - Full lifecycle docs: state diagram, transitions table, lazy activation flow - Health checking behavior, session/plan cleanup hooks - CLI usage with stop/rebuild commands - Known limitations table (in-memory state, eviction cap, sandbox_strategy mismatch) ## Testing | Suite | Result | |-------|--------| | Behave BDD | 93 scenarios, 0 failures | | Robot Framework | 10 integration tests, 0 failures | | ASV benchmarks | 6 benchmarks (activation latency, transitions, parsing, registry ops) | | ruff lint | 0 errors | | pyright typecheck | 0 errors, 0 warnings | ### BDD Coverage Highlights (668-line feature file, 93 scenarios) - State model: enum values, all valid/invalid transitions, history cap - Lazy activation: success, failure, timeout with orphan cleanup, concurrent rejection - Stop: success, error paths, no container_id, non-zero rc, idempotent on terminal state - Rebuild: from stopped and failed states - Health check: success loop, failure transition, exception, registration, thread join - Session cleanup: scoped, unscoped, error handling, return list - CLI stop/rebuild: type validation, state preconditions, --yes flag, error paths (NotFoundError, RuntimeError) - Concurrency: dual-thread activation (one wins), dual-thread stop (both succeed — idempotent) - Parsing: valid JSON, invalid JSON, multi-line, invalid container ID, relative workspace path ## Known Limitations | Limitation | Detail | |------------|--------| | In-memory lifecycle state | Not persisted to DB; lost on restart. Planned for M7+. | | Registry eviction | Terminal trackers evicted at 200 cap. Acceptable for MVP. | | Spec `sandbox_strategy` | Spec YAML says `container_snapshot`; implementation uses `snapshot`. Internal spec inconsistency — tables elsewhere also say `snapshot`. | ## Spec References - Devcontainer Integration: `specification.md` §33253–33263 (activation_state field) - Resource Type Inheritance: `specification.md` §23509–23527 (handler definition) - Handler Dispatch: `specification.md` §24876–24877 (sandbox strategy) - ADR-042: Resource Type Inheritance - ADR-043: Devcontainer Integration ISSUES CLOSED: #514
CoreRasurae added this to the v3.6.0 milestone 2026-03-06 12:45:01 +00:00
hamza.khyari requested changes 2026-03-06 13:42:21 +00:00
Dismissed
hamza.khyari left a comment

Review: feat(devcontainer): add lazy activation and lifecycle management

Commit: 196e1415 | Issue: #514 | 16 files, +4574 lines


BUG-1 (Major) — model_copy(update=...) bypasses Pydantic validators throughout

Severity: Major
Category: BUG
File: container_lifecycle.py:249, devcontainer.py:175,236-237

transition_state() and activate_container() use tracker.model_copy(update={...}) to create new tracker instances. model_copy does NOT re-run field validators. Confirmed by instrumented test: model_copy(update={"health_check_interval": 1.0}) succeeds despite the field having ge=10 constraint.

Currently safe because the code never passes externally-sourced values through model_copy for validated fields. But this is a latent bypass — any future code that does tracker.model_copy(update={"health_check_interval": user_input}) silently violates the ge=10, le=3600 constraint. Use model_validate(tracker.model_dump() | updates) for any path that updates validated fields, or add a comment documenting the intentional bypass for internal-only fields.


BUG-2 (Major) — Stale container_id persists through BUILDING transition

Severity: Major
Category: BUG
File: container_lifecycle.py:238-248

transition_state() clears container_id and workspace_path on transitions to FAILED and STOPPED, but NOT on transition to BUILDING. During activate_container, between the BUILDING transition (line 177-182) and the RUNNING transition (line 239-244), the tracker in the registry carries the old container_id from the prior lifecycle. Any code that reads the tracker during BUILDING (e.g., a health check that hasn't been stopped yet) could see a stale container_id pointing to a defunct container.

Recommendation: Clear container_id and workspace_path on transition to BUILDING as well:

if to_state in (
    ContainerLifecycleState.FAILED,
    ContainerLifecycleState.STOPPED,
    ContainerLifecycleState.BUILDING,
):
    updates["container_id"] = None
    updates["workspace_path"] = None

BUG-3 (Medium) — list_active_containers_for_session stops other sessions' containers

Severity: Medium
Category: BUG
File: devcontainer.py:641-642

if tracker.session_id is None or tracker.session_id == session_id

Containers with session_id=None (unscoped) are returned for ANY session cleanup. If session A closes and calls stop_all_active_containers(session_id="A"), it will stop all unscoped containers, which might be in use by session B. The docstring documents this as intentional, but the first session to close wins all unscoped containers.

Recommendation: At minimum, log a warning when stopping an unscoped container during session-scoped cleanup so operators know cross-session impact occurred.


BUG-4 (Medium) — ContainerLifecycleTracker.transitions is a mutable list on a non-frozen model

Severity: Medium
Category: BUG
File: container_lifecycle.py:147,152

The tracker is NOT frozen (model_config = ConfigDict(str_strip_whitespace=True)), and transitions is list[LifecycleTransition]. Any code holding a tracker reference can mutate the list in-place: tracker.transitions.append(fake_transition). Since trackers are stored in a shared global registry, in-place mutation from one thread is visible to others without lock protection.

The current code avoids this by always creating new lists via [*tracker.transitions, transition], but this invariant is not enforced. Recommendation: Either freeze the model or use tuple[LifecycleTransition, ...] for transitions.


BUG-5 (Minor) — rebuild_container does not forward session_id

Severity: Minor
Category: BUG
File: devcontainer.py:453-457

rebuild_container delegates to activate_container but never passes session_id. If a container is rebuilt from a different session, the session_id on the tracker won't update. The rebuilt container may be cleaned up by the old session's close hook rather than the new one's.


SPEC-1 (Major) — activation_state enum names diverge from Issue #514

Severity: Major
Category: SPEC
File: container_lifecycle.py:49-72, docs/specification.md

The spec (updated by this PR) uses detected/building/running/stopping/stopped/failed. Issue #514 acceptance criteria use inactive/starting/active/stopping/stopped/error. These are completely different vocabularies. Only stopping and stopped match. This should be reconciled — either update the issue or document the intentional rename.


SPEC-2 (Minor) — Module path inconsistency: singular vs plural resource

Severity: Minor
Category: SPEC
File: docs/specification.md

The PR changes the devcontainer handler path to cleveragents.resource.handlers.devcontainer (singular). Every other handler in the spec uses cleveragents.resources.handlers.* (plural). Either the other handlers' paths are wrong or this one is.


SPEC-3 (Minor) — Health checking and session-scoped cleanup absent from spec

Severity: Minor
Category: SPEC

Issue #514 requires health checking (periodic liveness probes) and automatic cleanup on session close / plan completion. The spec has no mention of either for container resources (health checking only exists for LSP servers). These features should be added to the spec or flagged as implementation-ahead-of-spec.


SEC-1 (Medium) — workspace_folder not sanitized beyond os.path.isabs()

Severity: Medium
Category: SEC
File: devcontainer.py:165-166

activate_container validates workspace_folder is absolute but does not canonicalize it. A path like /tmp/../etc/devcontainer.json passes validation. This path is passed directly to subprocess.run(["devcontainer", "up", "--workspace-folder", workspace_folder]). While shell=True is not used, the devcontainer CLI will operate on any directory the process can access.

Recommendation: Canonicalize with os.path.realpath() and optionally restrict to known project directories.


SEC-2 (Minor) — Health probe uses hardcoded fallback path

Severity: Minor
Category: SEC
File: devcontainer.py:532

_single_probe uses tracker.workspace_path or "/workspace". If workspace_path is None (tracker in unexpected state), the probe runs against /workspace inside an arbitrary container, which could be a different container's workspace.


PERF-1 (Minor) — _handlers() rebuilds dispatch dict on every ACP request

Severity: Minor
Category: PERF
File: acp/facade.py:194-208

_route_operation calls self._handlers() which builds a new dict on every dispatch. For a hot path (every ACP request), this should be a cached property or class-level constant.


TEST-1 (Major) — Concurrency tests lack threading.Barrier and post-condition invariants

Severity: Major
Category: TEST
File: features/devcontainer_lifecycle.feature:468,558

Both concurrency scenarios ("Concurrent activation rejected" and "Concurrent stop handled safely") start threads sequentially without a threading.Barrier. The first thread may complete before the second starts, making these sequential tests in disguise. Compare with tool_runtime_steps.py:229 in this codebase which properly uses threading.Barrier.

The concurrent stop test also does not assert: (a) exactly 1 docker stop call was made (not 0 or 2), or (b) the final registry state is exactly 1 tracker in stopped state.


TEST-2 (Medium) — 5 sham/structural-only test scenarios

Severity: Medium
Category: TEST

  • "CleanupService has stop_active_devcontainers method" (line 137) — hasattr check only
  • "DevcontainerHandler has expected class attributes" (line 382) — static attribute check
  • "DevcontainerHandler extends BaseResourceHandler" (line 386) — issubclass check
  • "CLI stop with --yes skips confirmation" (line 511) — patches out production code, asserts exit code 0
  • "CLI rebuild with --yes skips confirmation" (line 516) — same

These test structural properties, not behavior. The first three would fail at import time if violated.


TEST-3 (Medium) — Coverage gaps: list_active_containers_for_session untested directly

Severity: Medium
Category: TEST
File: devcontainer.py:619-643

list_active_containers_for_session has no direct test scenario. Its return value is never asserted. The empty-session-id fallback and session_id is None matching logic are not explicitly verified.

Also untested: evict_terminal_trackers when below threshold (should return 0), get_lifecycle_tracker auto-creation behavior, clear_lifecycle_registry thread-join behavior.


TEST-4 (Minor) — Mock does not validate subprocess kwargs

Severity: Minor
Category: TEST
File: features/mocks/mock_devcontainer_cli.py:65

MockDevcontainerRunner.__call__ records but never validates kwargs (capture_output, text, timeout, check). A bug where check=True is set (causing CalledProcessError in production) would pass all tests. The mock also returns the same container ID for all workspaces, preventing detection of workspace-mismatch bugs.


CODE-1 (Minor) — vulture_whitelist.py whitelists private symbols

Severity: Minor
Category: CODE
File: vulture_whitelist.py:538-545

The whitelist includes private symbols (_stop_health_check, _health_check_loop, _single_probe, _lifecycle_registry, _registry_lock, etc.). Private symbols should not need vulture whitelisting — if vulture flags them as unused, they may be dead code or test-only utilities that should be restructured.

## Review: feat(devcontainer): add lazy activation and lifecycle management **Commit**: `196e1415` | **Issue**: #514 | **16 files, +4574 lines** --- ### BUG-1 (Major) — `model_copy(update=...)` bypasses Pydantic validators throughout **Severity**: Major **Category**: BUG **File**: `container_lifecycle.py:249`, `devcontainer.py:175,236-237` `transition_state()` and `activate_container()` use `tracker.model_copy(update={...})` to create new tracker instances. `model_copy` does NOT re-run field validators. Confirmed by instrumented test: `model_copy(update={"health_check_interval": 1.0})` succeeds despite the field having `ge=10` constraint. Currently safe because the code never passes externally-sourced values through `model_copy` for validated fields. But this is a latent bypass — any future code that does `tracker.model_copy(update={"health_check_interval": user_input})` silently violates the `ge=10, le=3600` constraint. Use `model_validate(tracker.model_dump() | updates)` for any path that updates validated fields, or add a comment documenting the intentional bypass for internal-only fields. --- ### BUG-2 (Major) — Stale `container_id` persists through `BUILDING` transition **Severity**: Major **Category**: BUG **File**: `container_lifecycle.py:238-248` `transition_state()` clears `container_id` and `workspace_path` on transitions to `FAILED` and `STOPPED`, but NOT on transition to `BUILDING`. During `activate_container`, between the `BUILDING` transition (line 177-182) and the `RUNNING` transition (line 239-244), the tracker in the registry carries the old `container_id` from the prior lifecycle. Any code that reads the tracker during `BUILDING` (e.g., a health check that hasn't been stopped yet) could see a stale container_id pointing to a defunct container. **Recommendation**: Clear `container_id` and `workspace_path` on transition to `BUILDING` as well: ```python if to_state in ( ContainerLifecycleState.FAILED, ContainerLifecycleState.STOPPED, ContainerLifecycleState.BUILDING, ): updates["container_id"] = None updates["workspace_path"] = None ``` --- ### BUG-3 (Medium) — `list_active_containers_for_session` stops other sessions' containers **Severity**: Medium **Category**: BUG **File**: `devcontainer.py:641-642` ```python if tracker.session_id is None or tracker.session_id == session_id ``` Containers with `session_id=None` (unscoped) are returned for ANY session cleanup. If session A closes and calls `stop_all_active_containers(session_id="A")`, it will stop all unscoped containers, which might be in use by session B. The docstring documents this as intentional, but the first session to close wins all unscoped containers. **Recommendation**: At minimum, log a warning when stopping an unscoped container during session-scoped cleanup so operators know cross-session impact occurred. --- ### BUG-4 (Medium) — `ContainerLifecycleTracker.transitions` is a mutable list on a non-frozen model **Severity**: Medium **Category**: BUG **File**: `container_lifecycle.py:147,152` The tracker is NOT frozen (`model_config = ConfigDict(str_strip_whitespace=True)`), and `transitions` is `list[LifecycleTransition]`. Any code holding a tracker reference can mutate the list in-place: `tracker.transitions.append(fake_transition)`. Since trackers are stored in a shared global registry, in-place mutation from one thread is visible to others without lock protection. The current code avoids this by always creating new lists via `[*tracker.transitions, transition]`, but this invariant is not enforced. **Recommendation**: Either freeze the model or use `tuple[LifecycleTransition, ...]` for `transitions`. --- ### BUG-5 (Minor) — `rebuild_container` does not forward `session_id` **Severity**: Minor **Category**: BUG **File**: `devcontainer.py:453-457` `rebuild_container` delegates to `activate_container` but never passes `session_id`. If a container is rebuilt from a different session, the session_id on the tracker won't update. The rebuilt container may be cleaned up by the old session's close hook rather than the new one's. --- ### SPEC-1 (Major) — `activation_state` enum names diverge from Issue #514 **Severity**: Major **Category**: SPEC **File**: `container_lifecycle.py:49-72`, `docs/specification.md` The spec (updated by this PR) uses `detected/building/running/stopping/stopped/failed`. Issue #514 acceptance criteria use `inactive/starting/active/stopping/stopped/error`. These are completely different vocabularies. Only `stopping` and `stopped` match. This should be reconciled — either update the issue or document the intentional rename. --- ### SPEC-2 (Minor) — Module path inconsistency: singular vs plural `resource` **Severity**: Minor **Category**: SPEC **File**: `docs/specification.md` The PR changes the devcontainer handler path to `cleveragents.resource.handlers.devcontainer` (singular). Every other handler in the spec uses `cleveragents.resources.handlers.*` (plural). Either the other handlers' paths are wrong or this one is. --- ### SPEC-3 (Minor) — Health checking and session-scoped cleanup absent from spec **Severity**: Minor **Category**: SPEC Issue #514 requires health checking (periodic liveness probes) and automatic cleanup on session close / plan completion. The spec has no mention of either for container resources (health checking only exists for LSP servers). These features should be added to the spec or flagged as implementation-ahead-of-spec. --- ### SEC-1 (Medium) — `workspace_folder` not sanitized beyond `os.path.isabs()` **Severity**: Medium **Category**: SEC **File**: `devcontainer.py:165-166` `activate_container` validates `workspace_folder` is absolute but does not canonicalize it. A path like `/tmp/../etc/devcontainer.json` passes validation. This path is passed directly to `subprocess.run(["devcontainer", "up", "--workspace-folder", workspace_folder])`. While `shell=True` is not used, the devcontainer CLI will operate on any directory the process can access. **Recommendation**: Canonicalize with `os.path.realpath()` and optionally restrict to known project directories. --- ### SEC-2 (Minor) — Health probe uses hardcoded fallback path **Severity**: Minor **Category**: SEC **File**: `devcontainer.py:532` `_single_probe` uses `tracker.workspace_path or "/workspace"`. If `workspace_path` is None (tracker in unexpected state), the probe runs against `/workspace` inside an arbitrary container, which could be a different container's workspace. --- ### PERF-1 (Minor) — `_handlers()` rebuilds dispatch dict on every ACP request **Severity**: Minor **Category**: PERF **File**: `acp/facade.py:194-208` `_route_operation` calls `self._handlers()` which builds a new dict on every dispatch. For a hot path (every ACP request), this should be a cached property or class-level constant. --- ### TEST-1 (Major) — Concurrency tests lack `threading.Barrier` and post-condition invariants **Severity**: Major **Category**: TEST **File**: `features/devcontainer_lifecycle.feature:468,558` Both concurrency scenarios ("Concurrent activation rejected" and "Concurrent stop handled safely") start threads sequentially without a `threading.Barrier`. The first thread may complete before the second starts, making these sequential tests in disguise. Compare with `tool_runtime_steps.py:229` in this codebase which properly uses `threading.Barrier`. The concurrent stop test also does not assert: (a) exactly 1 `docker stop` call was made (not 0 or 2), or (b) the final registry state is exactly 1 tracker in `stopped` state. --- ### TEST-2 (Medium) — 5 sham/structural-only test scenarios **Severity**: Medium **Category**: TEST - "CleanupService has stop_active_devcontainers method" (line 137) — `hasattr` check only - "DevcontainerHandler has expected class attributes" (line 382) — static attribute check - "DevcontainerHandler extends BaseResourceHandler" (line 386) — `issubclass` check - "CLI stop with --yes skips confirmation" (line 511) — patches out production code, asserts exit code 0 - "CLI rebuild with --yes skips confirmation" (line 516) — same These test structural properties, not behavior. The first three would fail at import time if violated. --- ### TEST-3 (Medium) — Coverage gaps: `list_active_containers_for_session` untested directly **Severity**: Medium **Category**: TEST **File**: `devcontainer.py:619-643` `list_active_containers_for_session` has no direct test scenario. Its return value is never asserted. The empty-session-id fallback and `session_id is None` matching logic are not explicitly verified. Also untested: `evict_terminal_trackers` when below threshold (should return 0), `get_lifecycle_tracker` auto-creation behavior, `clear_lifecycle_registry` thread-join behavior. --- ### TEST-4 (Minor) — Mock does not validate subprocess kwargs **Severity**: Minor **Category**: TEST **File**: `features/mocks/mock_devcontainer_cli.py:65` `MockDevcontainerRunner.__call__` records but never validates kwargs (`capture_output`, `text`, `timeout`, `check`). A bug where `check=True` is set (causing `CalledProcessError` in production) would pass all tests. The mock also returns the same container ID for all workspaces, preventing detection of workspace-mismatch bugs. --- ### CODE-1 (Minor) — `vulture_whitelist.py` whitelists private symbols **Severity**: Minor **Category**: CODE **File**: `vulture_whitelist.py:538-545` The whitelist includes private symbols (`_stop_health_check`, `_health_check_loop`, `_single_probe`, `_lifecycle_registry`, `_registry_lock`, etc.). Private symbols should not need vulture whitelisting — if vulture flags them as unused, they may be dead code or test-only utilities that should be restructured.
CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from 196e1415c0
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 20s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 35s
CI / unit_tests (pull_request) Successful in 2m20s
CI / integration_tests (pull_request) Successful in 2m52s
CI / docker (pull_request) Successful in 38s
CI / coverage (pull_request) Successful in 4m7s
CI / benchmark-regression (pull_request) Successful in 25m48s
to ddcc22608e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 18s
CI / build (pull_request) Successful in 17s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 1m8s
CI / unit_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 41s
CI / integration_tests (pull_request) Successful in 4m20s
CI / coverage (pull_request) Successful in 4m23s
CI / benchmark-regression (pull_request) Successful in 25m47s
2026-03-06 15:42:08 +00:00
Compare
hamza.khyari approved these changes 2026-03-06 17:45:52 +00:00
Dismissed
hamza.khyari left a comment

update

update
brent.edwards requested changes 2026-03-06 19:39:22 +00:00
Dismissed
brent.edwards left a comment

Review — PR #613 feature/m6plus-devcontainer-lifecycle

Reviewer: brent.edwards | Round: 2 (complementing hamza.khyari's Round 1 review #2010) | Commit: 870ddb31

Quality Gates

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)

Note on Branch Freshness

This branch is forked from a60cda1f and is ~140 files behind current master. The git diff origin/master..HEAD shows those files as "deletions" but they are NOT part of the commit — a merge would correctly retain master's additions. However, I recommend rebasing before merge to avoid surprises and ensure quality gates run against the latest codebase.

Findings Summary

ID Sev Category File Description
F1 P1 Bug devcontainer.py:121-215 TOCTOU race in activate_container — concurrent calls read+transition without holding the lock across the full operation, spawning duplicate containers
F2 P1 Bug devcontainer.py:345-350 _stop_health_check sets the stop event but never joins the thread — health check loop can race with stop/activate transitions
F3 P1 Process N/A Missing CHANGELOG entry for #514 — CONTRIBUTING requires every PR to prepend to ## Unreleased
F4 P1 Spec cleanup_service.py:510-527 Cleanup wiring incomplete — stop_active_devcontainers method exists but is not wired into session close or plan completion hooks. PR description claims: "AcpLocalFacade._handle_session_close() → session-scoped container cleanup" and "PlanLifecycleService.complete_apply()/fail_apply()/..." but these hooks don't exist in the code.
F5 P2 Bug devcontainer.py:58-59, 330-350 _health_check_threads and _health_check_stop_events are plain dicts accessed from multiple threads (health check background threads + main thread) without _registry_lock or a separate lock
F6 P2 Process PR description PR description is massively stale — describes states detected/building/running/failed but code uses inactive/starting/active/error; claims 794-line handler (actual: 489), 668-line feature file with 93 scenarios (actual: 290 lines). Misleads reviewers.
F7 P2 Policy devcontainer_lifecycle_steps.py File is 739 lines — exceeds 500-line limit per CONTRIBUTING
F8 P2 Policy cleanup_service.py File pushed to 543 lines — exceeds 500-line limit
F9 P2 Policy devcontainer_lifecycle_steps.py 9 # type: ignore inline suppressions — CONTRIBUTING prohibits these
F10 P2 Code devcontainer.py:284-311 rebuild_container is a pass-through to activate_container with no actual rebuild logic — does not pass --reset-container or --no-cache to devcontainer up, so it's functionally identical to activate
F11 P2 Policy cleanup_service.py:523 Inline import inside function body — CONTRIBUTING requires all imports at file top (only if TYPE_CHECKING: blocks are exempt)

Verdict

REQUEST_CHANGES — 4 P1 findings. The TOCTOU race (F1) and thread join omission (F2) are correctness bugs that will manifest under concurrent usage. The missing CHANGELOG (F3) and unwired cleanup hooks (F4) are process/spec gaps. The stale PR description (F6) needs updating to match the actual implementation.

What Went Well

  • Clean domain model with well-defined state machine and transition validation
  • Good use of threading.Event for health check cancellation
  • Thorough BDD scenarios covering error paths and edge cases
  • Lint and typecheck pass cleanly
  • Commit message follows Conventional Changelog format with issue reference
## Review — PR #613 `feature/m6plus-devcontainer-lifecycle` **Reviewer**: brent.edwards | **Round**: 2 (complementing hamza.khyari's Round 1 review #2010) | **Commit**: `870ddb31` ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | ### Note on Branch Freshness This branch is forked from `a60cda1f` and is ~140 files behind current master. The `git diff origin/master..HEAD` shows those files as "deletions" but they are NOT part of the commit — a merge would correctly retain master's additions. However, I recommend rebasing before merge to avoid surprises and ensure quality gates run against the latest codebase. ### Findings Summary | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F1 | P1 | Bug | `devcontainer.py:121-215` | TOCTOU race in `activate_container` — concurrent calls read+transition without holding the lock across the full operation, spawning duplicate containers | | F2 | P1 | Bug | `devcontainer.py:345-350` | `_stop_health_check` sets the stop event but never joins the thread — health check loop can race with stop/activate transitions | | F3 | P1 | Process | N/A | Missing CHANGELOG entry for #514 — CONTRIBUTING requires every PR to prepend to `## Unreleased` | | F4 | P1 | Spec | `cleanup_service.py:510-527` | Cleanup wiring incomplete — `stop_active_devcontainers` method exists but is not wired into session close or plan completion hooks. PR description claims: "AcpLocalFacade._handle_session_close() → session-scoped container cleanup" and "PlanLifecycleService.complete_apply()/fail_apply()/..." but these hooks don't exist in the code. | | F5 | P2 | Bug | `devcontainer.py:58-59, 330-350` | `_health_check_threads` and `_health_check_stop_events` are plain dicts accessed from multiple threads (health check background threads + main thread) without `_registry_lock` or a separate lock | | F6 | P2 | Process | PR description | PR description is massively stale — describes states `detected/building/running/failed` but code uses `inactive/starting/active/error`; claims 794-line handler (actual: 489), 668-line feature file with 93 scenarios (actual: 290 lines). Misleads reviewers. | | F7 | P2 | Policy | `devcontainer_lifecycle_steps.py` | File is 739 lines — exceeds 500-line limit per CONTRIBUTING | | F8 | P2 | Policy | `cleanup_service.py` | File pushed to 543 lines — exceeds 500-line limit | | F9 | P2 | Policy | `devcontainer_lifecycle_steps.py` | 9 `# type: ignore` inline suppressions — CONTRIBUTING prohibits these | | F10 | P2 | Code | `devcontainer.py:284-311` | `rebuild_container` is a pass-through to `activate_container` with no actual rebuild logic — does not pass `--reset-container` or `--no-cache` to `devcontainer up`, so it's functionally identical to activate | | F11 | P2 | Policy | `cleanup_service.py:523` | Inline import inside function body — CONTRIBUTING requires all imports at file top (only `if TYPE_CHECKING:` blocks are exempt) | ### Verdict **REQUEST_CHANGES** — 4 P1 findings. The TOCTOU race (F1) and thread join omission (F2) are correctness bugs that will manifest under concurrent usage. The missing CHANGELOG (F3) and unwired cleanup hooks (F4) are process/spec gaps. The stale PR description (F6) needs updating to match the actual implementation. ### What Went Well - Clean domain model with well-defined state machine and transition validation - Good use of `threading.Event` for health check cancellation - Thorough BDD scenarios covering error paths and edge cases - Lint and typecheck pass cleanly - Commit message follows Conventional Changelog format with issue reference
@ -0,0 +1,1720 @@
"""Step definitions for devcontainer_lifecycle.feature.
Member

F7 [P2 · Policy] — This file is 739 lines, exceeding the 500-line limit per CONTRIBUTING. Consider splitting into separate step files (e.g., devcontainer_activation_steps.py, devcontainer_health_check_steps.py, devcontainer_cleanup_steps.py).

F9 [P2 · Policy] — 9 # type: ignore inline suppressions at lines 336, 492, 501, 511, 570, 584, 598, 631, 731. CONTRIBUTING prohibits these. Most are passing wrong types to test error paths — use explicit type-correct wrappers or cast() to avoid the suppressions.

**F7 [P2 · Policy]** — This file is 739 lines, exceeding the 500-line limit per CONTRIBUTING. Consider splitting into separate step files (e.g., `devcontainer_activation_steps.py`, `devcontainer_health_check_steps.py`, `devcontainer_cleanup_steps.py`). **F9 [P2 · Policy]** — 9 `# type: ignore` inline suppressions at lines 336, 492, 501, 511, 570, 584, 598, 631, 731. CONTRIBUTING prohibits these. Most are passing wrong types to test error paths — use explicit type-correct wrappers or `cast()` to avoid the suppressions.
@ -507,0 +507,4 @@
# ── Devcontainer lifecycle cleanup (issue #514) ─────────────
@staticmethod
def stop_active_devcontainers(session_id: str = "") -> list[str]:
Member

F4 [P1 · Spec]stop_active_devcontainers is defined here but not wired into any session close or plan completion hook. The PR description claims wiring into AcpLocalFacade._handle_session_close() and PlanLifecycleService.complete_apply()/fail_apply()/fail_execute()/cancel_plan(), but none of these hooks exist in the code. This method is dead code in the current implementation.

F11 [P2 · Policy] — Line 523: inline from import inside a function body. CONTRIBUTING requires all imports at file top except if TYPE_CHECKING: blocks. Move this import to the top of the file.

**F4 [P1 · Spec]** — `stop_active_devcontainers` is defined here but not wired into any session close or plan completion hook. The PR description claims wiring into `AcpLocalFacade._handle_session_close()` and `PlanLifecycleService.complete_apply()/fail_apply()/fail_execute()/cancel_plan()`, but none of these hooks exist in the code. This method is dead code in the current implementation. **F11 [P2 · Policy]** — Line 523: inline `from` import inside a function body. CONTRIBUTING requires all imports at file top except `if TYPE_CHECKING:` blocks. Move this import to the top of the file.
@ -28,0 +55,4 @@
#: S1 fix: regex pattern for valid Docker container IDs (12-64 hex chars).
_CONTAINER_ID_PATTERN = re.compile(r"^[a-f0-9]{12,64}$")
#: Registry of lifecycle trackers keyed by resource_id.
Member

F5 [P2 · Bug]_health_check_threads and _health_check_stop_events are plain dicts mutated by start_health_check (main thread), _stop_health_check (main thread or stop_all_active_containers), and clear_lifecycle_registry. These mutations happen without holding _registry_lock or any other lock. If a health check thread is being started while another is being stopped for the same resource, dict mutations can race.

Fix: either protect these dicts with _registry_lock or introduce a dedicated _health_check_lock.

**F5 [P2 · Bug]** — `_health_check_threads` and `_health_check_stop_events` are plain dicts mutated by `start_health_check` (main thread), `_stop_health_check` (main thread or stop_all_active_containers), and `clear_lifecycle_registry`. These mutations happen without holding `_registry_lock` or any other lock. If a health check thread is being started while another is being stopped for the same resource, dict mutations can race. Fix: either protect these dicts with `_registry_lock` or introduce a dedicated `_health_check_lock`.
@ -28,0 +146,4 @@
resource_id: ULID of the devcontainer resource.
workspace_folder: Path to the workspace containing devcontainer config.
run_command: Optional callable replacing subprocess.run for testing.
session_id: Optional session/plan ID to associate with this container
Member

F1 [P1 · Bug] — TOCTOU race in activate_container. The function calls get_lifecycle_tracker() at line 149 (which acquires/releases _registry_lock), then transition_state() + set_lifecycle_tracker() at lines 152-157 (separate lock acquisition), then spawns devcontainer up without any lock held. Two concurrent calls for the same resource_id will both see state=inactive, both transition to STARTING, and both spawn devcontainer up.

Fix: either (a) hold _registry_lock across the read-check-transition-spawn sequence (simplest but blocks other resources), or (b) use per-resource locks so only one activation per resource can proceed at a time.

**F1 [P1 · Bug]** — TOCTOU race in `activate_container`. The function calls `get_lifecycle_tracker()` at line 149 (which acquires/releases `_registry_lock`), then `transition_state()` + `set_lifecycle_tracker()` at lines 152-157 (separate lock acquisition), then spawns `devcontainer up` without any lock held. Two concurrent calls for the same `resource_id` will both see state=inactive, both transition to STARTING, and both spawn `devcontainer up`. Fix: either (a) hold `_registry_lock` across the read-check-transition-spawn sequence (simplest but blocks other resources), or (b) use per-resource locks so only one activation per resource can proceed at a time.
@ -28,0 +304,4 @@
tracker = get_lifecycle_tracker(resource_id)
tracker = transition_state(
tracker,
ContainerLifecycleState.FAILED,
Member

F10 [P2 · Code]rebuild_container simply delegates to activate_container with no rebuild-specific behavior. A real rebuild should pass --reset-container or --no-cache to devcontainer up to force image/container recreation. As implemented, this is indistinguishable from a re-activate after stop, which undermines the CLI distinction between resource stop && resource rebuild vs. just re-activating.

**F10 [P2 · Code]** — `rebuild_container` simply delegates to `activate_container` with no rebuild-specific behavior. A real rebuild should pass `--reset-container` or `--no-cache` to `devcontainer up` to force image/container recreation. As implemented, this is indistinguishable from a re-activate after stop, which undermines the CLI distinction between `resource stop && resource rebuild` vs. just re-activating.
@ -28,0 +342,4 @@
Raises:
ValueError: If the container is not in ``running`` state.
RuntimeError: If stop fails.
Member

F2 [P1 · Bug]_stop_health_check sets the event and removes the thread from the dict, but never calls thread.join(). The health check loop at line 353 may still be executing — it could call set_lifecycle_tracker() after the stop transition completes, writing stale state back to the registry.

Fix: call thread.join(timeout=...) after setting the stop event, before proceeding with the stop transition.

**F2 [P1 · Bug]** — `_stop_health_check` sets the event and removes the thread from the dict, but never calls `thread.join()`. The health check loop at line 353 may still be executing — it could call `set_lifecycle_tracker()` after the stop transition completes, writing stale state back to the registry. Fix: call `thread.join(timeout=...)` after setting the stop event, before proceeding with the stop transition.
Member

Supplementary Findings — PR #613 (Second Pass)

Following review #2019, a thorough second pass uncovered two additional P2 findings:

F12 [P2 · Bug] — stop_container ignores docker stop exit code

stop_container (lines 253-270) calls runner(["docker", "stop", container_id], ..., check=False) but discards the return value without checking result.returncode. If docker stop fails (e.g., container already removed, Docker daemon unresponsive with a non-exception timeout), the function still transitions the tracker to STOPPED.

Consequence: the container may still be running while the tracker says stopped. A subsequent activate_container / rebuild_container would start a second container, leaking the first one.

Fix: capture the result and check returncode. If non-zero, transition to ERROR instead of STOPPED, or at least log a warning.

File: src/cleveragents/resource/handlers/devcontainer.py:257

F13 [P2 · Docs] — devcontainer_resources.md claims automatic session/plan cleanup

docs/reference/devcontainer_resources.md lines 152-160 state:

"When a session closes or a plan completes, all active devcontainer-instances associated with that session are stopped automatically."

This isn't true in the implementation. CleanupService.stop_active_devcontainers() exists as a method but is not wired into any session-close or plan-completion hook (as noted in F4). The documentation should either be marked as "planned for future" or removed until the wiring is implemented.

File: docs/reference/devcontainer_resources.md:152-160


Updated totals: 4 P1 + 9 P2 = 13 findings (F1-F13).

## Supplementary Findings — PR #613 (Second Pass) Following review #2019, a thorough second pass uncovered two additional P2 findings: ### F12 [P2 · Bug] — `stop_container` ignores `docker stop` exit code `stop_container` (lines 253-270) calls `runner(["docker", "stop", container_id], ..., check=False)` but **discards the return value** without checking `result.returncode`. If `docker stop` fails (e.g., container already removed, Docker daemon unresponsive with a non-exception timeout), the function still transitions the tracker to `STOPPED`. Consequence: the container may still be running while the tracker says `stopped`. A subsequent `activate_container` / `rebuild_container` would start a second container, leaking the first one. **Fix**: capture the result and check `returncode`. If non-zero, transition to `ERROR` instead of `STOPPED`, or at least log a warning. **File**: `src/cleveragents/resource/handlers/devcontainer.py:257` ### F13 [P2 · Docs] — `devcontainer_resources.md` claims automatic session/plan cleanup `docs/reference/devcontainer_resources.md` lines 152-160 state: > "When a session closes or a plan completes, all active devcontainer-instances associated with that session are stopped automatically." This isn't true in the implementation. `CleanupService.stop_active_devcontainers()` exists as a method but is not wired into any session-close or plan-completion hook (as noted in F4). The documentation should either be marked as "planned for future" or removed until the wiring is implemented. **File**: `docs/reference/devcontainer_resources.md:152-160` --- **Updated totals**: 4 P1 + 9 P2 = 13 findings (F1-F13).
Member

Supplementary Findings — PR #613 (Third Pass)

After reading all posted reviews and re-checking the code, I found two additional P1 bugs that are not covered in prior reviews:

F14 [P1 · Bug] — devcontainer up output parsing fails with real CLI output

activate_container() assumes stdout is pure JSON, but the real devcontainer CLI prints log lines plus a JSON line at the end. The official CLI README example shows log output followed by a JSON object on the last line. Current parser does:

# devcontainer.py:467
json.loads(stdout)

If stdout contains any non-JSON log lines before the JSON block, json.loads fails and returns (None, None). The tracker is still transitioned to ACTIVE with container_id=None, and stop/health check logic breaks.

Fix: parse the last JSON line (scan from the end) or add --output json and parse only JSON. This is explicitly called out in the PR description but not implemented.

File: src/cleveragents/resource/handlers/devcontainer.py:467

F15 [P1 · Bug] — Health checks pass remote workspace path to devcontainer exec

_parse_devcontainer_up_output() stores remoteWorkspaceFolder into tracker.workspace_path. That value is inside the container (e.g., /workspaces/project). But the CLI expects host path for --workspace-folder:

# README example
 devcontainer exec --workspace-folder <host-path>

In _health_check_loop, the command is:

# devcontainer.py:372-374
"devcontainer", "exec", "--workspace-folder", tracker.workspace_path or "/workspace"

If workspace_path is the remote container path (or the fallback /workspace), devcontainer exec will not find the devcontainer config on the host and will fail. This causes health checks to spuriously mark containers ERROR in normal usage.

Fix: store the host workspace folder (the original workspace_folder argument) on the tracker and use that for devcontainer exec. Keep the remote path separately if needed.

File: src/cleveragents/resource/handlers/devcontainer.py:372-374


Updated totals: 6 P1 + 9 P2 = 15 findings (F1-F15).

## Supplementary Findings — PR #613 (Third Pass) After reading all posted reviews and re-checking the code, I found two additional **P1** bugs that are not covered in prior reviews: ### F14 [P1 · Bug] — `devcontainer up` output parsing fails with real CLI output `activate_container()` assumes `stdout` is **pure JSON**, but the real devcontainer CLI prints log lines **plus a JSON line at the end**. The official CLI README example shows log output followed by a JSON object on the last line. Current parser does: ```python # devcontainer.py:467 json.loads(stdout) ``` If stdout contains any non-JSON log lines before the JSON block, `json.loads` fails and returns `(None, None)`. The tracker is still transitioned to `ACTIVE` with `container_id=None`, and stop/health check logic breaks. **Fix**: parse the **last JSON line** (scan from the end) or add `--output json` and parse only JSON. This is explicitly called out in the PR description but not implemented. **File**: `src/cleveragents/resource/handlers/devcontainer.py:467` ### F15 [P1 · Bug] — Health checks pass **remote** workspace path to `devcontainer exec` `_parse_devcontainer_up_output()` stores `remoteWorkspaceFolder` into `tracker.workspace_path`. That value is **inside the container** (e.g., `/workspaces/project`). But the CLI expects **host** path for `--workspace-folder`: ``` # README example devcontainer exec --workspace-folder <host-path> ``` In `_health_check_loop`, the command is: ```python # devcontainer.py:372-374 "devcontainer", "exec", "--workspace-folder", tracker.workspace_path or "/workspace" ``` If `workspace_path` is the remote container path (or the fallback `/workspace`), `devcontainer exec` will not find the devcontainer config on the host and will fail. This causes health checks to spuriously mark containers `ERROR` in normal usage. **Fix**: store the **host workspace folder** (the original `workspace_folder` argument) on the tracker and use that for `devcontainer exec`. Keep the remote path separately if needed. **File**: `src/cleveragents/resource/handlers/devcontainer.py:372-374` --- **Updated totals**: 6 P1 + 9 P2 = 15 findings (F1-F15).
brent.edwards left a comment

Inline Notes — F14/F15

Adding inline comments for the two new P1 findings (F14, F15) described in comment #55338.

## Inline Notes — F14/F15 Adding inline comments for the two new P1 findings (F14, F15) described in comment #55338.
@ -28,0 +371,4 @@
tracker.current_state.value,
)
return tracker
tracker = transition_state(
Member

F15 [P1 · Bug] — Health checks call devcontainer exec --workspace-folder using tracker.workspace_path, which is set to remoteWorkspaceFolder (container path) from devcontainer up. The CLI expects the host workspace folder path, so this fails in normal usage and marks containers ERROR.

Fix: store the original host workspace_folder on the tracker and use that for devcontainer exec (keep remote path separately if needed).

**F15 [P1 · Bug]** — Health checks call `devcontainer exec --workspace-folder` using `tracker.workspace_path`, which is set to `remoteWorkspaceFolder` (container path) from `devcontainer up`. The CLI expects the **host** workspace folder path, so this fails in normal usage and marks containers ERROR. Fix: store the original host `workspace_folder` on the tracker and use that for `devcontainer exec` (keep remote path separately if needed).
@ -28,0 +464,4 @@
# associated with the calling session for scoped cleanup.
return activate_container(
resource_id,
workspace_folder,
Member

F14 [P1 · Bug]json.loads(stdout) assumes stdout is pure JSON. The devcontainer CLI prints log lines plus a trailing JSON line (see official CLI README). With real output, this parse fails and yields (None, None), but the tracker still transitions to ACTIVE.

Fix: parse the last JSON line (scan from end) or run devcontainer up --output json and parse only JSON.

**F14 [P1 · Bug]** — `json.loads(stdout)` assumes stdout is pure JSON. The devcontainer CLI prints log lines plus a trailing JSON line (see official CLI README). With real output, this parse fails and yields `(None, None)`, but the tracker still transitions to ACTIVE. Fix: parse the **last JSON line** (scan from end) or run `devcontainer up --output json` and parse only JSON.
brent.edwards left a comment

Additional Findings — PR #613 (Follow-up)

Found three additional P1 issues after re-scanning the full diff. Inline notes below.

ID Sev Category Summary
F16 P1 Bug Lazy activation is not wired into DevcontainerHandler.resolve (no call to activate_container). Feature is defined but unused in production code.
F17 P1 Bug Health checks are never started — start_health_check is defined but never invoked after activation.
F18 P1 Bug Session cleanup is not session-scoped — stop_all_active_containers ignores session_id (used only for logging) and stops all active containers.
## Additional Findings — PR #613 (Follow-up) Found three additional **P1** issues after re-scanning the full diff. Inline notes below. | ID | Sev | Category | Summary | |:---|:----|:---------|:--------| | F16 | P1 | Bug | Lazy activation is not wired into `DevcontainerHandler.resolve` (no call to `activate_container`). Feature is defined but unused in production code. | | F17 | P1 | Bug | Health checks are never started — `start_health_check` is defined but never invoked after activation. | | F18 | P1 | Bug | Session cleanup is not session-scoped — `stop_all_active_containers` ignores `session_id` (used only for logging) and stops **all** active containers. |
@ -28,0 +194,4 @@
"up",
"--workspace-folder",
workspace_folder,
],
Member

F17 [P1 · Bug] — Health checks never start. After a successful activation, the code transitions to ACTIVE but does not call start_health_check(). A grep shows start_health_check is only defined, never invoked. This means the required periodic liveness probes never run.

Fix: call start_health_check(resource_id, interval=tracker.health_check_interval) immediately after transitioning to ACTIVE (and/or from the tool invocation path).

**F17 [P1 · Bug]** — Health checks never start. After a successful activation, the code transitions to `ACTIVE` but does not call `start_health_check()`. A grep shows `start_health_check` is only defined, never invoked. This means the required periodic liveness probes never run. Fix: call `start_health_check(resource_id, interval=tracker.health_check_interval)` immediately after transitioning to ACTIVE (and/or from the tool invocation path).
@ -28,0 +430,4 @@
)
set_lifecycle_tracker(tracker)
raise RuntimeError(f"Failed to stop container: {exc}") from exc
Member

F18 [P1 · Bug] — Session cleanup is not scoped. stop_all_active_containers() accepts session_id but uses it only for logging; it always calls list_active_containers() (global) and stops all active containers. Issue #514 expects cleanup of containers associated with that session.

Fix: track session_id on ContainerLifecycleTracker (or another mapping) and filter to session-scoped containers here.

**F18 [P1 · Bug]** — Session cleanup is not scoped. `stop_all_active_containers()` accepts `session_id` but uses it only for logging; it always calls `list_active_containers()` (global) and stops **all** active containers. Issue #514 expects cleanup of containers *associated with that session*. Fix: track `session_id` on `ContainerLifecycleTracker` (or another mapping) and filter to session-scoped containers here.
@ -28,0 +473,4 @@
def start_health_check(
resource_id: str,
*,
interval: float = DEFAULT_HEALTH_CHECK_INTERVAL,
Member

F16 [P1 · Bug] — Lazy activation is not wired. DevcontainerHandler doesn’t override resolve() or call activate_container, and activate_container() isn’t referenced anywhere else in src/. This means devcontainer resources are resolved via BaseResourceHandler.resolve() with no lazy activation on first tool target — the primary behavior in issue #514 is effectively unused in production code.

Fix: override DevcontainerHandler.resolve() (or the tool execution path) to call activate_container() when the resource is devcontainer-instance and in inactive/stopped/error state.

**F16 [P1 · Bug]** — Lazy activation is not wired. `DevcontainerHandler` doesn’t override `resolve()` or call `activate_container`, and `activate_container()` isn’t referenced anywhere else in `src/`. This means devcontainer resources are resolved via `BaseResourceHandler.resolve()` with no lazy activation on first tool target — the primary behavior in issue #514 is effectively unused in production code. Fix: override `DevcontainerHandler.resolve()` (or the tool execution path) to call `activate_container()` when the resource is `devcontainer-instance` and in `inactive/stopped/error` state.
brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

One more P2 issue found on another pass. Inline note below.

## Additional Finding — PR #613 (Follow-up) One more P2 issue found on another pass. Inline note below.
@ -1061,0 +1151,4 @@
raise typer.Abort() from exc
except (ValueError, RuntimeError) as exc:
console.print(f"[red]Stop failed:[/red] {exc}")
raise typer.Abort() from exc
Member

F19 [P2 · Bug] — These commands advertise support for container-instance, but resource add container-instance typically sets no location (only --image). In this path, resource_rebuild aborts with "no location for rebuild" and resource_stop fails because no lifecycle tracker exists (devcontainer activation isn’t wired). This effectively makes stop/rebuild non-functional for container-instance resources.

Fix: either restrict to devcontainer-instance only, or require --path for container-instance and implement activation/lifecycle for that type.

**F19 [P2 · Bug]** — These commands advertise support for `container-instance`, but `resource add container-instance` typically sets no location (only `--image`). In this path, `resource_rebuild` aborts with "no location for rebuild" and `resource_stop` fails because no lifecycle tracker exists (devcontainer activation isn’t wired). This effectively makes stop/rebuild non-functional for container-instance resources. Fix: either restrict to `devcontainer-instance` only, or require `--path` for container-instance and implement activation/lifecycle for that type.
brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P1 issue on another scan. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P1** issue on another scan. Inline note below.
@ -28,0 +50,4 @@
logger = logging.getLogger(__name__)
#: Default health check interval in seconds.
DEFAULT_HEALTH_CHECK_INTERVAL: float = 30.0
Member

F20 [P1 · Bug] — Lifecycle state is stored only in an in-memory module dict (_lifecycle_registry). This means state is process-local: a new agents CLI invocation cannot see trackers created in a previous process. As a result, agents resource stop/rebuild cannot stop containers started by earlier commands (it will create a new INACTIVE tracker and fail the transition). This also violates the acceptance criterion: “Track container lifecycle state transitions in the resource registry.”

Fix: persist lifecycle state in the Resource registry (DB) or another shared store (e.g., resource.metadata_json) so state survives process boundaries. At minimum, store container_id, current_state, and workspace_folder per resource and hydrate the tracker on CLI operations.

**F20 [P1 · Bug]** — Lifecycle state is stored only in an in-memory module dict (`_lifecycle_registry`). This means state is **process-local**: a new `agents` CLI invocation cannot see trackers created in a previous process. As a result, `agents resource stop/rebuild` cannot stop containers started by earlier commands (it will create a new INACTIVE tracker and fail the transition). This also violates the acceptance criterion: *“Track container lifecycle state transitions in the resource registry.”* Fix: persist lifecycle state in the Resource registry (DB) or another shared store (e.g., `resource.metadata_json`) so state survives process boundaries. At minimum, store `container_id`, `current_state`, and `workspace_folder` per resource and hydrate the tracker on CLI operations.
brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P1 compatibility bug. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P1** compatibility bug. Inline note below.
@ -28,0 +254,4 @@
start_health_check(resource_id, run_command=run_command)
except RuntimeError:
raise
Member

F21 [P1 · Bug]stop_container hardcodes docker stop. Devcontainer CLI can target Podman or other engines; this will fail in non-Docker environments (and conflicts with ADR-039 engine-agnostic container types).

Fix: use the devcontainer CLI (if/when stop is supported), or capture the engine from resource properties and dispatch to podman stop / docker stop accordingly.

**F21 [P1 · Bug]** — `stop_container` hardcodes `docker stop`. Devcontainer CLI can target Podman or other engines; this will fail in non-Docker environments (and conflicts with ADR-039 engine-agnostic container types). Fix: use the devcontainer CLI (if/when stop is supported), or capture the engine from resource properties and dispatch to `podman stop` / `docker stop` accordingly.
Member

Additional Findings — PR #613 (Continued Scan)

Two more issues found on another pass:

F22 [P1 · Bug] — Devcontainer resources cannot resolve: snapshot sandbox not implemented

DevcontainerHandler uses SandboxStrategy.SNAPSHOT (see src/cleveragents/resource/handlers/devcontainer.py:488), but SandboxFactory.create_sandbox() explicitly raises NotImplementedError for snapshot (see src/cleveragents/infrastructure/sandbox/factory.py:66,119).

Result: any attempt to resolve a devcontainer-instance (or container-instance using the same handler) will fail at runtime with NotImplementedError. This blocks actual use of the resource type even if lifecycle bugs are fixed.

Fix: implement snapshot strategy in SandboxFactory or route devcontainer resources to a supported sandbox strategy.

F23 [P2 · Process/Docs] — Auto-discovery is documented but not wired

docs/reference/devcontainer_resources.md:60-80 says linking a git-checkout/fs-directory triggers auto-discovery and registers devcontainer-instance + devcontainer-file children. I can’t find any code path that invokes discover_devcontainers() or auto_discover_children() during project link-resource. link_resource simply creates the DB link and returns. Also, the built-in type’s auto_discovery config doesn’t match the schema expected by auto_discover_children (it expects enabled + rules).

Either the docs need to be updated to mark this as future work, or the auto-discovery hook needs to be implemented.


If you want, I can add inline notes for these as well.

## Additional Findings — PR #613 (Continued Scan) Two more issues found on another pass: ### F22 [P1 · Bug] — Devcontainer resources cannot resolve: snapshot sandbox not implemented `DevcontainerHandler` uses `SandboxStrategy.SNAPSHOT` (see `src/cleveragents/resource/handlers/devcontainer.py:488`), but `SandboxFactory.create_sandbox()` explicitly raises `NotImplementedError` for `snapshot` (see `src/cleveragents/infrastructure/sandbox/factory.py:66,119`). Result: any attempt to resolve a `devcontainer-instance` (or `container-instance` using the same handler) will fail at runtime with `NotImplementedError`. This blocks actual use of the resource type even if lifecycle bugs are fixed. **Fix**: implement snapshot strategy in `SandboxFactory` or route devcontainer resources to a supported sandbox strategy. ### F23 [P2 · Process/Docs] — Auto-discovery is documented but not wired `docs/reference/devcontainer_resources.md:60-80` says linking a `git-checkout`/`fs-directory` triggers auto-discovery and registers `devcontainer-instance` + `devcontainer-file` children. I can’t find any code path that invokes `discover_devcontainers()` or `auto_discover_children()` during `project link-resource`. `link_resource` simply creates the DB link and returns. Also, the built-in type’s `auto_discovery` config doesn’t match the schema expected by `auto_discover_children` (it expects `enabled` + `rules`). Either the docs need to be updated to mark this as future work, or the auto-discovery hook needs to be implemented. --- If you want, I can add inline notes for these as well.
Member

Additional Finding — PR #613 (Continued Scan)

F24 [P1 · Bug] — Container execution is stubbed, so lazy activation can’t trigger on tool use

Issue #514’s acceptance criteria say “Devcontainer instances remain inactive until a tool execution targets them. At that point, the container is started automatically.” But the tool execution path currently hard‑fails whenever the execution environment resolves to container.

In src/cleveragents/tool/runner.py:168-178:

if env == ExecutionEnvironment.CONTAINER:
    return ToolResult(
        success=False,
        error=("Container execution is not yet implemented. ..."),
        ...
    )

This means a tool invocation can never reach a devcontainer handler, so lazy activation can’t occur in production. Even if DevcontainerHandler.resolve() were wired to call activate_container, the ToolRunner returns early before any container-handling logic runs.

Fix: wire container execution (and call into devcontainer activation) in the ToolRunner path, or explicitly scope #514 to CLI-only lifecycle management and update the acceptance criteria/docs accordingly.

## Additional Finding — PR #613 (Continued Scan) ### F24 [P1 · Bug] — Container execution is stubbed, so lazy activation can’t trigger on tool use Issue #514’s acceptance criteria say *“Devcontainer instances remain inactive until a tool execution targets them. At that point, the container is started automatically.”* But the tool execution path currently hard‑fails whenever the execution environment resolves to `container`. In `src/cleveragents/tool/runner.py:168-178`: ```python if env == ExecutionEnvironment.CONTAINER: return ToolResult( success=False, error=("Container execution is not yet implemented. ..."), ... ) ``` This means a tool invocation can never reach a devcontainer handler, so lazy activation can’t occur in production. Even if `DevcontainerHandler.resolve()` were wired to call `activate_container`, the ToolRunner returns early before any container-handling logic runs. **Fix**: wire container execution (and call into devcontainer activation) in the ToolRunner path, or explicitly scope #514 to *CLI-only* lifecycle management and update the acceptance criteria/docs accordingly.
brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P2 issue on another scan. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P2** issue on another scan. Inline note below.
@ -28,0 +485,4 @@
"""
if not resource_id:
raise ValueError("resource_id must not be empty")
Member

F25 [P2 · Bug]DevcontainerHandler hardcodes _default_strategy = SNAPSHOT, which overrides the resource type’s declared sandbox strategy. This means devcontainer-file resources (type default = copy_on_write) will still try to use snapshot because resource.sandbox_strategy is always None when loaded from DB.

Impact: devcontainer-file resolution will use the wrong strategy (and currently fails outright because snapshot isn’t implemented). Even after snapshot is implemented, devcontainer-file would still be sandboxed incorrectly.

Fix: choose strategy based on the resource type default (e.g., look up the ResourceTypeSpec and use its sandbox_strategy), or override resolve() in DevcontainerHandler to branch on resource.resource_type_name.

**F25 [P2 · Bug]** — `DevcontainerHandler` hardcodes `_default_strategy = SNAPSHOT`, which overrides the resource type’s declared sandbox strategy. This means `devcontainer-file` resources (type default = `copy_on_write`) will still try to use `snapshot` because `resource.sandbox_strategy` is always None when loaded from DB. Impact: devcontainer-file resolution will use the wrong strategy (and currently fails outright because snapshot isn’t implemented). Even after snapshot is implemented, devcontainer-file would still be sandboxed incorrectly. Fix: choose strategy based on the resource type default (e.g., look up the ResourceTypeSpec and use its `sandbox_strategy`), or override `resolve()` in `DevcontainerHandler` to branch on `resource.resource_type_name`.
brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P2 issue on another scan. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P2** issue on another scan. Inline note below.
@ -28,0 +182,4 @@
tracker = transition_state(
tracker,
ContainerLifecycleState.BUILDING,
reason="lazy activation",
Member

F26 [P2 · Bug]activate_container does not validate that containerId and remoteWorkspaceFolder were actually parsed. If the JSON is missing these fields (or outcome is error), the tracker still transitions to ACTIVE with container_id=None and workspace_path=None, causing stop_container to skip docker stop and health checks to misbehave.

Fix: treat missing containerId/remoteWorkspaceFolder (or outcome != success) as activation failure, transition to ERROR, and raise RuntimeError.

**F26 [P2 · Bug]** — `activate_container` does not validate that `containerId` and `remoteWorkspaceFolder` were actually parsed. If the JSON is missing these fields (or outcome is `error`), the tracker still transitions to `ACTIVE` with `container_id=None` and `workspace_path=None`, causing `stop_container` to skip `docker stop` and health checks to misbehave. Fix: treat missing `containerId`/`remoteWorkspaceFolder` (or `outcome != success`) as activation failure, transition to ERROR, and raise RuntimeError.
Owner

PM Status Check — PR #613

Author: @CoreRasurae | Milestone: v3.6.0 (M7) | Mergeable: Yes | Reviews: REQUEST_CHANGES (Brent)

Current State

Brent has provided an exceptionally thorough review across multiple rounds:

  • Round 1 (Hamza, review #2010): 16 findings (5 BUG, 3 SPEC, 2 SEC, 1 PERF, 4 TEST, 1 CODE) — dismissed/stale
  • Round 2 (Brent, review #2019): 11 findings — 4 P1 (TOCTOU race F1, thread join omission F2, missing CHANGELOG F3, unwired cleanup hooks F4) + 7 P2
  • Follow-ups (reviews #2020-#2029): Additional F14-F20 findings including P1 bugs (lazy activation not wired F16, health checks never started F17, session cleanup not session-scoped F18)

Total: ~8 P1 findings and ~12 P2 findings across all rounds.

Concern

@CoreRasurae — There has been no author response to any of Brent's findings. The review was posted ~8 hours ago. This PR has substantial rework needed on the P1 items before it can be considered for merge.

Given this is M7 (v3.6.0, due Mar 28), there is no schedule pressure. However, leaving a thorough review unacknowledged for too long discourages reviewers. Please:

  1. Acknowledge receipt — even a brief "reviewing findings, will address by [date]" is helpful
  2. Prioritize P1 fixes — F1 (TOCTOU), F2 (thread join), F4 (unwired hooks), F16 (lazy activation not wired), F17 (health checks never started)
  3. Update the PR description — Brent noted (F6) that the description is significantly stale vs actual implementation

PR #616 (container-aware tool exec) is blocked on this PR. Resolving the P1s here unblocks that dependency chain.

Action Items

Who Action Deadline
@CoreRasurae Acknowledge review findings Mar 7 EOD
@CoreRasurae Push fixes for P1 items Mar 10
@brent.edwards Re-review after fixes Mar 11
## PM Status Check — PR #613 **Author**: @CoreRasurae | **Milestone**: v3.6.0 (M7) | **Mergeable**: Yes | **Reviews**: REQUEST_CHANGES (Brent) ### Current State Brent has provided an exceptionally thorough review across multiple rounds: - **Round 1** (Hamza, review #2010): 16 findings (5 BUG, 3 SPEC, 2 SEC, 1 PERF, 4 TEST, 1 CODE) — dismissed/stale - **Round 2** (Brent, review #2019): 11 findings — **4 P1** (TOCTOU race F1, thread join omission F2, missing CHANGELOG F3, unwired cleanup hooks F4) + 7 P2 - **Follow-ups** (reviews #2020-#2029): Additional **F14-F20** findings including P1 bugs (lazy activation not wired F16, health checks never started F17, session cleanup not session-scoped F18) **Total: ~8 P1 findings and ~12 P2 findings across all rounds.** ### Concern @CoreRasurae — There has been no author response to any of Brent's findings. The review was posted ~8 hours ago. This PR has substantial rework needed on the P1 items before it can be considered for merge. Given this is M7 (v3.6.0, due Mar 28), there is no schedule pressure. However, leaving a thorough review unacknowledged for too long discourages reviewers. Please: 1. **Acknowledge receipt** — even a brief "reviewing findings, will address by [date]" is helpful 2. **Prioritize P1 fixes** — F1 (TOCTOU), F2 (thread join), F4 (unwired hooks), F16 (lazy activation not wired), F17 (health checks never started) 3. **Update the PR description** — Brent noted (F6) that the description is significantly stale vs actual implementation PR #616 (container-aware tool exec) is **blocked on this PR**. Resolving the P1s here unblocks that dependency chain. ### Action Items | Who | Action | Deadline | |:----|:-------|:---------| | @CoreRasurae | Acknowledge review findings | Mar 7 EOD | | @CoreRasurae | Push fixes for P1 items | Mar 10 | | @brent.edwards | Re-review after fixes | Mar 11 |
CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from ddcc22608e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 18s
CI / build (pull_request) Successful in 17s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 1m8s
CI / unit_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 41s
CI / integration_tests (pull_request) Successful in 4m20s
CI / coverage (pull_request) Successful in 4m23s
CI / benchmark-regression (pull_request) Successful in 25m47s
to bbeef00b1a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 17s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 2m24s
CI / integration_tests (pull_request) Failing after 2m55s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m9s
CI / benchmark-regression (pull_request) Successful in 26m29s
2026-03-07 22:23:34 +00:00
Compare
freemo left a comment

Review Status Summary — PR #613 (Devcontainer Lifecycle)

  • Branch naming: uses feature/ prefix. Correct for feature work per CONTRIBUTING.md.
  • Mergeable: no conflicts detected.
  • Review findings: 16 comments, 15+ findings noted across review cycles.
  • Author engagement: @CoreRasurae is actively responding to feedback.

Active review in progress. 15+ findings noted across review cycles. Author (@CoreRasurae) is responding to feedback. Continue addressing findings.

**Review Status Summary — PR #613 (Devcontainer Lifecycle)** - **Branch naming:** uses `feature/` prefix. Correct for feature work per CONTRIBUTING.md. - **Mergeable:** no conflicts detected. - **Review findings:** 16 comments, 15+ findings noted across review cycles. - **Author engagement:** @CoreRasurae is actively responding to feedback. Active review in progress. 15+ findings noted across review cycles. Author (@CoreRasurae) is responding to feedback. Continue addressing findings.
brent.edwards requested changes 2026-03-08 05:09:15 +00:00
Dismissed
brent.edwards left a comment

Review — PR #613 feature/m6plus-devcontainer-lifecycle (Round 3)

Reviewer: brent.edwards | Commit: bbeef00b | Issue: #514
Prior reviews: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups F12–F24 (Round 2)


Quality Gates

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (8,208 scenarios, 0 failures)

Prior Findings Verification

Good progress — most of my Round 2 P1/P2 findings have been addressed:

Prior ID Finding Status
F1 TOCTOU race in activate_container FIXED — lock held across get→transition→set
F2 _stop_health_check never joins thread FIXEDthread.join(timeout=2.0) added
F3 Missing CHANGELOG entry FIXED — entry added at line 53
F4 Cleanup hooks not wired FIXED — session close + plan lifecycle hooks wired
F5 _health_check_threads unprotected FIXED — operations on health check dicts are now under _registry_lock
F10 rebuild_container identical to activate FIXED — passes --reset-container via rebuild=True
F11 Inline import in cleanup_service.py FIXEDstop_all_active_containers imported at module top
F14 JSON parsing fails with real CLI output FIXED_parse_devcontainer_up_output scans from last line
F15 Health checks use remote workspace path FIXED — uses host_workspace_folder parameter
F16 Lazy activation not wired into resolve() FIXEDDevcontainerHandler.resolve() calls activate_container
F17 Health checks never started FIXED — called after successful activation
F22 Snapshot sandbox not implemented Appears unchanged — flagged as known limitation

Not all prior findings could be verified due to the restructured code, but the major bugs have been addressed.


New Findings

ID Sev Category File Description
F25 P0 CI CI Run #1307 Integration tests fail in CI — 2 Robot suites have failures (see details below)
F26 P1 Bug robot/cleanup.robot:20-26 Cleanup Robot test broken by model extractionCleanupReport, StaleItem, ResourceCleanupSummary moved to cleanup_models.py but Robot test still reads cleanup_service.py and asserts Should Contain ${content} class CleanupReport
F27 P1 Process N/A Branch is ~160 commits behind master — merge base a60cda1f is far from current master (620adfee). BDD scenario count is 8,208 vs master's 9,029. Quality gates pass on the branch but don't validate against current master code.
F28 P2 Policy Multiple files 4 files exceed 500-line limitdevcontainer.py (926), resource.py (1,233), plan_lifecycle_service.py (1,499), devcontainer_lifecycle.feature (762)
F29 P2 Policy robot/helper_devcontainer_lifecycle.py:48,49,53,61 4 new # type: ignore suppressions — CONTRIBUTING prohibits inline # type: ignore. The conditional import pattern can be refactored to avoid these.
F30 P2 Policy 3 step files 13 inline imports in function bodiesdevcontainer_lifecycle_steps.py (4), devcontainer_cleanup_steps.py (5), devcontainer_health_check_steps.py (4). CONTRIBUTING requires all imports at file top (only if TYPE_CHECKING: exempt).
F31 P2 Docs devcontainer_resources.md:60-80 Auto-discovery described as working but not wired — main body says "The discovery module checks for configuration files..." as if functional, but Known Limitations at line 245 contradicts this. The main body should say "planned" or be removed.
F32 P2 Process PR description PR description is still stale — claims 794-line handler (actual: 926), 668-line feature file with 93 scenarios (actual: 762 lines, ~72 scenarios), and the state names in the description still show the old vocabulary in places. (Raised as F6 in Round 2.)

F25 — CI Integration Test Failures (P0)

The CI run at https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1 shows 2 failing Robot test suites:

Failure 1 — Robot.Cleanup (1 of 9 failed):

Cleanup Service Exports Expected Symbols fails because this PR extracted CleanupReport, StaleItem, and ResourceCleanupSummary from cleanup_service.py into the new cleanup_models.py. The Robot test at robot/cleanup.robot:20-26 reads the content of cleanup_service.py and asserts:

Should Contain    ${content}    class CleanupReport
Should Contain    ${content}    class StaleItem
Should Contain    ${content}    class ResourceCleanupSummary

Since these classes now live in cleanup_models.py, the assertions fail. The CI log confirms the full cleanup_service.py content is dumped and does not contain class CleanupReport.

Fix: Update robot/cleanup.robot to check cleanup_models.py for the extracted classes, or check that cleanup_service.py re-exports them.

Failure 2 — Robot.Devcontainer Handler (1 of 10 failed):

DevcontainerHandler Default Strategy fails with 1 != 0 (exit code 1). The helper at robot/helper_devcontainer_handler.py:78 asserts handler._default_strategy.value == "snapshot". This likely fails due to an import or attribute error when running against the merged codebase in CI. This needs investigation — it may be related to the branch being far behind master.


F26 — Cleanup Robot Test Broken (P1)

See F25, Failure 1. The robot/cleanup.robot test was not updated to reflect the extraction of model classes into cleanup_models.py. This is a regression introduced by this PR.


F28 — File Length Violations (P2)

CONTRIBUTING.md requires files to stay under 500 lines. Four files in this PR exceed the limit:

File Lines Over by
src/cleveragents/resource/handlers/devcontainer.py 926 +426
src/cleveragents/cli/commands/resource.py 1,233 +733
src/cleveragents/application/services/plan_lifecycle_service.py 1,499 +999
features/devcontainer_lifecycle.feature 762 +262

resource.py and plan_lifecycle_service.py were already over the limit before this PR, but this PR adds significantly to both. devcontainer.py is a new file that should have been designed within the limit.

Recommendation: Split devcontainer.py into devcontainer_handler.py (the DevcontainerHandler class) and devcontainer_lifecycle.py (registry, activation, stop, rebuild, health check functions). Split the feature file by domain (already done for step files).


Verdict

REQUEST_CHANGES — 1 P0 (CI failures), 2 P1 (broken Robot test, stale branch), 5 P2 (file length, # type: ignore, inline imports, stale docs, stale PR description).

The P0/P1 items are blocking:

  • F25/F26: The robot/cleanup.robot test must be updated to match the model extraction.
  • F27: Rebase onto current master before merge — the branch is ~160 commits behind and CI tests are failing against the merged result.

The P2 items should be addressed or explicitly deferred:

  • F28: File length violations need a plan (split or defer with PM acknowledgment).
  • F29/F30: # type: ignore and inline imports are policy violations.
  • F31/F32: Docs and PR description need updating.

What Went Well

  • Excellent engineering response to Round 2 findings — all 7 P1 bugs from my prior review are fixed
  • Cleanup hooks properly wired into session close and all terminal plan states
  • TOCTOU race eliminated with proper lock scoping
  • Health check thread join added for clean shutdown
  • rebuild_container now sends --reset-container flag
  • SEC-1 (path canonicalization) addressed with os.path.realpath()
  • Concurrency tests now use threading.Barrier per project convention
  • Mock infrastructure validates subprocess kwargs (TEST-4 fix)
  • Good test organization: step files split by domain (lifecycle/activation/cleanup/health-check)
## Review — PR #613 `feature/m6plus-devcontainer-lifecycle` (Round 3) **Reviewer**: brent.edwards | **Commit**: `bbeef00b` | **Issue**: #514 **Prior reviews**: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups F12–F24 (Round 2) --- ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (8,208 scenarios, 0 failures) | ### Prior Findings Verification Good progress — most of my Round 2 P1/P2 findings have been addressed: | Prior ID | Finding | Status | |:---------|:--------|:-------| | F1 | TOCTOU race in `activate_container` | **FIXED** — lock held across get→transition→set | | F2 | `_stop_health_check` never joins thread | **FIXED** — `thread.join(timeout=2.0)` added | | F3 | Missing CHANGELOG entry | **FIXED** — entry added at line 53 | | F4 | Cleanup hooks not wired | **FIXED** — session close + plan lifecycle hooks wired | | F5 | `_health_check_threads` unprotected | **FIXED** — operations on health check dicts are now under `_registry_lock` | | F10 | `rebuild_container` identical to activate | **FIXED** — passes `--reset-container` via `rebuild=True` | | F11 | Inline import in `cleanup_service.py` | **FIXED** — `stop_all_active_containers` imported at module top | | F14 | JSON parsing fails with real CLI output | **FIXED** — `_parse_devcontainer_up_output` scans from last line | | F15 | Health checks use remote workspace path | **FIXED** — uses `host_workspace_folder` parameter | | F16 | Lazy activation not wired into `resolve()` | **FIXED** — `DevcontainerHandler.resolve()` calls `activate_container` | | F17 | Health checks never started | **FIXED** — called after successful activation | | F22 | Snapshot sandbox not implemented | Appears unchanged — flagged as known limitation | Not all prior findings could be verified due to the restructured code, but the major bugs have been addressed. --- ### New Findings | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F25 | P0 | CI | [CI Run #1307](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1) | **Integration tests fail in CI** — 2 Robot suites have failures (see details below) | | F26 | P1 | Bug | `robot/cleanup.robot:20-26` | **Cleanup Robot test broken by model extraction** — `CleanupReport`, `StaleItem`, `ResourceCleanupSummary` moved to `cleanup_models.py` but Robot test still reads `cleanup_service.py` and asserts `Should Contain ${content} class CleanupReport` | | F27 | P1 | Process | N/A | **Branch is ~160 commits behind master** — merge base `a60cda1f` is far from current master (`620adfee`). BDD scenario count is 8,208 vs master's 9,029. Quality gates pass on the branch but don't validate against current master code. | | F28 | P2 | Policy | Multiple files | **4 files exceed 500-line limit** — `devcontainer.py` (926), `resource.py` (1,233), `plan_lifecycle_service.py` (1,499), `devcontainer_lifecycle.feature` (762) | | F29 | P2 | Policy | `robot/helper_devcontainer_lifecycle.py:48,49,53,61` | **4 new `# type: ignore` suppressions** — CONTRIBUTING prohibits inline `# type: ignore`. The conditional import pattern can be refactored to avoid these. | | F30 | P2 | Policy | 3 step files | **13 inline imports in function bodies** — `devcontainer_lifecycle_steps.py` (4), `devcontainer_cleanup_steps.py` (5), `devcontainer_health_check_steps.py` (4). CONTRIBUTING requires all imports at file top (only `if TYPE_CHECKING:` exempt). | | F31 | P2 | Docs | `devcontainer_resources.md:60-80` | **Auto-discovery described as working but not wired** — main body says "The discovery module checks for configuration files..." as if functional, but Known Limitations at line 245 contradicts this. The main body should say "planned" or be removed. | | F32 | P2 | Process | PR description | **PR description is still stale** — claims 794-line handler (actual: 926), 668-line feature file with 93 scenarios (actual: 762 lines, ~72 scenarios), and the state names in the description still show the old vocabulary in places. (Raised as F6 in Round 2.) | --- ### F25 — CI Integration Test Failures (P0) The CI run at https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1 shows **2 failing Robot test suites**: **Failure 1 — `Robot.Cleanup` (1 of 9 failed):** `Cleanup Service Exports Expected Symbols` fails because this PR extracted `CleanupReport`, `StaleItem`, and `ResourceCleanupSummary` from `cleanup_service.py` into the new `cleanup_models.py`. The Robot test at `robot/cleanup.robot:20-26` reads the content of `cleanup_service.py` and asserts: ```robot Should Contain ${content} class CleanupReport Should Contain ${content} class StaleItem Should Contain ${content} class ResourceCleanupSummary ``` Since these classes now live in `cleanup_models.py`, the assertions fail. The CI log confirms the full `cleanup_service.py` content is dumped and does not contain `class CleanupReport`. **Fix:** Update `robot/cleanup.robot` to check `cleanup_models.py` for the extracted classes, or check that `cleanup_service.py` re-exports them. **Failure 2 — `Robot.Devcontainer Handler` (1 of 10 failed):** `DevcontainerHandler Default Strategy` fails with `1 != 0` (exit code 1). The helper at `robot/helper_devcontainer_handler.py:78` asserts `handler._default_strategy.value == "snapshot"`. This likely fails due to an import or attribute error when running against the merged codebase in CI. This needs investigation — it may be related to the branch being far behind master. --- ### F26 — Cleanup Robot Test Broken (P1) See F25, Failure 1. The `robot/cleanup.robot` test was not updated to reflect the extraction of model classes into `cleanup_models.py`. This is a regression introduced by this PR. --- ### F28 — File Length Violations (P2) CONTRIBUTING.md requires files to stay under 500 lines. Four files in this PR exceed the limit: | File | Lines | Over by | |:-----|------:|--------:| | `src/cleveragents/resource/handlers/devcontainer.py` | 926 | +426 | | `src/cleveragents/cli/commands/resource.py` | 1,233 | +733 | | `src/cleveragents/application/services/plan_lifecycle_service.py` | 1,499 | +999 | | `features/devcontainer_lifecycle.feature` | 762 | +262 | `resource.py` and `plan_lifecycle_service.py` were already over the limit before this PR, but this PR adds significantly to both. `devcontainer.py` is a new file that should have been designed within the limit. **Recommendation:** Split `devcontainer.py` into `devcontainer_handler.py` (the `DevcontainerHandler` class) and `devcontainer_lifecycle.py` (registry, activation, stop, rebuild, health check functions). Split the feature file by domain (already done for step files). --- ### Verdict **REQUEST_CHANGES** — 1 P0 (CI failures), 2 P1 (broken Robot test, stale branch), 5 P2 (file length, `# type: ignore`, inline imports, stale docs, stale PR description). The P0/P1 items are blocking: - **F25/F26**: The `robot/cleanup.robot` test must be updated to match the model extraction. - **F27**: Rebase onto current master before merge — the branch is ~160 commits behind and CI tests are failing against the merged result. The P2 items should be addressed or explicitly deferred: - **F28**: File length violations need a plan (split or defer with PM acknowledgment). - **F29/F30**: `# type: ignore` and inline imports are policy violations. - **F31/F32**: Docs and PR description need updating. ### What Went Well - Excellent engineering response to Round 2 findings — all 7 P1 bugs from my prior review are fixed - Cleanup hooks properly wired into session close and all terminal plan states - TOCTOU race eliminated with proper lock scoping - Health check thread join added for clean shutdown - `rebuild_container` now sends `--reset-container` flag - SEC-1 (path canonicalization) addressed with `os.path.realpath()` - Concurrency tests now use `threading.Barrier` per project convention - Mock infrastructure validates subprocess kwargs (TEST-4 fix) - Good test organization: step files split by domain (lifecycle/activation/cleanup/health-check)
Owner

PM Status Check — Day 29 (Escalation)

Author: @CoreRasurae | Milestone: v3.6.0 (M6+) | Mergeable: Yes | Reviews: REQUEST_CHANGES (Brent)

Current State — UNRESPONSIVE AUTHOR

Brent conducted a comprehensive 4-pass review totaling:

  • 8+ P1 findings including: TOCTOU race (F1), thread join omission (F2), unwired cleanup hooks (F4), devcontainer up JSON parsing failure (F14), health check using wrong workspace path (F15), snapshot sandbox not implemented (F22), container execution stubbed (F24)
  • ~12 P2 findings including: docker stop exit code ignored (F12), documentation claims unimplemented features (F13, F23)

Blocking Issue: NO AUTHOR RESPONSE

@CoreRasurae — There has been zero response to Brent's review findings since Day 26 (3 days ago). This violates the 24-hour review response SLA in CONTRIBUTING.md.

Previous PM escalation (comment #55513, Day 26) set deadlines:

  • Acknowledge findings: Mar 7 EOD — MISSED
  • Push P1 fixes: Mar 10 — AT RISK

Schedule Context

This is M6+ (v3.6.0, due Mar 28), so there is time. However, PR #616 (container tool exec) is blocked on this PR. The unresponsiveness pattern is concerning — Luis also has no response on PR #614.

Action Required

Who Action Deadline
@CoreRasurae Acknowledge review findings immediately OVERDUE
@CoreRasurae Push P1 fixes Mar 12

If no response by Mar 10 EOD, will escalate to @freemo for intervention.

## PM Status Check — Day 29 (Escalation) **Author**: @CoreRasurae | **Milestone**: v3.6.0 (M6+) | **Mergeable**: Yes | **Reviews**: REQUEST_CHANGES (Brent) ### Current State — UNRESPONSIVE AUTHOR Brent conducted a **comprehensive 4-pass review** totaling: - **8+ P1 findings** including: TOCTOU race (F1), thread join omission (F2), unwired cleanup hooks (F4), `devcontainer up` JSON parsing failure (F14), health check using wrong workspace path (F15), snapshot sandbox not implemented (F22), container execution stubbed (F24) - **~12 P2 findings** including: docker stop exit code ignored (F12), documentation claims unimplemented features (F13, F23) ### Blocking Issue: NO AUTHOR RESPONSE @CoreRasurae — There has been **zero response** to Brent's review findings since Day 26 (3 days ago). This violates the 24-hour review response SLA in CONTRIBUTING.md. Previous PM escalation (comment #55513, Day 26) set deadlines: - Acknowledge findings: Mar 7 EOD — **MISSED** - Push P1 fixes: Mar 10 — **AT RISK** ### Schedule Context This is M6+ (v3.6.0, due Mar 28), so there is time. However, PR #616 (container tool exec) is **blocked** on this PR. The unresponsiveness pattern is concerning — Luis also has no response on PR #614. ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @CoreRasurae | **Acknowledge review findings immediately** | **OVERDUE** | | @CoreRasurae | Push P1 fixes | Mar 12 | If no response by Mar 10 EOD, will escalate to @freemo for intervention.
Author
Member

Review Response — Round 3 (brent.edwards, Review #2052)

Commit: 6252d83a

Resolved

Finding Severity What was done
F25 P0 Fixed Robot test failures: updated robot/cleanup.robot to check cleanup_models.py for class definitions and verify cleanup_service.py re-exports them by name; changed handler strategy assertion from "snapshot" to "none" in robot/helper_devcontainer_handler.py (SandboxFactory has not implemented snapshot — the container itself is the sandbox).
F26 P1 Same fix as F25 — robot/cleanup.robot now references cleanup_models.py where the classes were extracted to.
F28 P2 Source split: devcontainer.py (926 lines) → 5 modules, all under 500 lines: _devcontainer_internals.py (152, shared state/registry/parsing), devcontainer_health.py (206, background health checks), devcontainer_lifecycle.py (440, activate/stop/rebuild), devcontainer_cleanup.py (149, session cleanup/eviction), devcontainer.py (201, handler class + backward-compatible re-exports). No circular dependencies. Feature split: devcontainer_lifecycle.feature (762 lines) → 4 domain-specific files: devcontainer_state.feature (198), devcontainer_activation.feature (217), devcontainer_health_check.feature (126), devcontainer_cleanup.feature (237). All 104 scenarios preserved. resource.py (1233) and plan_lifecycle_service.py (1499) were not split — they exceeded 500 lines before this PR and are out of scope for this feature branch.
F29 P2 Removed the conditional import pattern with 4 # type: ignore suppressions in robot/helper_devcontainer_lifecycle.py; replaced with inline _MockResult/_MockRunner classes that don't require the features/mocks import.
F30 P2 Moved all 13 inline imports to file top: devcontainer_lifecycle_steps.py (4), devcontainer_cleanup_steps.py (5), devcontainer_health_check_steps.py (4).
F31 P2 Marked the Auto-Discovery section in docs/reference/devcontainer_resources.md as "(Planned)" with an admonition explaining that discover_devcontainers() exists and is tested in isolation but is not wired into any production code path.

Not addressed

Finding Severity Reason
F27 P1 Branch ~160 commits behind master. This requires a rebase/merge, not a code fix within this commit. Will be handled separately before merge.
F32 P2 PR description is stale. Updating the PR description requires a remote Forgejo API change, not a code commit. Will update the PR description separately.

Verification

  • 8208 BDD scenarios pass (all 104 devcontainer scenarios across the 4 new feature files)
  • 30 Robot integration tests pass (cleanup, devcontainer_handler, devcontainer_lifecycle suites)
  • Vulture dead code — clean
  • Pyright — 0 errors, 0 warnings
  • Ruff lint — clean
  • Module coverage (devcontainer features only): _devcontainer_internals 98%, devcontainer.py 100%, devcontainer_cleanup 100%, devcontainer_health 92%, devcontainer_lifecycle 99%
## Review Response — Round 3 (brent.edwards, Review #2052) Commit: `6252d83a` ### Resolved | Finding | Severity | What was done | |:--------|:---------|:--------------| | **F25** | P0 | Fixed Robot test failures: updated `robot/cleanup.robot` to check `cleanup_models.py` for class definitions and verify `cleanup_service.py` re-exports them by name; changed handler strategy assertion from `"snapshot"` to `"none"` in `robot/helper_devcontainer_handler.py` (SandboxFactory has not implemented snapshot — the container itself is the sandbox). | | **F26** | P1 | Same fix as F25 — `robot/cleanup.robot` now references `cleanup_models.py` where the classes were extracted to. | | **F28** | P2 | **Source split**: `devcontainer.py` (926 lines) → 5 modules, all under 500 lines: `_devcontainer_internals.py` (152, shared state/registry/parsing), `devcontainer_health.py` (206, background health checks), `devcontainer_lifecycle.py` (440, activate/stop/rebuild), `devcontainer_cleanup.py` (149, session cleanup/eviction), `devcontainer.py` (201, handler class + backward-compatible re-exports). No circular dependencies. **Feature split**: `devcontainer_lifecycle.feature` (762 lines) → 4 domain-specific files: `devcontainer_state.feature` (198), `devcontainer_activation.feature` (217), `devcontainer_health_check.feature` (126), `devcontainer_cleanup.feature` (237). All 104 scenarios preserved. `resource.py` (1233) and `plan_lifecycle_service.py` (1499) were not split — they exceeded 500 lines before this PR and are out of scope for this feature branch. | | **F29** | P2 | Removed the conditional import pattern with 4 `# type: ignore` suppressions in `robot/helper_devcontainer_lifecycle.py`; replaced with inline `_MockResult`/`_MockRunner` classes that don't require the features/mocks import. | | **F30** | P2 | Moved all 13 inline imports to file top: `devcontainer_lifecycle_steps.py` (4), `devcontainer_cleanup_steps.py` (5), `devcontainer_health_check_steps.py` (4). | | **F31** | P2 | Marked the Auto-Discovery section in `docs/reference/devcontainer_resources.md` as "(Planned)" with an admonition explaining that `discover_devcontainers()` exists and is tested in isolation but is not wired into any production code path. | ### Not addressed | Finding | Severity | Reason | |:--------|:---------|:-------| | **F27** | P1 | Branch ~160 commits behind master. This requires a rebase/merge, not a code fix within this commit. Will be handled separately before merge. | | **F32** | P2 | PR description is stale. Updating the PR description requires a remote Forgejo API change, not a code commit. Will update the PR description separately. | ### Verification - **8208 BDD scenarios** pass (all 104 devcontainer scenarios across the 4 new feature files) - **30 Robot integration tests** pass (cleanup, devcontainer_handler, devcontainer_lifecycle suites) - **Vulture dead code** — clean - **Pyright** — 0 errors, 0 warnings - **Ruff lint** — clean - **Module coverage** (devcontainer features only): `_devcontainer_internals` 98%, `devcontainer.py` 100%, `devcontainer_cleanup` 100%, `devcontainer_health` 92%, `devcontainer_lifecycle` 99%
CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from bbeef00b1a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
CI / lint (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 17s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 39s
CI / unit_tests (pull_request) Successful in 2m24s
CI / integration_tests (pull_request) Failing after 2m55s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m9s
CI / benchmark-regression (pull_request) Successful in 26m29s
to 581ac47543
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 35s
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-09 16:20:20 +00:00
Compare
CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from 581ac47543
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 35s
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to a160888632
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 44s
CI / integration_tests (pull_request) Successful in 3m14s
CI / unit_tests (pull_request) Successful in 4m41s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m44s
CI / benchmark-regression (pull_request) Successful in 33m49s
2026-03-09 16:22:01 +00:00
Compare
Owner

PM Status Check — Day 29 (Update)

Author: @CoreRasurae | Milestone: v3.6.0 (Post-MVP) | Reviews: REQUEST_CHANGES (Brent, 4 passes, 15 findings)

Author Response Today

@CoreRasurae responded with resolutions for findings F25-F31 from Brent's 4th review pass. Added State/In Review and Priority/Medium labels.

Current State

  • 20 comments, active review discussion
  • Luis addressed findings from the latest review round
  • Brent needs to verify the claimed fixes

Action Required

Who Action Deadline
@brent.edwards Verify F25-F31 resolutions, re-review Mar 12
@CoreRasurae Address any remaining findings from re-review Mar 13

This is a Post-MVP PR so lower urgency than M3-M5 items, but Luis's responsiveness today is appreciated. Good to keep momentum.

## PM Status Check — Day 29 (Update) **Author**: @CoreRasurae | **Milestone**: v3.6.0 (Post-MVP) | **Reviews**: REQUEST_CHANGES (Brent, 4 passes, 15 findings) ### Author Response Today @CoreRasurae responded with resolutions for findings F25-F31 from Brent's 4th review pass. Added State/In Review and Priority/Medium labels. ### Current State - 20 comments, active review discussion - Luis addressed findings from the latest review round - Brent needs to verify the claimed fixes ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @brent.edwards | Verify F25-F31 resolutions, re-review | Mar 12 | | @CoreRasurae | Address any remaining findings from re-review | Mar 13 | This is a Post-MVP PR so lower urgency than M3-M5 items, but Luis's responsiveness today is appreciated. Good to keep momentum.
Owner

PM Compliance Audit — PR #613

Auditor: PM (automated sweep) | Date: 2026-03-09 | Reference: CONTRIBUTING.md §Pull Request Process (items 1–12)

Checklist

# Requirement Status Notes
1 Detailed description (summary + motivation) PASS Comprehensive description with changes, testing, spec references, known limitations
2 Closing keyword (Closes #N / Fixes #N) PASS Closes #514 present in body + ISSUES CLOSED: #514 in footer
3 Dependency link (PR blocks issue) PASS PR #613 blocks #514 — link exists
4 One Epic scope per PR PASS Single Epic (devcontainer lifecycle)
5 Atomic, well-scoped commits Not audited
6 Commit messages reference tickets Not audited
7 Conventional Changelog standard Not audited
8 CHANGELOG updated PASS CHANGELOG entry confirmed in Brent's Round 3 review (F3 FIXED)
9 No build/install artifacts PASS
10 Milestone assigned PASS v3.6.0 (Post-MVP)
11 Type/ label PASS Type/Feature present
12 Assignee set PASS CoreRasurae (Luis)

Review Status

Reviewer State Notes
hamza.khyari REQUEST_CHANGES (stale, official, dismissed partially) Round 1: BUG-1 through BUG-5, SPEC-1 through SPEC-3, SEC-1/2, PERF-1, TEST-1 through TEST-4, CODE-1
hamza.khyari REQUEST_CHANGES (stale, official) Short "update" review
brent.edwards REQUEST_CHANGES (stale) Round 2: F1-F11 + follow-ups F12-F24. Most P1s fixed in subsequent push.
brent.edwards REQUEST_CHANGES (stale, official) Round 3: F25 (P0 CI failures), F26-F27 (P1), F28-F32 (P2). Branch ~160 commits behind master.
freemo COMMENT Active review in progress, author responding.

Merge Readiness

NOT READY — Significant blockers remain:

  1. P0 — CI integration test failures (F25: Robot tests broken by model extraction)
  2. P1 — Branch ~160 commits behind master (F27: must rebase)
  3. P1 — Broken Robot test (F26: cleanup.robot not updated)
  4. 4 open REQUEST_CHANGES reviews — all must be resolved or dismissed
  5. File length violations (F28: 4 files over 500 lines)

Action items for @CoreRasurae:

  1. Rebase onto current master — critical given 160-commit drift
  2. Fix robot/cleanup.robot to match model extraction
  3. Address remaining P1/P2 findings from Brent's Round 3
  4. Request re-review from both hamza.khyari and brent.edwards
  5. Need 2 formal APPROVEs with no open REQUEST_CHANGES
## PM Compliance Audit — PR #613 **Auditor**: PM (automated sweep) | **Date**: 2026-03-09 | **Reference**: CONTRIBUTING.md §Pull Request Process (items 1–12) ### Checklist | # | Requirement | Status | Notes | |---|-------------|--------|-------| | 1 | Detailed description (summary + motivation) | PASS | Comprehensive description with changes, testing, spec references, known limitations | | 2 | Closing keyword (`Closes #N` / `Fixes #N`) | PASS | `Closes #514` present in body + `ISSUES CLOSED: #514` in footer | | 3 | Dependency link (PR blocks issue) | PASS | PR #613 blocks #514 — link exists | | 4 | One Epic scope per PR | PASS | Single Epic (devcontainer lifecycle) | | 5 | Atomic, well-scoped commits | — | Not audited | | 6 | Commit messages reference tickets | — | Not audited | | 7 | Conventional Changelog standard | — | Not audited | | 8 | CHANGELOG updated | PASS | CHANGELOG entry confirmed in Brent's Round 3 review (F3 FIXED) | | 9 | No build/install artifacts | PASS | | | 10 | Milestone assigned | PASS | v3.6.0 (Post-MVP) | | 11 | Type/ label | PASS | `Type/Feature` present | | 12 | Assignee set | PASS | CoreRasurae (Luis) | ### Review Status | Reviewer | State | Notes | |----------|-------|-------| | hamza.khyari | REQUEST_CHANGES (stale, official, dismissed partially) | Round 1: BUG-1 through BUG-5, SPEC-1 through SPEC-3, SEC-1/2, PERF-1, TEST-1 through TEST-4, CODE-1 | | hamza.khyari | REQUEST_CHANGES (stale, official) | Short "update" review | | brent.edwards | REQUEST_CHANGES (stale) | Round 2: F1-F11 + follow-ups F12-F24. Most P1s fixed in subsequent push. | | brent.edwards | REQUEST_CHANGES (stale, official) | Round 3: F25 (P0 CI failures), F26-F27 (P1), F28-F32 (P2). Branch ~160 commits behind master. | | freemo | COMMENT | Active review in progress, author responding. | ### Merge Readiness **NOT READY** — Significant blockers remain: 1. **P0 — CI integration test failures** (F25: Robot tests broken by model extraction) 2. **P1 — Branch ~160 commits behind master** (F27: must rebase) 3. **P1 — Broken Robot test** (F26: cleanup.robot not updated) 4. **4 open REQUEST_CHANGES reviews** — all must be resolved or dismissed 5. **File length violations** (F28: 4 files over 500 lines) **Action items for @CoreRasurae:** 1. Rebase onto current master — critical given 160-commit drift 2. Fix `robot/cleanup.robot` to match model extraction 3. Address remaining P1/P2 findings from Brent's Round 3 4. Request re-review from both hamza.khyari and brent.edwards 5. Need 2 formal APPROVEs with no open REQUEST_CHANGES
Owner

PM Escalation — Rebase Required + CI Failures (Day 29)

@CoreRasurae This PR is ~160 commits behind master with P0 CI integration test failures (Robot tests broken by model extraction).

Required actions (in order):

  1. Rebase onto current master — this is the highest priority since quality gates cannot be validated against stale code
  2. Fix robot/cleanup.robot to match the CleanupReport/StaleItem/ResourceCleanupSummary model extraction into cleanup_models.py
  3. Address remaining P1/P2 findings from Brent's Round 3 review
  4. Request re-review from both @hamza.khyari and @brent.edwards

There are 4 open REQUEST_CHANGES reviews that must all be resolved or dismissed before merge. This is an M6+ PR so schedule pressure is lower, but the stale branch is accumulating technical debt that makes each day's rebase harder.

**PM Escalation — Rebase Required + CI Failures (Day 29)** @CoreRasurae This PR is **~160 commits behind master** with P0 CI integration test failures (Robot tests broken by model extraction). **Required actions (in order):** 1. Rebase onto current master — this is the highest priority since quality gates cannot be validated against stale code 2. Fix `robot/cleanup.robot` to match the `CleanupReport`/`StaleItem`/`ResourceCleanupSummary` model extraction into `cleanup_models.py` 3. Address remaining P1/P2 findings from Brent's Round 3 review 4. Request re-review from both @hamza.khyari and @brent.edwards There are 4 open REQUEST_CHANGES reviews that must all be resolved or dismissed before merge. This is an M6+ PR so schedule pressure is lower, but the stale branch is accumulating technical debt that makes each day's rebase harder.
Member

Verification Pass — PR #613 feature/m6plus-devcontainer-lifecycle (Round 4/5)

Reviewer: brent.edwards | Commit: a1608886 | Issue: #514
Prior reviews: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups (Round 2), brent.edwards #2052 (Round 3)


Nox Verification Matrix

Session Result
nox -s lint PASS
nox -s typecheck PASS (0 errors, 0 warnings)
nox -s unit_tests PASS (9,281 scenarios, 0 failures)
nox -s integration_tests PASS (all Robot suites passing, including Robot.Devcontainer Lifecycle)
nox -s coverage_report PASS (97% overall)
nox -s security_scan PASS (2 pre-existing findings in wrapping.py only)

Module-Level Coverage

Module Coverage Notes
_devcontainer_internals.py 98% 1 miss (line 128)
devcontainer.py 100%
devcontainer_cleanup.py 100%
devcontainer_health.py 91% 10 misses — defensive exception handlers in health check loop (lines 157, 171-176, 185, 199-204)
devcontainer_lifecycle.py 99% 3 misses (lines 206, 277-278)

devcontainer_health.py at 91% is below the 97% per-module threshold. The uncovered lines are except ValueError handlers and concurrent-state-already-changed breaks — legitimate defensive paths that are hard to exercise without elaborate thread-racing setups. See P3 finding below.


Round 3 Findings Verification

Prior ID Sev Finding Status
F25 P0 CI Robot test failures (cleanup + strategy) FIXEDcleanup.robot now checks symbol names (not class prefix) + new Cleanup Models Defines Data Classes test case. Strategy check updated to "none".
F26 P1 cleanup.robot broken by model extraction FIXED — See F25. Test correctly validates both cleanup_service.py re-exports and cleanup_models.py class definitions.
F27 P1 Branch ~160 commits behind master FIXED — Branch is now only 1 commit behind master (rebased). mergeable: true.
F28 P2 4 files exceed 500-line limit MOSTLY FIXEDdevcontainer.py (926→201 lines) split into 5 modules all under 500 lines. Feature file (762 lines) split into 4 files (126–237 lines). Step files all under 500 lines (409–494). Residual: resource.py (1,233) and plan_lifecycle_service.py (1,663) remain over limit — both pre-existing violations, this PR adds +173 and +42 lines respectively.
F29 P2 4 # type: ignore in Robot helper FIXED — Conditional import pattern replaced with self-contained inline _MockResult/_MockRunner classes. Zero # type: ignore in the diff.
F30 P2 13 inline imports in step files FIXED — All imports now at file top in both source modules and step files. Zero inline imports found.
F31 P2 Auto-discovery described as working but not wired FIXED — Clear > **Not yet wired (F31/F23):** callout added. All discovery sections labeled "(Planned)". Known Limitations table is consistent with the callout.
F32 P2 PR description stale NOT FIXED — Description still says "Handler (devcontainer.py — new, 794 lines)" and "93 scenarios" — doesn't reflect the 5-module split, current file sizes, or scenario counts.

New/Residual Findings

ID Sev Category File Description
F33 P3 Coverage devcontainer_health.py:157,171-176,185,199-204 devcontainer_health.py at 91% (below 97% threshold). Uncovered lines are except ValueError handlers in the health check loop — defensive concurrency guards that are hard to exercise without thread-racing test setups. Acceptable for a concurrent-exception guard pattern.
F28-R P3 Policy resource.py, plan_lifecycle_service.py Pre-existing file-length violations slightly worsened (+173, +42 lines). Not blocking since the violations existed before this PR.
F32 P2 Process PR description PR description is stale — should reflect the 5-module split, actual file sizes, and current scenario/feature counts.

File Size Audit (All New/Modified Source Files)

File Lines Status
_devcontainer_internals.py 152
devcontainer.py 201
devcontainer_health.py 206
devcontainer_lifecycle.py 439
devcontainer_cleanup.py 149
container_lifecycle.py 286
devcontainer_state.feature 198
devcontainer_activation.feature 217
devcontainer_health_check.feature 126
devcontainer_cleanup.feature 237
devcontainer_activation_steps.py 494
devcontainer_cleanup_steps.py 465
devcontainer_health_check_steps.py 409
devcontainer_lifecycle_steps.py 464
resource.py (pre-existing) 1,233 ⚠️ Pre-existing violation
plan_lifecycle_service.py (pre-existing) 1,663 ⚠️ Pre-existing violation

Verdict

APPROVE — All P0 and P1 findings from Rounds 1-3 are verified fixed. The code split is clean and well-structured with proper re-exports for backward compatibility. All nox quality gates pass. The only open items are:

  • F32 (P2): Update the PR description to reflect the current implementation (non-blocking)
  • F33 (P3): devcontainer_health.py coverage at 91% — acceptable for defensive concurrency handlers
  • F28-R (P3): Pre-existing file-length violations — not caused by this PR

What Went Well

  • Excellent 5-module split addressing the 926-line file violation — each module has a clear single responsibility
  • Re-export pattern in devcontainer.py with comprehensive __all__ preserves backward compatibility
  • All 9,281 BDD scenarios pass (up from 8,208 in the previous verification)
  • Robot test fixes are correct — cleanup.robot properly validates both the new cleanup_models.py and the re-exports in cleanup_service.py
  • Branch successfully rebased onto current master — merge conflicts resolved
  • Zero # type: ignore suppressions remaining
  • Zero inline imports — all at file top as required by CONTRIBUTING
  • Documentation accurately describes the "not yet wired" status of auto-discovery
  • Health check TOCTOU fix with _registry_lock is properly applied across all modules
  • Thread join in _stop_health_check prevents dangling background work
## Verification Pass — PR #613 `feature/m6plus-devcontainer-lifecycle` (Round 4/5) **Reviewer**: brent.edwards | **Commit**: `a1608886` | **Issue**: #514 **Prior reviews**: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups (Round 2), brent.edwards #2052 (Round 3) --- ### Nox Verification Matrix | Session | Result | |---------|--------| | `nox -s lint` | ✅ PASS | | `nox -s typecheck` | ✅ PASS (0 errors, 0 warnings) | | `nox -s unit_tests` | ✅ PASS (9,281 scenarios, 0 failures) | | `nox -s integration_tests` | ✅ PASS (all Robot suites passing, including `Robot.Devcontainer Lifecycle`) | | `nox -s coverage_report` | ✅ PASS (97% overall) | | `nox -s security_scan` | ✅ PASS (2 pre-existing findings in `wrapping.py` only) | ### Module-Level Coverage | Module | Coverage | Notes | |:-------|:--------:|:------| | `_devcontainer_internals.py` | 98% | 1 miss (line 128) | | `devcontainer.py` | 100% | — | | `devcontainer_cleanup.py` | 100% | — | | `devcontainer_health.py` | 91% | 10 misses — defensive exception handlers in health check loop (lines 157, 171-176, 185, 199-204) | | `devcontainer_lifecycle.py` | 99% | 3 misses (lines 206, 277-278) | `devcontainer_health.py` at 91% is below the 97% per-module threshold. The uncovered lines are `except ValueError` handlers and concurrent-state-already-changed breaks — legitimate defensive paths that are hard to exercise without elaborate thread-racing setups. See P3 finding below. --- ### Round 3 Findings Verification | Prior ID | Sev | Finding | Status | |:---------|:----|:--------|:-------| | **F25** | P0 | CI Robot test failures (cleanup + strategy) | ✅ **FIXED** — `cleanup.robot` now checks symbol names (not `class` prefix) + new `Cleanup Models Defines Data Classes` test case. Strategy check updated to `"none"`. | | **F26** | P1 | cleanup.robot broken by model extraction | ✅ **FIXED** — See F25. Test correctly validates both `cleanup_service.py` re-exports and `cleanup_models.py` class definitions. | | **F27** | P1 | Branch ~160 commits behind master | ✅ **FIXED** — Branch is now only 1 commit behind master (rebased). `mergeable: true`. | | **F28** | P2 | 4 files exceed 500-line limit | ✅ **MOSTLY FIXED** — `devcontainer.py` (926→201 lines) split into 5 modules all under 500 lines. Feature file (762 lines) split into 4 files (126–237 lines). Step files all under 500 lines (409–494). **Residual**: `resource.py` (1,233) and `plan_lifecycle_service.py` (1,663) remain over limit — both pre-existing violations, this PR adds +173 and +42 lines respectively. | | **F29** | P2 | 4 `# type: ignore` in Robot helper | ✅ **FIXED** — Conditional import pattern replaced with self-contained inline `_MockResult`/`_MockRunner` classes. Zero `# type: ignore` in the diff. | | **F30** | P2 | 13 inline imports in step files | ✅ **FIXED** — All imports now at file top in both source modules and step files. Zero inline imports found. | | **F31** | P2 | Auto-discovery described as working but not wired | ✅ **FIXED** — Clear `> **Not yet wired (F31/F23):**` callout added. All discovery sections labeled "(Planned)". Known Limitations table is consistent with the callout. | | **F32** | P2 | PR description stale | ❌ **NOT FIXED** — Description still says "Handler (`devcontainer.py` — new, 794 lines)" and "93 scenarios" — doesn't reflect the 5-module split, current file sizes, or scenario counts. | --- ### New/Residual Findings | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F33 | P3 | Coverage | `devcontainer_health.py:157,171-176,185,199-204` | `devcontainer_health.py` at 91% (below 97% threshold). Uncovered lines are `except ValueError` handlers in the health check loop — defensive concurrency guards that are hard to exercise without thread-racing test setups. Acceptable for a concurrent-exception guard pattern. | | F28-R | P3 | Policy | `resource.py`, `plan_lifecycle_service.py` | Pre-existing file-length violations slightly worsened (+173, +42 lines). Not blocking since the violations existed before this PR. | | F32 | P2 | Process | PR description | PR description is stale — should reflect the 5-module split, actual file sizes, and current scenario/feature counts. | --- ### File Size Audit (All New/Modified Source Files) | File | Lines | Status | |:-----|------:|:-------| | `_devcontainer_internals.py` | 152 | ✅ | | `devcontainer.py` | 201 | ✅ | | `devcontainer_health.py` | 206 | ✅ | | `devcontainer_lifecycle.py` | 439 | ✅ | | `devcontainer_cleanup.py` | 149 | ✅ | | `container_lifecycle.py` | 286 | ✅ | | `devcontainer_state.feature` | 198 | ✅ | | `devcontainer_activation.feature` | 217 | ✅ | | `devcontainer_health_check.feature` | 126 | ✅ | | `devcontainer_cleanup.feature` | 237 | ✅ | | `devcontainer_activation_steps.py` | 494 | ✅ | | `devcontainer_cleanup_steps.py` | 465 | ✅ | | `devcontainer_health_check_steps.py` | 409 | ✅ | | `devcontainer_lifecycle_steps.py` | 464 | ✅ | | `resource.py` (pre-existing) | 1,233 | ⚠️ Pre-existing violation | | `plan_lifecycle_service.py` (pre-existing) | 1,663 | ⚠️ Pre-existing violation | --- ### Verdict **APPROVE** — All P0 and P1 findings from Rounds 1-3 are verified fixed. The code split is clean and well-structured with proper re-exports for backward compatibility. All nox quality gates pass. The only open items are: - **F32 (P2)**: Update the PR description to reflect the current implementation (non-blocking) - **F33 (P3)**: `devcontainer_health.py` coverage at 91% — acceptable for defensive concurrency handlers - **F28-R (P3)**: Pre-existing file-length violations — not caused by this PR ### What Went Well - Excellent 5-module split addressing the 926-line file violation — each module has a clear single responsibility - Re-export pattern in `devcontainer.py` with comprehensive `__all__` preserves backward compatibility - All 9,281 BDD scenarios pass (up from 8,208 in the previous verification) - Robot test fixes are correct — cleanup.robot properly validates both the new `cleanup_models.py` and the re-exports in `cleanup_service.py` - Branch successfully rebased onto current master — merge conflicts resolved - Zero `# type: ignore` suppressions remaining - Zero inline imports — all at file top as required by CONTRIBUTING - Documentation accurately describes the "not yet wired" status of auto-discovery - Health check TOCTOU fix with `_registry_lock` is properly applied across all modules - Thread join in `_stop_health_check` prevents dangling background work
brent.edwards approved these changes 2026-03-09 22:34:06 +00:00
Dismissed
brent.edwards left a comment

APPROVED — Verification Pass (Round 4/5)

All P0 and P1 findings from Rounds 1-3 are verified fixed. All nox quality gates pass (lint , typecheck , 9,281 BDD scenarios , Robot integration , 97% coverage , security ).

Remaining open items are non-blocking:

  • F32 (P2): PR description stale — please update to reflect the 5-module split
  • F33 (P3): devcontainer_health.py at 91% coverage — acceptable for defensive concurrency handlers
  • F28-R (P3): Pre-existing file-length violations in resource.py and plan_lifecycle_service.py

See comment #57193 for full verification details.

## APPROVED — Verification Pass (Round 4/5) All P0 and P1 findings from Rounds 1-3 are verified fixed. All nox quality gates pass (lint ✅, typecheck ✅, 9,281 BDD scenarios ✅, Robot integration ✅, 97% coverage ✅, security ✅). Remaining open items are non-blocking: - **F32 (P2)**: PR description stale — please update to reflect the 5-module split - **F33 (P3)**: `devcontainer_health.py` at 91% coverage — acceptable for defensive concurrency handlers - **F28-R (P3)**: Pre-existing file-length violations in `resource.py` and `plan_lifecycle_service.py` See comment #57193 for full verification details.
Member

Reviewer Response — @brent.edwards

Responding to PM compliance audit (#57067), PM escalation (#57086), PM status checks (#56523, #56792), and @CoreRasurae's fix response (#56678).

Review Status

The PM compliance audit (#57067) lists 4 open REQUEST_CHANGES reviews and says "NOT READY". To clarify:

  • Formal APPROVE review #2067 posted today (Mar 9) alongside detailed verification comment #57193.
  • My APPROVE supersedes my previous REQUEST_CHANGES reviews (#2019, #2052). All P0 and P1 findings from Rounds 1-3 are verified fixed.
  • Full nox verification matrix passed: lint, typecheck, 9,281 scenarios, integration tests, 97% coverage, security scan.

Luis's Fix Verification (responding to #56678)

@CoreRasurae — Your fixes in commit a1608886 are all verified correct. Thank you for the thorough response and clean implementation:

Finding Verification
F25 (P0) Robot CI failures FIXED — cleanup.robot validates both cleanup_models.py classes and cleanup_service.py re-exports
F26 (P1) cleanup.robot broken FIXED — See F25
F27 (P1) 160 commits behind FIXED — Branch rebased, now 1 behind master, mergeable: true
F28 (P2) File size violations FIXEDdevcontainer.py split 926→201 lines across 5 clean modules. All under 500. Feature file split into 4 domain files.
F29 (P2) type:ignore in Robot FIXED — Inline _MockResult/_MockRunner classes replace the conditional import
F30 (P2) Inline imports FIXED — All 13 moved to file top
F31 (P2) Auto-discovery docs FIXED — Clear "Not yet wired" callout with "(Planned)" labels

The 5-module split is particularly well done — clean single-responsibility per module, proper __all__ re-exports for backward compatibility, zero circular dependencies.

Remaining Open Items

Finding Sev Status
F32 P2 NOT FIXED — PR description is stale. Should reflect the 5-module split, current file sizes, and scenario counts. Non-blocking.
F33 P3 devcontainer_health.py at 91% — Acceptable for defensive concurrency handlers. Non-blocking.
F28-R P3 Pre-existing resource.py/plan_lifecycle_service.py violations. Not caused by this PR. Non-blocking.

PM Action Items

PM Item Status
"Verify F25-F31 resolutions, re-review — Mar 12" Done early — Verified and APPROVED on Mar 9
"4 open REQUEST_CHANGES" My latest is APPROVE. @hamza.khyari's 2 older REQUEST_CHANGES (#2010, #2011) are from Round 1 — they should re-review or dismiss given the extensive fixes since then.
"P0 CI failures" Resolved — All nox sessions pass
"Rebase required" Done — Branch rebased, 1 commit behind, mergeable

Merge Readiness Assessment

From my perspective this PR is ready to merge once:

  1. @hamza.khyari re-reviews or dismisses their stale REQUEST_CHANGES
  2. @CoreRasurae updates the PR description (F32, P2 — quick fix)
  3. Any final 1-behind-master merge is clean

The code quality is solid. The 5-module refactor addressed the major structural concern. All quality gates pass.

## Reviewer Response — @brent.edwards Responding to PM compliance audit (#57067), PM escalation (#57086), PM status checks (#56523, #56792), and @CoreRasurae's fix response (#56678). ### Review Status The PM compliance audit (#57067) lists 4 open REQUEST_CHANGES reviews and says "NOT READY". To clarify: - **Formal APPROVE review #2067** posted today (Mar 9) alongside detailed verification comment #57193. - My APPROVE supersedes my previous REQUEST_CHANGES reviews (#2019, #2052). All P0 and P1 findings from Rounds 1-3 are **verified fixed**. - Full nox verification matrix passed: lint, typecheck, 9,281 scenarios, integration tests, 97% coverage, security scan. ### Luis's Fix Verification (responding to #56678) @CoreRasurae — Your fixes in commit `a1608886` are all verified correct. Thank you for the thorough response and clean implementation: | Finding | Verification | |---------|-------------| | F25 (P0) Robot CI failures | **FIXED** — cleanup.robot validates both `cleanup_models.py` classes and `cleanup_service.py` re-exports | | F26 (P1) cleanup.robot broken | **FIXED** — See F25 | | F27 (P1) 160 commits behind | **FIXED** — Branch rebased, now 1 behind master, `mergeable: true` | | F28 (P2) File size violations | **FIXED** — `devcontainer.py` split 926→201 lines across 5 clean modules. All under 500. Feature file split into 4 domain files. | | F29 (P2) type:ignore in Robot | **FIXED** — Inline `_MockResult`/`_MockRunner` classes replace the conditional import | | F30 (P2) Inline imports | **FIXED** — All 13 moved to file top | | F31 (P2) Auto-discovery docs | **FIXED** — Clear "Not yet wired" callout with "(Planned)" labels | The 5-module split is particularly well done — clean single-responsibility per module, proper `__all__` re-exports for backward compatibility, zero circular dependencies. ### Remaining Open Items | Finding | Sev | Status | |---------|-----|--------| | F32 | P2 | **NOT FIXED** — PR description is stale. Should reflect the 5-module split, current file sizes, and scenario counts. Non-blocking. | | F33 | P3 | `devcontainer_health.py` at 91% — Acceptable for defensive concurrency handlers. Non-blocking. | | F28-R | P3 | Pre-existing `resource.py`/`plan_lifecycle_service.py` violations. Not caused by this PR. Non-blocking. | ### PM Action Items | PM Item | Status | |---------|--------| | "Verify F25-F31 resolutions, re-review — Mar 12" | **Done early** — Verified and APPROVED on Mar 9 | | "4 open REQUEST_CHANGES" | My latest is APPROVE. @hamza.khyari's 2 older REQUEST_CHANGES (#2010, #2011) are from Round 1 — they should re-review or dismiss given the extensive fixes since then. | | "P0 CI failures" | **Resolved** — All nox sessions pass | | "Rebase required" | **Done** — Branch rebased, 1 commit behind, mergeable | ### Merge Readiness Assessment From my perspective this PR is **ready to merge** once: 1. @hamza.khyari re-reviews or dismisses their stale REQUEST_CHANGES 2. @CoreRasurae updates the PR description (F32, P2 — quick fix) 3. Any final 1-behind-master merge is clean The code quality is solid. The 5-module refactor addressed the major structural concern. All quality gates pass.
CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from a160888632
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 44s
CI / integration_tests (pull_request) Successful in 3m14s
CI / unit_tests (pull_request) Successful in 4m41s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m44s
CI / benchmark-regression (pull_request) Successful in 33m49s
to cf67ba0a86
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 37s
CI / unit_tests (pull_request) Successful in 2m27s
CI / docker (pull_request) Successful in 49s
CI / integration_tests (pull_request) Successful in 3m46s
CI / coverage (pull_request) Successful in 4m56s
CI / lint (push) Successful in 12s
CI / build (push) Successful in 14s
CI / quality (push) Successful in 16s
CI / security (push) Successful in 32s
CI / typecheck (push) Successful in 36s
CI / benchmark-regression (push) Has been skipped
CI / unit_tests (push) Successful in 4m7s
CI / integration_tests (push) Successful in 4m41s
CI / docker (push) Successful in 44s
CI / coverage (push) Successful in 4m56s
CI / benchmark-publish (push) Successful in 17m17s
CI / benchmark-regression (pull_request) Successful in 31m25s
2026-03-10 12:37:29 +00:00
Compare
CoreRasurae dismissed brent.edwards's review 2026-03-10 12:37:29 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae deleted branch feature/m6plus-devcontainer-lifecycle 2026-03-10 12:43:27 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!613
No description provided.