feat(devcontainer): add lazy activation and lifecycle management #613

CoreRasurae · 2026-03-06T12:44:52Z

CoreRasurae commented

2026-03-06 12:44:52 +00:00

Summary

Implements full lifecycle management for devcontainer-instance resources: lazy activation on first tool target, health checking, manual stop/rebuild CLI commands, and automatic cleanup on session close or plan completion.

Closes #514

Changes

Domain Model (`container_lifecycle.py` — new, 249 lines)

ContainerLifecycleState enum: detected, building, running, stopping, stopped, failed
VALID_TRANSITIONS mapping with validated state machine
ContainerLifecycleTracker Pydantic v2 model with session_id for scoped cleanup
transition_state() with history cap (100), automatic clearing of container_id/workspace_path on terminal states

Handler (`devcontainer.py` — new, 794 lines)

activate_container(): lazy activation via devcontainer up with JSON output parsing, orphan cleanup on timeout, auto-starts health check on success
stop_container(): idempotent stop (early return on terminal state), stops health check before lock to avoid deadlock
rebuild_container(): re-activates stopped/failed containers
start_health_check() / _health_check_loop(): background daemon thread with configurable interval, TOCTOU-safe transitions under lock
stop_all_active_containers(): session-scoped bulk stop with terminal tracker eviction
evict_terminal_trackers(): caps registry at 200 terminal-state entries, evicts oldest first
_parse_devcontainer_up_output(): reverse-line JSON scanning, hex container ID validation, absolute workspace path validation
DevcontainerHandler class: extends BaseResourceHandler with snapshot sandbox strategy

CLI Commands (`resource.py` — modified)

agents resource stop <name> [--yes]: transitions running → stopping → stopped
agents resource rebuild <name> [--yes]: transitions stopped/failed → building → running
Type guards: _STOPPABLE_TYPES (devcontainer-instance, container-instance), _REBUILDABLE_TYPES (devcontainer-instance only)

Cleanup Wiring

AcpLocalFacade._handle_session_close() → session-scoped container cleanup (works even without session service)
PlanLifecycleService.complete_apply() / fail_apply() / fail_execute() / cancel_plan() → plan-scoped container cleanup
CleanupService.stop_active_devcontainers() → thin static wrapper delegating to handler

Specification Updates (`specification.md`)

Updated handler name to DevcontainerHandler in 3 locations
Added activation_state enum field to devcontainer-instance type definition
Updated handler dispatch table

Documentation (`devcontainer_resources.md` — new, 264 lines)

Full lifecycle docs: state diagram, transitions table, lazy activation flow
Health checking behavior, session/plan cleanup hooks
CLI usage with stop/rebuild commands
Known limitations table (in-memory state, eviction cap, sandbox_strategy mismatch)

Testing

Suite	Result
Behave BDD	93 scenarios, 0 failures
Robot Framework	10 integration tests, 0 failures
ASV benchmarks	6 benchmarks (activation latency, transitions, parsing, registry ops)
ruff lint	0 errors
pyright typecheck	0 errors, 0 warnings

BDD Coverage Highlights (668-line feature file, 93 scenarios)

State model: enum values, all valid/invalid transitions, history cap
Lazy activation: success, failure, timeout with orphan cleanup, concurrent rejection
Stop: success, error paths, no container_id, non-zero rc, idempotent on terminal state
Rebuild: from stopped and failed states
Health check: success loop, failure transition, exception, registration, thread join
Session cleanup: scoped, unscoped, error handling, return list
CLI stop/rebuild: type validation, state preconditions, --yes flag, error paths (NotFoundError, RuntimeError)
Concurrency: dual-thread activation (one wins), dual-thread stop (both succeed — idempotent)
Parsing: valid JSON, invalid JSON, multi-line, invalid container ID, relative workspace path

Known Limitations

Limitation	Detail
In-memory lifecycle state	Not persisted to DB; lost on restart. Planned for M7+.
Registry eviction	Terminal trackers evicted at 200 cap. Acceptable for MVP.
Spec `sandbox_strategy`	Spec YAML says `container_snapshot`; implementation uses `snapshot`. Internal spec inconsistency — tables elsewhere also say `snapshot`.

Spec References

Devcontainer Integration: specification.md §33253–33263 (activation_state field)
Resource Type Inheritance: specification.md §23509–23527 (handler definition)
Handler Dispatch: specification.md §24876–24877 (sandbox strategy)
ADR-042: Resource Type Inheritance
ADR-043: Devcontainer Integration

ISSUES CLOSED: #514

## Summary Implements full lifecycle management for `devcontainer-instance` resources: lazy activation on first tool target, health checking, manual stop/rebuild CLI commands, and automatic cleanup on session close or plan completion. Closes #514 ## Changes ### Domain Model (`container_lifecycle.py` — new, 249 lines) - `ContainerLifecycleState` enum: `detected`, `building`, `running`, `stopping`, `stopped`, `failed` - `VALID_TRANSITIONS` mapping with validated state machine - `ContainerLifecycleTracker` Pydantic v2 model with `session_id` for scoped cleanup - `transition_state()` with history cap (100), automatic clearing of `container_id`/`workspace_path` on terminal states ### Handler (`devcontainer.py` — new, 794 lines) - `activate_container()`: lazy activation via `devcontainer up` with JSON output parsing, orphan cleanup on timeout, auto-starts health check on success - `stop_container()`: idempotent stop (early return on terminal state), stops health check before lock to avoid deadlock - `rebuild_container()`: re-activates stopped/failed containers - `start_health_check()` / `_health_check_loop()`: background daemon thread with configurable interval, TOCTOU-safe transitions under lock - `stop_all_active_containers()`: session-scoped bulk stop with terminal tracker eviction - `evict_terminal_trackers()`: caps registry at 200 terminal-state entries, evicts oldest first - `_parse_devcontainer_up_output()`: reverse-line JSON scanning, hex container ID validation, absolute workspace path validation - `DevcontainerHandler` class: extends `BaseResourceHandler` with `snapshot` sandbox strategy ### CLI Commands (`resource.py` — modified) - `agents resource stop <name> [--yes]`: transitions `running` → `stopping` → `stopped` - `agents resource rebuild <name> [--yes]`: transitions `stopped`/`failed` → `building` → `running` - Type guards: `_STOPPABLE_TYPES` (devcontainer-instance, container-instance), `_REBUILDABLE_TYPES` (devcontainer-instance only) ### Cleanup Wiring - `AcpLocalFacade._handle_session_close()` → session-scoped container cleanup (works even without session service) - `PlanLifecycleService.complete_apply()` / `fail_apply()` / `fail_execute()` / `cancel_plan()` → plan-scoped container cleanup - `CleanupService.stop_active_devcontainers()` → thin static wrapper delegating to handler ### Specification Updates (`specification.md`) - Updated handler name to `DevcontainerHandler` in 3 locations - Added `activation_state` enum field to `devcontainer-instance` type definition - Updated handler dispatch table ### Documentation (`devcontainer_resources.md` — new, 264 lines) - Full lifecycle docs: state diagram, transitions table, lazy activation flow - Health checking behavior, session/plan cleanup hooks - CLI usage with stop/rebuild commands - Known limitations table (in-memory state, eviction cap, sandbox_strategy mismatch) ## Testing | Suite | Result | |-------|--------| | Behave BDD | 93 scenarios, 0 failures | | Robot Framework | 10 integration tests, 0 failures | | ASV benchmarks | 6 benchmarks (activation latency, transitions, parsing, registry ops) | | ruff lint | 0 errors | | pyright typecheck | 0 errors, 0 warnings | ### BDD Coverage Highlights (668-line feature file, 93 scenarios) - State model: enum values, all valid/invalid transitions, history cap - Lazy activation: success, failure, timeout with orphan cleanup, concurrent rejection - Stop: success, error paths, no container_id, non-zero rc, idempotent on terminal state - Rebuild: from stopped and failed states - Health check: success loop, failure transition, exception, registration, thread join - Session cleanup: scoped, unscoped, error handling, return list - CLI stop/rebuild: type validation, state preconditions, --yes flag, error paths (NotFoundError, RuntimeError) - Concurrency: dual-thread activation (one wins), dual-thread stop (both succeed — idempotent) - Parsing: valid JSON, invalid JSON, multi-line, invalid container ID, relative workspace path ## Known Limitations | Limitation | Detail | |------------|--------| | In-memory lifecycle state | Not persisted to DB; lost on restart. Planned for M7+. | | Registry eviction | Terminal trackers evicted at 200 cap. Acceptable for MVP. | | Spec `sandbox_strategy` | Spec YAML says `container_snapshot`; implementation uses `snapshot`. Internal spec inconsistency — tables elsewhere also say `snapshot`. | ## Spec References - Devcontainer Integration: `specification.md` §33253–33263 (activation_state field) - Resource Type Inheritance: `specification.md` §23509–23527 (handler definition) - Handler Dispatch: `specification.md` §24876–24877 (sandbox strategy) - ADR-042: Resource Type Inheritance - ADR-043: Devcontainer Integration ISSUES CLOSED: #514

CoreRasurae self-assigned this

2026-03-06 12:45:01 +00:00

CoreRasurae added this to the v3.6.0 milestone

2026-03-06 12:45:01 +00:00

CoreRasurae added the

Type

Feature

label

2026-03-06 12:45:07 +00:00

hamza.khyari requested changes

2026-03-06 13:42:21 +00:00

Dismissed

hamza.khyari left a comment

Review: feat(devcontainer): add lazy activation and lifecycle management

Commit: 196e1415 | Issue: #514 | 16 files, +4574 lines

BUG-1 (Major) — `model_copy(update=...)` bypasses Pydantic validators throughout

Severity: Major
Category: BUG
File: container_lifecycle.py:249, devcontainer.py:175,236-237

transition_state() and activate_container() use tracker.model_copy(update={...}) to create new tracker instances. model_copy does NOT re-run field validators. Confirmed by instrumented test: model_copy(update={"health_check_interval": 1.0}) succeeds despite the field having ge=10 constraint.

Currently safe because the code never passes externally-sourced values through model_copy for validated fields. But this is a latent bypass — any future code that does tracker.model_copy(update={"health_check_interval": user_input}) silently violates the ge=10, le=3600 constraint. Use model_validate(tracker.model_dump() | updates) for any path that updates validated fields, or add a comment documenting the intentional bypass for internal-only fields.

BUG-2 (Major) — Stale `container_id` persists through `BUILDING` transition

Severity: Major
Category: BUG
File: container_lifecycle.py:238-248

transition_state() clears container_id and workspace_path on transitions to FAILED and STOPPED, but NOT on transition to BUILDING. During activate_container, between the BUILDING transition (line 177-182) and the RUNNING transition (line 239-244), the tracker in the registry carries the old container_id from the prior lifecycle. Any code that reads the tracker during BUILDING (e.g., a health check that hasn't been stopped yet) could see a stale container_id pointing to a defunct container.

Recommendation: Clear container_id and workspace_path on transition to BUILDING as well:

if to_state in (
    ContainerLifecycleState.FAILED,
    ContainerLifecycleState.STOPPED,
    ContainerLifecycleState.BUILDING,
):
    updates["container_id"] = None
    updates["workspace_path"] = None

BUG-3 (Medium) — `list_active_containers_for_session` stops other sessions' containers

Severity: Medium
Category: BUG
File: devcontainer.py:641-642

if tracker.session_id is None or tracker.session_id == session_id

Containers with session_id=None (unscoped) are returned for ANY session cleanup. If session A closes and calls stop_all_active_containers(session_id="A"), it will stop all unscoped containers, which might be in use by session B. The docstring documents this as intentional, but the first session to close wins all unscoped containers.

Recommendation: At minimum, log a warning when stopping an unscoped container during session-scoped cleanup so operators know cross-session impact occurred.

BUG-4 (Medium) — `ContainerLifecycleTracker.transitions` is a mutable list on a non-frozen model

Severity: Medium
Category: BUG
File: container_lifecycle.py:147,152

The tracker is NOT frozen (model_config = ConfigDict(str_strip_whitespace=True)), and transitions is list[LifecycleTransition]. Any code holding a tracker reference can mutate the list in-place: tracker.transitions.append(fake_transition). Since trackers are stored in a shared global registry, in-place mutation from one thread is visible to others without lock protection.

The current code avoids this by always creating new lists via [*tracker.transitions, transition], but this invariant is not enforced. Recommendation: Either freeze the model or use tuple[LifecycleTransition, ...] for transitions.

BUG-5 (Minor) — `rebuild_container` does not forward `session_id`

Severity: Minor
Category: BUG
File: devcontainer.py:453-457

rebuild_container delegates to activate_container but never passes session_id. If a container is rebuilt from a different session, the session_id on the tracker won't update. The rebuilt container may be cleaned up by the old session's close hook rather than the new one's.

SPEC-1 (Major) — `activation_state` enum names diverge from Issue #514

Severity: Major
Category: SPEC
File: container_lifecycle.py:49-72, docs/specification.md

The spec (updated by this PR) uses detected/building/running/stopping/stopped/failed. Issue #514 acceptance criteria use inactive/starting/active/stopping/stopped/error. These are completely different vocabularies. Only stopping and stopped match. This should be reconciled — either update the issue or document the intentional rename.

SPEC-2 (Minor) — Module path inconsistency: singular vs plural `resource`

Severity: Minor
Category: SPEC
File: docs/specification.md

The PR changes the devcontainer handler path to cleveragents.resource.handlers.devcontainer (singular). Every other handler in the spec uses cleveragents.resources.handlers.* (plural). Either the other handlers' paths are wrong or this one is.

SPEC-3 (Minor) — Health checking and session-scoped cleanup absent from spec

Severity: Minor
Category: SPEC

Issue #514 requires health checking (periodic liveness probes) and automatic cleanup on session close / plan completion. The spec has no mention of either for container resources (health checking only exists for LSP servers). These features should be added to the spec or flagged as implementation-ahead-of-spec.

SEC-1 (Medium) — `workspace_folder` not sanitized beyond `os.path.isabs()`

Severity: Medium
Category: SEC
File: devcontainer.py:165-166

activate_container validates workspace_folder is absolute but does not canonicalize it. A path like /tmp/../etc/devcontainer.json passes validation. This path is passed directly to subprocess.run(["devcontainer", "up", "--workspace-folder", workspace_folder]). While shell=True is not used, the devcontainer CLI will operate on any directory the process can access.

Recommendation: Canonicalize with os.path.realpath() and optionally restrict to known project directories.

SEC-2 (Minor) — Health probe uses hardcoded fallback path

Severity: Minor
Category: SEC
File: devcontainer.py:532

_single_probe uses tracker.workspace_path or "/workspace". If workspace_path is None (tracker in unexpected state), the probe runs against /workspace inside an arbitrary container, which could be a different container's workspace.

PERF-1 (Minor) — `_handlers()` rebuilds dispatch dict on every ACP request

Severity: Minor
Category: PERF
File: acp/facade.py:194-208

_route_operation calls self._handlers() which builds a new dict on every dispatch. For a hot path (every ACP request), this should be a cached property or class-level constant.

TEST-1 (Major) — Concurrency tests lack `threading.Barrier` and post-condition invariants

Severity: Major
Category: TEST
File: features/devcontainer_lifecycle.feature:468,558

Both concurrency scenarios ("Concurrent activation rejected" and "Concurrent stop handled safely") start threads sequentially without a threading.Barrier. The first thread may complete before the second starts, making these sequential tests in disguise. Compare with tool_runtime_steps.py:229 in this codebase which properly uses threading.Barrier.

The concurrent stop test also does not assert: (a) exactly 1 docker stop call was made (not 0 or 2), or (b) the final registry state is exactly 1 tracker in stopped state.

TEST-2 (Medium) — 5 sham/structural-only test scenarios

Severity: Medium
Category: TEST

"CleanupService has stop_active_devcontainers method" (line 137) — hasattr check only
"DevcontainerHandler has expected class attributes" (line 382) — static attribute check
"DevcontainerHandler extends BaseResourceHandler" (line 386) — issubclass check
"CLI stop with --yes skips confirmation" (line 511) — patches out production code, asserts exit code 0
"CLI rebuild with --yes skips confirmation" (line 516) — same

These test structural properties, not behavior. The first three would fail at import time if violated.

TEST-3 (Medium) — Coverage gaps: `list_active_containers_for_session` untested directly

Severity: Medium
Category: TEST
File: devcontainer.py:619-643

list_active_containers_for_session has no direct test scenario. Its return value is never asserted. The empty-session-id fallback and session_id is None matching logic are not explicitly verified.

Also untested: evict_terminal_trackers when below threshold (should return 0), get_lifecycle_tracker auto-creation behavior, clear_lifecycle_registry thread-join behavior.

TEST-4 (Minor) — Mock does not validate subprocess kwargs

Severity: Minor
Category: TEST
File: features/mocks/mock_devcontainer_cli.py:65

MockDevcontainerRunner.__call__ records but never validates kwargs (capture_output, text, timeout, check). A bug where check=True is set (causing CalledProcessError in production) would pass all tests. The mock also returns the same container ID for all workspaces, preventing detection of workspace-mismatch bugs.

CODE-1 (Minor) — `vulture_whitelist.py` whitelists private symbols

Severity: Minor
Category: CODE
File: vulture_whitelist.py:538-545

The whitelist includes private symbols (_stop_health_check, _health_check_loop, _single_probe, _lifecycle_registry, _registry_lock, etc.). Private symbols should not need vulture whitelisting — if vulture flags them as unused, they may be dead code or test-only utilities that should be restructured.

## Review: feat(devcontainer): add lazy activation and lifecycle management **Commit**: `196e1415` | **Issue**: #514 | **16 files, +4574 lines** --- ### BUG-1 (Major) — `model_copy(update=...)` bypasses Pydantic validators throughout **Severity**: Major **Category**: BUG **File**: `container_lifecycle.py:249`, `devcontainer.py:175,236-237` `transition_state()` and `activate_container()` use `tracker.model_copy(update={...})` to create new tracker instances. `model_copy` does NOT re-run field validators. Confirmed by instrumented test: `model_copy(update={"health_check_interval": 1.0})` succeeds despite the field having `ge=10` constraint. Currently safe because the code never passes externally-sourced values through `model_copy` for validated fields. But this is a latent bypass — any future code that does `tracker.model_copy(update={"health_check_interval": user_input})` silently violates the `ge=10, le=3600` constraint. Use `model_validate(tracker.model_dump() | updates)` for any path that updates validated fields, or add a comment documenting the intentional bypass for internal-only fields. --- ### BUG-2 (Major) — Stale `container_id` persists through `BUILDING` transition **Severity**: Major **Category**: BUG **File**: `container_lifecycle.py:238-248` `transition_state()` clears `container_id` and `workspace_path` on transitions to `FAILED` and `STOPPED`, but NOT on transition to `BUILDING`. During `activate_container`, between the `BUILDING` transition (line 177-182) and the `RUNNING` transition (line 239-244), the tracker in the registry carries the old `container_id` from the prior lifecycle. Any code that reads the tracker during `BUILDING` (e.g., a health check that hasn't been stopped yet) could see a stale container_id pointing to a defunct container. **Recommendation**: Clear `container_id` and `workspace_path` on transition to `BUILDING` as well: ```python if to_state in ( ContainerLifecycleState.FAILED, ContainerLifecycleState.STOPPED, ContainerLifecycleState.BUILDING, ): updates["container_id"] = None updates["workspace_path"] = None ``` --- ### BUG-3 (Medium) — `list_active_containers_for_session` stops other sessions' containers **Severity**: Medium **Category**: BUG **File**: `devcontainer.py:641-642` ```python if tracker.session_id is None or tracker.session_id == session_id ``` Containers with `session_id=None` (unscoped) are returned for ANY session cleanup. If session A closes and calls `stop_all_active_containers(session_id="A")`, it will stop all unscoped containers, which might be in use by session B. The docstring documents this as intentional, but the first session to close wins all unscoped containers. **Recommendation**: At minimum, log a warning when stopping an unscoped container during session-scoped cleanup so operators know cross-session impact occurred. --- ### BUG-4 (Medium) — `ContainerLifecycleTracker.transitions` is a mutable list on a non-frozen model **Severity**: Medium **Category**: BUG **File**: `container_lifecycle.py:147,152` The tracker is NOT frozen (`model_config = ConfigDict(str_strip_whitespace=True)`), and `transitions` is `list[LifecycleTransition]`. Any code holding a tracker reference can mutate the list in-place: `tracker.transitions.append(fake_transition)`. Since trackers are stored in a shared global registry, in-place mutation from one thread is visible to others without lock protection. The current code avoids this by always creating new lists via `[*tracker.transitions, transition]`, but this invariant is not enforced. **Recommendation**: Either freeze the model or use `tuple[LifecycleTransition, ...]` for `transitions`. --- ### BUG-5 (Minor) — `rebuild_container` does not forward `session_id` **Severity**: Minor **Category**: BUG **File**: `devcontainer.py:453-457` `rebuild_container` delegates to `activate_container` but never passes `session_id`. If a container is rebuilt from a different session, the session_id on the tracker won't update. The rebuilt container may be cleaned up by the old session's close hook rather than the new one's. --- ### SPEC-1 (Major) — `activation_state` enum names diverge from Issue #514 **Severity**: Major **Category**: SPEC **File**: `container_lifecycle.py:49-72`, `docs/specification.md` The spec (updated by this PR) uses `detected/building/running/stopping/stopped/failed`. Issue #514 acceptance criteria use `inactive/starting/active/stopping/stopped/error`. These are completely different vocabularies. Only `stopping` and `stopped` match. This should be reconciled — either update the issue or document the intentional rename. --- ### SPEC-2 (Minor) — Module path inconsistency: singular vs plural `resource` **Severity**: Minor **Category**: SPEC **File**: `docs/specification.md` The PR changes the devcontainer handler path to `cleveragents.resource.handlers.devcontainer` (singular). Every other handler in the spec uses `cleveragents.resources.handlers.*` (plural). Either the other handlers' paths are wrong or this one is. --- ### SPEC-3 (Minor) — Health checking and session-scoped cleanup absent from spec **Severity**: Minor **Category**: SPEC Issue #514 requires health checking (periodic liveness probes) and automatic cleanup on session close / plan completion. The spec has no mention of either for container resources (health checking only exists for LSP servers). These features should be added to the spec or flagged as implementation-ahead-of-spec. --- ### SEC-1 (Medium) — `workspace_folder` not sanitized beyond `os.path.isabs()` **Severity**: Medium **Category**: SEC **File**: `devcontainer.py:165-166` `activate_container` validates `workspace_folder` is absolute but does not canonicalize it. A path like `/tmp/../etc/devcontainer.json` passes validation. This path is passed directly to `subprocess.run(["devcontainer", "up", "--workspace-folder", workspace_folder])`. While `shell=True` is not used, the devcontainer CLI will operate on any directory the process can access. **Recommendation**: Canonicalize with `os.path.realpath()` and optionally restrict to known project directories. --- ### SEC-2 (Minor) — Health probe uses hardcoded fallback path **Severity**: Minor **Category**: SEC **File**: `devcontainer.py:532` `_single_probe` uses `tracker.workspace_path or "/workspace"`. If `workspace_path` is None (tracker in unexpected state), the probe runs against `/workspace` inside an arbitrary container, which could be a different container's workspace. --- ### PERF-1 (Minor) — `_handlers()` rebuilds dispatch dict on every ACP request **Severity**: Minor **Category**: PERF **File**: `acp/facade.py:194-208` `_route_operation` calls `self._handlers()` which builds a new dict on every dispatch. For a hot path (every ACP request), this should be a cached property or class-level constant. --- ### TEST-1 (Major) — Concurrency tests lack `threading.Barrier` and post-condition invariants **Severity**: Major **Category**: TEST **File**: `features/devcontainer_lifecycle.feature:468,558` Both concurrency scenarios ("Concurrent activation rejected" and "Concurrent stop handled safely") start threads sequentially without a `threading.Barrier`. The first thread may complete before the second starts, making these sequential tests in disguise. Compare with `tool_runtime_steps.py:229` in this codebase which properly uses `threading.Barrier`. The concurrent stop test also does not assert: (a) exactly 1 `docker stop` call was made (not 0 or 2), or (b) the final registry state is exactly 1 tracker in `stopped` state. --- ### TEST-2 (Medium) — 5 sham/structural-only test scenarios **Severity**: Medium **Category**: TEST - "CleanupService has stop_active_devcontainers method" (line 137) — `hasattr` check only - "DevcontainerHandler has expected class attributes" (line 382) — static attribute check - "DevcontainerHandler extends BaseResourceHandler" (line 386) — `issubclass` check - "CLI stop with --yes skips confirmation" (line 511) — patches out production code, asserts exit code 0 - "CLI rebuild with --yes skips confirmation" (line 516) — same These test structural properties, not behavior. The first three would fail at import time if violated. --- ### TEST-3 (Medium) — Coverage gaps: `list_active_containers_for_session` untested directly **Severity**: Medium **Category**: TEST **File**: `devcontainer.py:619-643` `list_active_containers_for_session` has no direct test scenario. Its return value is never asserted. The empty-session-id fallback and `session_id is None` matching logic are not explicitly verified. Also untested: `evict_terminal_trackers` when below threshold (should return 0), `get_lifecycle_tracker` auto-creation behavior, `clear_lifecycle_registry` thread-join behavior. --- ### TEST-4 (Minor) — Mock does not validate subprocess kwargs **Severity**: Minor **Category**: TEST **File**: `features/mocks/mock_devcontainer_cli.py:65` `MockDevcontainerRunner.__call__` records but never validates kwargs (`capture_output`, `text`, `timeout`, `check`). A bug where `check=True` is set (causing `CalledProcessError` in production) would pass all tests. The mock also returns the same container ID for all workspaces, preventing detection of workspace-mismatch bugs. --- ### CODE-1 (Minor) — `vulture_whitelist.py` whitelists private symbols **Severity**: Minor **Category**: CODE **File**: `vulture_whitelist.py:538-545` The whitelist includes private symbols (`_stop_health_check`, `_health_check_loop`, `_single_probe`, `_lifecycle_registry`, `_registry_lock`, etc.). Private symbols should not need vulture whitelisting — if vulture flags them as unused, they may be dead code or test-only utilities that should be restructured.

CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from 196e1415c0

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / build (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 20s

Details

CI / security (pull_request) Successful in 32s

Details

CI / typecheck (pull_request) Successful in 35s

Details

CI / unit_tests (pull_request) Successful in 2m20s

Details

CI / integration_tests (pull_request) Successful in 2m52s

Details

CI / docker (pull_request) Successful in 38s

Details

CI / coverage (pull_request) Successful in 4m7s

Details

CI / benchmark-regression (pull_request) Successful in 25m48s

Details

to ddcc22608e

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 21s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / build (pull_request) Successful in 17s

Details

CI / security (pull_request) Successful in 36s

Details

CI / typecheck (pull_request) Successful in 1m8s

Details

CI / unit_tests (pull_request) Successful in 3m31s

Details

CI / docker (pull_request) Successful in 41s

Details

CI / integration_tests (pull_request) Successful in 4m20s

Details

CI / coverage (pull_request) Successful in 4m23s

Details

CI / benchmark-regression (pull_request) Successful in 25m47s

Details

2026-03-06 15:42:08 +00:00

Compare

hamza.khyari approved these changes

2026-03-06 17:45:52 +00:00

Dismissed

hamza.khyari requested changes

2026-03-06 17:46:35 +00:00

hamza.khyari left a comment

update

brent.edwards requested changes

2026-03-06 19:39:22 +00:00

Dismissed

brent.edwards left a comment

Review — PR #613 `feature/m6plus-devcontainer-lifecycle`

Reviewer: brent.edwards | Round: 2 (complementing hamza.khyari's Round 1 review #2010) | Commit: 870ddb31

Quality Gates

Gate	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)

Note on Branch Freshness

This branch is forked from a60cda1f and is ~140 files behind current master. The git diff origin/master..HEAD shows those files as "deletions" but they are NOT part of the commit — a merge would correctly retain master's additions. However, I recommend rebasing before merge to avoid surprises and ensure quality gates run against the latest codebase.

Findings Summary

ID	Sev	Category	File	Description
F1	P1	Bug	`devcontainer.py:121-215`	TOCTOU race in `activate_container` — concurrent calls read+transition without holding the lock across the full operation, spawning duplicate containers
F2	P1	Bug	`devcontainer.py:345-350`	`_stop_health_check` sets the stop event but never joins the thread — health check loop can race with stop/activate transitions
F3	P1	Process	N/A	Missing CHANGELOG entry for #514 — CONTRIBUTING requires every PR to prepend to `## Unreleased`
F4	P1	Spec	`cleanup_service.py:510-527`	Cleanup wiring incomplete — `stop_active_devcontainers` method exists but is not wired into session close or plan completion hooks. PR description claims: "AcpLocalFacade._handle_session_close() → session-scoped container cleanup" and "PlanLifecycleService.complete_apply()/fail_apply()/..." but these hooks don't exist in the code.
F5	P2	Bug	`devcontainer.py:58-59, 330-350`	`_health_check_threads` and `_health_check_stop_events` are plain dicts accessed from multiple threads (health check background threads + main thread) without `_registry_lock` or a separate lock
F6	P2	Process	PR description	PR description is massively stale — describes states `detected/building/running/failed` but code uses `inactive/starting/active/error`; claims 794-line handler (actual: 489), 668-line feature file with 93 scenarios (actual: 290 lines). Misleads reviewers.
F7	P2	Policy	`devcontainer_lifecycle_steps.py`	File is 739 lines — exceeds 500-line limit per CONTRIBUTING
F8	P2	Policy	`cleanup_service.py`	File pushed to 543 lines — exceeds 500-line limit
F9	P2	Policy	`devcontainer_lifecycle_steps.py`	9 `# type: ignore` inline suppressions — CONTRIBUTING prohibits these
F10	P2	Code	`devcontainer.py:284-311`	`rebuild_container` is a pass-through to `activate_container` with no actual rebuild logic — does not pass `--reset-container` or `--no-cache` to `devcontainer up`, so it's functionally identical to activate
F11	P2	Policy	`cleanup_service.py:523`	Inline import inside function body — CONTRIBUTING requires all imports at file top (only `if TYPE_CHECKING:` blocks are exempt)

Verdict

REQUEST_CHANGES — 4 P1 findings. The TOCTOU race (F1) and thread join omission (F2) are correctness bugs that will manifest under concurrent usage. The missing CHANGELOG (F3) and unwired cleanup hooks (F4) are process/spec gaps. The stale PR description (F6) needs updating to match the actual implementation.

What Went Well

Clean domain model with well-defined state machine and transition validation
Good use of threading.Event for health check cancellation
Thorough BDD scenarios covering error paths and edge cases
Lint and typecheck pass cleanly
Commit message follows Conventional Changelog format with issue reference

## Review — PR #613 `feature/m6plus-devcontainer-lifecycle` **Reviewer**: brent.edwards | **Round**: 2 (complementing hamza.khyari's Round 1 review #2010) | **Commit**: `870ddb31` ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | ### Note on Branch Freshness This branch is forked from `a60cda1f` and is ~140 files behind current master. The `git diff origin/master..HEAD` shows those files as "deletions" but they are NOT part of the commit — a merge would correctly retain master's additions. However, I recommend rebasing before merge to avoid surprises and ensure quality gates run against the latest codebase. ### Findings Summary | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F1 | P1 | Bug | `devcontainer.py:121-215` | TOCTOU race in `activate_container` — concurrent calls read+transition without holding the lock across the full operation, spawning duplicate containers | | F2 | P1 | Bug | `devcontainer.py:345-350` | `_stop_health_check` sets the stop event but never joins the thread — health check loop can race with stop/activate transitions | | F3 | P1 | Process | N/A | Missing CHANGELOG entry for #514 — CONTRIBUTING requires every PR to prepend to `## Unreleased` | | F4 | P1 | Spec | `cleanup_service.py:510-527` | Cleanup wiring incomplete — `stop_active_devcontainers` method exists but is not wired into session close or plan completion hooks. PR description claims: "AcpLocalFacade._handle_session_close() → session-scoped container cleanup" and "PlanLifecycleService.complete_apply()/fail_apply()/..." but these hooks don't exist in the code. | | F5 | P2 | Bug | `devcontainer.py:58-59, 330-350` | `_health_check_threads` and `_health_check_stop_events` are plain dicts accessed from multiple threads (health check background threads + main thread) without `_registry_lock` or a separate lock | | F6 | P2 | Process | PR description | PR description is massively stale — describes states `detected/building/running/failed` but code uses `inactive/starting/active/error`; claims 794-line handler (actual: 489), 668-line feature file with 93 scenarios (actual: 290 lines). Misleads reviewers. | | F7 | P2 | Policy | `devcontainer_lifecycle_steps.py` | File is 739 lines — exceeds 500-line limit per CONTRIBUTING | | F8 | P2 | Policy | `cleanup_service.py` | File pushed to 543 lines — exceeds 500-line limit | | F9 | P2 | Policy | `devcontainer_lifecycle_steps.py` | 9 `# type: ignore` inline suppressions — CONTRIBUTING prohibits these | | F10 | P2 | Code | `devcontainer.py:284-311` | `rebuild_container` is a pass-through to `activate_container` with no actual rebuild logic — does not pass `--reset-container` or `--no-cache` to `devcontainer up`, so it's functionally identical to activate | | F11 | P2 | Policy | `cleanup_service.py:523` | Inline import inside function body — CONTRIBUTING requires all imports at file top (only `if TYPE_CHECKING:` blocks are exempt) | ### Verdict **REQUEST_CHANGES** — 4 P1 findings. The TOCTOU race (F1) and thread join omission (F2) are correctness bugs that will manifest under concurrent usage. The missing CHANGELOG (F3) and unwired cleanup hooks (F4) are process/spec gaps. The stale PR description (F6) needs updating to match the actual implementation. ### What Went Well - Clean domain model with well-defined state machine and transition validation - Good use of `threading.Event` for health check cancellation - Thorough BDD scenarios covering error paths and edge cases - Lint and typecheck pass cleanly - Commit message follows Conventional Changelog format with issue reference

features/steps/devcontainer_lifecycle_steps.py Outdated

					
				@ -0,0 +1,1720 @@

				"""Step definitions for devcontainer_lifecycle.feature.

F7 [P2 · Policy] — This file is 739 lines, exceeding the 500-line limit per CONTRIBUTING. Consider splitting into separate step files (e.g., devcontainer_activation_steps.py, devcontainer_health_check_steps.py, devcontainer_cleanup_steps.py).

F9 [P2 · Policy] — 9 # type: ignore inline suppressions at lines 336, 492, 501, 511, 570, 584, 598, 631, 731. CONTRIBUTING prohibits these. Most are passing wrong types to test error paths — use explicit type-correct wrappers or cast() to avoid the suppressions.

**F7 [P2 · Policy]** — This file is 739 lines, exceeding the 500-line limit per CONTRIBUTING. Consider splitting into separate step files (e.g., `devcontainer_activation_steps.py`, `devcontainer_health_check_steps.py`, `devcontainer_cleanup_steps.py`). **F9 [P2 · Policy]** — 9 `# type: ignore` inline suppressions at lines 336, 492, 501, 511, 570, 584, 598, 631, 731. CONTRIBUTING prohibits these. Most are passing wrong types to test error paths — use explicit type-correct wrappers or `cast()` to avoid the suppressions.

src/cleveragents/application/services/cleanup_service.py Outdated

					
				@ -507,0 +507,4 @@

				    # ── Devcontainer lifecycle cleanup (issue #514) ─────────────

				    @staticmethod

				    def stop_active_devcontainers(session_id: str = "") -> list[str]:

F4 [P1 · Spec] — stop_active_devcontainers is defined here but not wired into any session close or plan completion hook. The PR description claims wiring into AcpLocalFacade._handle_session_close() and PlanLifecycleService.complete_apply()/fail_apply()/fail_execute()/cancel_plan(), but none of these hooks exist in the code. This method is dead code in the current implementation.

F11 [P2 · Policy] — Line 523: inline from import inside a function body. CONTRIBUTING requires all imports at file top except if TYPE_CHECKING: blocks. Move this import to the top of the file.

**F4 [P1 · Spec]** — `stop_active_devcontainers` is defined here but not wired into any session close or plan completion hook. The PR description claims wiring into `AcpLocalFacade._handle_session_close()` and `PlanLifecycleService.complete_apply()/fail_apply()/fail_execute()/cancel_plan()`, but none of these hooks exist in the code. This method is dead code in the current implementation. **F11 [P2 · Policy]** — Line 523: inline `from` import inside a function body. CONTRIBUTING requires all imports at file top except `if TYPE_CHECKING:` blocks. Move this import to the top of the file.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +55,4 @@

				#: S1 fix: regex pattern for valid Docker container IDs (12-64 hex chars).

				_CONTAINER_ID_PATTERN = re.compile(r"^[a-f0-9]{12,64}$")

				#: Registry of lifecycle trackers keyed by resource_id.

F5 [P2 · Bug] — _health_check_threads and _health_check_stop_events are plain dicts mutated by start_health_check (main thread), _stop_health_check (main thread or stop_all_active_containers), and clear_lifecycle_registry. These mutations happen without holding _registry_lock or any other lock. If a health check thread is being started while another is being stopped for the same resource, dict mutations can race.

Fix: either protect these dicts with _registry_lock or introduce a dedicated _health_check_lock.

**F5 [P2 · Bug]** — `_health_check_threads` and `_health_check_stop_events` are plain dicts mutated by `start_health_check` (main thread), `_stop_health_check` (main thread or stop_all_active_containers), and `clear_lifecycle_registry`. These mutations happen without holding `_registry_lock` or any other lock. If a health check thread is being started while another is being stopped for the same resource, dict mutations can race. Fix: either protect these dicts with `_registry_lock` or introduce a dedicated `_health_check_lock`.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +146,4 @@

				        resource_id: ULID of the devcontainer resource.

				        workspace_folder: Path to the workspace containing devcontainer config.

				        run_command: Optional callable replacing subprocess.run for testing.

				        session_id: Optional session/plan ID to associate with this container

F1 [P1 · Bug] — TOCTOU race in activate_container. The function calls get_lifecycle_tracker() at line 149 (which acquires/releases _registry_lock), then transition_state() + set_lifecycle_tracker() at lines 152-157 (separate lock acquisition), then spawns devcontainer up without any lock held. Two concurrent calls for the same resource_id will both see state=inactive, both transition to STARTING, and both spawn devcontainer up.

Fix: either (a) hold _registry_lock across the read-check-transition-spawn sequence (simplest but blocks other resources), or (b) use per-resource locks so only one activation per resource can proceed at a time.

**F1 [P1 · Bug]** — TOCTOU race in `activate_container`. The function calls `get_lifecycle_tracker()` at line 149 (which acquires/releases `_registry_lock`), then `transition_state()` + `set_lifecycle_tracker()` at lines 152-157 (separate lock acquisition), then spawns `devcontainer up` without any lock held. Two concurrent calls for the same `resource_id` will both see state=inactive, both transition to STARTING, and both spawn `devcontainer up`. Fix: either (a) hold `_registry_lock` across the read-check-transition-spawn sequence (simplest but blocks other resources), or (b) use per-resource locks so only one activation per resource can proceed at a time.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +304,4 @@

				            tracker = get_lifecycle_tracker(resource_id)

				            tracker = transition_state(

				                tracker,

				                ContainerLifecycleState.FAILED,

F10 [P2 · Code] — rebuild_container simply delegates to activate_container with no rebuild-specific behavior. A real rebuild should pass --reset-container or --no-cache to devcontainer up to force image/container recreation. As implemented, this is indistinguishable from a re-activate after stop, which undermines the CLI distinction between resource stop && resource rebuild vs. just re-activating.

**F10 [P2 · Code]** — `rebuild_container` simply delegates to `activate_container` with no rebuild-specific behavior. A real rebuild should pass `--reset-container` or `--no-cache` to `devcontainer up` to force image/container recreation. As implemented, this is indistinguishable from a re-activate after stop, which undermines the CLI distinction between `resource stop && resource rebuild` vs. just re-activating.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +342,4 @@

				    Raises:

				        ValueError: If the container is not in ``running`` state.

				        RuntimeError: If stop fails.

F2 [P1 · Bug] — _stop_health_check sets the event and removes the thread from the dict, but never calls thread.join(). The health check loop at line 353 may still be executing — it could call set_lifecycle_tracker() after the stop transition completes, writing stale state back to the registry.

Fix: call thread.join(timeout=...) after setting the stop event, before proceeding with the stop transition.

**F2 [P1 · Bug]** — `_stop_health_check` sets the event and removes the thread from the dict, but never calls `thread.join()`. The health check loop at line 353 may still be executing — it could call `set_lifecycle_tracker()` after the stop transition completes, writing stale state back to the registry. Fix: call `thread.join(timeout=...)` after setting the stop event, before proceeding with the stop transition.

brent.edwards commented

2026-03-06 20:21:55 +00:00

Supplementary Findings — PR #613 (Second Pass)

Following review #2019, a thorough second pass uncovered two additional P2 findings:

F12 [P2 · Bug] — `stop_container` ignores `docker stop` exit code

stop_container (lines 253-270) calls runner(["docker", "stop", container_id], ..., check=False) but discards the return value without checking result.returncode. If docker stop fails (e.g., container already removed, Docker daemon unresponsive with a non-exception timeout), the function still transitions the tracker to STOPPED.

Consequence: the container may still be running while the tracker says stopped. A subsequent activate_container / rebuild_container would start a second container, leaking the first one.

Fix: capture the result and check returncode. If non-zero, transition to ERROR instead of STOPPED, or at least log a warning.

File: src/cleveragents/resource/handlers/devcontainer.py:257

F13 [P2 · Docs] — `devcontainer_resources.md` claims automatic session/plan cleanup

docs/reference/devcontainer_resources.md lines 152-160 state:

"When a session closes or a plan completes, all active devcontainer-instances associated with that session are stopped automatically."

This isn't true in the implementation. CleanupService.stop_active_devcontainers() exists as a method but is not wired into any session-close or plan-completion hook (as noted in F4). The documentation should either be marked as "planned for future" or removed until the wiring is implemented.

File: docs/reference/devcontainer_resources.md:152-160

Updated totals: 4 P1 + 9 P2 = 13 findings (F1-F13).

## Supplementary Findings — PR #613 (Second Pass) Following review #2019, a thorough second pass uncovered two additional P2 findings: ### F12 [P2 · Bug] — `stop_container` ignores `docker stop` exit code `stop_container` (lines 253-270) calls `runner(["docker", "stop", container_id], ..., check=False)` but **discards the return value** without checking `result.returncode`. If `docker stop` fails (e.g., container already removed, Docker daemon unresponsive with a non-exception timeout), the function still transitions the tracker to `STOPPED`. Consequence: the container may still be running while the tracker says `stopped`. A subsequent `activate_container` / `rebuild_container` would start a second container, leaking the first one. **Fix**: capture the result and check `returncode`. If non-zero, transition to `ERROR` instead of `STOPPED`, or at least log a warning. **File**: `src/cleveragents/resource/handlers/devcontainer.py:257` ### F13 [P2 · Docs] — `devcontainer_resources.md` claims automatic session/plan cleanup `docs/reference/devcontainer_resources.md` lines 152-160 state: > "When a session closes or a plan completes, all active devcontainer-instances associated with that session are stopped automatically." This isn't true in the implementation. `CleanupService.stop_active_devcontainers()` exists as a method but is not wired into any session-close or plan-completion hook (as noted in F4). The documentation should either be marked as "planned for future" or removed until the wiring is implemented. **File**: `docs/reference/devcontainer_resources.md:152-160` --- **Updated totals**: 4 P1 + 9 P2 = 13 findings (F1-F13).

brent.edwards commented

2026-03-06 20:28:24 +00:00

Supplementary Findings — PR #613 (Third Pass)

After reading all posted reviews and re-checking the code, I found two additional P1 bugs that are not covered in prior reviews:

F14 [P1 · Bug] — `devcontainer up` output parsing fails with real CLI output

activate_container() assumes stdout is pure JSON, but the real devcontainer CLI prints log lines plus a JSON line at the end. The official CLI README example shows log output followed by a JSON object on the last line. Current parser does:

# devcontainer.py:467
json.loads(stdout)

If stdout contains any non-JSON log lines before the JSON block, json.loads fails and returns (None, None). The tracker is still transitioned to ACTIVE with container_id=None, and stop/health check logic breaks.

Fix: parse the last JSON line (scan from the end) or add --output json and parse only JSON. This is explicitly called out in the PR description but not implemented.

File: src/cleveragents/resource/handlers/devcontainer.py:467

F15 [P1 · Bug] — Health checks pass remote workspace path to `devcontainer exec`

_parse_devcontainer_up_output() stores remoteWorkspaceFolder into tracker.workspace_path. That value is inside the container (e.g., /workspaces/project). But the CLI expects host path for --workspace-folder:

# README example
 devcontainer exec --workspace-folder <host-path>

In _health_check_loop, the command is:

# devcontainer.py:372-374
"devcontainer", "exec", "--workspace-folder", tracker.workspace_path or "/workspace"

If workspace_path is the remote container path (or the fallback /workspace), devcontainer exec will not find the devcontainer config on the host and will fail. This causes health checks to spuriously mark containers ERROR in normal usage.

Fix: store the host workspace folder (the original workspace_folder argument) on the tracker and use that for devcontainer exec. Keep the remote path separately if needed.

File: src/cleveragents/resource/handlers/devcontainer.py:372-374

Updated totals: 6 P1 + 9 P2 = 15 findings (F1-F15).

## Supplementary Findings — PR #613 (Third Pass) After reading all posted reviews and re-checking the code, I found two additional **P1** bugs that are not covered in prior reviews: ### F14 [P1 · Bug] — `devcontainer up` output parsing fails with real CLI output `activate_container()` assumes `stdout` is **pure JSON**, but the real devcontainer CLI prints log lines **plus a JSON line at the end**. The official CLI README example shows log output followed by a JSON object on the last line. Current parser does: ```python # devcontainer.py:467 json.loads(stdout) ``` If stdout contains any non-JSON log lines before the JSON block, `json.loads` fails and returns `(None, None)`. The tracker is still transitioned to `ACTIVE` with `container_id=None`, and stop/health check logic breaks. **Fix**: parse the **last JSON line** (scan from the end) or add `--output json` and parse only JSON. This is explicitly called out in the PR description but not implemented. **File**: `src/cleveragents/resource/handlers/devcontainer.py:467` ### F15 [P1 · Bug] — Health checks pass **remote** workspace path to `devcontainer exec` `_parse_devcontainer_up_output()` stores `remoteWorkspaceFolder` into `tracker.workspace_path`. That value is **inside the container** (e.g., `/workspaces/project`). But the CLI expects **host** path for `--workspace-folder`: ``` # README example devcontainer exec --workspace-folder <host-path> ``` In `_health_check_loop`, the command is: ```python # devcontainer.py:372-374 "devcontainer", "exec", "--workspace-folder", tracker.workspace_path or "/workspace" ``` If `workspace_path` is the remote container path (or the fallback `/workspace`), `devcontainer exec` will not find the devcontainer config on the host and will fail. This causes health checks to spuriously mark containers `ERROR` in normal usage. **Fix**: store the **host workspace folder** (the original `workspace_folder` argument) on the tracker and use that for `devcontainer exec`. Keep the remote path separately if needed. **File**: `src/cleveragents/resource/handlers/devcontainer.py:372-374` --- **Updated totals**: 6 P1 + 9 P2 = 15 findings (F1-F15).

brent.edwards reviewed

2026-03-06 20:31:32 +00:00

brent.edwards left a comment

Inline Notes — F14/F15

Adding inline comments for the two new P1 findings (F14, F15) described in comment #55338.

## Inline Notes — F14/F15 Adding inline comments for the two new P1 findings (F14, F15) described in comment #55338.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +371,4 @@

				                tracker.current_state.value,

				            )

				            return tracker

				        tracker = transition_state(

F15 [P1 · Bug] — Health checks call devcontainer exec --workspace-folder using tracker.workspace_path, which is set to remoteWorkspaceFolder (container path) from devcontainer up. The CLI expects the host workspace folder path, so this fails in normal usage and marks containers ERROR.

Fix: store the original host workspace_folder on the tracker and use that for devcontainer exec (keep remote path separately if needed).

**F15 [P1 · Bug]** — Health checks call `devcontainer exec --workspace-folder` using `tracker.workspace_path`, which is set to `remoteWorkspaceFolder` (container path) from `devcontainer up`. The CLI expects the **host** workspace folder path, so this fails in normal usage and marks containers ERROR. Fix: store the original host `workspace_folder` on the tracker and use that for `devcontainer exec` (keep remote path separately if needed).

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +464,4 @@

				    # associated with the calling session for scoped cleanup.

				    return activate_container(

				        resource_id,

				        workspace_folder,

F14 [P1 · Bug] — json.loads(stdout) assumes stdout is pure JSON. The devcontainer CLI prints log lines plus a trailing JSON line (see official CLI README). With real output, this parse fails and yields (None, None), but the tracker still transitions to ACTIVE.

Fix: parse the last JSON line (scan from end) or run devcontainer up --output json and parse only JSON.

**F14 [P1 · Bug]** — `json.loads(stdout)` assumes stdout is pure JSON. The devcontainer CLI prints log lines plus a trailing JSON line (see official CLI README). With real output, this parse fails and yields `(None, None)`, but the tracker still transitions to ACTIVE. Fix: parse the **last JSON line** (scan from end) or run `devcontainer up --output json` and parse only JSON.

brent.edwards reviewed

2026-03-06 20:39:58 +00:00

brent.edwards left a comment

Additional Findings — PR #613 (Follow-up)

Found three additional P1 issues after re-scanning the full diff. Inline notes below.

ID	Sev	Category	Summary
F16	P1	Bug	Lazy activation is not wired into `DevcontainerHandler.resolve` (no call to `activate_container`). Feature is defined but unused in production code.
F17	P1	Bug	Health checks are never started — `start_health_check` is defined but never invoked after activation.
F18	P1	Bug	Session cleanup is not session-scoped — `stop_all_active_containers` ignores `session_id` (used only for logging) and stops all active containers.

## Additional Findings — PR #613 (Follow-up) Found three additional **P1** issues after re-scanning the full diff. Inline notes below. | ID | Sev | Category | Summary | |:---|:----|:---------|:--------| | F16 | P1 | Bug | Lazy activation is not wired into `DevcontainerHandler.resolve` (no call to `activate_container`). Feature is defined but unused in production code. | | F17 | P1 | Bug | Health checks are never started — `start_health_check` is defined but never invoked after activation. | | F18 | P1 | Bug | Session cleanup is not session-scoped — `stop_all_active_containers` ignores `session_id` (used only for logging) and stops **all** active containers. |

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +194,4 @@

				                "up",

				                "--workspace-folder",

				                workspace_folder,

				            ],

F17 [P1 · Bug] — Health checks never start. After a successful activation, the code transitions to ACTIVE but does not call start_health_check(). A grep shows start_health_check is only defined, never invoked. This means the required periodic liveness probes never run.

Fix: call start_health_check(resource_id, interval=tracker.health_check_interval) immediately after transitioning to ACTIVE (and/or from the tool invocation path).

**F17 [P1 · Bug]** — Health checks never start. After a successful activation, the code transitions to `ACTIVE` but does not call `start_health_check()`. A grep shows `start_health_check` is only defined, never invoked. This means the required periodic liveness probes never run. Fix: call `start_health_check(resource_id, interval=tracker.health_check_interval)` immediately after transitioning to ACTIVE (and/or from the tool invocation path).

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +430,4 @@

				            )

				            set_lifecycle_tracker(tracker)

				        raise RuntimeError(f"Failed to stop container: {exc}") from exc

F18 [P1 · Bug] — Session cleanup is not scoped. stop_all_active_containers() accepts session_id but uses it only for logging; it always calls list_active_containers() (global) and stops all active containers. Issue #514 expects cleanup of containers associated with that session.

Fix: track session_id on ContainerLifecycleTracker (or another mapping) and filter to session-scoped containers here.

**F18 [P1 · Bug]** — Session cleanup is not scoped. `stop_all_active_containers()` accepts `session_id` but uses it only for logging; it always calls `list_active_containers()` (global) and stops **all** active containers. Issue #514 expects cleanup of containers *associated with that session*. Fix: track `session_id` on `ContainerLifecycleTracker` (or another mapping) and filter to session-scoped containers here.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +473,4 @@

				def start_health_check(

				    resource_id: str,

				    *,

				    interval: float = DEFAULT_HEALTH_CHECK_INTERVAL,

F16 [P1 · Bug] — Lazy activation is not wired. DevcontainerHandler doesn’t override resolve() or call activate_container, and activate_container() isn’t referenced anywhere else in src/. This means devcontainer resources are resolved via BaseResourceHandler.resolve() with no lazy activation on first tool target — the primary behavior in issue #514 is effectively unused in production code.

Fix: override DevcontainerHandler.resolve() (or the tool execution path) to call activate_container() when the resource is devcontainer-instance and in inactive/stopped/error state.

**F16 [P1 · Bug]** — Lazy activation is not wired. `DevcontainerHandler` doesn’t override `resolve()` or call `activate_container`, and `activate_container()` isn’t referenced anywhere else in `src/`. This means devcontainer resources are resolved via `BaseResourceHandler.resolve()` with no lazy activation on first tool target — the primary behavior in issue #514 is effectively unused in production code. Fix: override `DevcontainerHandler.resolve()` (or the tool execution path) to call `activate_container()` when the resource is `devcontainer-instance` and in `inactive/stopped/error` state.

brent.edwards reviewed

2026-03-06 20:49:30 +00:00

brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

One more P2 issue found on another pass. Inline note below.

## Additional Finding — PR #613 (Follow-up) One more P2 issue found on another pass. Inline note below.

src/cleveragents/cli/commands/resource.py Outdated

					
				@ -1061,0 +1151,4 @@

				        raise typer.Abort() from exc

				    except (ValueError, RuntimeError) as exc:

				        console.print(f"[red]Stop failed:[/red] {exc}")

				        raise typer.Abort() from exc

F19 [P2 · Bug] — These commands advertise support for container-instance, but resource add container-instance typically sets no location (only --image). In this path, resource_rebuild aborts with "no location for rebuild" and resource_stop fails because no lifecycle tracker exists (devcontainer activation isn’t wired). This effectively makes stop/rebuild non-functional for container-instance resources.

Fix: either restrict to devcontainer-instance only, or require --path for container-instance and implement activation/lifecycle for that type.

**F19 [P2 · Bug]** — These commands advertise support for `container-instance`, but `resource add container-instance` typically sets no location (only `--image`). In this path, `resource_rebuild` aborts with "no location for rebuild" and `resource_stop` fails because no lifecycle tracker exists (devcontainer activation isn’t wired). This effectively makes stop/rebuild non-functional for container-instance resources. Fix: either restrict to `devcontainer-instance` only, or require `--path` for container-instance and implement activation/lifecycle for that type.

brent.edwards reviewed

2026-03-06 20:55:58 +00:00

brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P1 issue on another scan. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P1** issue on another scan. Inline note below.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +50,4 @@

				logger = logging.getLogger(__name__)

				#: Default health check interval in seconds.

				DEFAULT_HEALTH_CHECK_INTERVAL: float = 30.0

F20 [P1 · Bug] — Lifecycle state is stored only in an in-memory module dict (_lifecycle_registry). This means state is process-local: a new agents CLI invocation cannot see trackers created in a previous process. As a result, agents resource stop/rebuild cannot stop containers started by earlier commands (it will create a new INACTIVE tracker and fail the transition). This also violates the acceptance criterion: “Track container lifecycle state transitions in the resource registry.”

Fix: persist lifecycle state in the Resource registry (DB) or another shared store (e.g., resource.metadata_json) so state survives process boundaries. At minimum, store container_id, current_state, and workspace_folder per resource and hydrate the tracker on CLI operations.

**F20 [P1 · Bug]** — Lifecycle state is stored only in an in-memory module dict (`_lifecycle_registry`). This means state is **process-local**: a new `agents` CLI invocation cannot see trackers created in a previous process. As a result, `agents resource stop/rebuild` cannot stop containers started by earlier commands (it will create a new INACTIVE tracker and fail the transition). This also violates the acceptance criterion: *“Track container lifecycle state transitions in the resource registry.”* Fix: persist lifecycle state in the Resource registry (DB) or another shared store (e.g., `resource.metadata_json`) so state survives process boundaries. At minimum, store `container_id`, `current_state`, and `workspace_folder` per resource and hydrate the tracker on CLI operations.

brent.edwards reviewed

2026-03-06 21:03:07 +00:00

brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P1 compatibility bug. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P1** compatibility bug. Inline note below.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +254,4 @@

				        start_health_check(resource_id, run_command=run_command)

				    except RuntimeError:

				        raise

F21 [P1 · Bug] — stop_container hardcodes docker stop. Devcontainer CLI can target Podman or other engines; this will fail in non-Docker environments (and conflicts with ADR-039 engine-agnostic container types).

Fix: use the devcontainer CLI (if/when stop is supported), or capture the engine from resource properties and dispatch to podman stop / docker stop accordingly.

**F21 [P1 · Bug]** — `stop_container` hardcodes `docker stop`. Devcontainer CLI can target Podman or other engines; this will fail in non-Docker environments (and conflicts with ADR-039 engine-agnostic container types). Fix: use the devcontainer CLI (if/when stop is supported), or capture the engine from resource properties and dispatch to `podman stop` / `docker stop` accordingly.

brent.edwards commented

2026-03-06 21:17:40 +00:00

Additional Findings — PR #613 (Continued Scan)

Two more issues found on another pass:

F22 [P1 · Bug] — Devcontainer resources cannot resolve: snapshot sandbox not implemented

DevcontainerHandler uses SandboxStrategy.SNAPSHOT (see src/cleveragents/resource/handlers/devcontainer.py:488), but SandboxFactory.create_sandbox() explicitly raises NotImplementedError for snapshot (see src/cleveragents/infrastructure/sandbox/factory.py:66,119).

Result: any attempt to resolve a devcontainer-instance (or container-instance using the same handler) will fail at runtime with NotImplementedError. This blocks actual use of the resource type even if lifecycle bugs are fixed.

Fix: implement snapshot strategy in SandboxFactory or route devcontainer resources to a supported sandbox strategy.

F23 [P2 · Process/Docs] — Auto-discovery is documented but not wired

docs/reference/devcontainer_resources.md:60-80 says linking a git-checkout/fs-directory triggers auto-discovery and registers devcontainer-instance + devcontainer-file children. I can’t find any code path that invokes discover_devcontainers() or auto_discover_children() during project link-resource. link_resource simply creates the DB link and returns. Also, the built-in type’s auto_discovery config doesn’t match the schema expected by auto_discover_children (it expects enabled + rules).

Either the docs need to be updated to mark this as future work, or the auto-discovery hook needs to be implemented.

If you want, I can add inline notes for these as well.

## Additional Findings — PR #613 (Continued Scan) Two more issues found on another pass: ### F22 [P1 · Bug] — Devcontainer resources cannot resolve: snapshot sandbox not implemented `DevcontainerHandler` uses `SandboxStrategy.SNAPSHOT` (see `src/cleveragents/resource/handlers/devcontainer.py:488`), but `SandboxFactory.create_sandbox()` explicitly raises `NotImplementedError` for `snapshot` (see `src/cleveragents/infrastructure/sandbox/factory.py:66,119`). Result: any attempt to resolve a `devcontainer-instance` (or `container-instance` using the same handler) will fail at runtime with `NotImplementedError`. This blocks actual use of the resource type even if lifecycle bugs are fixed. **Fix**: implement snapshot strategy in `SandboxFactory` or route devcontainer resources to a supported sandbox strategy. ### F23 [P2 · Process/Docs] — Auto-discovery is documented but not wired `docs/reference/devcontainer_resources.md:60-80` says linking a `git-checkout`/`fs-directory` triggers auto-discovery and registers `devcontainer-instance` + `devcontainer-file` children. I can’t find any code path that invokes `discover_devcontainers()` or `auto_discover_children()` during `project link-resource`. `link_resource` simply creates the DB link and returns. Also, the built-in type’s `auto_discovery` config doesn’t match the schema expected by `auto_discover_children` (it expects `enabled` + `rules`). Either the docs need to be updated to mark this as future work, or the auto-discovery hook needs to be implemented. --- If you want, I can add inline notes for these as well.

brent.edwards commented

2026-03-06 21:30:58 +00:00

Additional Finding — PR #613 (Continued Scan)

F24 [P1 · Bug] — Container execution is stubbed, so lazy activation can’t trigger on tool use

Issue #514’s acceptance criteria say “Devcontainer instances remain inactive until a tool execution targets them. At that point, the container is started automatically.” But the tool execution path currently hard‑fails whenever the execution environment resolves to container.

In src/cleveragents/tool/runner.py:168-178:

if env == ExecutionEnvironment.CONTAINER:
    return ToolResult(
        success=False,
        error=("Container execution is not yet implemented. ..."),
        ...
    )

This means a tool invocation can never reach a devcontainer handler, so lazy activation can’t occur in production. Even if DevcontainerHandler.resolve() were wired to call activate_container, the ToolRunner returns early before any container-handling logic runs.

Fix: wire container execution (and call into devcontainer activation) in the ToolRunner path, or explicitly scope #514 to CLI-only lifecycle management and update the acceptance criteria/docs accordingly.

## Additional Finding — PR #613 (Continued Scan) ### F24 [P1 · Bug] — Container execution is stubbed, so lazy activation can’t trigger on tool use Issue #514’s acceptance criteria say *“Devcontainer instances remain inactive until a tool execution targets them. At that point, the container is started automatically.”* But the tool execution path currently hard‑fails whenever the execution environment resolves to `container`. In `src/cleveragents/tool/runner.py:168-178`: ```python if env == ExecutionEnvironment.CONTAINER: return ToolResult( success=False, error=("Container execution is not yet implemented. ..."), ... ) ``` This means a tool invocation can never reach a devcontainer handler, so lazy activation can’t occur in production. Even if `DevcontainerHandler.resolve()` were wired to call `activate_container`, the ToolRunner returns early before any container-handling logic runs. **Fix**: wire container execution (and call into devcontainer activation) in the ToolRunner path, or explicitly scope #514 to *CLI-only* lifecycle management and update the acceptance criteria/docs accordingly.

brent.edwards reviewed

2026-03-06 21:41:03 +00:00

brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P2 issue on another scan. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P2** issue on another scan. Inline note below.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +485,4 @@

				    """

				    if not resource_id:

				        raise ValueError("resource_id must not be empty")

F25 [P2 · Bug] — DevcontainerHandler hardcodes _default_strategy = SNAPSHOT, which overrides the resource type’s declared sandbox strategy. This means devcontainer-file resources (type default = copy_on_write) will still try to use snapshot because resource.sandbox_strategy is always None when loaded from DB.

Impact: devcontainer-file resolution will use the wrong strategy (and currently fails outright because snapshot isn’t implemented). Even after snapshot is implemented, devcontainer-file would still be sandboxed incorrectly.

Fix: choose strategy based on the resource type default (e.g., look up the ResourceTypeSpec and use its sandbox_strategy), or override resolve() in DevcontainerHandler to branch on resource.resource_type_name.

**F25 [P2 · Bug]** — `DevcontainerHandler` hardcodes `_default_strategy = SNAPSHOT`, which overrides the resource type’s declared sandbox strategy. This means `devcontainer-file` resources (type default = `copy_on_write`) will still try to use `snapshot` because `resource.sandbox_strategy` is always None when loaded from DB. Impact: devcontainer-file resolution will use the wrong strategy (and currently fails outright because snapshot isn’t implemented). Even after snapshot is implemented, devcontainer-file would still be sandboxed incorrectly. Fix: choose strategy based on the resource type default (e.g., look up the ResourceTypeSpec and use its `sandbox_strategy`), or override `resolve()` in `DevcontainerHandler` to branch on `resource.resource_type_name`.

brent.edwards reviewed

2026-03-06 21:55:32 +00:00

brent.edwards left a comment

Additional Finding — PR #613 (Follow-up)

Found one more P2 issue on another scan. Inline note below.

## Additional Finding — PR #613 (Follow-up) Found one more **P2** issue on another scan. Inline note below.

src/cleveragents/resource/handlers/devcontainer.py Outdated

					
				@ -28,0 +182,4 @@

				        tracker = transition_state(

				            tracker,

				            ContainerLifecycleState.BUILDING,

				            reason="lazy activation",

F26 [P2 · Bug] — activate_container does not validate that containerId and remoteWorkspaceFolder were actually parsed. If the JSON is missing these fields (or outcome is error), the tracker still transitions to ACTIVE with container_id=None and workspace_path=None, causing stop_container to skip docker stop and health checks to misbehave.

Fix: treat missing containerId/remoteWorkspaceFolder (or outcome != success) as activation failure, transition to ERROR, and raise RuntimeError.

**F26 [P2 · Bug]** — `activate_container` does not validate that `containerId` and `remoteWorkspaceFolder` were actually parsed. If the JSON is missing these fields (or outcome is `error`), the tracker still transitions to `ACTIVE` with `container_id=None` and `workspace_path=None`, causing `stop_container` to skip `docker stop` and health checks to misbehave. Fix: treat missing `containerId`/`remoteWorkspaceFolder` (or `outcome != success`) as activation failure, transition to ERROR, and raise RuntimeError.

freemo commented

2026-03-06 23:37:55 +00:00

PM Status Check — PR #613

Author: @CoreRasurae | Milestone: v3.6.0 (M7) | Mergeable: Yes | Reviews: REQUEST_CHANGES (Brent)

Current State

Brent has provided an exceptionally thorough review across multiple rounds:

Round 1 (Hamza, review #2010): 16 findings (5 BUG, 3 SPEC, 2 SEC, 1 PERF, 4 TEST, 1 CODE) — dismissed/stale
Round 2 (Brent, review #2019): 11 findings — 4 P1 (TOCTOU race F1, thread join omission F2, missing CHANGELOG F3, unwired cleanup hooks F4) + 7 P2
Follow-ups (reviews #2020-#2029): Additional F14-F20 findings including P1 bugs (lazy activation not wired F16, health checks never started F17, session cleanup not session-scoped F18)

Total: ~8 P1 findings and ~12 P2 findings across all rounds.

Concern

@CoreRasurae — There has been no author response to any of Brent's findings. The review was posted ~8 hours ago. This PR has substantial rework needed on the P1 items before it can be considered for merge.

Given this is M7 (v3.6.0, due Mar 28), there is no schedule pressure. However, leaving a thorough review unacknowledged for too long discourages reviewers. Please:

Acknowledge receipt — even a brief "reviewing findings, will address by [date]" is helpful
Prioritize P1 fixes — F1 (TOCTOU), F2 (thread join), F4 (unwired hooks), F16 (lazy activation not wired), F17 (health checks never started)
Update the PR description — Brent noted (F6) that the description is significantly stale vs actual implementation

PR #616 (container-aware tool exec) is blocked on this PR. Resolving the P1s here unblocks that dependency chain.

Action Items

Who	Action	Deadline
@CoreRasurae	Acknowledge review findings	Mar 7 EOD
@CoreRasurae	Push fixes for P1 items	Mar 10
@brent.edwards	Re-review after fixes	Mar 11

## PM Status Check — PR #613 **Author**: @CoreRasurae | **Milestone**: v3.6.0 (M7) | **Mergeable**: Yes | **Reviews**: REQUEST_CHANGES (Brent) ### Current State Brent has provided an exceptionally thorough review across multiple rounds: - **Round 1** (Hamza, review #2010): 16 findings (5 BUG, 3 SPEC, 2 SEC, 1 PERF, 4 TEST, 1 CODE) — dismissed/stale - **Round 2** (Brent, review #2019): 11 findings — **4 P1** (TOCTOU race F1, thread join omission F2, missing CHANGELOG F3, unwired cleanup hooks F4) + 7 P2 - **Follow-ups** (reviews #2020-#2029): Additional **F14-F20** findings including P1 bugs (lazy activation not wired F16, health checks never started F17, session cleanup not session-scoped F18) **Total: ~8 P1 findings and ~12 P2 findings across all rounds.** ### Concern @CoreRasurae — There has been no author response to any of Brent's findings. The review was posted ~8 hours ago. This PR has substantial rework needed on the P1 items before it can be considered for merge. Given this is M7 (v3.6.0, due Mar 28), there is no schedule pressure. However, leaving a thorough review unacknowledged for too long discourages reviewers. Please: 1. **Acknowledge receipt** — even a brief "reviewing findings, will address by [date]" is helpful 2. **Prioritize P1 fixes** — F1 (TOCTOU), F2 (thread join), F4 (unwired hooks), F16 (lazy activation not wired), F17 (health checks never started) 3. **Update the PR description** — Brent noted (F6) that the description is significantly stale vs actual implementation PR #616 (container-aware tool exec) is **blocked on this PR**. Resolving the P1s here unblocks that dependency chain. ### Action Items | Who | Action | Deadline | |:----|:-------|:---------| | @CoreRasurae | Acknowledge review findings | Mar 7 EOD | | @CoreRasurae | Push fixes for P1 items | Mar 10 | | @brent.edwards | Re-review after fixes | Mar 11 |

freemo referenced this pull request

2026-03-06 23:38:00 +00:00

feat(devcontainer): add container-aware tool execution and I/O forwarding #616

CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from ddcc22608e

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 21s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / build (pull_request) Successful in 17s

Details

CI / security (pull_request) Successful in 36s

Details

CI / typecheck (pull_request) Successful in 1m8s

Details

CI / unit_tests (pull_request) Successful in 3m31s

Details

CI / docker (pull_request) Successful in 41s

Details

CI / integration_tests (pull_request) Successful in 4m20s

Details

CI / coverage (pull_request) Successful in 4m23s

Details

CI / benchmark-regression (pull_request) Successful in 25m47s

Details

to bbeef00b1a

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 14s

Details

CI / lint (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 17s

Details

CI / typecheck (pull_request) Successful in 33s

Details

CI / security (pull_request) Successful in 39s

Details

CI / unit_tests (pull_request) Successful in 2m24s

Details

CI / integration_tests (pull_request) Failing after 2m55s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / coverage (pull_request) Successful in 4m9s

Details

CI / benchmark-regression (pull_request) Successful in 26m29s

Details

2026-03-07 22:23:34 +00:00

Compare

freemo reviewed

2026-03-07 23:13:17 +00:00

freemo left a comment

Review Status Summary — PR #613 (Devcontainer Lifecycle)

Branch naming: uses feature/ prefix. Correct for feature work per CONTRIBUTING.md.
Mergeable: no conflicts detected.
Review findings: 16 comments, 15+ findings noted across review cycles.
Author engagement: @CoreRasurae is actively responding to feedback.

Active review in progress. 15+ findings noted across review cycles. Author (@CoreRasurae) is responding to feedback. Continue addressing findings.

**Review Status Summary — PR #613 (Devcontainer Lifecycle)** - **Branch naming:** uses `feature/` prefix. Correct for feature work per CONTRIBUTING.md. - **Mergeable:** no conflicts detected. - **Review findings:** 16 comments, 15+ findings noted across review cycles. - **Author engagement:** @CoreRasurae is actively responding to feedback. Active review in progress. 15+ findings noted across review cycles. Author (@CoreRasurae) is responding to feedback. Continue addressing findings.

brent.edwards requested changes

2026-03-08 05:09:15 +00:00

Dismissed

brent.edwards left a comment

Review — PR #613 `feature/m6plus-devcontainer-lifecycle` (Round 3)

Reviewer: brent.edwards | Commit: bbeef00b | Issue: #514
Prior reviews: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups F12–F24 (Round 2)

Quality Gates

Gate	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)
`nox -s unit_tests`	PASS (8,208 scenarios, 0 failures)

Prior Findings Verification

Good progress — most of my Round 2 P1/P2 findings have been addressed:

Prior ID	Finding	Status
F1	TOCTOU race in `activate_container`	FIXED — lock held across get→transition→set
F2	`_stop_health_check` never joins thread	FIXED — `thread.join(timeout=2.0)` added
F3	Missing CHANGELOG entry	FIXED — entry added at line 53
F4	Cleanup hooks not wired	FIXED — session close + plan lifecycle hooks wired
F5	`_health_check_threads` unprotected	FIXED — operations on health check dicts are now under `_registry_lock`
F10	`rebuild_container` identical to activate	FIXED — passes `--reset-container` via `rebuild=True`
F11	Inline import in `cleanup_service.py`	FIXED — `stop_all_active_containers` imported at module top
F14	JSON parsing fails with real CLI output	FIXED — `_parse_devcontainer_up_output` scans from last line
F15	Health checks use remote workspace path	FIXED — uses `host_workspace_folder` parameter
F16	Lazy activation not wired into `resolve()`	FIXED — `DevcontainerHandler.resolve()` calls `activate_container`
F17	Health checks never started	FIXED — called after successful activation
F22	Snapshot sandbox not implemented	Appears unchanged — flagged as known limitation

Not all prior findings could be verified due to the restructured code, but the major bugs have been addressed.

New Findings

ID	Sev	Category	File	Description
F25	P0	CI	CI Run #1307	Integration tests fail in CI — 2 Robot suites have failures (see details below)
F26	P1	Bug	`robot/cleanup.robot:20-26`	Cleanup Robot test broken by model extraction — `CleanupReport`, `StaleItem`, `ResourceCleanupSummary` moved to `cleanup_models.py` but Robot test still reads `cleanup_service.py` and asserts `Should Contain ${content} class CleanupReport`
F27	P1	Process	N/A	Branch is ~160 commits behind master — merge base `a60cda1f` is far from current master (`620adfee`). BDD scenario count is 8,208 vs master's 9,029. Quality gates pass on the branch but don't validate against current master code.
F28	P2	Policy	Multiple files	4 files exceed 500-line limit — `devcontainer.py` (926), `resource.py` (1,233), `plan_lifecycle_service.py` (1,499), `devcontainer_lifecycle.feature` (762)
F29	P2	Policy	`robot/helper_devcontainer_lifecycle.py:48,49,53,61`	4 new `# type: ignore` suppressions — CONTRIBUTING prohibits inline `# type: ignore`. The conditional import pattern can be refactored to avoid these.
F30	P2	Policy	3 step files	13 inline imports in function bodies — `devcontainer_lifecycle_steps.py` (4), `devcontainer_cleanup_steps.py` (5), `devcontainer_health_check_steps.py` (4). CONTRIBUTING requires all imports at file top (only `if TYPE_CHECKING:` exempt).
F31	P2	Docs	`devcontainer_resources.md:60-80`	Auto-discovery described as working but not wired — main body says "The discovery module checks for configuration files..." as if functional, but Known Limitations at line 245 contradicts this. The main body should say "planned" or be removed.
F32	P2	Process	PR description	PR description is still stale — claims 794-line handler (actual: 926), 668-line feature file with 93 scenarios (actual: 762 lines, ~72 scenarios), and the state names in the description still show the old vocabulary in places. (Raised as F6 in Round 2.)

F25 — CI Integration Test Failures (P0)

The CI run at https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1 shows 2 failing Robot test suites:

Failure 1 — Robot.Cleanup (1 of 9 failed):

Cleanup Service Exports Expected Symbols fails because this PR extracted CleanupReport, StaleItem, and ResourceCleanupSummary from cleanup_service.py into the new cleanup_models.py. The Robot test at robot/cleanup.robot:20-26 reads the content of cleanup_service.py and asserts:

Should Contain    ${content}    class CleanupReport
Should Contain    ${content}    class StaleItem
Should Contain    ${content}    class ResourceCleanupSummary

Since these classes now live in cleanup_models.py, the assertions fail. The CI log confirms the full cleanup_service.py content is dumped and does not contain class CleanupReport.

Fix: Update robot/cleanup.robot to check cleanup_models.py for the extracted classes, or check that cleanup_service.py re-exports them.

Failure 2 — Robot.Devcontainer Handler (1 of 10 failed):

DevcontainerHandler Default Strategy fails with 1 != 0 (exit code 1). The helper at robot/helper_devcontainer_handler.py:78 asserts handler._default_strategy.value == "snapshot". This likely fails due to an import or attribute error when running against the merged codebase in CI. This needs investigation — it may be related to the branch being far behind master.

F26 — Cleanup Robot Test Broken (P1)

See F25, Failure 1. The robot/cleanup.robot test was not updated to reflect the extraction of model classes into cleanup_models.py. This is a regression introduced by this PR.

F28 — File Length Violations (P2)

CONTRIBUTING.md requires files to stay under 500 lines. Four files in this PR exceed the limit:

File	Lines	Over by
`src/cleveragents/resource/handlers/devcontainer.py`	926	+426
`src/cleveragents/cli/commands/resource.py`	1,233	+733
`src/cleveragents/application/services/plan_lifecycle_service.py`	1,499	+999
`features/devcontainer_lifecycle.feature`	762	+262

resource.py and plan_lifecycle_service.py were already over the limit before this PR, but this PR adds significantly to both. devcontainer.py is a new file that should have been designed within the limit.

Recommendation: Split devcontainer.py into devcontainer_handler.py (the DevcontainerHandler class) and devcontainer_lifecycle.py (registry, activation, stop, rebuild, health check functions). Split the feature file by domain (already done for step files).

Verdict

REQUEST_CHANGES — 1 P0 (CI failures), 2 P1 (broken Robot test, stale branch), 5 P2 (file length, # type: ignore, inline imports, stale docs, stale PR description).

The P0/P1 items are blocking:

F25/F26: The robot/cleanup.robot test must be updated to match the model extraction.
F27: Rebase onto current master before merge — the branch is ~160 commits behind and CI tests are failing against the merged result.

The P2 items should be addressed or explicitly deferred:

F28: File length violations need a plan (split or defer with PM acknowledgment).
F29/F30: # type: ignore and inline imports are policy violations.
F31/F32: Docs and PR description need updating.

What Went Well

Excellent engineering response to Round 2 findings — all 7 P1 bugs from my prior review are fixed
Cleanup hooks properly wired into session close and all terminal plan states
TOCTOU race eliminated with proper lock scoping
Health check thread join added for clean shutdown
rebuild_container now sends --reset-container flag
SEC-1 (path canonicalization) addressed with os.path.realpath()
Concurrency tests now use threading.Barrier per project convention
Mock infrastructure validates subprocess kwargs (TEST-4 fix)
Good test organization: step files split by domain (lifecycle/activation/cleanup/health-check)

## Review — PR #613 `feature/m6plus-devcontainer-lifecycle` (Round 3) **Reviewer**: brent.edwards | **Commit**: `bbeef00b` | **Issue**: #514 **Prior reviews**: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups F12–F24 (Round 2) --- ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (8,208 scenarios, 0 failures) | ### Prior Findings Verification Good progress — most of my Round 2 P1/P2 findings have been addressed: | Prior ID | Finding | Status | |:---------|:--------|:-------| | F1 | TOCTOU race in `activate_container` | **FIXED** — lock held across get→transition→set | | F2 | `_stop_health_check` never joins thread | **FIXED** — `thread.join(timeout=2.0)` added | | F3 | Missing CHANGELOG entry | **FIXED** — entry added at line 53 | | F4 | Cleanup hooks not wired | **FIXED** — session close + plan lifecycle hooks wired | | F5 | `_health_check_threads` unprotected | **FIXED** — operations on health check dicts are now under `_registry_lock` | | F10 | `rebuild_container` identical to activate | **FIXED** — passes `--reset-container` via `rebuild=True` | | F11 | Inline import in `cleanup_service.py` | **FIXED** — `stop_all_active_containers` imported at module top | | F14 | JSON parsing fails with real CLI output | **FIXED** — `_parse_devcontainer_up_output` scans from last line | | F15 | Health checks use remote workspace path | **FIXED** — uses `host_workspace_folder` parameter | | F16 | Lazy activation not wired into `resolve()` | **FIXED** — `DevcontainerHandler.resolve()` calls `activate_container` | | F17 | Health checks never started | **FIXED** — called after successful activation | | F22 | Snapshot sandbox not implemented | Appears unchanged — flagged as known limitation | Not all prior findings could be verified due to the restructured code, but the major bugs have been addressed. --- ### New Findings | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F25 | P0 | CI | [CI Run #1307](https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1) | **Integration tests fail in CI** — 2 Robot suites have failures (see details below) | | F26 | P1 | Bug | `robot/cleanup.robot:20-26` | **Cleanup Robot test broken by model extraction** — `CleanupReport`, `StaleItem`, `ResourceCleanupSummary` moved to `cleanup_models.py` but Robot test still reads `cleanup_service.py` and asserts `Should Contain ${content} class CleanupReport` | | F27 | P1 | Process | N/A | **Branch is ~160 commits behind master** — merge base `a60cda1f` is far from current master (`620adfee`). BDD scenario count is 8,208 vs master's 9,029. Quality gates pass on the branch but don't validate against current master code. | | F28 | P2 | Policy | Multiple files | **4 files exceed 500-line limit** — `devcontainer.py` (926), `resource.py` (1,233), `plan_lifecycle_service.py` (1,499), `devcontainer_lifecycle.feature` (762) | | F29 | P2 | Policy | `robot/helper_devcontainer_lifecycle.py:48,49,53,61` | **4 new `# type: ignore` suppressions** — CONTRIBUTING prohibits inline `# type: ignore`. The conditional import pattern can be refactored to avoid these. | | F30 | P2 | Policy | 3 step files | **13 inline imports in function bodies** — `devcontainer_lifecycle_steps.py` (4), `devcontainer_cleanup_steps.py` (5), `devcontainer_health_check_steps.py` (4). CONTRIBUTING requires all imports at file top (only `if TYPE_CHECKING:` exempt). | | F31 | P2 | Docs | `devcontainer_resources.md:60-80` | **Auto-discovery described as working but not wired** — main body says "The discovery module checks for configuration files..." as if functional, but Known Limitations at line 245 contradicts this. The main body should say "planned" or be removed. | | F32 | P2 | Process | PR description | **PR description is still stale** — claims 794-line handler (actual: 926), 668-line feature file with 93 scenarios (actual: 762 lines, ~72 scenarios), and the state names in the description still show the old vocabulary in places. (Raised as F6 in Round 2.) | --- ### F25 — CI Integration Test Failures (P0) The CI run at https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1 shows **2 failing Robot test suites**: **Failure 1 — `Robot.Cleanup` (1 of 9 failed):** `Cleanup Service Exports Expected Symbols` fails because this PR extracted `CleanupReport`, `StaleItem`, and `ResourceCleanupSummary` from `cleanup_service.py` into the new `cleanup_models.py`. The Robot test at `robot/cleanup.robot:20-26` reads the content of `cleanup_service.py` and asserts: ```robot Should Contain ${content} class CleanupReport Should Contain ${content} class StaleItem Should Contain ${content} class ResourceCleanupSummary ``` Since these classes now live in `cleanup_models.py`, the assertions fail. The CI log confirms the full `cleanup_service.py` content is dumped and does not contain `class CleanupReport`. **Fix:** Update `robot/cleanup.robot` to check `cleanup_models.py` for the extracted classes, or check that `cleanup_service.py` re-exports them. **Failure 2 — `Robot.Devcontainer Handler` (1 of 10 failed):** `DevcontainerHandler Default Strategy` fails with `1 != 0` (exit code 1). The helper at `robot/helper_devcontainer_handler.py:78` asserts `handler._default_strategy.value == "snapshot"`. This likely fails due to an import or attribute error when running against the merged codebase in CI. This needs investigation — it may be related to the branch being far behind master. --- ### F26 — Cleanup Robot Test Broken (P1) See F25, Failure 1. The `robot/cleanup.robot` test was not updated to reflect the extraction of model classes into `cleanup_models.py`. This is a regression introduced by this PR. --- ### F28 — File Length Violations (P2) CONTRIBUTING.md requires files to stay under 500 lines. Four files in this PR exceed the limit: | File | Lines | Over by | |:-----|------:|--------:| | `src/cleveragents/resource/handlers/devcontainer.py` | 926 | +426 | | `src/cleveragents/cli/commands/resource.py` | 1,233 | +733 | | `src/cleveragents/application/services/plan_lifecycle_service.py` | 1,499 | +999 | | `features/devcontainer_lifecycle.feature` | 762 | +262 | `resource.py` and `plan_lifecycle_service.py` were already over the limit before this PR, but this PR adds significantly to both. `devcontainer.py` is a new file that should have been designed within the limit. **Recommendation:** Split `devcontainer.py` into `devcontainer_handler.py` (the `DevcontainerHandler` class) and `devcontainer_lifecycle.py` (registry, activation, stop, rebuild, health check functions). Split the feature file by domain (already done for step files). --- ### Verdict **REQUEST_CHANGES** — 1 P0 (CI failures), 2 P1 (broken Robot test, stale branch), 5 P2 (file length, `# type: ignore`, inline imports, stale docs, stale PR description). The P0/P1 items are blocking: - **F25/F26**: The `robot/cleanup.robot` test must be updated to match the model extraction. - **F27**: Rebase onto current master before merge — the branch is ~160 commits behind and CI tests are failing against the merged result. The P2 items should be addressed or explicitly deferred: - **F28**: File length violations need a plan (split or defer with PM acknowledgment). - **F29/F30**: `# type: ignore` and inline imports are policy violations. - **F31/F32**: Docs and PR description need updating. ### What Went Well - Excellent engineering response to Round 2 findings — all 7 P1 bugs from my prior review are fixed - Cleanup hooks properly wired into session close and all terminal plan states - TOCTOU race eliminated with proper lock scoping - Health check thread join added for clean shutdown - `rebuild_container` now sends `--reset-container` flag - SEC-1 (path canonicalization) addressed with `os.path.realpath()` - Concurrency tests now use `threading.Barrier` per project convention - Mock infrastructure validates subprocess kwargs (TEST-4 fix) - Good test organization: step files split by domain (lifecycle/activation/cleanup/health-check)

freemo commented

2026-03-09 02:48:22 +00:00

PM Status Check — Day 29 (Escalation)

Author: @CoreRasurae | Milestone: v3.6.0 (M6+) | Mergeable: Yes | Reviews: REQUEST_CHANGES (Brent)

Current State — UNRESPONSIVE AUTHOR

Brent conducted a comprehensive 4-pass review totaling:

8+ P1 findings including: TOCTOU race (F1), thread join omission (F2), unwired cleanup hooks (F4), devcontainer up JSON parsing failure (F14), health check using wrong workspace path (F15), snapshot sandbox not implemented (F22), container execution stubbed (F24)
~12 P2 findings including: docker stop exit code ignored (F12), documentation claims unimplemented features (F13, F23)

Blocking Issue: NO AUTHOR RESPONSE

@CoreRasurae — There has been zero response to Brent's review findings since Day 26 (3 days ago). This violates the 24-hour review response SLA in CONTRIBUTING.md.

Previous PM escalation (comment #55513, Day 26) set deadlines:

Acknowledge findings: Mar 7 EOD — MISSED
Push P1 fixes: Mar 10 — AT RISK

Schedule Context

This is M6+ (v3.6.0, due Mar 28), so there is time. However, PR #616 (container tool exec) is blocked on this PR. The unresponsiveness pattern is concerning — Luis also has no response on PR #614.

Action Required

Who	Action	Deadline
@CoreRasurae	Acknowledge review findings immediately	OVERDUE
@CoreRasurae	Push P1 fixes	Mar 12

If no response by Mar 10 EOD, will escalate to @freemo for intervention.

## PM Status Check — Day 29 (Escalation) **Author**: @CoreRasurae | **Milestone**: v3.6.0 (M6+) | **Mergeable**: Yes | **Reviews**: REQUEST_CHANGES (Brent) ### Current State — UNRESPONSIVE AUTHOR Brent conducted a **comprehensive 4-pass review** totaling: - **8+ P1 findings** including: TOCTOU race (F1), thread join omission (F2), unwired cleanup hooks (F4), `devcontainer up` JSON parsing failure (F14), health check using wrong workspace path (F15), snapshot sandbox not implemented (F22), container execution stubbed (F24) - **~12 P2 findings** including: docker stop exit code ignored (F12), documentation claims unimplemented features (F13, F23) ### Blocking Issue: NO AUTHOR RESPONSE @CoreRasurae — There has been **zero response** to Brent's review findings since Day 26 (3 days ago). This violates the 24-hour review response SLA in CONTRIBUTING.md. Previous PM escalation (comment #55513, Day 26) set deadlines: - Acknowledge findings: Mar 7 EOD — **MISSED** - Push P1 fixes: Mar 10 — **AT RISK** ### Schedule Context This is M6+ (v3.6.0, due Mar 28), so there is time. However, PR #616 (container tool exec) is **blocked** on this PR. The unresponsiveness pattern is concerning — Luis also has no response on PR #614. ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @CoreRasurae | **Acknowledge review findings immediately** | **OVERDUE** | | @CoreRasurae | Push P1 fixes | Mar 12 | If no response by Mar 10 EOD, will escalate to @freemo for intervention.

freemo referenced this pull request

2026-03-09 02:48:29 +00:00

feat(async): wire retry policies into services #614

freemo referenced this pull request

2026-03-09 02:48:40 +00:00

feat(devcontainer): add container-aware tool execution and I/O forwarding #616

CoreRasurae commented

2026-03-09 15:28:02 +00:00

Review Response — Round 3 (brent.edwards, Review #2052)

Commit: 6252d83a

Resolved

Finding	Severity	What was done
F25	P0	Fixed Robot test failures: updated `robot/cleanup.robot` to check `cleanup_models.py` for class definitions and verify `cleanup_service.py` re-exports them by name; changed handler strategy assertion from `"snapshot"` to `"none"` in `robot/helper_devcontainer_handler.py` (SandboxFactory has not implemented snapshot — the container itself is the sandbox).
F26	P1	Same fix as F25 — `robot/cleanup.robot` now references `cleanup_models.py` where the classes were extracted to.
F28	P2	Source split: `devcontainer.py` (926 lines) → 5 modules, all under 500 lines: `_devcontainer_internals.py` (152, shared state/registry/parsing), `devcontainer_health.py` (206, background health checks), `devcontainer_lifecycle.py` (440, activate/stop/rebuild), `devcontainer_cleanup.py` (149, session cleanup/eviction), `devcontainer.py` (201, handler class + backward-compatible re-exports). No circular dependencies. Feature split: `devcontainer_lifecycle.feature` (762 lines) → 4 domain-specific files: `devcontainer_state.feature` (198), `devcontainer_activation.feature` (217), `devcontainer_health_check.feature` (126), `devcontainer_cleanup.feature` (237). All 104 scenarios preserved. `resource.py` (1233) and `plan_lifecycle_service.py` (1499) were not split — they exceeded 500 lines before this PR and are out of scope for this feature branch.
F29	P2	Removed the conditional import pattern with 4 `# type: ignore` suppressions in `robot/helper_devcontainer_lifecycle.py`; replaced with inline `_MockResult`/`_MockRunner` classes that don't require the features/mocks import.
F30	P2	Moved all 13 inline imports to file top: `devcontainer_lifecycle_steps.py` (4), `devcontainer_cleanup_steps.py` (5), `devcontainer_health_check_steps.py` (4).
F31	P2	Marked the Auto-Discovery section in `docs/reference/devcontainer_resources.md` as "(Planned)" with an admonition explaining that `discover_devcontainers()` exists and is tested in isolation but is not wired into any production code path.

Not addressed

Finding	Severity	Reason
F27	P1	Branch ~160 commits behind master. This requires a rebase/merge, not a code fix within this commit. Will be handled separately before merge.
F32	P2	PR description is stale. Updating the PR description requires a remote Forgejo API change, not a code commit. Will update the PR description separately.

Verification

8208 BDD scenarios pass (all 104 devcontainer scenarios across the 4 new feature files)
30 Robot integration tests pass (cleanup, devcontainer_handler, devcontainer_lifecycle suites)
Vulture dead code — clean
Pyright — 0 errors, 0 warnings
Ruff lint — clean
Module coverage (devcontainer features only): _devcontainer_internals 98%, devcontainer.py 100%, devcontainer_cleanup 100%, devcontainer_health 92%, devcontainer_lifecycle 99%

## Review Response — Round 3 (brent.edwards, Review #2052) Commit: `6252d83a` ### Resolved | Finding | Severity | What was done | |:--------|:---------|:--------------| | **F25** | P0 | Fixed Robot test failures: updated `robot/cleanup.robot` to check `cleanup_models.py` for class definitions and verify `cleanup_service.py` re-exports them by name; changed handler strategy assertion from `"snapshot"` to `"none"` in `robot/helper_devcontainer_handler.py` (SandboxFactory has not implemented snapshot — the container itself is the sandbox). | | **F26** | P1 | Same fix as F25 — `robot/cleanup.robot` now references `cleanup_models.py` where the classes were extracted to. | | **F28** | P2 | **Source split**: `devcontainer.py` (926 lines) → 5 modules, all under 500 lines: `_devcontainer_internals.py` (152, shared state/registry/parsing), `devcontainer_health.py` (206, background health checks), `devcontainer_lifecycle.py` (440, activate/stop/rebuild), `devcontainer_cleanup.py` (149, session cleanup/eviction), `devcontainer.py` (201, handler class + backward-compatible re-exports). No circular dependencies. **Feature split**: `devcontainer_lifecycle.feature` (762 lines) → 4 domain-specific files: `devcontainer_state.feature` (198), `devcontainer_activation.feature` (217), `devcontainer_health_check.feature` (126), `devcontainer_cleanup.feature` (237). All 104 scenarios preserved. `resource.py` (1233) and `plan_lifecycle_service.py` (1499) were not split — they exceeded 500 lines before this PR and are out of scope for this feature branch. | | **F29** | P2 | Removed the conditional import pattern with 4 `# type: ignore` suppressions in `robot/helper_devcontainer_lifecycle.py`; replaced with inline `_MockResult`/`_MockRunner` classes that don't require the features/mocks import. | | **F30** | P2 | Moved all 13 inline imports to file top: `devcontainer_lifecycle_steps.py` (4), `devcontainer_cleanup_steps.py` (5), `devcontainer_health_check_steps.py` (4). | | **F31** | P2 | Marked the Auto-Discovery section in `docs/reference/devcontainer_resources.md` as "(Planned)" with an admonition explaining that `discover_devcontainers()` exists and is tested in isolation but is not wired into any production code path. | ### Not addressed | Finding | Severity | Reason | |:--------|:---------|:-------| | **F27** | P1 | Branch ~160 commits behind master. This requires a rebase/merge, not a code fix within this commit. Will be handled separately before merge. | | **F32** | P2 | PR description is stale. Updating the PR description requires a remote Forgejo API change, not a code commit. Will update the PR description separately. | ### Verification - **8208 BDD scenarios** pass (all 104 devcontainer scenarios across the 4 new feature files) - **30 Robot integration tests** pass (cleanup, devcontainer_handler, devcontainer_lifecycle suites) - **Vulture dead code** — clean - **Pyright** — 0 errors, 0 warnings - **Ruff lint** — clean - **Module coverage** (devcontainer features only): `_devcontainer_internals` 98%, `devcontainer.py` 100%, `devcontainer_cleanup` 100%, `devcontainer_health` 92%, `devcontainer_lifecycle` 99%

CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from bbeef00b1a

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 14s

Details

CI / lint (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 17s

Details

CI / typecheck (pull_request) Successful in 33s

Details

CI / security (pull_request) Successful in 39s

Details

CI / unit_tests (pull_request) Successful in 2m24s

Details

CI / integration_tests (pull_request) Failing after 2m55s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / coverage (pull_request) Successful in 4m9s

Details

CI / benchmark-regression (pull_request) Successful in 26m29s

Details

to 581ac47543

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 19s

Details

CI / security (pull_request) Successful in 33s

Details

CI / typecheck (pull_request) Successful in 35s

Details

CI / integration_tests (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

2026-03-09 16:20:20 +00:00

Compare

CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from 581ac47543

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 19s

Details

CI / security (pull_request) Successful in 33s

Details

CI / typecheck (pull_request) Successful in 35s

Details

CI / integration_tests (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

to a160888632

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / security (pull_request) Successful in 34s

Details

CI / typecheck (pull_request) Successful in 44s

Details

CI / integration_tests (pull_request) Successful in 3m14s

Details

CI / unit_tests (pull_request) Successful in 4m41s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / coverage (pull_request) Successful in 4m44s

Details

CI / benchmark-regression (pull_request) Successful in 33m49s

Details

2026-03-09 16:22:01 +00:00

Compare

freemo added the

labels

2026-03-09 19:02:29 +00:00

freemo commented

2026-03-09 19:04:09 +00:00

PM Status Check — Day 29 (Update)

Author: @CoreRasurae | Milestone: v3.6.0 (Post-MVP) | Reviews: REQUEST_CHANGES (Brent, 4 passes, 15 findings)

Author Response Today

@CoreRasurae responded with resolutions for findings F25-F31 from Brent's 4th review pass. Added State/In Review and Priority/Medium labels.

Current State

20 comments, active review discussion
Luis addressed findings from the latest review round
Brent needs to verify the claimed fixes

Action Required

Who	Action	Deadline
@brent.edwards	Verify F25-F31 resolutions, re-review	Mar 12
@CoreRasurae	Address any remaining findings from re-review	Mar 13

This is a Post-MVP PR so lower urgency than M3-M5 items, but Luis's responsiveness today is appreciated. Good to keep momentum.

## PM Status Check — Day 29 (Update) **Author**: @CoreRasurae | **Milestone**: v3.6.0 (Post-MVP) | **Reviews**: REQUEST_CHANGES (Brent, 4 passes, 15 findings) ### Author Response Today @CoreRasurae responded with resolutions for findings F25-F31 from Brent's 4th review pass. Added State/In Review and Priority/Medium labels. ### Current State - 20 comments, active review discussion - Luis addressed findings from the latest review round - Brent needs to verify the claimed fixes ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @brent.edwards | Verify F25-F31 resolutions, re-review | Mar 12 | | @CoreRasurae | Address any remaining findings from re-review | Mar 13 | This is a Post-MVP PR so lower urgency than M3-M5 items, but Luis's responsiveness today is appreciated. Good to keep momentum.

freemo referenced this pull request

2026-03-09 19:04:22 +00:00

feat(devcontainer): add container-aware tool execution and I/O forwarding #616

freemo added the

Points

8

label

2026-03-09 19:11:35 +00:00

freemo referenced this pull request

2026-03-09 19:11:45 +00:00

feat(devcontainer): add container-aware tool execution and I/O forwarding #515

freemo added a new dependency

2026-03-09 20:28:02 +00:00

#514 feat(devcontainer): add lazy activation and lifecycle management

freemo referenced this pull request

2026-03-09 20:28:29 +00:00

feat(devcontainer): add container-aware tool execution and I/O forwarding #616

freemo commented

2026-03-09 20:31:57 +00:00

PM Compliance Audit — PR #613

Auditor: PM (automated sweep) | Date: 2026-03-09 | Reference: CONTRIBUTING.md §Pull Request Process (items 1–12)

Checklist

#	Requirement	Status	Notes
1	Detailed description (summary + motivation)	PASS	Comprehensive description with changes, testing, spec references, known limitations
2	Closing keyword (`Closes #N` / `Fixes #N`)	PASS	`Closes #514` present in body + `ISSUES CLOSED: #514` in footer
3	Dependency link (PR blocks issue)	PASS	PR #613 blocks #514 — link exists
4	One Epic scope per PR	PASS	Single Epic (devcontainer lifecycle)
5	Atomic, well-scoped commits	—	Not audited
6	Commit messages reference tickets	—	Not audited
7	Conventional Changelog standard	—	Not audited
8	CHANGELOG updated	PASS	CHANGELOG entry confirmed in Brent's Round 3 review (F3 FIXED)
9	No build/install artifacts	PASS
10	Milestone assigned	PASS	v3.6.0 (Post-MVP)
11	Type/ label	PASS	`Type/Feature` present
12	Assignee set	PASS	CoreRasurae (Luis)

Review Status

Reviewer	State	Notes
hamza.khyari	REQUEST_CHANGES (stale, official, dismissed partially)	Round 1: BUG-1 through BUG-5, SPEC-1 through SPEC-3, SEC-1/2, PERF-1, TEST-1 through TEST-4, CODE-1
hamza.khyari	REQUEST_CHANGES (stale, official)	Short "update" review
brent.edwards	REQUEST_CHANGES (stale)	Round 2: F1-F11 + follow-ups F12-F24. Most P1s fixed in subsequent push.
brent.edwards	REQUEST_CHANGES (stale, official)	Round 3: F25 (P0 CI failures), F26-F27 (P1), F28-F32 (P2). Branch ~160 commits behind master.
freemo	COMMENT	Active review in progress, author responding.

Merge Readiness

NOT READY — Significant blockers remain:

P0 — CI integration test failures (F25: Robot tests broken by model extraction)
P1 — Branch ~160 commits behind master (F27: must rebase)
P1 — Broken Robot test (F26: cleanup.robot not updated)
4 open REQUEST_CHANGES reviews — all must be resolved or dismissed
File length violations (F28: 4 files over 500 lines)

Action items for @CoreRasurae:

Rebase onto current master — critical given 160-commit drift
Fix robot/cleanup.robot to match model extraction
Address remaining P1/P2 findings from Brent's Round 3
Request re-review from both hamza.khyari and brent.edwards
Need 2 formal APPROVEs with no open REQUEST_CHANGES

## PM Compliance Audit — PR #613 **Auditor**: PM (automated sweep) | **Date**: 2026-03-09 | **Reference**: CONTRIBUTING.md §Pull Request Process (items 1–12) ### Checklist | # | Requirement | Status | Notes | |---|-------------|--------|-------| | 1 | Detailed description (summary + motivation) | PASS | Comprehensive description with changes, testing, spec references, known limitations | | 2 | Closing keyword (`Closes #N` / `Fixes #N`) | PASS | `Closes #514` present in body + `ISSUES CLOSED: #514` in footer | | 3 | Dependency link (PR blocks issue) | PASS | PR #613 blocks #514 — link exists | | 4 | One Epic scope per PR | PASS | Single Epic (devcontainer lifecycle) | | 5 | Atomic, well-scoped commits | — | Not audited | | 6 | Commit messages reference tickets | — | Not audited | | 7 | Conventional Changelog standard | — | Not audited | | 8 | CHANGELOG updated | PASS | CHANGELOG entry confirmed in Brent's Round 3 review (F3 FIXED) | | 9 | No build/install artifacts | PASS | | | 10 | Milestone assigned | PASS | v3.6.0 (Post-MVP) | | 11 | Type/ label | PASS | `Type/Feature` present | | 12 | Assignee set | PASS | CoreRasurae (Luis) | ### Review Status | Reviewer | State | Notes | |----------|-------|-------| | hamza.khyari | REQUEST_CHANGES (stale, official, dismissed partially) | Round 1: BUG-1 through BUG-5, SPEC-1 through SPEC-3, SEC-1/2, PERF-1, TEST-1 through TEST-4, CODE-1 | | hamza.khyari | REQUEST_CHANGES (stale, official) | Short "update" review | | brent.edwards | REQUEST_CHANGES (stale) | Round 2: F1-F11 + follow-ups F12-F24. Most P1s fixed in subsequent push. | | brent.edwards | REQUEST_CHANGES (stale, official) | Round 3: F25 (P0 CI failures), F26-F27 (P1), F28-F32 (P2). Branch ~160 commits behind master. | | freemo | COMMENT | Active review in progress, author responding. | ### Merge Readiness **NOT READY** — Significant blockers remain: 1. **P0 — CI integration test failures** (F25: Robot tests broken by model extraction) 2. **P1 — Branch ~160 commits behind master** (F27: must rebase) 3. **P1 — Broken Robot test** (F26: cleanup.robot not updated) 4. **4 open REQUEST_CHANGES reviews** — all must be resolved or dismissed 5. **File length violations** (F28: 4 files over 500 lines) **Action items for @CoreRasurae:** 1. Rebase onto current master — critical given 160-commit drift 2. Fix `robot/cleanup.robot` to match model extraction 3. Address remaining P1/P2 findings from Brent's Round 3 4. Request re-review from both hamza.khyari and brent.edwards 5. Need 2 formal APPROVEs with no open REQUEST_CHANGES

freemo commented

2026-03-09 20:42:51 +00:00

PM Escalation — Rebase Required + CI Failures (Day 29)

@CoreRasurae This PR is ~160 commits behind master with P0 CI integration test failures (Robot tests broken by model extraction).

Required actions (in order):

Rebase onto current master — this is the highest priority since quality gates cannot be validated against stale code
Fix robot/cleanup.robot to match the CleanupReport/StaleItem/ResourceCleanupSummary model extraction into cleanup_models.py
Address remaining P1/P2 findings from Brent's Round 3 review
Request re-review from both @hamza.khyari and @brent.edwards

There are 4 open REQUEST_CHANGES reviews that must all be resolved or dismissed before merge. This is an M6+ PR so schedule pressure is lower, but the stale branch is accumulating technical debt that makes each day's rebase harder.

**PM Escalation — Rebase Required + CI Failures (Day 29)** @CoreRasurae This PR is **~160 commits behind master** with P0 CI integration test failures (Robot tests broken by model extraction). **Required actions (in order):** 1. Rebase onto current master — this is the highest priority since quality gates cannot be validated against stale code 2. Fix `robot/cleanup.robot` to match the `CleanupReport`/`StaleItem`/`ResourceCleanupSummary` model extraction into `cleanup_models.py` 3. Address remaining P1/P2 findings from Brent's Round 3 review 4. Request re-review from both @hamza.khyari and @brent.edwards There are 4 open REQUEST_CHANGES reviews that must all be resolved or dismissed before merge. This is an M6+ PR so schedule pressure is lower, but the stale branch is accumulating technical debt that makes each day's rebase harder.

brent.edwards commented

2026-03-09 22:33:58 +00:00

Verification Pass — PR #613 `feature/m6plus-devcontainer-lifecycle` (Round 4/5)

Reviewer: brent.edwards | Commit: a1608886 | Issue: #514
Prior reviews: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups (Round 2), brent.edwards #2052 (Round 3)

Nox Verification Matrix

Session	Result
`nox -s lint`	✅ PASS
`nox -s typecheck`	✅ PASS (0 errors, 0 warnings)
`nox -s unit_tests`	✅ PASS (9,281 scenarios, 0 failures)
`nox -s integration_tests`	✅ PASS (all Robot suites passing, including `Robot.Devcontainer Lifecycle`)
`nox -s coverage_report`	✅ PASS (97% overall)
`nox -s security_scan`	✅ PASS (2 pre-existing findings in `wrapping.py` only)

Module-Level Coverage

Module	Coverage	Notes
`_devcontainer_internals.py`	98%	1 miss (line 128)
`devcontainer.py`	100%	—
`devcontainer_cleanup.py`	100%	—
`devcontainer_health.py`	91%	10 misses — defensive exception handlers in health check loop (lines 157, 171-176, 185, 199-204)
`devcontainer_lifecycle.py`	99%	3 misses (lines 206, 277-278)

devcontainer_health.py at 91% is below the 97% per-module threshold. The uncovered lines are except ValueError handlers and concurrent-state-already-changed breaks — legitimate defensive paths that are hard to exercise without elaborate thread-racing setups. See P3 finding below.

Round 3 Findings Verification

Prior ID	Sev	Finding	Status
F25	P0	CI Robot test failures (cleanup + strategy)	✅ FIXED — `cleanup.robot` now checks symbol names (not `class` prefix) + new `Cleanup Models Defines Data Classes` test case. Strategy check updated to `"none"`.
F26	P1	cleanup.robot broken by model extraction	✅ FIXED — See F25. Test correctly validates both `cleanup_service.py` re-exports and `cleanup_models.py` class definitions.
F27	P1	Branch ~160 commits behind master	✅ FIXED — Branch is now only 1 commit behind master (rebased). `mergeable: true`.
F28	P2	4 files exceed 500-line limit	✅ MOSTLY FIXED — `devcontainer.py` (926→201 lines) split into 5 modules all under 500 lines. Feature file (762 lines) split into 4 files (126–237 lines). Step files all under 500 lines (409–494). Residual: `resource.py` (1,233) and `plan_lifecycle_service.py` (1,663) remain over limit — both pre-existing violations, this PR adds +173 and +42 lines respectively.
F29	P2	4 `# type: ignore` in Robot helper	✅ FIXED — Conditional import pattern replaced with self-contained inline `_MockResult`/`_MockRunner` classes. Zero `# type: ignore` in the diff.
F30	P2	13 inline imports in step files	✅ FIXED — All imports now at file top in both source modules and step files. Zero inline imports found.
F31	P2	Auto-discovery described as working but not wired	✅ FIXED — Clear `> Not yet wired (F31/F23):` callout added. All discovery sections labeled "(Planned)". Known Limitations table is consistent with the callout.
F32	P2	PR description stale	❌ NOT FIXED — Description still says "Handler (`devcontainer.py` — new, 794 lines)" and "93 scenarios" — doesn't reflect the 5-module split, current file sizes, or scenario counts.

New/Residual Findings

ID	Sev	Category	File	Description
F33	P3	Coverage	`devcontainer_health.py:157,171-176,185,199-204`	`devcontainer_health.py` at 91% (below 97% threshold). Uncovered lines are `except ValueError` handlers in the health check loop — defensive concurrency guards that are hard to exercise without thread-racing test setups. Acceptable for a concurrent-exception guard pattern.
F28-R	P3	Policy	`resource.py`, `plan_lifecycle_service.py`	Pre-existing file-length violations slightly worsened (+173, +42 lines). Not blocking since the violations existed before this PR.
F32	P2	Process	PR description	PR description is stale — should reflect the 5-module split, actual file sizes, and current scenario/feature counts.

File Size Audit (All New/Modified Source Files)

File	Lines	Status
`_devcontainer_internals.py`	152	✅
`devcontainer.py`	201	✅
`devcontainer_health.py`	206	✅
`devcontainer_lifecycle.py`	439	✅
`devcontainer_cleanup.py`	149	✅
`container_lifecycle.py`	286	✅
`devcontainer_state.feature`	198	✅
`devcontainer_activation.feature`	217	✅
`devcontainer_health_check.feature`	126	✅
`devcontainer_cleanup.feature`	237	✅
`devcontainer_activation_steps.py`	494	✅
`devcontainer_cleanup_steps.py`	465	✅
`devcontainer_health_check_steps.py`	409	✅
`devcontainer_lifecycle_steps.py`	464	✅
`resource.py` (pre-existing)	1,233	⚠️ Pre-existing violation
`plan_lifecycle_service.py` (pre-existing)	1,663	⚠️ Pre-existing violation

Verdict

APPROVE — All P0 and P1 findings from Rounds 1-3 are verified fixed. The code split is clean and well-structured with proper re-exports for backward compatibility. All nox quality gates pass. The only open items are:

F32 (P2): Update the PR description to reflect the current implementation (non-blocking)
F33 (P3): devcontainer_health.py coverage at 91% — acceptable for defensive concurrency handlers
F28-R (P3): Pre-existing file-length violations — not caused by this PR

What Went Well

Excellent 5-module split addressing the 926-line file violation — each module has a clear single responsibility
Re-export pattern in devcontainer.py with comprehensive __all__ preserves backward compatibility
All 9,281 BDD scenarios pass (up from 8,208 in the previous verification)
Robot test fixes are correct — cleanup.robot properly validates both the new cleanup_models.py and the re-exports in cleanup_service.py
Branch successfully rebased onto current master — merge conflicts resolved
Zero # type: ignore suppressions remaining
Zero inline imports — all at file top as required by CONTRIBUTING
Documentation accurately describes the "not yet wired" status of auto-discovery
Health check TOCTOU fix with _registry_lock is properly applied across all modules
Thread join in _stop_health_check prevents dangling background work

## Verification Pass — PR #613 `feature/m6plus-devcontainer-lifecycle` (Round 4/5) **Reviewer**: brent.edwards | **Commit**: `a1608886` | **Issue**: #514 **Prior reviews**: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups (Round 2), brent.edwards #2052 (Round 3) --- ### Nox Verification Matrix | Session | Result | |---------|--------| | `nox -s lint` | ✅ PASS | | `nox -s typecheck` | ✅ PASS (0 errors, 0 warnings) | | `nox -s unit_tests` | ✅ PASS (9,281 scenarios, 0 failures) | | `nox -s integration_tests` | ✅ PASS (all Robot suites passing, including `Robot.Devcontainer Lifecycle`) | | `nox -s coverage_report` | ✅ PASS (97% overall) | | `nox -s security_scan` | ✅ PASS (2 pre-existing findings in `wrapping.py` only) | ### Module-Level Coverage | Module | Coverage | Notes | |:-------|:--------:|:------| | `_devcontainer_internals.py` | 98% | 1 miss (line 128) | | `devcontainer.py` | 100% | — | | `devcontainer_cleanup.py` | 100% | — | | `devcontainer_health.py` | 91% | 10 misses — defensive exception handlers in health check loop (lines 157, 171-176, 185, 199-204) | | `devcontainer_lifecycle.py` | 99% | 3 misses (lines 206, 277-278) | `devcontainer_health.py` at 91% is below the 97% per-module threshold. The uncovered lines are `except ValueError` handlers and concurrent-state-already-changed breaks — legitimate defensive paths that are hard to exercise without elaborate thread-racing setups. See P3 finding below. --- ### Round 3 Findings Verification | Prior ID | Sev | Finding | Status | |:---------|:----|:--------|:-------| | **F25** | P0 | CI Robot test failures (cleanup + strategy) | ✅ **FIXED** — `cleanup.robot` now checks symbol names (not `class` prefix) + new `Cleanup Models Defines Data Classes` test case. Strategy check updated to `"none"`. | | **F26** | P1 | cleanup.robot broken by model extraction | ✅ **FIXED** — See F25. Test correctly validates both `cleanup_service.py` re-exports and `cleanup_models.py` class definitions. | | **F27** | P1 | Branch ~160 commits behind master | ✅ **FIXED** — Branch is now only 1 commit behind master (rebased). `mergeable: true`. | | **F28** | P2 | 4 files exceed 500-line limit | ✅ **MOSTLY FIXED** — `devcontainer.py` (926→201 lines) split into 5 modules all under 500 lines. Feature file (762 lines) split into 4 files (126–237 lines). Step files all under 500 lines (409–494). **Residual**: `resource.py` (1,233) and `plan_lifecycle_service.py` (1,663) remain over limit — both pre-existing violations, this PR adds +173 and +42 lines respectively. | | **F29** | P2 | 4 `# type: ignore` in Robot helper | ✅ **FIXED** — Conditional import pattern replaced with self-contained inline `_MockResult`/`_MockRunner` classes. Zero `# type: ignore` in the diff. | | **F30** | P2 | 13 inline imports in step files | ✅ **FIXED** — All imports now at file top in both source modules and step files. Zero inline imports found. | | **F31** | P2 | Auto-discovery described as working but not wired | ✅ **FIXED** — Clear `> **Not yet wired (F31/F23):**` callout added. All discovery sections labeled "(Planned)". Known Limitations table is consistent with the callout. | | **F32** | P2 | PR description stale | ❌ **NOT FIXED** — Description still says "Handler (`devcontainer.py` — new, 794 lines)" and "93 scenarios" — doesn't reflect the 5-module split, current file sizes, or scenario counts. | --- ### New/Residual Findings | ID | Sev | Category | File | Description | |:---|:----|:---------|:-----|:------------| | F33 | P3 | Coverage | `devcontainer_health.py:157,171-176,185,199-204` | `devcontainer_health.py` at 91% (below 97% threshold). Uncovered lines are `except ValueError` handlers in the health check loop — defensive concurrency guards that are hard to exercise without thread-racing test setups. Acceptable for a concurrent-exception guard pattern. | | F28-R | P3 | Policy | `resource.py`, `plan_lifecycle_service.py` | Pre-existing file-length violations slightly worsened (+173, +42 lines). Not blocking since the violations existed before this PR. | | F32 | P2 | Process | PR description | PR description is stale — should reflect the 5-module split, actual file sizes, and current scenario/feature counts. | --- ### File Size Audit (All New/Modified Source Files) | File | Lines | Status | |:-----|------:|:-------| | `_devcontainer_internals.py` | 152 | ✅ | | `devcontainer.py` | 201 | ✅ | | `devcontainer_health.py` | 206 | ✅ | | `devcontainer_lifecycle.py` | 439 | ✅ | | `devcontainer_cleanup.py` | 149 | ✅ | | `container_lifecycle.py` | 286 | ✅ | | `devcontainer_state.feature` | 198 | ✅ | | `devcontainer_activation.feature` | 217 | ✅ | | `devcontainer_health_check.feature` | 126 | ✅ | | `devcontainer_cleanup.feature` | 237 | ✅ | | `devcontainer_activation_steps.py` | 494 | ✅ | | `devcontainer_cleanup_steps.py` | 465 | ✅ | | `devcontainer_health_check_steps.py` | 409 | ✅ | | `devcontainer_lifecycle_steps.py` | 464 | ✅ | | `resource.py` (pre-existing) | 1,233 | ⚠️ Pre-existing violation | | `plan_lifecycle_service.py` (pre-existing) | 1,663 | ⚠️ Pre-existing violation | --- ### Verdict **APPROVE** — All P0 and P1 findings from Rounds 1-3 are verified fixed. The code split is clean and well-structured with proper re-exports for backward compatibility. All nox quality gates pass. The only open items are: - **F32 (P2)**: Update the PR description to reflect the current implementation (non-blocking) - **F33 (P3)**: `devcontainer_health.py` coverage at 91% — acceptable for defensive concurrency handlers - **F28-R (P3)**: Pre-existing file-length violations — not caused by this PR ### What Went Well - Excellent 5-module split addressing the 926-line file violation — each module has a clear single responsibility - Re-export pattern in `devcontainer.py` with comprehensive `__all__` preserves backward compatibility - All 9,281 BDD scenarios pass (up from 8,208 in the previous verification) - Robot test fixes are correct — cleanup.robot properly validates both the new `cleanup_models.py` and the re-exports in `cleanup_service.py` - Branch successfully rebased onto current master — merge conflicts resolved - Zero `# type: ignore` suppressions remaining - Zero inline imports — all at file top as required by CONTRIBUTING - Documentation accurately describes the "not yet wired" status of auto-discovery - Health check TOCTOU fix with `_registry_lock` is properly applied across all modules - Thread join in `_stop_health_check` prevents dangling background work

brent.edwards approved these changes

2026-03-09 22:34:06 +00:00

Dismissed

brent.edwards left a comment

APPROVED — Verification Pass (Round 4/5)

All P0 and P1 findings from Rounds 1-3 are verified fixed. All nox quality gates pass (lint ✅, typecheck ✅, 9,281 BDD scenarios ✅, Robot integration ✅, 97% coverage ✅, security ✅).

Remaining open items are non-blocking:

F32 (P2): PR description stale — please update to reflect the 5-module split
F33 (P3): devcontainer_health.py at 91% coverage — acceptable for defensive concurrency handlers
F28-R (P3): Pre-existing file-length violations in resource.py and plan_lifecycle_service.py

See comment #57193 for full verification details.

## APPROVED — Verification Pass (Round 4/5) All P0 and P1 findings from Rounds 1-3 are verified fixed. All nox quality gates pass (lint ✅, typecheck ✅, 9,281 BDD scenarios ✅, Robot integration ✅, 97% coverage ✅, security ✅). Remaining open items are non-blocking: - **F32 (P2)**: PR description stale — please update to reflect the 5-module split - **F33 (P3)**: `devcontainer_health.py` at 91% coverage — acceptable for defensive concurrency handlers - **F28-R (P3)**: Pre-existing file-length violations in `resource.py` and `plan_lifecycle_service.py` See comment #57193 for full verification details.

brent.edwards commented

2026-03-09 23:29:33 +00:00

Reviewer Response — @brent.edwards

Responding to PM compliance audit (#57067), PM escalation (#57086), PM status checks (#56523, #56792), and @CoreRasurae's fix response (#56678).

Review Status

The PM compliance audit (#57067) lists 4 open REQUEST_CHANGES reviews and says "NOT READY". To clarify:

Formal APPROVE review #2067 posted today (Mar 9) alongside detailed verification comment #57193.
My APPROVE supersedes my previous REQUEST_CHANGES reviews (#2019, #2052). All P0 and P1 findings from Rounds 1-3 are verified fixed.
Full nox verification matrix passed: lint, typecheck, 9,281 scenarios, integration tests, 97% coverage, security scan.

Luis's Fix Verification (responding to #56678)

@CoreRasurae — Your fixes in commit a1608886 are all verified correct. Thank you for the thorough response and clean implementation:

Finding	Verification
F25 (P0) Robot CI failures	FIXED — cleanup.robot validates both `cleanup_models.py` classes and `cleanup_service.py` re-exports
F26 (P1) cleanup.robot broken	FIXED — See F25
F27 (P1) 160 commits behind	FIXED — Branch rebased, now 1 behind master, `mergeable: true`
F28 (P2) File size violations	FIXED — `devcontainer.py` split 926→201 lines across 5 clean modules. All under 500. Feature file split into 4 domain files.
F29 (P2) type:ignore in Robot	FIXED — Inline `_MockResult`/`_MockRunner` classes replace the conditional import
F30 (P2) Inline imports	FIXED — All 13 moved to file top
F31 (P2) Auto-discovery docs	FIXED — Clear "Not yet wired" callout with "(Planned)" labels

The 5-module split is particularly well done — clean single-responsibility per module, proper __all__ re-exports for backward compatibility, zero circular dependencies.

Remaining Open Items

Finding	Sev	Status
F32	P2	NOT FIXED — PR description is stale. Should reflect the 5-module split, current file sizes, and scenario counts. Non-blocking.
F33	P3	`devcontainer_health.py` at 91% — Acceptable for defensive concurrency handlers. Non-blocking.
F28-R	P3	Pre-existing `resource.py`/`plan_lifecycle_service.py` violations. Not caused by this PR. Non-blocking.

PM Action Items

PM Item	Status
"Verify F25-F31 resolutions, re-review — Mar 12"	Done early — Verified and APPROVED on Mar 9
"4 open REQUEST_CHANGES"	My latest is APPROVE. @hamza.khyari's 2 older REQUEST_CHANGES (#2010, #2011) are from Round 1 — they should re-review or dismiss given the extensive fixes since then.
"P0 CI failures"	Resolved — All nox sessions pass
"Rebase required"	Done — Branch rebased, 1 commit behind, mergeable

Merge Readiness Assessment

From my perspective this PR is ready to merge once:

@hamza.khyari re-reviews or dismisses their stale REQUEST_CHANGES
@CoreRasurae updates the PR description (F32, P2 — quick fix)
Any final 1-behind-master merge is clean

The code quality is solid. The 5-module refactor addressed the major structural concern. All quality gates pass.

## Reviewer Response — @brent.edwards Responding to PM compliance audit (#57067), PM escalation (#57086), PM status checks (#56523, #56792), and @CoreRasurae's fix response (#56678). ### Review Status The PM compliance audit (#57067) lists 4 open REQUEST_CHANGES reviews and says "NOT READY". To clarify: - **Formal APPROVE review #2067** posted today (Mar 9) alongside detailed verification comment #57193. - My APPROVE supersedes my previous REQUEST_CHANGES reviews (#2019, #2052). All P0 and P1 findings from Rounds 1-3 are **verified fixed**. - Full nox verification matrix passed: lint, typecheck, 9,281 scenarios, integration tests, 97% coverage, security scan. ### Luis's Fix Verification (responding to #56678) @CoreRasurae — Your fixes in commit `a1608886` are all verified correct. Thank you for the thorough response and clean implementation: | Finding | Verification | |---------|-------------| | F25 (P0) Robot CI failures | **FIXED** — cleanup.robot validates both `cleanup_models.py` classes and `cleanup_service.py` re-exports | | F26 (P1) cleanup.robot broken | **FIXED** — See F25 | | F27 (P1) 160 commits behind | **FIXED** — Branch rebased, now 1 behind master, `mergeable: true` | | F28 (P2) File size violations | **FIXED** — `devcontainer.py` split 926→201 lines across 5 clean modules. All under 500. Feature file split into 4 domain files. | | F29 (P2) type:ignore in Robot | **FIXED** — Inline `_MockResult`/`_MockRunner` classes replace the conditional import | | F30 (P2) Inline imports | **FIXED** — All 13 moved to file top | | F31 (P2) Auto-discovery docs | **FIXED** — Clear "Not yet wired" callout with "(Planned)" labels | The 5-module split is particularly well done — clean single-responsibility per module, proper `__all__` re-exports for backward compatibility, zero circular dependencies. ### Remaining Open Items | Finding | Sev | Status | |---------|-----|--------| | F32 | P2 | **NOT FIXED** — PR description is stale. Should reflect the 5-module split, current file sizes, and scenario counts. Non-blocking. | | F33 | P3 | `devcontainer_health.py` at 91% — Acceptable for defensive concurrency handlers. Non-blocking. | | F28-R | P3 | Pre-existing `resource.py`/`plan_lifecycle_service.py` violations. Not caused by this PR. Non-blocking. | ### PM Action Items | PM Item | Status | |---------|--------| | "Verify F25-F31 resolutions, re-review — Mar 12" | **Done early** — Verified and APPROVED on Mar 9 | | "4 open REQUEST_CHANGES" | My latest is APPROVE. @hamza.khyari's 2 older REQUEST_CHANGES (#2010, #2011) are from Round 1 — they should re-review or dismiss given the extensive fixes since then. | | "P0 CI failures" | **Resolved** — All nox sessions pass | | "Rebase required" | **Done** — Branch rebased, 1 commit behind, mergeable | ### Merge Readiness Assessment From my perspective this PR is **ready to merge** once: 1. @hamza.khyari re-reviews or dismisses their stale REQUEST_CHANGES 2. @CoreRasurae updates the PR description (F32, P2 — quick fix) 3. Any final 1-behind-master merge is clean The code quality is solid. The 5-module refactor addressed the major structural concern. All quality gates pass.

CoreRasurae force-pushed feature/m6plus-devcontainer-lifecycle from a160888632

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / security (pull_request) Successful in 34s

Details

CI / typecheck (pull_request) Successful in 44s