feat(devcontainer): add lazy activation and lifecycle management #613
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Blocks
#514 feat(devcontainer): add lazy activation and lifecycle management
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core!613
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m6plus-devcontainer-lifecycle"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements full lifecycle management for
devcontainer-instanceresources: lazy activation on first tool target, health checking, manual stop/rebuild CLI commands, and automatic cleanup on session close or plan completion.Closes #514
Changes
Domain Model (
container_lifecycle.py— new, 249 lines)ContainerLifecycleStateenum:detected,building,running,stopping,stopped,failedVALID_TRANSITIONSmapping with validated state machineContainerLifecycleTrackerPydantic v2 model withsession_idfor scoped cleanuptransition_state()with history cap (100), automatic clearing ofcontainer_id/workspace_pathon terminal statesHandler (
devcontainer.py— new, 794 lines)activate_container(): lazy activation viadevcontainer upwith JSON output parsing, orphan cleanup on timeout, auto-starts health check on successstop_container(): idempotent stop (early return on terminal state), stops health check before lock to avoid deadlockrebuild_container(): re-activates stopped/failed containersstart_health_check()/_health_check_loop(): background daemon thread with configurable interval, TOCTOU-safe transitions under lockstop_all_active_containers(): session-scoped bulk stop with terminal tracker evictionevict_terminal_trackers(): caps registry at 200 terminal-state entries, evicts oldest first_parse_devcontainer_up_output(): reverse-line JSON scanning, hex container ID validation, absolute workspace path validationDevcontainerHandlerclass: extendsBaseResourceHandlerwithsnapshotsandbox strategyCLI Commands (
resource.py— modified)agents resource stop <name> [--yes]: transitionsrunning→stopping→stoppedagents resource rebuild <name> [--yes]: transitionsstopped/failed→building→running_STOPPABLE_TYPES(devcontainer-instance, container-instance),_REBUILDABLE_TYPES(devcontainer-instance only)Cleanup Wiring
AcpLocalFacade._handle_session_close()→ session-scoped container cleanup (works even without session service)PlanLifecycleService.complete_apply()/fail_apply()/fail_execute()/cancel_plan()→ plan-scoped container cleanupCleanupService.stop_active_devcontainers()→ thin static wrapper delegating to handlerSpecification Updates (
specification.md)DevcontainerHandlerin 3 locationsactivation_stateenum field todevcontainer-instancetype definitionDocumentation (
devcontainer_resources.md— new, 264 lines)Testing
BDD Coverage Highlights (668-line feature file, 93 scenarios)
Known Limitations
sandbox_strategycontainer_snapshot; implementation usessnapshot. Internal spec inconsistency — tables elsewhere also saysnapshot.Spec References
specification.md§33253–33263 (activation_state field)specification.md§23509–23527 (handler definition)specification.md§24876–24877 (sandbox strategy)ISSUES CLOSED: #514
Review: feat(devcontainer): add lazy activation and lifecycle management
Commit:
196e1415| Issue: #514 | 16 files, +4574 linesBUG-1 (Major) —
model_copy(update=...)bypasses Pydantic validators throughoutSeverity: Major
Category: BUG
File:
container_lifecycle.py:249,devcontainer.py:175,236-237transition_state()andactivate_container()usetracker.model_copy(update={...})to create new tracker instances.model_copydoes NOT re-run field validators. Confirmed by instrumented test:model_copy(update={"health_check_interval": 1.0})succeeds despite the field havingge=10constraint.Currently safe because the code never passes externally-sourced values through
model_copyfor validated fields. But this is a latent bypass — any future code that doestracker.model_copy(update={"health_check_interval": user_input})silently violates thege=10, le=3600constraint. Usemodel_validate(tracker.model_dump() | updates)for any path that updates validated fields, or add a comment documenting the intentional bypass for internal-only fields.BUG-2 (Major) — Stale
container_idpersists throughBUILDINGtransitionSeverity: Major
Category: BUG
File:
container_lifecycle.py:238-248transition_state()clearscontainer_idandworkspace_pathon transitions toFAILEDandSTOPPED, but NOT on transition toBUILDING. Duringactivate_container, between theBUILDINGtransition (line 177-182) and theRUNNINGtransition (line 239-244), the tracker in the registry carries the oldcontainer_idfrom the prior lifecycle. Any code that reads the tracker duringBUILDING(e.g., a health check that hasn't been stopped yet) could see a stale container_id pointing to a defunct container.Recommendation: Clear
container_idandworkspace_pathon transition toBUILDINGas well:BUG-3 (Medium) —
list_active_containers_for_sessionstops other sessions' containersSeverity: Medium
Category: BUG
File:
devcontainer.py:641-642Containers with
session_id=None(unscoped) are returned for ANY session cleanup. If session A closes and callsstop_all_active_containers(session_id="A"), it will stop all unscoped containers, which might be in use by session B. The docstring documents this as intentional, but the first session to close wins all unscoped containers.Recommendation: At minimum, log a warning when stopping an unscoped container during session-scoped cleanup so operators know cross-session impact occurred.
BUG-4 (Medium) —
ContainerLifecycleTracker.transitionsis a mutable list on a non-frozen modelSeverity: Medium
Category: BUG
File:
container_lifecycle.py:147,152The tracker is NOT frozen (
model_config = ConfigDict(str_strip_whitespace=True)), andtransitionsislist[LifecycleTransition]. Any code holding a tracker reference can mutate the list in-place:tracker.transitions.append(fake_transition). Since trackers are stored in a shared global registry, in-place mutation from one thread is visible to others without lock protection.The current code avoids this by always creating new lists via
[*tracker.transitions, transition], but this invariant is not enforced. Recommendation: Either freeze the model or usetuple[LifecycleTransition, ...]fortransitions.BUG-5 (Minor) —
rebuild_containerdoes not forwardsession_idSeverity: Minor
Category: BUG
File:
devcontainer.py:453-457rebuild_containerdelegates toactivate_containerbut never passessession_id. If a container is rebuilt from a different session, the session_id on the tracker won't update. The rebuilt container may be cleaned up by the old session's close hook rather than the new one's.SPEC-1 (Major) —
activation_stateenum names diverge from Issue #514Severity: Major
Category: SPEC
File:
container_lifecycle.py:49-72,docs/specification.mdThe spec (updated by this PR) uses
detected/building/running/stopping/stopped/failed. Issue #514 acceptance criteria useinactive/starting/active/stopping/stopped/error. These are completely different vocabularies. Onlystoppingandstoppedmatch. This should be reconciled — either update the issue or document the intentional rename.SPEC-2 (Minor) — Module path inconsistency: singular vs plural
resourceSeverity: Minor
Category: SPEC
File:
docs/specification.mdThe PR changes the devcontainer handler path to
cleveragents.resource.handlers.devcontainer(singular). Every other handler in the spec usescleveragents.resources.handlers.*(plural). Either the other handlers' paths are wrong or this one is.SPEC-3 (Minor) — Health checking and session-scoped cleanup absent from spec
Severity: Minor
Category: SPEC
Issue #514 requires health checking (periodic liveness probes) and automatic cleanup on session close / plan completion. The spec has no mention of either for container resources (health checking only exists for LSP servers). These features should be added to the spec or flagged as implementation-ahead-of-spec.
SEC-1 (Medium) —
workspace_foldernot sanitized beyondos.path.isabs()Severity: Medium
Category: SEC
File:
devcontainer.py:165-166activate_containervalidatesworkspace_folderis absolute but does not canonicalize it. A path like/tmp/../etc/devcontainer.jsonpasses validation. This path is passed directly tosubprocess.run(["devcontainer", "up", "--workspace-folder", workspace_folder]). Whileshell=Trueis not used, the devcontainer CLI will operate on any directory the process can access.Recommendation: Canonicalize with
os.path.realpath()and optionally restrict to known project directories.SEC-2 (Minor) — Health probe uses hardcoded fallback path
Severity: Minor
Category: SEC
File:
devcontainer.py:532_single_probeusestracker.workspace_path or "/workspace". Ifworkspace_pathis None (tracker in unexpected state), the probe runs against/workspaceinside an arbitrary container, which could be a different container's workspace.PERF-1 (Minor) —
_handlers()rebuilds dispatch dict on every ACP requestSeverity: Minor
Category: PERF
File:
acp/facade.py:194-208_route_operationcallsself._handlers()which builds a new dict on every dispatch. For a hot path (every ACP request), this should be a cached property or class-level constant.TEST-1 (Major) — Concurrency tests lack
threading.Barrierand post-condition invariantsSeverity: Major
Category: TEST
File:
features/devcontainer_lifecycle.feature:468,558Both concurrency scenarios ("Concurrent activation rejected" and "Concurrent stop handled safely") start threads sequentially without a
threading.Barrier. The first thread may complete before the second starts, making these sequential tests in disguise. Compare withtool_runtime_steps.py:229in this codebase which properly usesthreading.Barrier.The concurrent stop test also does not assert: (a) exactly 1
docker stopcall was made (not 0 or 2), or (b) the final registry state is exactly 1 tracker instoppedstate.TEST-2 (Medium) — 5 sham/structural-only test scenarios
Severity: Medium
Category: TEST
hasattrcheck onlyissubclasscheckThese test structural properties, not behavior. The first three would fail at import time if violated.
TEST-3 (Medium) — Coverage gaps:
list_active_containers_for_sessionuntested directlySeverity: Medium
Category: TEST
File:
devcontainer.py:619-643list_active_containers_for_sessionhas no direct test scenario. Its return value is never asserted. The empty-session-id fallback andsession_id is Nonematching logic are not explicitly verified.Also untested:
evict_terminal_trackerswhen below threshold (should return 0),get_lifecycle_trackerauto-creation behavior,clear_lifecycle_registrythread-join behavior.TEST-4 (Minor) — Mock does not validate subprocess kwargs
Severity: Minor
Category: TEST
File:
features/mocks/mock_devcontainer_cli.py:65MockDevcontainerRunner.__call__records but never validates kwargs (capture_output,text,timeout,check). A bug wherecheck=Trueis set (causingCalledProcessErrorin production) would pass all tests. The mock also returns the same container ID for all workspaces, preventing detection of workspace-mismatch bugs.CODE-1 (Minor) —
vulture_whitelist.pywhitelists private symbolsSeverity: Minor
Category: CODE
File:
vulture_whitelist.py:538-545The whitelist includes private symbols (
_stop_health_check,_health_check_loop,_single_probe,_lifecycle_registry,_registry_lock, etc.). Private symbols should not need vulture whitelisting — if vulture flags them as unused, they may be dead code or test-only utilities that should be restructured.196e1415c0ddcc22608eupdate
Review — PR #613
feature/m6plus-devcontainer-lifecycleReviewer: brent.edwards | Round: 2 (complementing hamza.khyari's Round 1 review #2010) | Commit:
870ddb31Quality Gates
nox -s lintnox -s typecheckNote on Branch Freshness
This branch is forked from
a60cda1fand is ~140 files behind current master. Thegit diff origin/master..HEADshows those files as "deletions" but they are NOT part of the commit — a merge would correctly retain master's additions. However, I recommend rebasing before merge to avoid surprises and ensure quality gates run against the latest codebase.Findings Summary
devcontainer.py:121-215activate_container— concurrent calls read+transition without holding the lock across the full operation, spawning duplicate containersdevcontainer.py:345-350_stop_health_checksets the stop event but never joins the thread — health check loop can race with stop/activate transitions## Unreleasedcleanup_service.py:510-527stop_active_devcontainersmethod exists but is not wired into session close or plan completion hooks. PR description claims: "AcpLocalFacade._handle_session_close() → session-scoped container cleanup" and "PlanLifecycleService.complete_apply()/fail_apply()/..." but these hooks don't exist in the code.devcontainer.py:58-59, 330-350_health_check_threadsand_health_check_stop_eventsare plain dicts accessed from multiple threads (health check background threads + main thread) without_registry_lockor a separate lockdetected/building/running/failedbut code usesinactive/starting/active/error; claims 794-line handler (actual: 489), 668-line feature file with 93 scenarios (actual: 290 lines). Misleads reviewers.devcontainer_lifecycle_steps.pycleanup_service.pydevcontainer_lifecycle_steps.py# type: ignoreinline suppressions — CONTRIBUTING prohibits thesedevcontainer.py:284-311rebuild_containeris a pass-through toactivate_containerwith no actual rebuild logic — does not pass--reset-containeror--no-cachetodevcontainer up, so it's functionally identical to activatecleanup_service.py:523if TYPE_CHECKING:blocks are exempt)Verdict
REQUEST_CHANGES — 4 P1 findings. The TOCTOU race (F1) and thread join omission (F2) are correctness bugs that will manifest under concurrent usage. The missing CHANGELOG (F3) and unwired cleanup hooks (F4) are process/spec gaps. The stale PR description (F6) needs updating to match the actual implementation.
What Went Well
threading.Eventfor health check cancellation@ -0,0 +1,1720 @@"""Step definitions for devcontainer_lifecycle.feature.F7 [P2 · Policy] — This file is 739 lines, exceeding the 500-line limit per CONTRIBUTING. Consider splitting into separate step files (e.g.,
devcontainer_activation_steps.py,devcontainer_health_check_steps.py,devcontainer_cleanup_steps.py).F9 [P2 · Policy] — 9
# type: ignoreinline suppressions at lines 336, 492, 501, 511, 570, 584, 598, 631, 731. CONTRIBUTING prohibits these. Most are passing wrong types to test error paths — use explicit type-correct wrappers orcast()to avoid the suppressions.@ -507,0 +507,4 @@# ── Devcontainer lifecycle cleanup (issue #514) ─────────────@staticmethoddef stop_active_devcontainers(session_id: str = "") -> list[str]:F4 [P1 · Spec] —
stop_active_devcontainersis defined here but not wired into any session close or plan completion hook. The PR description claims wiring intoAcpLocalFacade._handle_session_close()andPlanLifecycleService.complete_apply()/fail_apply()/fail_execute()/cancel_plan(), but none of these hooks exist in the code. This method is dead code in the current implementation.F11 [P2 · Policy] — Line 523: inline
fromimport inside a function body. CONTRIBUTING requires all imports at file top exceptif TYPE_CHECKING:blocks. Move this import to the top of the file.@ -28,0 +55,4 @@#: S1 fix: regex pattern for valid Docker container IDs (12-64 hex chars)._CONTAINER_ID_PATTERN = re.compile(r"^[a-f0-9]{12,64}$")#: Registry of lifecycle trackers keyed by resource_id.F5 [P2 · Bug] —
_health_check_threadsand_health_check_stop_eventsare plain dicts mutated bystart_health_check(main thread),_stop_health_check(main thread or stop_all_active_containers), andclear_lifecycle_registry. These mutations happen without holding_registry_lockor any other lock. If a health check thread is being started while another is being stopped for the same resource, dict mutations can race.Fix: either protect these dicts with
_registry_lockor introduce a dedicated_health_check_lock.@ -28,0 +146,4 @@resource_id: ULID of the devcontainer resource.workspace_folder: Path to the workspace containing devcontainer config.run_command: Optional callable replacing subprocess.run for testing.session_id: Optional session/plan ID to associate with this containerF1 [P1 · Bug] — TOCTOU race in
activate_container. The function callsget_lifecycle_tracker()at line 149 (which acquires/releases_registry_lock), thentransition_state()+set_lifecycle_tracker()at lines 152-157 (separate lock acquisition), then spawnsdevcontainer upwithout any lock held. Two concurrent calls for the sameresource_idwill both see state=inactive, both transition to STARTING, and both spawndevcontainer up.Fix: either (a) hold
_registry_lockacross the read-check-transition-spawn sequence (simplest but blocks other resources), or (b) use per-resource locks so only one activation per resource can proceed at a time.@ -28,0 +304,4 @@tracker = get_lifecycle_tracker(resource_id)tracker = transition_state(tracker,ContainerLifecycleState.FAILED,F10 [P2 · Code] —
rebuild_containersimply delegates toactivate_containerwith no rebuild-specific behavior. A real rebuild should pass--reset-containeror--no-cachetodevcontainer upto force image/container recreation. As implemented, this is indistinguishable from a re-activate after stop, which undermines the CLI distinction betweenresource stop && resource rebuildvs. just re-activating.@ -28,0 +342,4 @@Raises:ValueError: If the container is not in ``running`` state.RuntimeError: If stop fails.F2 [P1 · Bug] —
_stop_health_checksets the event and removes the thread from the dict, but never callsthread.join(). The health check loop at line 353 may still be executing — it could callset_lifecycle_tracker()after the stop transition completes, writing stale state back to the registry.Fix: call
thread.join(timeout=...)after setting the stop event, before proceeding with the stop transition.Supplementary Findings — PR #613 (Second Pass)
Following review #2019, a thorough second pass uncovered two additional P2 findings:
F12 [P2 · Bug] —
stop_containerignoresdocker stopexit codestop_container(lines 253-270) callsrunner(["docker", "stop", container_id], ..., check=False)but discards the return value without checkingresult.returncode. Ifdocker stopfails (e.g., container already removed, Docker daemon unresponsive with a non-exception timeout), the function still transitions the tracker toSTOPPED.Consequence: the container may still be running while the tracker says
stopped. A subsequentactivate_container/rebuild_containerwould start a second container, leaking the first one.Fix: capture the result and check
returncode. If non-zero, transition toERRORinstead ofSTOPPED, or at least log a warning.File:
src/cleveragents/resource/handlers/devcontainer.py:257F13 [P2 · Docs] —
devcontainer_resources.mdclaims automatic session/plan cleanupdocs/reference/devcontainer_resources.mdlines 152-160 state:This isn't true in the implementation.
CleanupService.stop_active_devcontainers()exists as a method but is not wired into any session-close or plan-completion hook (as noted in F4). The documentation should either be marked as "planned for future" or removed until the wiring is implemented.File:
docs/reference/devcontainer_resources.md:152-160Updated totals: 4 P1 + 9 P2 = 13 findings (F1-F13).
Supplementary Findings — PR #613 (Third Pass)
After reading all posted reviews and re-checking the code, I found two additional P1 bugs that are not covered in prior reviews:
F14 [P1 · Bug] —
devcontainer upoutput parsing fails with real CLI outputactivate_container()assumesstdoutis pure JSON, but the real devcontainer CLI prints log lines plus a JSON line at the end. The official CLI README example shows log output followed by a JSON object on the last line. Current parser does:If stdout contains any non-JSON log lines before the JSON block,
json.loadsfails and returns(None, None). The tracker is still transitioned toACTIVEwithcontainer_id=None, and stop/health check logic breaks.Fix: parse the last JSON line (scan from the end) or add
--output jsonand parse only JSON. This is explicitly called out in the PR description but not implemented.File:
src/cleveragents/resource/handlers/devcontainer.py:467F15 [P1 · Bug] — Health checks pass remote workspace path to
devcontainer exec_parse_devcontainer_up_output()storesremoteWorkspaceFolderintotracker.workspace_path. That value is inside the container (e.g.,/workspaces/project). But the CLI expects host path for--workspace-folder:In
_health_check_loop, the command is:If
workspace_pathis the remote container path (or the fallback/workspace),devcontainer execwill not find the devcontainer config on the host and will fail. This causes health checks to spuriously mark containersERRORin normal usage.Fix: store the host workspace folder (the original
workspace_folderargument) on the tracker and use that fordevcontainer exec. Keep the remote path separately if needed.File:
src/cleveragents/resource/handlers/devcontainer.py:372-374Updated totals: 6 P1 + 9 P2 = 15 findings (F1-F15).
Inline Notes — F14/F15
Adding inline comments for the two new P1 findings (F14, F15) described in comment #55338.
@ -28,0 +371,4 @@tracker.current_state.value,)return trackertracker = transition_state(F15 [P1 · Bug] — Health checks call
devcontainer exec --workspace-folderusingtracker.workspace_path, which is set toremoteWorkspaceFolder(container path) fromdevcontainer up. The CLI expects the host workspace folder path, so this fails in normal usage and marks containers ERROR.Fix: store the original host
workspace_folderon the tracker and use that fordevcontainer exec(keep remote path separately if needed).@ -28,0 +464,4 @@# associated with the calling session for scoped cleanup.return activate_container(resource_id,workspace_folder,F14 [P1 · Bug] —
json.loads(stdout)assumes stdout is pure JSON. The devcontainer CLI prints log lines plus a trailing JSON line (see official CLI README). With real output, this parse fails and yields(None, None), but the tracker still transitions to ACTIVE.Fix: parse the last JSON line (scan from end) or run
devcontainer up --output jsonand parse only JSON.Additional Findings — PR #613 (Follow-up)
Found three additional P1 issues after re-scanning the full diff. Inline notes below.
DevcontainerHandler.resolve(no call toactivate_container). Feature is defined but unused in production code.start_health_checkis defined but never invoked after activation.stop_all_active_containersignoressession_id(used only for logging) and stops all active containers.@ -28,0 +194,4 @@"up","--workspace-folder",workspace_folder,],F17 [P1 · Bug] — Health checks never start. After a successful activation, the code transitions to
ACTIVEbut does not callstart_health_check(). A grep showsstart_health_checkis only defined, never invoked. This means the required periodic liveness probes never run.Fix: call
start_health_check(resource_id, interval=tracker.health_check_interval)immediately after transitioning to ACTIVE (and/or from the tool invocation path).@ -28,0 +430,4 @@)set_lifecycle_tracker(tracker)raise RuntimeError(f"Failed to stop container: {exc}") from excF18 [P1 · Bug] — Session cleanup is not scoped.
stop_all_active_containers()acceptssession_idbut uses it only for logging; it always callslist_active_containers()(global) and stops all active containers. Issue #514 expects cleanup of containers associated with that session.Fix: track
session_idonContainerLifecycleTracker(or another mapping) and filter to session-scoped containers here.@ -28,0 +473,4 @@def start_health_check(resource_id: str,*,interval: float = DEFAULT_HEALTH_CHECK_INTERVAL,F16 [P1 · Bug] — Lazy activation is not wired.
DevcontainerHandlerdoesn’t overrideresolve()or callactivate_container, andactivate_container()isn’t referenced anywhere else insrc/. This means devcontainer resources are resolved viaBaseResourceHandler.resolve()with no lazy activation on first tool target — the primary behavior in issue #514 is effectively unused in production code.Fix: override
DevcontainerHandler.resolve()(or the tool execution path) to callactivate_container()when the resource isdevcontainer-instanceand ininactive/stopped/errorstate.Additional Finding — PR #613 (Follow-up)
One more P2 issue found on another pass. Inline note below.
@ -1061,0 +1151,4 @@raise typer.Abort() from excexcept (ValueError, RuntimeError) as exc:console.print(f"[red]Stop failed:[/red] {exc}")raise typer.Abort() from excF19 [P2 · Bug] — These commands advertise support for
container-instance, butresource add container-instancetypically sets no location (only--image). In this path,resource_rebuildaborts with "no location for rebuild" andresource_stopfails because no lifecycle tracker exists (devcontainer activation isn’t wired). This effectively makes stop/rebuild non-functional for container-instance resources.Fix: either restrict to
devcontainer-instanceonly, or require--pathfor container-instance and implement activation/lifecycle for that type.Additional Finding — PR #613 (Follow-up)
Found one more P1 issue on another scan. Inline note below.
@ -28,0 +50,4 @@logger = logging.getLogger(__name__)#: Default health check interval in seconds.DEFAULT_HEALTH_CHECK_INTERVAL: float = 30.0F20 [P1 · Bug] — Lifecycle state is stored only in an in-memory module dict (
_lifecycle_registry). This means state is process-local: a newagentsCLI invocation cannot see trackers created in a previous process. As a result,agents resource stop/rebuildcannot stop containers started by earlier commands (it will create a new INACTIVE tracker and fail the transition). This also violates the acceptance criterion: “Track container lifecycle state transitions in the resource registry.”Fix: persist lifecycle state in the Resource registry (DB) or another shared store (e.g.,
resource.metadata_json) so state survives process boundaries. At minimum, storecontainer_id,current_state, andworkspace_folderper resource and hydrate the tracker on CLI operations.Additional Finding — PR #613 (Follow-up)
Found one more P1 compatibility bug. Inline note below.
@ -28,0 +254,4 @@start_health_check(resource_id, run_command=run_command)except RuntimeError:raiseF21 [P1 · Bug] —
stop_containerhardcodesdocker stop. Devcontainer CLI can target Podman or other engines; this will fail in non-Docker environments (and conflicts with ADR-039 engine-agnostic container types).Fix: use the devcontainer CLI (if/when stop is supported), or capture the engine from resource properties and dispatch to
podman stop/docker stopaccordingly.Additional Findings — PR #613 (Continued Scan)
Two more issues found on another pass:
F22 [P1 · Bug] — Devcontainer resources cannot resolve: snapshot sandbox not implemented
DevcontainerHandlerusesSandboxStrategy.SNAPSHOT(seesrc/cleveragents/resource/handlers/devcontainer.py:488), butSandboxFactory.create_sandbox()explicitly raisesNotImplementedErrorforsnapshot(seesrc/cleveragents/infrastructure/sandbox/factory.py:66,119).Result: any attempt to resolve a
devcontainer-instance(orcontainer-instanceusing the same handler) will fail at runtime withNotImplementedError. This blocks actual use of the resource type even if lifecycle bugs are fixed.Fix: implement snapshot strategy in
SandboxFactoryor route devcontainer resources to a supported sandbox strategy.F23 [P2 · Process/Docs] — Auto-discovery is documented but not wired
docs/reference/devcontainer_resources.md:60-80says linking agit-checkout/fs-directorytriggers auto-discovery and registersdevcontainer-instance+devcontainer-filechildren. I can’t find any code path that invokesdiscover_devcontainers()orauto_discover_children()duringproject link-resource.link_resourcesimply creates the DB link and returns. Also, the built-in type’sauto_discoveryconfig doesn’t match the schema expected byauto_discover_children(it expectsenabled+rules).Either the docs need to be updated to mark this as future work, or the auto-discovery hook needs to be implemented.
If you want, I can add inline notes for these as well.
Additional Finding — PR #613 (Continued Scan)
F24 [P1 · Bug] — Container execution is stubbed, so lazy activation can’t trigger on tool use
Issue #514’s acceptance criteria say “Devcontainer instances remain inactive until a tool execution targets them. At that point, the container is started automatically.” But the tool execution path currently hard‑fails whenever the execution environment resolves to
container.In
src/cleveragents/tool/runner.py:168-178:This means a tool invocation can never reach a devcontainer handler, so lazy activation can’t occur in production. Even if
DevcontainerHandler.resolve()were wired to callactivate_container, the ToolRunner returns early before any container-handling logic runs.Fix: wire container execution (and call into devcontainer activation) in the ToolRunner path, or explicitly scope #514 to CLI-only lifecycle management and update the acceptance criteria/docs accordingly.
Additional Finding — PR #613 (Follow-up)
Found one more P2 issue on another scan. Inline note below.
@ -28,0 +485,4 @@"""if not resource_id:raise ValueError("resource_id must not be empty")F25 [P2 · Bug] —
DevcontainerHandlerhardcodes_default_strategy = SNAPSHOT, which overrides the resource type’s declared sandbox strategy. This meansdevcontainer-fileresources (type default =copy_on_write) will still try to usesnapshotbecauseresource.sandbox_strategyis always None when loaded from DB.Impact: devcontainer-file resolution will use the wrong strategy (and currently fails outright because snapshot isn’t implemented). Even after snapshot is implemented, devcontainer-file would still be sandboxed incorrectly.
Fix: choose strategy based on the resource type default (e.g., look up the ResourceTypeSpec and use its
sandbox_strategy), or overrideresolve()inDevcontainerHandlerto branch onresource.resource_type_name.Additional Finding — PR #613 (Follow-up)
Found one more P2 issue on another scan. Inline note below.
@ -28,0 +182,4 @@tracker = transition_state(tracker,ContainerLifecycleState.BUILDING,reason="lazy activation",F26 [P2 · Bug] —
activate_containerdoes not validate thatcontainerIdandremoteWorkspaceFolderwere actually parsed. If the JSON is missing these fields (or outcome iserror), the tracker still transitions toACTIVEwithcontainer_id=Noneandworkspace_path=None, causingstop_containerto skipdocker stopand health checks to misbehave.Fix: treat missing
containerId/remoteWorkspaceFolder(oroutcome != success) as activation failure, transition to ERROR, and raise RuntimeError.PM Status Check — PR #613
Author: @CoreRasurae | Milestone: v3.6.0 (M7) | Mergeable: Yes | Reviews: REQUEST_CHANGES (Brent)
Current State
Brent has provided an exceptionally thorough review across multiple rounds:
Total: ~8 P1 findings and ~12 P2 findings across all rounds.
Concern
@CoreRasurae — There has been no author response to any of Brent's findings. The review was posted ~8 hours ago. This PR has substantial rework needed on the P1 items before it can be considered for merge.
Given this is M7 (v3.6.0, due Mar 28), there is no schedule pressure. However, leaving a thorough review unacknowledged for too long discourages reviewers. Please:
PR #616 (container-aware tool exec) is blocked on this PR. Resolving the P1s here unblocks that dependency chain.
Action Items
ddcc22608ebbeef00b1aReview Status Summary — PR #613 (Devcontainer Lifecycle)
feature/prefix. Correct for feature work per CONTRIBUTING.md.Active review in progress. 15+ findings noted across review cycles. Author (@CoreRasurae) is responding to feedback. Continue addressing findings.
Review — PR #613
feature/m6plus-devcontainer-lifecycle(Round 3)Reviewer: brent.edwards | Commit:
bbeef00b| Issue: #514Prior reviews: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups F12–F24 (Round 2)
Quality Gates
nox -s lintnox -s typechecknox -s unit_testsPrior Findings Verification
Good progress — most of my Round 2 P1/P2 findings have been addressed:
activate_container_stop_health_checknever joins threadthread.join(timeout=2.0)added_health_check_threadsunprotected_registry_lockrebuild_containeridentical to activate--reset-containerviarebuild=Truecleanup_service.pystop_all_active_containersimported at module top_parse_devcontainer_up_outputscans from last linehost_workspace_folderparameterresolve()DevcontainerHandler.resolve()callsactivate_containerNot all prior findings could be verified due to the restructured code, but the major bugs have been addressed.
New Findings
robot/cleanup.robot:20-26CleanupReport,StaleItem,ResourceCleanupSummarymoved tocleanup_models.pybut Robot test still readscleanup_service.pyand assertsShould Contain ${content} class CleanupReporta60cda1fis far from current master (620adfee). BDD scenario count is 8,208 vs master's 9,029. Quality gates pass on the branch but don't validate against current master code.devcontainer.py(926),resource.py(1,233),plan_lifecycle_service.py(1,499),devcontainer_lifecycle.feature(762)robot/helper_devcontainer_lifecycle.py:48,49,53,61# type: ignoresuppressions — CONTRIBUTING prohibits inline# type: ignore. The conditional import pattern can be refactored to avoid these.devcontainer_lifecycle_steps.py(4),devcontainer_cleanup_steps.py(5),devcontainer_health_check_steps.py(4). CONTRIBUTING requires all imports at file top (onlyif TYPE_CHECKING:exempt).devcontainer_resources.md:60-80F25 — CI Integration Test Failures (P0)
The CI run at https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1307/jobs/5/attempt/1 shows 2 failing Robot test suites:
Failure 1 —
Robot.Cleanup(1 of 9 failed):Cleanup Service Exports Expected Symbolsfails because this PR extractedCleanupReport,StaleItem, andResourceCleanupSummaryfromcleanup_service.pyinto the newcleanup_models.py. The Robot test atrobot/cleanup.robot:20-26reads the content ofcleanup_service.pyand asserts:Since these classes now live in
cleanup_models.py, the assertions fail. The CI log confirms the fullcleanup_service.pycontent is dumped and does not containclass CleanupReport.Fix: Update
robot/cleanup.robotto checkcleanup_models.pyfor the extracted classes, or check thatcleanup_service.pyre-exports them.Failure 2 —
Robot.Devcontainer Handler(1 of 10 failed):DevcontainerHandler Default Strategyfails with1 != 0(exit code 1). The helper atrobot/helper_devcontainer_handler.py:78assertshandler._default_strategy.value == "snapshot". This likely fails due to an import or attribute error when running against the merged codebase in CI. This needs investigation — it may be related to the branch being far behind master.F26 — Cleanup Robot Test Broken (P1)
See F25, Failure 1. The
robot/cleanup.robottest was not updated to reflect the extraction of model classes intocleanup_models.py. This is a regression introduced by this PR.F28 — File Length Violations (P2)
CONTRIBUTING.md requires files to stay under 500 lines. Four files in this PR exceed the limit:
src/cleveragents/resource/handlers/devcontainer.pysrc/cleveragents/cli/commands/resource.pysrc/cleveragents/application/services/plan_lifecycle_service.pyfeatures/devcontainer_lifecycle.featureresource.pyandplan_lifecycle_service.pywere already over the limit before this PR, but this PR adds significantly to both.devcontainer.pyis a new file that should have been designed within the limit.Recommendation: Split
devcontainer.pyintodevcontainer_handler.py(theDevcontainerHandlerclass) anddevcontainer_lifecycle.py(registry, activation, stop, rebuild, health check functions). Split the feature file by domain (already done for step files).Verdict
REQUEST_CHANGES — 1 P0 (CI failures), 2 P1 (broken Robot test, stale branch), 5 P2 (file length,
# type: ignore, inline imports, stale docs, stale PR description).The P0/P1 items are blocking:
robot/cleanup.robottest must be updated to match the model extraction.The P2 items should be addressed or explicitly deferred:
# type: ignoreand inline imports are policy violations.What Went Well
rebuild_containernow sends--reset-containerflagos.path.realpath()threading.Barrierper project conventionPM Status Check — Day 29 (Escalation)
Author: @CoreRasurae | Milestone: v3.6.0 (M6+) | Mergeable: Yes | Reviews: REQUEST_CHANGES (Brent)
Current State — UNRESPONSIVE AUTHOR
Brent conducted a comprehensive 4-pass review totaling:
devcontainer upJSON parsing failure (F14), health check using wrong workspace path (F15), snapshot sandbox not implemented (F22), container execution stubbed (F24)Blocking Issue: NO AUTHOR RESPONSE
@CoreRasurae — There has been zero response to Brent's review findings since Day 26 (3 days ago). This violates the 24-hour review response SLA in CONTRIBUTING.md.
Previous PM escalation (comment #55513, Day 26) set deadlines:
Schedule Context
This is M6+ (v3.6.0, due Mar 28), so there is time. However, PR #616 (container tool exec) is blocked on this PR. The unresponsiveness pattern is concerning — Luis also has no response on PR #614.
Action Required
If no response by Mar 10 EOD, will escalate to @freemo for intervention.
Review Response — Round 3 (brent.edwards, Review #2052)
Commit:
6252d83aResolved
robot/cleanup.robotto checkcleanup_models.pyfor class definitions and verifycleanup_service.pyre-exports them by name; changed handler strategy assertion from"snapshot"to"none"inrobot/helper_devcontainer_handler.py(SandboxFactory has not implemented snapshot — the container itself is the sandbox).robot/cleanup.robotnow referencescleanup_models.pywhere the classes were extracted to.devcontainer.py(926 lines) → 5 modules, all under 500 lines:_devcontainer_internals.py(152, shared state/registry/parsing),devcontainer_health.py(206, background health checks),devcontainer_lifecycle.py(440, activate/stop/rebuild),devcontainer_cleanup.py(149, session cleanup/eviction),devcontainer.py(201, handler class + backward-compatible re-exports). No circular dependencies. Feature split:devcontainer_lifecycle.feature(762 lines) → 4 domain-specific files:devcontainer_state.feature(198),devcontainer_activation.feature(217),devcontainer_health_check.feature(126),devcontainer_cleanup.feature(237). All 104 scenarios preserved.resource.py(1233) andplan_lifecycle_service.py(1499) were not split — they exceeded 500 lines before this PR and are out of scope for this feature branch.# type: ignoresuppressions inrobot/helper_devcontainer_lifecycle.py; replaced with inline_MockResult/_MockRunnerclasses that don't require the features/mocks import.devcontainer_lifecycle_steps.py(4),devcontainer_cleanup_steps.py(5),devcontainer_health_check_steps.py(4).docs/reference/devcontainer_resources.mdas "(Planned)" with an admonition explaining thatdiscover_devcontainers()exists and is tested in isolation but is not wired into any production code path.Not addressed
Verification
_devcontainer_internals98%,devcontainer.py100%,devcontainer_cleanup100%,devcontainer_health92%,devcontainer_lifecycle99%bbeef00b1a581ac47543581ac47543a160888632PM Status Check — Day 29 (Update)
Author: @CoreRasurae | Milestone: v3.6.0 (Post-MVP) | Reviews: REQUEST_CHANGES (Brent, 4 passes, 15 findings)
Author Response Today
@CoreRasurae responded with resolutions for findings F25-F31 from Brent's 4th review pass. Added State/In Review and Priority/Medium labels.
Current State
Action Required
This is a Post-MVP PR so lower urgency than M3-M5 items, but Luis's responsiveness today is appreciated. Good to keep momentum.
PM Compliance Audit — PR #613
Auditor: PM (automated sweep) | Date: 2026-03-09 | Reference: CONTRIBUTING.md §Pull Request Process (items 1–12)
Checklist
Closes #N/Fixes #N)Closes #514present in body +ISSUES CLOSED: #514in footerType/FeaturepresentReview Status
Merge Readiness
NOT READY — Significant blockers remain:
Action items for @CoreRasurae:
robot/cleanup.robotto match model extractionPM Escalation — Rebase Required + CI Failures (Day 29)
@CoreRasurae This PR is ~160 commits behind master with P0 CI integration test failures (Robot tests broken by model extraction).
Required actions (in order):
robot/cleanup.robotto match theCleanupReport/StaleItem/ResourceCleanupSummarymodel extraction intocleanup_models.pyThere are 4 open REQUEST_CHANGES reviews that must all be resolved or dismissed before merge. This is an M6+ PR so schedule pressure is lower, but the stale branch is accumulating technical debt that makes each day's rebase harder.
Verification Pass — PR #613
feature/m6plus-devcontainer-lifecycle(Round 4/5)Reviewer: brent.edwards | Commit:
a1608886| Issue: #514Prior reviews: hamza.khyari #2010 (Round 1), brent.edwards #2019 + follow-ups (Round 2), brent.edwards #2052 (Round 3)
Nox Verification Matrix
nox -s lintnox -s typechecknox -s unit_testsnox -s integration_testsRobot.Devcontainer Lifecycle)nox -s coverage_reportnox -s security_scanwrapping.pyonly)Module-Level Coverage
_devcontainer_internals.pydevcontainer.pydevcontainer_cleanup.pydevcontainer_health.pydevcontainer_lifecycle.pydevcontainer_health.pyat 91% is below the 97% per-module threshold. The uncovered lines areexcept ValueErrorhandlers and concurrent-state-already-changed breaks — legitimate defensive paths that are hard to exercise without elaborate thread-racing setups. See P3 finding below.Round 3 Findings Verification
cleanup.robotnow checks symbol names (notclassprefix) + newCleanup Models Defines Data Classestest case. Strategy check updated to"none".cleanup_service.pyre-exports andcleanup_models.pyclass definitions.mergeable: true.devcontainer.py(926→201 lines) split into 5 modules all under 500 lines. Feature file (762 lines) split into 4 files (126–237 lines). Step files all under 500 lines (409–494). Residual:resource.py(1,233) andplan_lifecycle_service.py(1,663) remain over limit — both pre-existing violations, this PR adds +173 and +42 lines respectively.# type: ignorein Robot helper_MockResult/_MockRunnerclasses. Zero# type: ignorein the diff.> **Not yet wired (F31/F23):**callout added. All discovery sections labeled "(Planned)". Known Limitations table is consistent with the callout.devcontainer.py— new, 794 lines)" and "93 scenarios" — doesn't reflect the 5-module split, current file sizes, or scenario counts.New/Residual Findings
devcontainer_health.py:157,171-176,185,199-204devcontainer_health.pyat 91% (below 97% threshold). Uncovered lines areexcept ValueErrorhandlers in the health check loop — defensive concurrency guards that are hard to exercise without thread-racing test setups. Acceptable for a concurrent-exception guard pattern.resource.py,plan_lifecycle_service.pyFile Size Audit (All New/Modified Source Files)
_devcontainer_internals.pydevcontainer.pydevcontainer_health.pydevcontainer_lifecycle.pydevcontainer_cleanup.pycontainer_lifecycle.pydevcontainer_state.featuredevcontainer_activation.featuredevcontainer_health_check.featuredevcontainer_cleanup.featuredevcontainer_activation_steps.pydevcontainer_cleanup_steps.pydevcontainer_health_check_steps.pydevcontainer_lifecycle_steps.pyresource.py(pre-existing)plan_lifecycle_service.py(pre-existing)Verdict
APPROVE — All P0 and P1 findings from Rounds 1-3 are verified fixed. The code split is clean and well-structured with proper re-exports for backward compatibility. All nox quality gates pass. The only open items are:
devcontainer_health.pycoverage at 91% — acceptable for defensive concurrency handlersWhat Went Well
devcontainer.pywith comprehensive__all__preserves backward compatibilitycleanup_models.pyand the re-exports incleanup_service.py# type: ignoresuppressions remaining_registry_lockis properly applied across all modules_stop_health_checkprevents dangling background workAPPROVED — Verification Pass (Round 4/5)
All P0 and P1 findings from Rounds 1-3 are verified fixed. All nox quality gates pass (lint ✅, typecheck ✅, 9,281 BDD scenarios ✅, Robot integration ✅, 97% coverage ✅, security ✅).
Remaining open items are non-blocking:
devcontainer_health.pyat 91% coverage — acceptable for defensive concurrency handlersresource.pyandplan_lifecycle_service.pySee comment #57193 for full verification details.
Reviewer Response — @brent.edwards
Responding to PM compliance audit (#57067), PM escalation (#57086), PM status checks (#56523, #56792), and @CoreRasurae's fix response (#56678).
Review Status
The PM compliance audit (#57067) lists 4 open REQUEST_CHANGES reviews and says "NOT READY". To clarify:
Luis's Fix Verification (responding to #56678)
@CoreRasurae — Your fixes in commit
a1608886are all verified correct. Thank you for the thorough response and clean implementation:cleanup_models.pyclasses andcleanup_service.pyre-exportsmergeable: truedevcontainer.pysplit 926→201 lines across 5 clean modules. All under 500. Feature file split into 4 domain files._MockResult/_MockRunnerclasses replace the conditional importThe 5-module split is particularly well done — clean single-responsibility per module, proper
__all__re-exports for backward compatibility, zero circular dependencies.Remaining Open Items
devcontainer_health.pyat 91% — Acceptable for defensive concurrency handlers. Non-blocking.resource.py/plan_lifecycle_service.pyviolations. Not caused by this PR. Non-blocking.PM Action Items
Merge Readiness Assessment
From my perspective this PR is ready to merge once:
The code quality is solid. The 5-module refactor addressed the major structural concern. All quality gates pass.
a160888632cf67ba0a86New commits pushed, approval review dismissed automatically according to repository settings