feat(devcontainer): add container-aware tool execution and I/O forwarding #515

Closed
opened 2026-03-02 16:23:08 +00:00 by freemo · 4 comments
Owner

Metadata

  • Commit Message: feat(devcontainer): add container-aware tool execution and I/O forwarding
  • Branch: feature/m6plus-container-tool-exec

Background and Context

The specification (§Execution > Execution Environment Routing, §Resources > Devcontainer Integration) defines that when execution_environment is set to container, tools must execute inside the target container rather than on the host. This requires I/O forwarding between host and container, path mapping for file-based tools, and result retrieval. This builds on the execution environment routing (#512) and devcontainer lifecycle (#514).

Expected Behavior

When a tool executes in container mode, the tool runner transparently routes execution through devcontainer exec, handles path mapping between host and container filesystems, and retrieves results. File tools write to the container filesystem; results are synced back to the host sandbox.

Acceptance Criteria

  • Implement container tool execution via devcontainer exec with proper argument escaping.
  • Implement host-to-container path mapping for file paths in tool arguments.
  • Implement container-to-host result retrieval for file-based tool outputs.
  • Handle container execution timeouts with configurable limits.
  • Handle container execution failures with structured error reporting.
  • Support both built-in tools and MCP tools executing in container mode.
  • Maintain tool execution audit trail with container metadata (container_id, image, exec_time).

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Subtasks

  • Code [Luis]: Implement ContainerToolExecutor in src/cleveragents/tool/ that wraps tool execution via devcontainer exec.
  • Code [Luis]: Implement PathMapper utility for host-to-container and container-to-host path translation.
  • Code [Luis]: Wire ContainerToolExecutor into the tool runner when ExecutionEnvironmentResolver returns container.
  • Code [Luis]: Implement result retrieval — sync container filesystem changes back to host sandbox.
  • Code [Luis]: Add timeout handling and structured error reporting for container execution failures.
  • Code [Luis]: Add container metadata (container_id, image, exec_time) to ToolInvocation audit trail.
  • Docs [Luis]: Update docs/reference/execution_environment.md with container execution details.
  • Tests (Behave) [Luis]: Add features/container_tool_exec.feature covering: basic container execution, path mapping, timeout handling, error reporting, audit trail.
  • Tests (Robot) [Luis]: Add robot/container_tool_exec.robot integration tests.
  • Tests (ASV) [Luis]: Add benchmarks/container_tool_exec_bench.py for execution overhead measurement.
  • Quality [Luis]: Verify coverage >=97% via nox -s coverage_report.
  • Quality [Luis]: Run nox (all default sessions, including benchmark), fix any errors.

Parent: #398 (Epic: Post-MVP Resources)
Blocked by: #512, #514

## Metadata - **Commit Message**: `feat(devcontainer): add container-aware tool execution and I/O forwarding` - **Branch**: `feature/m6plus-container-tool-exec` ## Background and Context The specification (§Execution > Execution Environment Routing, §Resources > Devcontainer Integration) defines that when execution_environment is set to `container`, tools must execute inside the target container rather than on the host. This requires I/O forwarding between host and container, path mapping for file-based tools, and result retrieval. This builds on the execution environment routing (#512) and devcontainer lifecycle (#514). ## Expected Behavior When a tool executes in container mode, the tool runner transparently routes execution through `devcontainer exec`, handles path mapping between host and container filesystems, and retrieves results. File tools write to the container filesystem; results are synced back to the host sandbox. ## Acceptance Criteria - [x] Implement container tool execution via `devcontainer exec` with proper argument escaping. - [x] Implement host-to-container path mapping for file paths in tool arguments. - [x] Implement container-to-host result retrieval for file-based tool outputs. - [x] Handle container execution timeouts with configurable limits. - [x] Handle container execution failures with structured error reporting. - [x] Support both built-in tools and MCP tools executing in container mode. - [x] Maintain tool execution audit trail with container metadata (container_id, image, exec_time). ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. ## Subtasks - [x] Code [Luis]: Implement `ContainerToolExecutor` in `src/cleveragents/tool/` that wraps tool execution via `devcontainer exec`. - [x] Code [Luis]: Implement `PathMapper` utility for host-to-container and container-to-host path translation. - [x] Code [Luis]: Wire `ContainerToolExecutor` into the tool runner when `ExecutionEnvironmentResolver` returns `container`. - [x] Code [Luis]: Implement result retrieval — sync container filesystem changes back to host sandbox. - [x] Code [Luis]: Add timeout handling and structured error reporting for container execution failures. - [x] Code [Luis]: Add container metadata (container_id, image, exec_time) to `ToolInvocation` audit trail. - [x] Docs [Luis]: Update `docs/reference/execution_environment.md` with container execution details. - [x] Tests (Behave) [Luis]: Add `features/container_tool_exec.feature` covering: basic container execution, path mapping, timeout handling, error reporting, audit trail. - [x] Tests (Robot) [Luis]: Add `robot/container_tool_exec.robot` integration tests. - [x] Tests (ASV) [Luis]: Add `benchmarks/container_tool_exec_bench.py` for execution overhead measurement. - [x] Quality [Luis]: Verify coverage >=97% via `nox -s coverage_report`. - [x] Quality [Luis]: Run `nox` (all default sessions, including benchmark), fix any errors. Parent: #398 (Epic: Post-MVP Resources) Blocked by: #512, #514
freemo added this to the v3.6.0 milestone 2026-03-02 16:24:38 +00:00
Member

Implementation Notes

Architecture Decisions

Pydantic BaseModel for all domain types: The codebase has an architecture test (features/architecture.feature:38) enforcing that all dataclasses use Pydantic BaseModel. All new types (ContainerConfig, ContainerMetadata, ContainerExecutionError, ContainerTimeoutError, _ExecResult) use BaseModel with ConfigDict(frozen=True) where immutability is required.

Error hierarchy: ContainerExecutionError and ContainerTimeoutError extend both Exception and BaseModel to support both structured data and Python exception semantics. ContainerTimeoutError inherits from ContainerExecutionError and sets timed_out=True by default.

PathMapper design: Bidirectional path mapping between host and container filesystems. Paths outside the configured root are returned unchanged (passthrough). Uses pathlib.PurePosixPath for platform-independent path manipulation.

ToolRunner integration: Added optional container_executor parameter to ToolRunner.__init__() to maintain backward compatibility. When execution environment resolves to CONTAINER and no executor is configured, returns a structured ToolResult error rather than raising an exception.

File Locations

Component File Lines
ContainerToolExecutor src/cleveragents/tool/container_executor.py Full file (~240 lines)
ContainerConfig src/cleveragents/tool/container_executor.py ~40-80
ContainerMetadata src/cleveragents/tool/container_executor.py ~85-120
PathMapper src/cleveragents/tool/path_mapper.py Full file (~40 lines)
ToolRunner routing src/cleveragents/tool/runner.py 51-63, 168-179
ToolInvocation.container_metadata src/cleveragents/domain/models/core/change.py ~475

Test Results

Session Result
lint Pass
typecheck Pass (0 Pyright errors)
unit_tests 8910 scenarios, 8909 passed, 1 pre-existing error (context_strategy_registry:124)
integration_tests 1287/1287 passed, 0 failed
coverage_report 97% overall (meets threshold)
benchmark Pass (all 5 new ASV benchmarks execute)
security_scan Pass
dead_code Pass (vulture whitelist updated)
docs Pass (mkdocs build successful)
build Pass (wheel built successfully)

Key Discovery: Robot Framework = Escaping

Robot Framework's Run Process keyword interprets = characters in continuation lines as keyword arguments. When passing inline Python code containing assignments (e.g., m = PathMapper(...)) via -c, Robot misparses them. Fix: Use ${script}= Catenate SEPARATOR=\n to build multi-line Python scripts as variables before passing to Run Process. This pattern is consistent with robot/changeset_persistence.robot.

Pre-existing Error

features/context_strategy_registry.feature:124 ("Default strategy config values") has a pre-existing error that exists on master branch. This is NOT caused by our changes.

PR

PR #616 - feature/m6plus-container-tool-exec -> master

## Implementation Notes ### Architecture Decisions **Pydantic BaseModel for all domain types**: The codebase has an architecture test (`features/architecture.feature:38`) enforcing that all dataclasses use Pydantic BaseModel. All new types (`ContainerConfig`, `ContainerMetadata`, `ContainerExecutionError`, `ContainerTimeoutError`, `_ExecResult`) use `BaseModel` with `ConfigDict(frozen=True)` where immutability is required. **Error hierarchy**: `ContainerExecutionError` and `ContainerTimeoutError` extend both `Exception` and `BaseModel` to support both structured data and Python exception semantics. `ContainerTimeoutError` inherits from `ContainerExecutionError` and sets `timed_out=True` by default. **PathMapper design**: Bidirectional path mapping between host and container filesystems. Paths outside the configured root are returned unchanged (passthrough). Uses `pathlib.PurePosixPath` for platform-independent path manipulation. **ToolRunner integration**: Added optional `container_executor` parameter to `ToolRunner.__init__()` to maintain backward compatibility. When execution environment resolves to `CONTAINER` and no executor is configured, returns a structured `ToolResult` error rather than raising an exception. ### File Locations | Component | File | Lines | |---|---|---| | ContainerToolExecutor | `src/cleveragents/tool/container_executor.py` | Full file (~240 lines) | | ContainerConfig | `src/cleveragents/tool/container_executor.py` | ~40-80 | | ContainerMetadata | `src/cleveragents/tool/container_executor.py` | ~85-120 | | PathMapper | `src/cleveragents/tool/path_mapper.py` | Full file (~40 lines) | | ToolRunner routing | `src/cleveragents/tool/runner.py` | 51-63, 168-179 | | ToolInvocation.container_metadata | `src/cleveragents/domain/models/core/change.py` | ~475 | ### Test Results | Session | Result | |---|---| | lint | Pass | | typecheck | Pass (0 Pyright errors) | | unit_tests | 8910 scenarios, 8909 passed, 1 pre-existing error (context_strategy_registry:124) | | integration_tests | 1287/1287 passed, 0 failed | | coverage_report | 97% overall (meets threshold) | | benchmark | Pass (all 5 new ASV benchmarks execute) | | security_scan | Pass | | dead_code | Pass (vulture whitelist updated) | | docs | Pass (mkdocs build successful) | | build | Pass (wheel built successfully) | ### Key Discovery: Robot Framework `=` Escaping Robot Framework's `Run Process` keyword interprets `=` characters in continuation lines as keyword arguments. When passing inline Python code containing assignments (e.g., `m = PathMapper(...)`) via `-c`, Robot misparses them. **Fix**: Use `${script}= Catenate SEPARATOR=\n` to build multi-line Python scripts as variables before passing to `Run Process`. This pattern is consistent with `robot/changeset_persistence.robot`. ### Pre-existing Error `features/context_strategy_registry.feature:124` ("Default strategy config values") has a pre-existing error that exists on master branch. This is NOT caused by our changes. ### PR PR #616 - `feature/m6plus-container-tool-exec` -> `master`
Author
Owner

PM Status — Day 29 (2026-03-09)

@CoreRasurae — Acknowledged your implementation completion for container-aware tool execution. All quality gates passing (8910 BDD, 1287 Robot, 97% coverage) is excellent.

Dependency note: PR #616 (your implementation) is blocked on PR #613 (lazy activation and lifecycle). PR #613 needs to merge first before #616 can be rebased and reviewed.

Action items:

  1. Please ensure PR #613 is rebased and conflict-free — it currently has merge conflicts
  2. Once #613 merges, rebase #616 against master
  3. Both PRs need the empty body issue resolved — please add implementation notes to the PR descriptions per CONTRIBUTING.md

Timeline: These are M5 features (v3.4.0, due 2026-03-06 — already past due). Prioritize getting #613 through review first.

**PM Status — Day 29 (2026-03-09)** @CoreRasurae — Acknowledged your implementation completion for container-aware tool execution. All quality gates passing (8910 BDD, 1287 Robot, 97% coverage) is excellent. **Dependency note:** PR #616 (your implementation) is blocked on PR #613 (lazy activation and lifecycle). PR #613 needs to merge first before #616 can be rebased and reviewed. **Action items:** 1. Please ensure PR #613 is rebased and conflict-free — it currently has merge conflicts 2. Once #613 merges, rebase #616 against master 3. Both PRs need the empty body issue resolved — please add implementation notes to the PR descriptions per CONTRIBUTING.md **Timeline:** These are M5 features (v3.4.0, due 2026-03-06 — already past due). Prioritize getting #613 through review first.
Member

Code Review Fixes Applied (PR #616, Review #2064)

All validated findings from the 3-cycle code review have been applied and amended into commit 6fa6342a.

Production Fixes (9 applied)

ID Severity Summary
B1 HIGH sync_results_to_host now writes captured result.stdout content to host_path via Path.write_text()
B2 HIGH devcontainer exec uses --container-id when set, falls back to --workspace-folder with new host_workspace_folder field on ContainerConfig
B3 MEDIUM _map_input_paths recurses into dicts inside lists via _map_value_host_to_container() helper
B4 MEDIUM _map_output_paths recurses into dicts inside lists via _map_value_container_to_host() helper
B5 MEDIUM _build_exec_command JSON serialization errors now return clean ToolResult(success=False)
B6 LOW timeout_seconds=0 no longer silently ignored; Field(gt=0) validator added
B7 LOW _build_sync_command docstrings corrected
B8 LOW logger.warning() emitted when /tmp/sandbox fallback path is used
SC2 INFO Reference doc updated with "Container Targeting" and "Container Prerequisites" sections

Test Additions (11 new scenarios, T1-T8)

  • T1: sync_results_to_host verifies file content written to real tmpdir
  • T2: Nested dicts-in-lists mapping for _map_input_paths
  • T3: Nested dicts-in-lists mapping for _map_output_paths
  • T4: sync_results_to_host failure raises ContainerExecutionError
  • T5: Custom timeout_seconds override scenario
  • T6: _parse_output edge cases (JSON array, malformed JSON)
  • T7: Empty host_sandbox_path fallback covered by B8 warning
  • T8: _build_exec_command structure with --container-id and --workspace-folder variants

Deferred (out of scope)

ID Severity Reason
S1 LOW Stderr secret redaction requires a broader mechanism not scoped to this issue
P1 LOW Async variant is a significant new feature beyond #515 scope
SC1 LOW Doc describes current 4-level precedence accurately; 6-level is #512's concern

Validation

  • nox -e lint
  • nox -e typecheck (0 errors)
  • nox -e unit_tests (8,919 passed; 1 errored scenario is pre-existing on master at context_strategy_registry.feature:124)
## Code Review Fixes Applied (PR #616, Review #2064) All validated findings from the 3-cycle code review have been applied and amended into commit `6fa6342a`. ### Production Fixes (9 applied) | ID | Severity | Summary | |----|----------|---------| | **B1** | HIGH | `sync_results_to_host` now writes captured `result.stdout` content to `host_path` via `Path.write_text()` | | **B2** | HIGH | `devcontainer exec` uses `--container-id` when set, falls back to `--workspace-folder` with new `host_workspace_folder` field on `ContainerConfig` | | **B3** | MEDIUM | `_map_input_paths` recurses into dicts inside lists via `_map_value_host_to_container()` helper | | **B4** | MEDIUM | `_map_output_paths` recurses into dicts inside lists via `_map_value_container_to_host()` helper | | **B5** | MEDIUM | `_build_exec_command` JSON serialization errors now return clean `ToolResult(success=False)` | | **B6** | LOW | `timeout_seconds=0` no longer silently ignored; `Field(gt=0)` validator added | | **B7** | LOW | `_build_sync_command` docstrings corrected | | **B8** | LOW | `logger.warning()` emitted when `/tmp/sandbox` fallback path is used | | **SC2** | INFO | Reference doc updated with "Container Targeting" and "Container Prerequisites" sections | ### Test Additions (11 new scenarios, T1-T8) - **T1**: `sync_results_to_host` verifies file content written to real tmpdir - **T2**: Nested dicts-in-lists mapping for `_map_input_paths` - **T3**: Nested dicts-in-lists mapping for `_map_output_paths` - **T4**: `sync_results_to_host` failure raises `ContainerExecutionError` - **T5**: Custom `timeout_seconds` override scenario - **T6**: `_parse_output` edge cases (JSON array, malformed JSON) - **T7**: Empty `host_sandbox_path` fallback covered by B8 warning - **T8**: `_build_exec_command` structure with `--container-id` and `--workspace-folder` variants ### Deferred (out of scope) | ID | Severity | Reason | |----|----------|--------| | **S1** | LOW | Stderr secret redaction requires a broader mechanism not scoped to this issue | | **P1** | LOW | Async variant is a significant new feature beyond #515 scope | | **SC1** | LOW | Doc describes current 4-level precedence accurately; 6-level is #512's concern | ### Validation - `nox -e lint` ✅ - `nox -e typecheck` ✅ (0 errors) - `nox -e unit_tests` ✅ (8,919 passed; 1 errored scenario is pre-existing on master at `context_strategy_registry.feature:124`)
Author
Owner

PM Acknowledgment (Day 31):

Thank you @CoreRasurae for addressing the review findings.

Status: PR #616 has 22 comments and 4 rounds of REQUEST_CHANGES from @brent.edwards. Merge conflict present.

Action needed: Rebase PR #616 and request re-review from Brent. Address any remaining findings from Round 4.

Priority: Medium — after TDD infra (#627, #629). This is M6 (v3.6.0) work.

**PM Acknowledgment (Day 31)**: Thank you @CoreRasurae for addressing the review findings. **Status**: PR #616 has 22 comments and 4 rounds of `REQUEST_CHANGES` from @brent.edwards. Merge conflict present. **Action needed**: Rebase PR #616 and request re-review from Brent. Address any remaining findings from Round 4. **Priority**: Medium — after TDD infra (#627, #629). This is M6 (v3.6.0) work.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#515
No description provided.