fix(cli): honour project-level execution-env-priority in resolution #1136

2026-03-23T13:20:05Z

hamza.khyari commented

2026-03-23 13:20:05 +00:00

Summary

Thread plan_env and project_env through the tool execution chain so the ExecutionEnvironmentResolver receives project-level execution environment values
The resolver's precedence logic (tool > plan > project > default) was already correct, but callers never passed project_env — tools always fell through to the global HOST default

Closes #1080

Dependency

Requires PR #1135 to be merged first. PR #1135 adds the CLI flag (project context set --execution-env-priority) and persistence for project-level execution environment. This PR (#1136) threads the persisted value through the execution chain to the resolver.

Root Cause

The bug is in the call chain, not the resolver. ToolCallRouter.route(), ToolCallingRuntime._execute_tool_call(), and PlanExecutionContext never threaded project_env through to ToolRunner.execute(). The resolver received project_env=None for every invocation, so precedence level 2 was always skipped.

Changes

File	Change
`PlanExecutionContext`	Added `plan_env` and `project_env` fields + properties
`ToolCallRouter`	Accept `plan_env`/`project_env` in constructor, pass to `runner.execute()` in `route()` and `route_streaming()`
`ToolCallingRuntime`	Accept `plan_env`/`project_env`, pass to `runner.execute()` in the direct-runner fallback path
`tool_router_steps.py`	Updated monkey-patched execute stubs to accept explicit `plan_env`/`project_env` keyword arguments
`exec_env_project_override.feature`	4 new BDD scenarios proving plan_env/project_env reach the resolver via ToolCallRouter
`CHANGELOG.md`	Entry for #1080

Tests

4 BDD scenarios (22 steps) verify the full chain:

project_env="container" reaches resolver
plan_env="host" reaches resolver
Both plan_env and project_env reach resolver simultaneously
Neither passed → resolver receives None for both

All 99 existing tool_router + execution_environment scenarios pass.

Note on E2E Test

The issue mentions removing @tdd_expected_fail from a WF17 precedence level 2 test case. No such test exists — the WF17 E2E suite has not been written yet.

ISSUES CLOSED: #1080

## Summary - Thread `plan_env` and `project_env` through the tool execution chain so the `ExecutionEnvironmentResolver` receives project-level execution environment values - The resolver's precedence logic (tool > plan > project > default) was already correct, but callers never passed `project_env` — tools always fell through to the global HOST default Closes #1080 ## Dependency **Requires PR #1135 to be merged first.** PR #1135 adds the CLI flag (`project context set --execution-env-priority`) and persistence for project-level execution environment. This PR (#1136) threads the persisted value through the execution chain to the resolver. ## Root Cause The bug is in the **call chain**, not the resolver. `ToolCallRouter.route()`, `ToolCallingRuntime._execute_tool_call()`, and `PlanExecutionContext` never threaded `project_env` through to `ToolRunner.execute()`. The resolver received `project_env=None` for every invocation, so precedence level 2 was always skipped. ## Changes | File | Change | |---|---| | `PlanExecutionContext` | Added `plan_env` and `project_env` fields + properties | | `ToolCallRouter` | Accept `plan_env`/`project_env` in constructor, pass to `runner.execute()` in `route()` and `route_streaming()` | | `ToolCallingRuntime` | Accept `plan_env`/`project_env`, pass to `runner.execute()` in the direct-runner fallback path | | `tool_router_steps.py` | Updated monkey-patched execute stubs to accept explicit `plan_env`/`project_env` keyword arguments | | `exec_env_project_override.feature` | **4 new BDD scenarios** proving plan_env/project_env reach the resolver via ToolCallRouter | | `CHANGELOG.md` | Entry for #1080 | ## Tests 4 BDD scenarios (22 steps) verify the full chain: 1. `project_env="container"` reaches resolver 2. `plan_env="host"` reaches resolver 3. Both `plan_env` and `project_env` reach resolver simultaneously 4. Neither passed → resolver receives `None` for both All 99 existing tool_router + execution_environment scenarios pass. ## Note on E2E Test The issue mentions removing `@tdd_expected_fail` from a WF17 precedence level 2 test case. No such test exists — the WF17 E2E suite has not been written yet. ISSUES CLOSED: #1080

hamza.khyari added the

labels 2026-03-23 13:20:16 +00:00

hamza.khyari added this to the v3.5.0 milestone 2026-03-23 13:20:17 +00:00

hamza.khyari added the

MoSCoW

Must have

label 2026-03-23 13:20:25 +00:00

hamza.khyari self-assigned this 2026-03-23 13:34:53 +00:00

freemo requested changes 2026-03-24 15:24:37 +00:00

Dismissed

freemo left a comment

Review: REQUEST CHANGES

The diff itself is clean, minimal, and well-structured — adding plan_env/project_env parameters to PlanExecutionContext, ToolCallRouter, and ToolCallingRuntime. The commit messages are excellent and the code follows existing patterns. However, there are two significant gaps.

Blocking Issues

No production caller passes the new parameters. The PR adds plan_env/project_env to the constructors of PlanExecutionContext, ToolCallRouter, and ToolCallingRuntime, but no production construction site is modified in this diff. All existing callers still instantiate these objects without passing env values. This means the plumbing is in place, but unless a companion change at the orchestration/CLI layer actually passes these values when constructing the router and runtime, the bug remains latent in production.

If the top-level wiring occurs in a separate PR (layered approach) or via a DI container, please clarify this in the PR description and link to the companion PR.
No new behavioral tests. The only test change is a signature fixup to 2 existing monkey-patched stubs (adding **_kwargs). There is no test that verifies the actual fix — e.g., constructing a ToolCallRouter(plan_env="container") and asserting the runner receives it. For a Priority/Critical bug fix, there must be at least one test proving the behavior change works end-to-end.

Minor Issues

**_kwargs in test stubs is pragmatic but fragile — it silently swallows any future keyword changes without failing. Consider explicitly accepting the new params instead.
No BDD scenario for the end-to-end project-level override path. Per CONTRIBUTING.md §Testing Philosophy: "Every coding task must include or update tests at multiple levels."

Action Items

Clarify where the top-level wiring occurs (or add it to this PR)
Add at least one test that constructs ToolCallRouter with project_env="container" and asserts the runner's execute() receives it
Consider adding a BDD scenario for end-to-end project-level env override
Replace **_kwargs with explicit parameter names in test stubs

## Review: REQUEST CHANGES The diff itself is clean, minimal, and well-structured — adding `plan_env`/`project_env` parameters to `PlanExecutionContext`, `ToolCallRouter`, and `ToolCallingRuntime`. The commit messages are excellent and the code follows existing patterns. However, there are two significant gaps. ### Blocking Issues 1. **No production caller passes the new parameters.** The PR adds `plan_env`/`project_env` to the **constructors** of `PlanExecutionContext`, `ToolCallRouter`, and `ToolCallingRuntime`, but **no production construction site is modified in this diff**. All existing callers still instantiate these objects without passing env values. This means the plumbing is in place, but unless a companion change at the orchestration/CLI layer actually passes these values when constructing the router and runtime, **the bug remains latent in production**. If the top-level wiring occurs in a separate PR (layered approach) or via a DI container, please clarify this in the PR description and link to the companion PR. 2. **No new behavioral tests.** The only test change is a signature fixup to 2 existing monkey-patched stubs (adding `**_kwargs`). There is no test that verifies the actual fix — e.g., constructing a `ToolCallRouter(plan_env="container")` and asserting the runner receives it. For a `Priority/Critical` bug fix, there must be at least one test proving the behavior change works end-to-end. ### Minor Issues 3. **`**_kwargs` in test stubs** is pragmatic but fragile — it silently swallows any future keyword changes without failing. Consider explicitly accepting the new params instead. 4. **No BDD scenario** for the end-to-end project-level override path. Per CONTRIBUTING.md §Testing Philosophy: *"Every coding task must include or update tests at multiple levels."* ### Action Items - [ ] Clarify where the top-level wiring occurs (or add it to this PR) - [ ] Add at least one test that constructs `ToolCallRouter` with `project_env="container"` and asserts the runner's `execute()` receives it - [ ] Consider adding a BDD scenario for end-to-end project-level env override - [ ] Replace `**_kwargs` with explicit parameter names in test stubs

freemo approved these changes 2026-03-27 17:10:04 +00:00

Dismissed

freemo left a comment

Review: fix(cli): honour project-level execution-env-priority in resolution

Approved with comments. The code change is correct and minimal.

Issues to Address

1. Unrelated merge commits (Medium)
The branch contains 8 unrelated merge commits from master (LSP runtime, resource handler, plan lifecycle, etc.). Should be rebased to contain only the 2 relevant commits for a cleaner diff and review.

2. Missing integration tests (Medium)
The only test change fixes existing stubs to accept the new **_kwargs. No new test scenarios verify the actual precedence behavior end-to-end. At least one test that confirms project-level execution-env-priority is actually honored during tool execution would strengthen confidence.

What's Good

Clean, minimal change: correctly threads plan_env/project_env through PlanExecutionContext → ToolCallingRuntime → ToolCallRouter → runner.execute().
Type safety: all new parameters are str | None with None defaults, consistent with existing patterns.
No new error paths introduced — parameters are simply threaded through.
CHANGELOG entry references #1080.

Note

PRs #1135 and #1136 are related — #1135 adds the CLI flag and persistence, #1136 threads the value through execution. They should be merged in order: #1135 first, #1136 second.

## Review: fix(cli): honour project-level execution-env-priority in resolution **Approved with comments.** The code change is correct and minimal. ### Issues to Address **1. Unrelated merge commits (Medium)** The branch contains **8 unrelated merge commits** from master (LSP runtime, resource handler, plan lifecycle, etc.). Should be rebased to contain only the 2 relevant commits for a cleaner diff and review. **2. Missing integration tests (Medium)** The only test change fixes existing stubs to accept the new `**_kwargs`. No new test scenarios verify the actual precedence behavior end-to-end. At least one test that confirms project-level `execution-env-priority` is actually honored during tool execution would strengthen confidence. ### What's Good - Clean, minimal change: correctly threads `plan_env`/`project_env` through `PlanExecutionContext` → `ToolCallingRuntime` → `ToolCallRouter` → `runner.execute()`. - Type safety: all new parameters are `str | None` with `None` defaults, consistent with existing patterns. - No new error paths introduced — parameters are simply threaded through. - CHANGELOG entry references #1080. ### Note PRs #1135 and #1136 are related — #1135 adds the CLI flag and persistence, #1136 threads the value through execution. They should be merged in order: #1135 first, #1136 second.

freemo requested review from CoreRasurae 2026-03-28 21:28:06 +00:00

freemo requested changes 2026-03-28 23:21:59 +00:00

Dismissed

freemo left a comment

Day 48 Planning Review — Bug Fix PR for #1080

The core fix (injecting project_env into the execution environment resolution chain) is architecturally correct. However, several issues must be resolved:

Blocking issues:

Two commits — Must be squashed into one. Per CONTRIBUTING.md, each PR should contain exactly one atomic commit.
Merge conflicts (mergeable: false) — Rebase required.
No milestone assigned — Set to v3.5.0 to match linked bug #1080.
Closing keyword format — Uses ISSUES CLOSED: #1080 (custom trailer) instead of Closes #1080. Forgejo may not auto-close the issue with the custom format. Add Closes #1080 to the PR body.
No new tests — As @freemo flagged in review #2706, no behavioral test proves the fix works. The only changes are fixing existing monkey-patched stubs. At minimum, add a test that verifies project_env reaches the resolver.
Dependency on #1135 — This PR requires #1135 to be merged first (confirmed by @freemo's review #2787). This should be documented as a blocking dependency in the PR description.
No @tdd_expected_fail removal — Author acknowledges the WF17 test suite doesn't exist. Acceptable if no TDD test exists for this bug.

Requested changes: Squash commits, rebase, set milestone, add Closes #1080, add behavioral test, document #1135 dependency.

**Day 48 Planning Review — Bug Fix PR for #1080** The core fix (injecting `project_env` into the execution environment resolution chain) is architecturally correct. However, several issues must be resolved: **Blocking issues:** 1. **Two commits** — Must be squashed into one. Per CONTRIBUTING.md, each PR should contain exactly one atomic commit. 2. **Merge conflicts** (`mergeable: false`) — Rebase required. 3. **No milestone assigned** — Set to v3.5.0 to match linked bug #1080. 4. **Closing keyword format** — Uses `ISSUES CLOSED: #1080` (custom trailer) instead of `Closes #1080`. Forgejo may not auto-close the issue with the custom format. Add `Closes #1080` to the PR body. 5. **No new tests** — As @freemo flagged in review #2706, no behavioral test proves the fix works. The only changes are fixing existing monkey-patched stubs. At minimum, add a test that verifies `project_env` reaches the resolver. 6. **Dependency on #1135** — This PR requires #1135 to be merged first (confirmed by @freemo's review #2787). This should be documented as a blocking dependency in the PR description. 7. **No `@tdd_expected_fail` removal** — Author acknowledges the WF17 test suite doesn't exist. Acceptable if no TDD test exists for this bug. **Requested changes:** Squash commits, rebase, set milestone, add `Closes #1080`, add behavioral test, document #1135 dependency.

freemo approved these changes 2026-03-30 04:19:54 +00:00

Dismissed

freemo left a comment

Review: APPROVED

Well-written PR with clear root cause analysis and focused changes. The 5-file, 53-line change is surgically scoped.

Notes

Good: The PR description transparently notes the missing WF17 E2E test and explains why.
Good: Root cause is correctly identified — callers never passed project_env, not a resolver logic bug.
Minor: The **_kwargs signature in test mock overrides is unusual but pragmatic for forward compatibility.
The PlanExecutionContext additions (plan_env, project_env properties) are clean and well-documented.

## Review: APPROVED Well-written PR with clear root cause analysis and focused changes. The 5-file, 53-line change is surgically scoped. ### Notes - Good: The PR description transparently notes the missing WF17 E2E test and explains why. - Good: Root cause is correctly identified — callers never passed `project_env`, not a resolver logic bug. - Minor: The `**_kwargs` signature in test mock overrides is unusual but pragmatic for forward compatibility. - The `PlanExecutionContext` additions (`plan_env`, `project_env` properties) are clean and well-documented.

freemo requested changes 2026-03-30 04:48:41 +00:00

freemo left a comment

Updated Review (Deep Pass): REQUEST CHANGES

My initial review approved this PR. The deep review reveals a significant gap.

New Finding: Zero New Test Coverage

This PR threads plan_env and project_env parameters through PlanExecutionContext -> ToolCallingRuntime -> ToolCallRouter -> ToolRunner.execute(), but includes no new BDD scenarios or Robot tests proving the resolver actually uses these values. The only test changes update existing mock signatures to accept **_kwargs for compatibility.

Per CONTRIBUTING.md §Multi-Level Testing Mandate: "Every coding task must include or update tests at multiple levels." This PR adds production logic (6 call sites passing plan_env/project_env) but zero new tests verifying the behavior. There should be at least:

A BDD scenario verifying project-level env overrides plan-level env at the correct precedence
A scenario verifying the resolver receives and uses both values

Previous finding still applies:

The **_kwargs signature in test mocks is pragmatic for forward compatibility
The PR description is well-written with clear root cause analysis
router.py at 909 lines is a pre-existing 500-line violation

## Updated Review (Deep Pass): REQUEST CHANGES My initial review approved this PR. The deep review reveals a significant gap. ### New Finding: Zero New Test Coverage This PR threads `plan_env` and `project_env` parameters through `PlanExecutionContext` -> `ToolCallingRuntime` -> `ToolCallRouter` -> `ToolRunner.execute()`, but includes **no new BDD scenarios or Robot tests** proving the resolver actually uses these values. The only test changes update existing mock signatures to accept `**_kwargs` for compatibility. Per CONTRIBUTING.md §Multi-Level Testing Mandate: "Every coding task must include or update tests at multiple levels." This PR adds production logic (6 call sites passing `plan_env`/`project_env`) but zero new tests verifying the behavior. There should be at least: 1. A BDD scenario verifying project-level env overrides plan-level env at the correct precedence 2. A scenario verifying the resolver receives and uses both values ### Previous finding still applies: - The `**_kwargs` signature in test mocks is pragmatic for forward compatibility - The PR description is well-written with clear root cause analysis - `router.py` at 909 lines is a pre-existing 500-line violation

hamza.khyari force-pushed bugfix/m8-exec-env-precedence-level2 from d709717105 to 11450c6e67

2026-03-30 11:04:02 +00:00

Compare

hamza.khyari force-pushed bugfix/m8-exec-env-precedence-level2 from 11450c6e67 to 01400e8c43

2026-03-30 11:38:15 +00:00

Compare

hamza.khyari commented

2026-03-30 11:38:34 +00:00

Self-Review: Production Path Analysis

Rebased onto master (01400e8c). All 99 affected Behave scenarios pass.

Critical Finding: Production Bypass

Deep trace of the production plan execute path reveals it does not use the tool-calling pipeline:

CLI plan execute
  → _get_plan_executor()          # builds PlanExecutor with LLMExecuteActor
  → PlanExecutor.run_execute()
    → _run_execute_with_stub()     # execution_context is None
      → LLMExecuteActor.execute()
        → llm.invoke()             # DIRECT LangChain LLM call
        → _parse_file_blocks()     # regex parse of text output
        → returns ExecuteResult

ToolRunner.execute(), ToolCallRouter, ToolCallingRuntime, and PlanExecutionContext are never instantiated in this path. The production execution goes through LLMExecuteActor which sends a prompt directly to the LLM and parses text output with regex.

What This Means for #1080

Path	Uses tool calling?	project_env threaded?
Production (`LLMExecuteActor`)	No	N/A — no env resolution occurs
Tool-calling pipeline (`ToolCallRouter` → `ToolRunner`)	Yes	Yes — this PR fixes it
Tests/benchmarks	Yes	Yes — verified by 4 BDD scenarios

This PR correctly fixes the tool-calling pipeline's project_env threading. When the tool-calling pipeline becomes the production path (replacing LLMExecuteActor), the fix will be active. Currently, the production plan execute path doesn't do tool calling or env resolution at all.

Recommendation

This PR should be merged as-is — the fix is correct for the tool-calling pipeline. A separate issue should track migrating the production plan execute path from LLMExecuteActor (direct LLM invoke) to the tool-calling pipeline (ToolCallingRuntime → ToolCallRouter → ToolRunner), at which point this fix becomes production-active.

## Self-Review: Production Path Analysis Rebased onto master (`01400e8c`). All 99 affected Behave scenarios pass. ### Critical Finding: Production Bypass Deep trace of the production `plan execute` path reveals it **does not use the tool-calling pipeline**: ``` CLI plan execute → _get_plan_executor() # builds PlanExecutor with LLMExecuteActor → PlanExecutor.run_execute() → _run_execute_with_stub() # execution_context is None → LLMExecuteActor.execute() → llm.invoke() # DIRECT LangChain LLM call → _parse_file_blocks() # regex parse of text output → returns ExecuteResult ``` `ToolRunner.execute()`, `ToolCallRouter`, `ToolCallingRuntime`, and `PlanExecutionContext` are **never instantiated** in this path. The production execution goes through `LLMExecuteActor` which sends a prompt directly to the LLM and parses text output with regex. ### What This Means for #1080 | Path | Uses tool calling? | project_env threaded? | |------|-------------------|----------------------| | **Production** (`LLMExecuteActor`) | No | N/A — no env resolution occurs | | **Tool-calling pipeline** (`ToolCallRouter` → `ToolRunner`) | Yes | **Yes** — this PR fixes it | | **Tests/benchmarks** | Yes | **Yes** — verified by 4 BDD scenarios | This PR correctly fixes the tool-calling pipeline's `project_env` threading. When the tool-calling pipeline becomes the production path (replacing `LLMExecuteActor`), the fix will be active. Currently, the production `plan execute` path doesn't do tool calling or env resolution at all. ### Recommendation This PR should be merged as-is — the fix is correct for the tool-calling pipeline. A separate issue should track migrating the production `plan execute` path from `LLMExecuteActor` (direct LLM invoke) to the tool-calling pipeline (`ToolCallingRuntime` → `ToolCallRouter` → `ToolRunner`), at which point this fix becomes production-active.

hamza.khyari force-pushed bugfix/m8-exec-env-precedence-level2 from 01400e8c43 to cef18b7a0b

2026-03-30 12:11:33 +00:00

Compare

hamza.khyari force-pushed bugfix/m8-exec-env-precedence-level2 from cef18b7a0b to 2207f03150

2026-03-31 10:40:31 +00:00

Compare

hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-03-31 10:43:49 +00:00

hamza.khyari force-pushed bugfix/m8-exec-env-precedence-level2 from 2207f03150 to cd9cb9e889

2026-03-31 11:31:09 +00:00

Compare

hamza.khyari canceled auto merging this pull request when all checks succeed 2026-03-31 11:39:52 +00:00

hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-03-31 11:43:03 +00:00

hamza.khyari merged commit d27fb6d1f0 into master

2026-03-31 11:53:09 +00:00

hamza.khyari referenced this issue from a commit

2026-03-31 11:53:11 +00:00

Merge pull request 'fix(cli): honour project-level execution-env-priority in resolution' (#1136) from bugfix/m8-exec-env-precedence-level2 into master

hamza.khyari deleted branch bugfix/m8-exec-env-precedence-level2

2026-03-31 11:53:20 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#1136