feat(tool): add tool-level execution environment preferences #970
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!970
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m5-tool-env-prefs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Add tool-level execution environment preferences with four modes:
required,preferred,specific, andnone.Changes
New model (
domain/models/core/execution_environment_preference.py):EnvironmentPreferenceModeenum: REQUIRED, PREFERRED, SPECIFIC, NONEExecutionEnvironmentPreferencefrozen Pydantic model withmodeandtarget_resourcefieldstarget_resourcerequired iff mode is SPECIFICToolSpec & Tool model updates:
execution_environment: ExecutionEnvironmentPreferencefield to bothToolSpec(runtime) andTool(domain)Tool.from_config()parsesexecution_environmentfrom YAML config dictsToolRunner integration (
tool/runner.py):_env_resolver.resolve(), checksspec.execution_environment.mode:ContainerUnavailableErrorif resolved env is not CONTAINERtool_envwithtarget_resourceTests
Quality Gates
nox -s lintnox -s typechecknox -s unit_testsnox -s integration_testsCloses #879
6c6d7ec3dd3bfd07758cPM Day 36 Triage: MERGE CONFLICT. @brent.edwards rebase onto master needed. This closes #879 and is an M6 feature targeting v3.5.0. Reviewer: @CoreRasurae — please review once rebased and conflicts are resolved.
PM Status — Day 37 — Rebase Required
This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.
@brent.edwards — Please rebase this PR onto
masterby Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.PM rebase request — Day 37
Code Review — PR #970
Reviewed against the full review protocol (Phases 0–9). Typecheck, lint, and all 27 Behave scenarios pass on the current branch. The core design (domain model + ToolRunner integration) is solid. However, there are blocking issues that need resolution before merge.
Critical (must fix)
F1: Scope creep — 7 unrelated files modified
The PR modifies
robot/architecture.robot,robot/cli_core.robot,robot/core_cli_commands.robot,robot/database_integration.robot,robot/scientific_paper_basic.robot,robot/scientific_paper_e2e_test.robot, andsrc/cleveragents/infrastructure/database/unit_of_work.py. These are Robot assertion tightening and a UoW lifecycle fix — neither related to tool env preferences. Please revert these and submit as separate PRs.F2: PR labels — Remove
MoSCoW/Should haveandPoints/5from the PR. These belong on the issue only.F3: Issue #879 subtasks — All 8 subtasks and 6 acceptance criteria are still unchecked.
F4: Merge conflicts — Branch is 52 commits behind master (
mergeable: false). Thedomain/models/core/__init__.pydiff removesSandboxDiffEntry,DiffView,SandboxRef,SandboxStrategyProtocolwhich were likely added on master after this branch diverged — these will conflict on rebase.Major (must fix)
F5:
SPECIFICmode doesn't route to the named target (runner.py:167-169)When
pref.mode == SPECIFIC, the code setseffective_tool_env = "container"but never passespref.target_resourceto the resolver or container executor. SPECIFIC currently behaves identically to REQUIRED — the named target is silently ignored. This is the primary value proposition of SPECIFIC mode.F6:
PREFERREDsemantics need documentation (runner.py:173-178)PREFERRED only upgrades to container when
effective_tool_env is None. If the caller passestool_env="host", the tool's preference is silently ignored. This may be intentional (caller override takes precedence) but contradicts the issue description ("try container, fall back to host"). Either document the current behavior or adjust the logic.F7: Missing
__all__(execution_environment_preference.py)Module needs:
F8:
providers/__init__.pydocstring — Unrelated to this feature. Move to a separate commit.Minor (non-blocking)
F9: Behave step (line 449) and Robot helper (line 239) use bare
MagicMock()forcontainer_executor. Should becreate_autospec(ContainerToolExecutor).F10: Verify
vulture_whitelist.pyadditions are limited to the new symbols.F11: No
structlogmessages in the preference routing block (runner.py:163-209). These routing decisions (REQUIRED fail, PREFERRED fallback, SPECIFIC override) should be logged for debugging.F12: The 4 ToolRunner Behave scenarios could share a
Backgroundto reduce setup duplication.Summary
The domain model is clean and well-validated. F5 (SPECIFIC not routing to target) is the most important functional bug — the rest are process and hygiene.
Code Review — PR #970 (Round 2)
Follow-up to the initial review. These are additional findings discovered during deeper scenario tracing, Pydantic round-trip testing, and boundary analysis. All confirmed empirically against the branch.
Major
F13: Whitespace-only
target_resourcepasses validation (execution_environment_preference.py:52,66)Confirmed:
The validator uses
not self.target_resourcewhich isFalsefor whitespace strings. The model'sConfigDicthasfrozen=Truebut nostr_strip_whitespace=True. A whitespace-only target is meaningless and will silently cause routing failures.Fix: add
str_strip_whitespace=Trueto the ConfigDict.F14: Rebase will silently drop
SandboxStrategyProtocoland 3 related symbols (related to F4)The PR's
__init__.pyis missing thesandbox_strategyimports (DiffView,SandboxDiffEntry,SandboxRef,SandboxStrategyProtocol) that were added to master in commit2688c857after this branch diverged. On rebase, the PR's__init__.pypatch will either conflict or silently remove these symbols. The author must manually verify all sandbox_strategy exports survive.F15: SPECIFIC mode failure gives generic error, no target name (
runner.py:203-208)Confirmed: when SPECIFIC mode targeting
local/api-devfails, the error is:No mention of the target
local/api-dev. Add a SPECIFIC-mode handler before theelsebranch:Minor
F16:
as_cli_dictuses string comparison instead of enum (tool.py:533)All other comparisons in this PR use enum members (e.g.,
pref.mode == EnvironmentPreferenceMode.SPECIFIC). Should beee.mode != EnvironmentPreferenceMode.NONEfor consistency.F17: No test for SPECIFIC mode failure path
The feature tests REQUIRED failure and PREFERRED fallback, but no scenario covers SPECIFIC mode when the container is unavailable.
F18: No boundary test for whitespace-only
target_resourceOnce F13 is fixed, add a scenario verifying rejection.
F19: No test for
from_configwith invalidexecution_environmenttypeexecution_environment: "required"(string instead of dict) raises a rawValidationError. Should have a scenario verifying the error.F20:
from_configtreatsexecution_environment: nullas default silentlyMay be intentional, but should be documented.
F21:
str_strip_whitespaceinconsistency betweenTooland nestedExecutionEnvironmentPreferenceConfirmed:
Tool.description = " padded "gets stripped to"padded", butpref.target_resource = " padded "stays as-is. Fix overlaps with F13.Round 2 Summary
check comments
Response to @hamza.khyari's Review (Rounds 1+2) — PR #970
Thank you for the comprehensive review, Hamza. The SPECIFIC mode routing bug (F5) is a great catch — that's the primary value prop of the feature.
Critical
F1 (Scope creep — 7 unrelated files): Agreed. Will revert the Robot assertion tightening and UoW lifecycle fix to separate PRs. The commit should contain only the tool env preference feature files.
F2 (PR labels): Will remove
MoSCoW/Should haveandPoints/5.F3 (Issue subtasks): Will check off completed items on #879.
F4 (Merge conflicts / rebase): Will rebase onto current master. Will manually verify all
sandbox_strategyexports survive, per your note aboutSandboxStrategyProtocoletc.Major (Round 1)
F5 (SPECIFIC mode doesn't route to named target): Confirmed bug. Will pass
pref.target_resourcethrough toresolve_and_validate()and the container executor, so SPECIFIC mode actually uses the named container resource.F6 (PREFERRED semantics): Will document: "caller override takes precedence over tool preference — PREFERRED only upgrades when no caller override is set."
F7 (Missing
__all__): Will add.F8 (Unrelated docstring change): Will move to separate commit.
Major (Round 2)
F13 (Whitespace-only
target_resource): Will addstr_strip_whitespace=Trueto the model ConfigDict.F14 (Rebase symbol loss): Will carefully merge
sandbox_strategyimports during rebase.F15 (SPECIFIC error message generic): Will add a SPECIFIC-mode handler with the target resource name in the error message.
Minor
F9: Will use
create_autospec(ContainerToolExecutor).F10: Will verify vulture whitelist.
F11: Will add
structlogmessages for routing decisions.F12: Will add a
Backgroundfor shared setup.F16: Will use enum comparison instead of string.
F17-F21: Will add boundary test scenarios for SPECIFIC failure, whitespace target, invalid config type, null config, and strip inconsistency.
Working on all fixes now. The rebase (F4) will be done first to establish a clean base for the other changes.
3bfd07758c1400f1a5b91400f1a5b9ddd1c4b7bcddd1c4b7bc7c7e4d5cecReview Fixes Applied — Commit
7c7e4d5cBranch rebased onto
origin/master(ad98d41d). Scope creep (F1) and merge conflicts (F4) automatically resolved by rebase. All actionable findings addressed.__all__)create_autospec)MagicMock()withcreate_autospec(ContainerToolExecutor)in both test filesstr_strip_whitespace=Truein ConfigDicttarget_resourceEnvironmentPreferenceMode.NONEQuality Gates
nox -s lint— PASSnox -s typecheck— PASS (0 errors)Code Review — PR #970
Tool-level execution environment preferences. Proper labels, milestone, and issue linkage (#879). Approved.
7c7e4d5cec514d36a531New commits pushed, approval review dismissed automatically according to repository settings
Rebased onto
origin/master(79b0a2c5). CHANGELOG conflict resolved (kept master, re-added PR entry).nox -s lintPASS,nox -s typecheckPASS (0 errors). Commit514d36a5.