TDD: Write failing test for #1080 — execution environment resolution ignores project-level override #1101

Closed
opened 2026-03-22 16:30:11 +00:00 by freemo · 2 comments
Owner

Metadata

  • Commit Message: test: add TDD bug-capture test for #1080 — execution env resolution precedence
  • Branch: tdd/m5-exec-env-resolution

Background and Context

This is the TDD counterpart to bug #1080. Per the project's Test-Driven Development workflow for bugs (see CONTRIBUTING.md > Bug Fix Workflow), the first step in fixing any bug is to write a test that captures the buggy behavior. The test is tagged with @tdd_bug, @tdd_bug_1080, and @tdd_expected_fail so that it passes CI while the bug is still unfixed. Once the fix is implemented in #1080, the @tdd_expected_fail tag will be removed and the test will run normally.

See #1080 for full bug details.

Expected Behavior

A new test exists that:

  1. Captures the exact failure described in #1080.
  2. Is tagged with @tdd_bug, @tdd_bug_1080, and @tdd_expected_fail.
  3. Passes CI via the expected-failure mechanism (the underlying assertion fails, confirming the bug exists, but the tag inversion causes the test to pass).
  4. Would fail CI if the bug were fixed without removing the @tdd_expected_fail tag.

Acceptance Criteria

  • A test is written that captures the bug behavior described in #1080.
  • The test is tagged with @tdd_bug, @tdd_bug_1080, and @tdd_expected_fail.
  • The @tdd_expected_fail tag causes the test to pass CI (the underlying assertion fails as expected, proving the bug exists).
  • The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed.
  • Tag validation rules pass: @tdd_bug_1080 has corresponding @tdd_bug, and @tdd_expected_fail has both.
  • A pull request is opened from the branch to master, CI passes, and the PR is merged through the normal merge process.

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the test and what bug behavior it captures.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, CI passes, and the PR is merged before this issue is marked done.

Subtasks

  • Code: Analyze bug #1080 to identify the exact failure condition, including the inputs, state, and code path that trigger the bug.
  • Code: Determine the appropriate test type (Behave unit test, Robot integration test, or both) and file location for the reproducing test.
  • Tests (Behave): Write a Behave scenario in features/ that captures the bug. Tag the scenario with @tdd_bug, @tdd_bug_1080, and @tdd_expected_fail. The scenario must exercise the specific code path that triggers the bug and assert the correct expected behavior (which currently fails due to the bug). Name the scenario descriptively to indicate it is a bug regression test.
  • Tests (Robot): If the bug involves integration-level behavior, add a Robot test in robot/ with equivalent tags. If purely unit-level, mark N/A with justification.
  • Docs: Add a comment in the test file explaining this test captures bug #1080 and uses @tdd_expected_fail until the fix is merged.
  • Quality: Verify CI passes with the tagged test. Confirm the underlying assertion fails for the correct reason.
  • Quality: Verify tag validation rules pass.
  • Quality: Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage.
  • Quality: Run nox (all default sessions), fix any errors if needed ensuring nox passes across entire code base.
## Metadata - **Commit Message**: `test: add TDD bug-capture test for #1080 — execution env resolution precedence` - **Branch**: `tdd/m5-exec-env-resolution` ## Background and Context This is the TDD counterpart to bug #1080. Per the project's Test-Driven Development workflow for bugs (see `CONTRIBUTING.md` > Bug Fix Workflow), the first step in fixing any bug is to write a test that captures the buggy behavior. The test is tagged with `@tdd_bug`, `@tdd_bug_1080`, and `@tdd_expected_fail` so that it passes CI while the bug is still unfixed. Once the fix is implemented in #1080, the `@tdd_expected_fail` tag will be removed and the test will run normally. See #1080 for full bug details. ## Expected Behavior A new test exists that: 1. Captures the exact failure described in #1080. 2. Is tagged with `@tdd_bug`, `@tdd_bug_1080`, and `@tdd_expected_fail`. 3. Passes CI via the expected-failure mechanism (the underlying assertion fails, confirming the bug exists, but the tag inversion causes the test to pass). 4. Would fail CI if the bug were fixed without removing the `@tdd_expected_fail` tag. ## Acceptance Criteria - [x] A test is written that captures the bug behavior described in #1080. - [x] The test is tagged with `@tdd_bug`, `@tdd_bug_1080`, and `@tdd_expected_fail`. - [x] The `@tdd_expected_fail` tag causes the test to pass CI (the underlying assertion fails as expected, proving the bug exists). - [x] The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed. - [x] Tag validation rules pass: `@tdd_bug_1080` has corresponding `@tdd_bug`, and `@tdd_expected_fail` has both. - [ ] A pull request is opened from the branch to `master`, CI passes, and the PR is merged through the normal merge process. ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the test and what bug behavior it captures. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, CI passes, and the PR is **merged** before this issue is marked done. ## Subtasks - [x] Code: Analyze bug #1080 to identify the exact failure condition, including the inputs, state, and code path that trigger the bug. - [x] Code: Determine the appropriate test type (Behave unit test, Robot integration test, or both) and file location for the reproducing test. - [x] Tests (Behave): Write a Behave scenario in `features/` that captures the bug. Tag the scenario with `@tdd_bug`, `@tdd_bug_1080`, and `@tdd_expected_fail`. The scenario must exercise the specific code path that triggers the bug and assert the correct expected behavior (which currently fails due to the bug). Name the scenario descriptively to indicate it is a bug regression test. - [x] Tests (Robot): If the bug involves integration-level behavior, add a Robot test in `robot/` with equivalent tags. If purely unit-level, mark N/A with justification. - [x] Docs: Add a comment in the test file explaining this test captures bug #1080 and uses `@tdd_expected_fail` until the fix is merged. - [x] Quality: Verify CI passes with the tagged test. Confirm the underlying assertion fails for the correct reason. - [x] Quality: Verify tag validation rules pass. - [x] Quality: Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. - [ ] Quality: Run `nox` (all default sessions), fix any errors if needed ensuring nox passes across **entire** code base.
freemo added this to the v3.5.0 milestone 2026-03-22 16:30:11 +00:00
Member

Implementation Note: Bug #1080 Analysis

Failure Condition Identified

The bug is in cleveragents.application.services.execution_environment_resolver.ExecutionEnvironmentResolver.resolve(). The spec (§Execution Environment Routing, lines 19329–19372 of docs/specification.md) defines a 6-level precedence chain with priority interleaving:

  1. Plan-level execution_environment with priority: override
  2. Project-level execution_environment with priority: override ← the broken level
  3. Nearest-ancestor devcontainer
  4. Plan-level execution_environment with priority: fallback
  5. Project-level execution_environment with priority: fallback
  6. Host (default)

The current resolve() method uses a flat 4-level chain without priority awareness:

def resolve(self, tool_env=None, plan_env=None, project_env=None, default=None):
    for raw in (tool_env, plan_env, project_env, default):
        if raw is not None:
            return self._coerce(raw)
    return self.DEFAULT

This means:

  • plan_env always beats project_env, regardless of whether the plan has fallback priority
  • There is no way to express that a project-level override (priority: override) should beat a plan-level fallback
  • The 6-level interleaving from the spec (override priorities evaluated first, then devcontainer, then fallbacks) is not implemented

Specific Bug Scenario

When plan_env="host" with plan_env_priority="fallback" and project_env="container" with project_env_priority="override", the current resolver returns "host" (plan wins). The spec says the result should be "container" (project override at level 2 beats plan fallback at level 4).

Test Approach

  • Type: Behave unit test (the resolver is a pure domain service with no integration dependencies)
  • Robot: N/A — this is purely unit-level logic; no external services, no I/O, no integration boundaries
  • File: features/tdd_exec_env_resolution_precedence.feature with steps in features/steps/tdd_exec_env_resolution_precedence_steps.py
  • Strategy: Call the resolver's resolve() and assert correct 6-level precedence behavior. The assertion will fail because the resolver doesn't implement priority-aware interleaving. The @tdd_expected_fail tag will invert this to a CI pass.
## Implementation Note: Bug #1080 Analysis ### Failure Condition Identified The bug is in `cleveragents.application.services.execution_environment_resolver.ExecutionEnvironmentResolver.resolve()`. The spec (§Execution Environment Routing, lines 19329–19372 of `docs/specification.md`) defines a **6-level precedence chain** with priority interleaving: 1. Plan-level `execution_environment` with `priority: override` 2. **Project-level `execution_environment` with `priority: override`** ← the broken level 3. Nearest-ancestor devcontainer 4. Plan-level `execution_environment` with `priority: fallback` 5. Project-level `execution_environment` with `priority: fallback` 6. Host (default) The current `resolve()` method uses a **flat 4-level chain** without priority awareness: ```python def resolve(self, tool_env=None, plan_env=None, project_env=None, default=None): for raw in (tool_env, plan_env, project_env, default): if raw is not None: return self._coerce(raw) return self.DEFAULT ``` This means: - `plan_env` **always** beats `project_env`, regardless of whether the plan has `fallback` priority - There is no way to express that a project-level override (priority: override) should beat a plan-level fallback - The 6-level interleaving from the spec (override priorities evaluated first, then devcontainer, then fallbacks) is not implemented ### Specific Bug Scenario When `plan_env="host"` with `plan_env_priority="fallback"` and `project_env="container"` with `project_env_priority="override"`, the current resolver returns `"host"` (plan wins). The spec says the result should be `"container"` (project override at level 2 beats plan fallback at level 4). ### Test Approach - **Type**: Behave unit test (the resolver is a pure domain service with no integration dependencies) - **Robot**: N/A — this is purely unit-level logic; no external services, no I/O, no integration boundaries - **File**: `features/tdd_exec_env_resolution_precedence.feature` with steps in `features/steps/tdd_exec_env_resolution_precedence_steps.py` - **Strategy**: Call the resolver's `resolve()` and assert correct 6-level precedence behavior. The assertion will fail because the resolver doesn't implement priority-aware interleaving. The `@tdd_expected_fail` tag will invert this to a CI pass.
Member

Implementation Note: Test Design and Results

Test Files Created

  • Feature: features/tdd_exec_env_resolution_precedence.feature
  • Steps: features/steps/tdd_exec_env_resolution_precedence_steps.py

Test Design

The feature file has 3 scenarios:

  1. Bug #1080 - Project-level override beats plan-level fallback — Tagged @tdd_expected_fail. Calls ExecutionEnvironmentResolver.resolve() with plan_env="host" and project_env="container", simulating a project with priority: override and a plan with priority: fallback. Asserts the result is "container" (spec level 2 beats level 4). Currently fails with "host" because the resolver uses a flat chain. The @tdd_expected_fail tag inverts this to a CI pass.

  2. Regression guard - Plan-level override still beats project-level override — No expected-fail tag. Verifies that plan_env at override priority (level 1) still beats project_env at override priority (level 2). Passes today and serves as a regression guard.

  3. Regression guard - Project-level override beats host default — No expected-fail tag. Verifies the simple case where project_env is set and no plan_env exists. Passes today.

Robot Test Decision

N/A — The ExecutionEnvironmentResolver is a pure domain service with no I/O, external services, or integration boundaries. All behavior is testable at the unit level with Behave. No Robot test needed.

Quality Gate Results

  • nox -s lint
  • nox -s typecheck
  • nox -s unit_tests (463 features passed, 0 failed, 12234 scenarios passed, 0 failed)
  • nox -s coverage_report (98.4% ≥ 97% threshold)
  • Tag validation (the @tdd_expected_fail/@tdd_bug/@tdd_bug_1080 combination passes all tag rules)
## Implementation Note: Test Design and Results ### Test Files Created - **Feature**: `features/tdd_exec_env_resolution_precedence.feature` - **Steps**: `features/steps/tdd_exec_env_resolution_precedence_steps.py` ### Test Design The feature file has 3 scenarios: 1. **`Bug #1080 - Project-level override beats plan-level fallback`** — Tagged `@tdd_expected_fail`. Calls `ExecutionEnvironmentResolver.resolve()` with `plan_env="host"` and `project_env="container"`, simulating a project with `priority: override` and a plan with `priority: fallback`. Asserts the result is `"container"` (spec level 2 beats level 4). **Currently fails with** `"host"` because the resolver uses a flat chain. The `@tdd_expected_fail` tag inverts this to a CI pass. 2. **`Regression guard - Plan-level override still beats project-level override`** — No expected-fail tag. Verifies that `plan_env` at override priority (level 1) still beats `project_env` at override priority (level 2). Passes today and serves as a regression guard. 3. **`Regression guard - Project-level override beats host default`** — No expected-fail tag. Verifies the simple case where `project_env` is set and no `plan_env` exists. Passes today. ### Robot Test Decision **N/A** — The `ExecutionEnvironmentResolver` is a pure domain service with no I/O, external services, or integration boundaries. All behavior is testable at the unit level with Behave. No Robot test needed. ### Quality Gate Results - `nox -s lint` ✅ - `nox -s typecheck` ✅ - `nox -s unit_tests` ✅ (463 features passed, 0 failed, 12234 scenarios passed, 0 failed) - `nox -s coverage_report` ✅ (98.4% ≥ 97% threshold) - Tag validation ✅ (the @tdd_expected_fail/@tdd_bug/@tdd_bug_1080 combination passes all tag rules)
brent.edwards added reference tdd/m5-exec-env-resolution 2026-03-23 01:10:09 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#1101
No description provided.