Implement @tdd_expected_fail tag handling in Behave environment #627

Closed
opened 2026-03-07 21:31:52 +00:00 by freemo · 5 comments
Owner

Metadata

  • Commit Message: feat(testing): implement @tdd_expected_fail tag handling in Behave environment
  • Branch: feature/m5-behave-tdd-tags

Background and Context

The project's Bug Fix Workflow (see CONTRIBUTING.md) requires a three-tag system for TDD bug-capture tests:

  • @tdd_bug — generic filter tag, always present on TDD bug tests
  • @tdd_bug_<N> — links to specific bug issue number, always present
  • @tdd_expected_fail — behavioral switch that inverts test pass/fail

The Behave test environment must handle these tags so that:

  • When @tdd_expected_fail is present, a test that fails (bug still exists) is reported as passed
  • When @tdd_expected_fail is present, a test that passes (bug was fixed without removing the tag) is reported as failed with a clear error message
  • Tag validation rules are enforced: @tdd_bug_<N> requires @tdd_bug, and @tdd_expected_fail requires both

This allows TDD bug-capture tests to pass CI while the bug is still unfixed, without requiring any exceptions to the "all tests must pass" rule.

Expected Behavior

Behave environment hooks in features/environment.py correctly handle the @tdd_expected_fail tag by inverting test results for tagged scenarios, and enforce tag consistency rules.

Acceptance Criteria

  • Scenarios tagged @tdd_expected_fail that fail have their result inverted to pass (expected failure).
  • Scenarios tagged @tdd_expected_fail that pass have their result inverted to fail with message: "Bug appears to be fixed. Remove the @tdd_expected_fail tag and verify the fix through the bug fix workflow."
  • A scenario with @tdd_bug_<N> but missing @tdd_bug causes a clear validation error.
  • A scenario with @tdd_expected_fail but missing @tdd_bug or @tdd_bug_<N> causes a clear validation error.
  • The @tdd_bug tag can be used to filter/list all TDD bug tests (e.g., behave --tags=@tdd_bug via nox).
  • Normal (non-tagged) test execution is completely unaffected.

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Subtasks

  • Code: Add after_scenario hook to features/environment.py that detects @tdd_expected_fail tag and inverts the scenario's pass/fail result.
  • Code: Implement tag validation in before_scenario hook: if @tdd_bug_<N> is present, verify @tdd_bug is also present. If @tdd_expected_fail is present, verify both @tdd_bug and at least one @tdd_bug_<N> are present. Raise a clear error if validation fails.
  • Code: When a @tdd_expected_fail scenario unexpectedly passes, set the failure message to: "Bug appears to be fixed. Remove the @tdd_expected_fail tag from this scenario and verify the fix through the bug fix workflow. See CONTRIBUTING.md > Bug Fix Workflow."
  • Code: Ensure the tag handling does not interfere with Behave's normal reporting, coverage measurement, or other environment hooks.
  • Docs: Add inline documentation in features/environment.py explaining the three-tag system and referencing CONTRIBUTING.md > TDD Bug Test Tags.
  • Tests (Behave): Add test scenarios that verify: (a) a @tdd_expected_fail scenario that fails is reported as passed, (b) a @tdd_expected_fail scenario that passes is reported as failed, (c) tag validation catches missing @tdd_bug, (d) non-tagged scenarios are unaffected.
  • Tests (Robot): N/A — this is Behave-specific infrastructure.
  • Tests (ASV): N/A — no performance-sensitive code.
  • Quality: Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improve coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun nox -s coverage_report to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%.
  • Quality: Run nox (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across entire code base, do not ignore any failure even if it seems unrelated to this commit, fix it.
## Metadata - **Commit Message**: `feat(testing): implement @tdd_expected_fail tag handling in Behave environment` - **Branch**: `feature/m5-behave-tdd-tags` ## Background and Context The project's Bug Fix Workflow (see `CONTRIBUTING.md`) requires a three-tag system for TDD bug-capture tests: - `@tdd_bug` — generic filter tag, always present on TDD bug tests - `@tdd_bug_<N>` — links to specific bug issue number, always present - `@tdd_expected_fail` — behavioral switch that inverts test pass/fail The Behave test environment must handle these tags so that: - When `@tdd_expected_fail` is present, a test that **fails** (bug still exists) is reported as **passed** - When `@tdd_expected_fail` is present, a test that **passes** (bug was fixed without removing the tag) is reported as **failed** with a clear error message - Tag validation rules are enforced: `@tdd_bug_<N>` requires `@tdd_bug`, and `@tdd_expected_fail` requires both This allows TDD bug-capture tests to pass CI while the bug is still unfixed, without requiring any exceptions to the "all tests must pass" rule. ## Expected Behavior Behave environment hooks in `features/environment.py` correctly handle the `@tdd_expected_fail` tag by inverting test results for tagged scenarios, and enforce tag consistency rules. ## Acceptance Criteria - [ ] Scenarios tagged `@tdd_expected_fail` that fail have their result inverted to pass (expected failure). - [ ] Scenarios tagged `@tdd_expected_fail` that pass have their result inverted to fail with message: "Bug appears to be fixed. Remove the @tdd_expected_fail tag and verify the fix through the bug fix workflow." - [ ] A scenario with `@tdd_bug_<N>` but missing `@tdd_bug` causes a clear validation error. - [ ] A scenario with `@tdd_expected_fail` but missing `@tdd_bug` or `@tdd_bug_<N>` causes a clear validation error. - [ ] The `@tdd_bug` tag can be used to filter/list all TDD bug tests (e.g., `behave --tags=@tdd_bug` via nox). - [ ] Normal (non-tagged) test execution is completely unaffected. ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. ## Subtasks - [ ] Code: Add `after_scenario` hook to `features/environment.py` that detects `@tdd_expected_fail` tag and inverts the scenario's pass/fail result. - [ ] Code: Implement tag validation in `before_scenario` hook: if `@tdd_bug_<N>` is present, verify `@tdd_bug` is also present. If `@tdd_expected_fail` is present, verify both `@tdd_bug` and at least one `@tdd_bug_<N>` are present. Raise a clear error if validation fails. - [ ] Code: When a `@tdd_expected_fail` scenario unexpectedly passes, set the failure message to: "Bug appears to be fixed. Remove the @tdd_expected_fail tag from this scenario and verify the fix through the bug fix workflow. See CONTRIBUTING.md > Bug Fix Workflow." - [ ] Code: Ensure the tag handling does not interfere with Behave's normal reporting, coverage measurement, or other environment hooks. - [ ] Docs: Add inline documentation in `features/environment.py` explaining the three-tag system and referencing `CONTRIBUTING.md > TDD Bug Test Tags`. - [ ] Tests (Behave): Add test scenarios that verify: (a) a `@tdd_expected_fail` scenario that fails is reported as passed, (b) a `@tdd_expected_fail` scenario that passes is reported as failed, (c) tag validation catches missing `@tdd_bug`, (d) non-tagged scenarios are unaffected. - [ ] Tests (Robot): N/A — this is Behave-specific infrastructure. - [ ] Tests (ASV): N/A — no performance-sensitive code. - [ ] Quality: Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improve coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun `nox -s coverage_report` to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%. - [ ] Quality: Run `nox` (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across **entire** code base, do not ignore any failure even if it seems unrelated to this commit, fix it.
freemo added this to the v3.4.0 milestone 2026-03-07 23:06:37 +00:00
Author
Owner

Triage — Day 27

Status: Triaged and labeled.

Actions Taken

  • Added labels: Type/Feature, Priority/High, MoSCoW/Must have, State/Unverified, Points/5
  • Assigned to: @brent.edwards (QA lead, most familiar with test infrastructure)
  • Milestone: v3.4.0 (M5 — TDD infrastructure)

Context

This is a blocking dependency for all 9 TDD bug counterpart issues (#630-#638). The Behave @tdd_expected_fail tag handler is required for TDD tests to pass CI while bugs remain unfixed.

However, TDD counterpart issues (#630-#638) have been created now and can be worked on immediately — the tests can be written and will simply fail in CI until this infrastructure is in place. The important thing is that the test code exists on the tdd/ branches.

Priority Note

While labeled Priority/High (not Critical — it's infrastructure, not a bug), this should be prioritized as it unblocks the entire TDD bug fix pipeline.

## Triage — Day 27 **Status**: Triaged and labeled. ### Actions Taken - Added labels: `Type/Feature`, `Priority/High`, `MoSCoW/Must have`, `State/Unverified`, `Points/5` - Assigned to: @brent.edwards (QA lead, most familiar with test infrastructure) - Milestone: v3.4.0 (M5 — TDD infrastructure) ### Context This is a blocking dependency for all 9 TDD bug counterpart issues (#630-#638). The Behave `@tdd_expected_fail` tag handler is required for TDD tests to pass CI while bugs remain unfixed. However, TDD counterpart issues (#630-#638) have been created now and can be worked on immediately — the tests can be written and will simply fail in CI until this infrastructure is in place. The important thing is that the test *code* exists on the `tdd/` branches. ### Priority Note While labeled `Priority/High` (not Critical — it's infrastructure, not a bug), this should be prioritized as it unblocks the entire TDD bug fix pipeline.
freemo modified the milestone from v3.4.0 to v3.2.0 2026-03-09 20:09:23 +00:00
Author
Owner

PM Note (Day 29) — Reassignment and Milestone Change

Changes:

  • Assignee: @brent.edwards → @CoreRasurae
  • Milestone: v3.4.0 → v3.2.0
  • State: Unverified → Verified

Rationale: TDD infrastructure (#627, #628, #629) is a prerequisite for the formal bug-fix TDD workflow described in CONTRIBUTING.md. Without the @tdd_expected_fail tag handling, the team is using @wip as a workaround, which does not enforce the quality gate. Moving these to v3.2.0 reflects their actual priority as blocking infrastructure.

Brent currently has 15 issues / 55 SP and is a single point of failure for M3 bug fixes that have active branches. Luis has capacity and experience with the Behave test infrastructure.

@CoreRasurae — This involves adding @tdd_expected_fail tag support to the Behave environment configuration (features/environment.py). Tests tagged with this marker should be expected to fail (proving a bug exists) and should pass the test suite without being treated as regressions. See CONTRIBUTING.md's Bug Fix Workflow section for the full specification.

**PM Note (Day 29) — Reassignment and Milestone Change** **Changes:** - **Assignee**: @brent.edwards → @CoreRasurae - **Milestone**: v3.4.0 → **v3.2.0** - **State**: Unverified → **Verified** **Rationale:** TDD infrastructure (#627, #628, #629) is a prerequisite for the formal bug-fix TDD workflow described in CONTRIBUTING.md. Without the `@tdd_expected_fail` tag handling, the team is using `@wip` as a workaround, which does not enforce the quality gate. Moving these to v3.2.0 reflects their actual priority as blocking infrastructure. Brent currently has 15 issues / 55 SP and is a single point of failure for M3 bug fixes that have active branches. Luis has capacity and experience with the Behave test infrastructure. @CoreRasurae — This involves adding `@tdd_expected_fail` tag support to the Behave environment configuration (`features/environment.py`). Tests tagged with this marker should be expected to fail (proving a bug exists) and should pass the test suite without being treated as regressions. See CONTRIBUTING.md's Bug Fix Workflow section for the full specification.
Member

Implementation Summary

Branch: feature/m5-behave-tdd-tags
Commit: 2382f9cf649ae256d82cedff1b26389ab8b0c58b
PR: #665

Changes

  • features/environment.py: Added validate_tdd_tags() and should_invert_result() helper functions. Implemented tag validation in before_scenario (enforces @tdd_bug + @tdd_bug_<N> prerequisites for @tdd_expected_fail). Implemented result inversion via Scenario.run() monkey-patch in before_all@tdd_expected_fail scenarios that fail are reported as passed; scenarios that unexpectedly pass are reported as failed with guidance message.

  • features/testing/tdd_tag_validation.feature: 13 BDD scenarios covering tag validation (missing @tdd_bug, missing @tdd_bug_<N>, valid combinations) and result inversion behavior.

  • features/testing/tdd_expected_fail_demo.feature: 1 demo scenario with @tdd_expected_fail demonstrating expected-failure inversion.

  • features/steps/tdd_tag_validation_steps.py: Step definitions for all tag validation scenarios.

Design Decision

Behave's Scenario.run() returns a local failed boolean that after_scenario hooks cannot modify. The TDD tag inversion was therefore implemented via a Scenario.run() monkey-patch applied in before_all, which wraps the original method to invert the return value for @tdd_expected_fail tagged scenarios.

Quality Gates

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (9713 scenarios, 351 features)
nox -s integration_tests PASS (1342 tests)
nox -s coverage_report PASS (98.7% >= 97% threshold)
## Implementation Summary **Branch**: `feature/m5-behave-tdd-tags` **Commit**: `2382f9cf649ae256d82cedff1b26389ab8b0c58b` **PR**: #665 ### Changes - **`features/environment.py`**: Added `validate_tdd_tags()` and `should_invert_result()` helper functions. Implemented tag validation in `before_scenario` (enforces `@tdd_bug` + `@tdd_bug_<N>` prerequisites for `@tdd_expected_fail`). Implemented result inversion via `Scenario.run()` monkey-patch in `before_all` — `@tdd_expected_fail` scenarios that fail are reported as passed; scenarios that unexpectedly pass are reported as failed with guidance message. - **`features/testing/tdd_tag_validation.feature`**: 13 BDD scenarios covering tag validation (missing `@tdd_bug`, missing `@tdd_bug_<N>`, valid combinations) and result inversion behavior. - **`features/testing/tdd_expected_fail_demo.feature`**: 1 demo scenario with `@tdd_expected_fail` demonstrating expected-failure inversion. - **`features/steps/tdd_tag_validation_steps.py`**: Step definitions for all tag validation scenarios. ### Design Decision Behave's `Scenario.run()` returns a local `failed` boolean that `after_scenario` hooks cannot modify. The TDD tag inversion was therefore implemented via a `Scenario.run()` monkey-patch applied in `before_all`, which wraps the original method to invert the return value for `@tdd_expected_fail` tagged scenarios. ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (9713 scenarios, 351 features) | | `nox -s integration_tests` | PASS (1342 tests) | | `nox -s coverage_report` | PASS (98.7% >= 97% threshold) |
Author
Owner

PM Acknowledgment (Day 31):

Thank you @CoreRasurae. PR #665 is submitted — good progress.

Status: PR #665 has no reviews yet and a merge conflict. This is the #1 item on the project critical path — it unblocks the CI quality gate (#629) which unblocks all bug fix PR merges.

Action needed:

  1. Rebase PR #665 against current develop to resolve merge conflict
  2. Request review from @brent.edwards (QA specialist familiar with the test infrastructure)

Priority: CRITICAL — this blocks M3 closure.

**PM Acknowledgment (Day 31)**: Thank you @CoreRasurae. PR #665 is submitted — good progress. **Status**: PR #665 has **no reviews yet** and a **merge conflict**. This is the #1 item on the project critical path — it unblocks the CI quality gate (#629) which unblocks all bug fix PR merges. **Action needed**: 1. Rebase PR #665 against current `develop` to resolve merge conflict 2. Request review from @brent.edwards (QA specialist familiar with the test infrastructure) **Priority**: CRITICAL — this blocks M3 closure.
Author
Owner

PM Follow-up — Day 31 (2026-03-11)

PR #665 still has a merge conflict and no reviews.

@CoreRasurae — please rebase PR #665 onto current master at your earliest convenience. This is on the critical path for the TDD infrastructure pipeline.

Once rebased, we need a reviewer assigned. @hurui200320 — would you be available to review PR #665 given your experience implementing the Robot counterpart (#628/PR #673)?

## PM Follow-up — Day 31 (2026-03-11) PR #665 still has a merge conflict and no reviews. @CoreRasurae — please rebase PR #665 onto current master at your earliest convenience. This is on the critical path for the TDD infrastructure pipeline. Once rebased, we need a reviewer assigned. @hurui200320 — would you be available to review PR #665 given your experience implementing the Robot counterpart (#628/PR #673)?
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
Depends on
Reference
cleveragents/cleveragents-core#627
No description provided.