feat(tests): replace all @skip tags with proper @tdd_expected_fail tags or remove them across the entire codebase #7221

2026-04-10T09:19:24Z

hurui200320 commented

2026-04-10 09:19:24 +00:00

Summary

Replaces all 234 bare @skip occurrences across 82 Behave feature files with the correct TDD issue-capture tagging system described in CONTRIBUTING.md § Bug Fix Workflow.

Previously, the noxfile ran Behave with --tags=not @skip, silently excluding all @skip-tagged scenarios from every CI run. These tests never ran, never inverted results via the @tdd_expected_fail mechanism, and never contributed to coverage — defeating the purpose of TDD issue-capture testing. Every @skip occurrence had a commented-out hint line immediately above it showing the intended proper tags (e.g., # @tdd_issue @tdd_issue_4272 @tdd_expected_fail @skip), confirming they were all intended for conversion.

Changes

Mechanical conversion (234 replacements across 82 files)

Extracted the proper TDD tags from the comment hint above each @skip line, removed @skip from the tag set, and replaced the @skip line with those tags.
Removed the now-redundant comment hint lines alongside each replacement.

Bug-fixed scenarios — `@tdd_expected_fail` removed (84 scenarios)

After conversion, ran nox -s unit_tests to identify which newly-enabled @tdd_expected_fail scenarios now pass (their referenced bugs have already been fixed). Removed @tdd_expected_fail from those 84 scenarios and their corresponding feature-level tags, leaving only the permanent @tdd_issue @tdd_issue_<N> regression-guard tags.
Affected features include: tdd_tool_runner_env_precedence, tdd_automation_profile_session_leak, tls_certificate_check, project_create_persist, resource_type_bootstrap_*, and 18 others.

Noxfile cleanup

Removed all four --tags=not @skip arguments from noxfile.py (unit_tests and coverage sessions). With zero @skip tags remaining in the codebase, this filter was dead code and its presence would mislead future maintainers into thinking @skip is still a supported escape mechanism.

Regression guard files

Split the regression guards into two focused files:
- tdd_regression_guards_exec_env.feature for bug #4281 (exec-env precedence)
- tdd_regression_guards_session_list.feature for bug #4271 (session list summary)
Each file carries only its own @tdd_issue tags at the feature level, avoiding cross-contamination via Behave tag inheritance. The Background step (session-list-summary mock) only appears in the session-list file where it is actually needed.

Duplicate tag cleanup

Removed duplicate @tdd_issue @tdd_issue_4287 tag lines in tdd_skill_add_regression.feature (lines 20 and 29).

Inline comment for retained `@tdd_expected_fail`

Added inline comment in ci_workflow_validation.feature:134 explaining why this specific #4227 scenario retains @tdd_expected_fail despite #4227 being closed (CI YAML does not encode threshold as a machine-readable value).

Known edge cases — `@tdd_expected_fail` retained (closed issues, fix on master, scenarios still fail)

The following issues are closed and their fixes are on master, but the specific test assertions still fail because the fixes address other aspects of the bugs. The @tdd_expected_fail tags are functionally correct and must remain until the specific scenario assertions pass:

tdd_exec_env_resolution_precedence.feature — bug #1080 (closed 2026-03-31). The precedence-level-2-vs-4 scenario still fails.
session_list_summary_dedup.feature — bug #3046 (closed 2026-04-05). The dedup-consistency scenarios still fail.
actor_add_update_enforcement.feature — bug #2609 (closed 2026-04-05). The enforcement scenarios still fail.
ci_workflow_validation.feature:134 — #4227 (closed 2026-04-08). The CI YAML threshold assertion still fails.

Verification

grep -r "@skip" features/ --include="*.feature" → zero results ✓
grep -n "tags=not @skip" noxfile.py → zero results ✓
nox -s unit_tests → 629 features passed, 0 failed ✓ (up from ~545 before this PR)
CI all green (coverage ≥ 97%) ✓
Integration tests (Robot Framework) do not use @skip — confirmed no action needed ✓
E2E tests (Robot Framework) do not use @skip — confirmed no action needed ✓
CHANGELOG.md updated with entry for this change ✓
CONTRIBUTORS.md — Rui Hu already listed ✓

Issues Addressed

Closes #7025

## Summary Replaces all 234 bare `@skip` occurrences across 82 Behave feature files with the correct TDD issue-capture tagging system described in CONTRIBUTING.md § Bug Fix Workflow. Previously, the noxfile ran Behave with `--tags=not @skip`, silently excluding all `@skip`-tagged scenarios from every CI run. These tests never ran, never inverted results via the `@tdd_expected_fail` mechanism, and never contributed to coverage — defeating the purpose of TDD issue-capture testing. Every `@skip` occurrence had a commented-out hint line immediately above it showing the intended proper tags (e.g., `# @tdd_issue @tdd_issue_4272 @tdd_expected_fail @skip`), confirming they were all intended for conversion. ## Changes ### Mechanical conversion (234 replacements across 82 files) - Extracted the proper TDD tags from the comment hint above each `@skip` line, removed `@skip` from the tag set, and replaced the `@skip` line with those tags. - Removed the now-redundant comment hint lines alongside each replacement. ### Bug-fixed scenarios — `@tdd_expected_fail` removed (84 scenarios) - After conversion, ran `nox -s unit_tests` to identify which newly-enabled `@tdd_expected_fail` scenarios now **pass** (their referenced bugs have already been fixed). Removed `@tdd_expected_fail` from those 84 scenarios and their corresponding feature-level tags, leaving only the permanent `@tdd_issue @tdd_issue_<N>` regression-guard tags. - Affected features include: `tdd_tool_runner_env_precedence`, `tdd_automation_profile_session_leak`, `tls_certificate_check`, `project_create_persist`, `resource_type_bootstrap_*`, and 18 others. ### Noxfile cleanup - Removed all four `--tags=not @skip` arguments from `noxfile.py` (unit_tests and coverage sessions). With zero `@skip` tags remaining in the codebase, this filter was dead code and its presence would mislead future maintainers into thinking `@skip` is still a supported escape mechanism. ### Regression guard files - Split the regression guards into two focused files: - `tdd_regression_guards_exec_env.feature` for bug #4281 (exec-env precedence) - `tdd_regression_guards_session_list.feature` for bug #4271 (session list summary) - Each file carries only its own `@tdd_issue` tags at the feature level, avoiding cross-contamination via Behave tag inheritance. The `Background` step (`session-list-summary mock`) only appears in the session-list file where it is actually needed. ### Duplicate tag cleanup - Removed duplicate `@tdd_issue @tdd_issue_4287` tag lines in `tdd_skill_add_regression.feature` (lines 20 and 29). ### Inline comment for retained `@tdd_expected_fail` - Added inline comment in `ci_workflow_validation.feature:134` explaining why this specific #4227 scenario retains `@tdd_expected_fail` despite #4227 being closed (CI YAML does not encode threshold as a machine-readable value). ### Known edge cases — `@tdd_expected_fail` retained (closed issues, fix on master, scenarios still fail) The following issues are **closed** and their fixes **are on master**, but the specific test assertions still fail because the fixes address other aspects of the bugs. The `@tdd_expected_fail` tags are functionally correct and must remain until the specific scenario assertions pass: - `tdd_exec_env_resolution_precedence.feature` — bug #1080 (closed 2026-03-31). The precedence-level-2-vs-4 scenario still fails. - `session_list_summary_dedup.feature` — bug #3046 (closed 2026-04-05). The dedup-consistency scenarios still fail. - `actor_add_update_enforcement.feature` — bug #2609 (closed 2026-04-05). The enforcement scenarios still fail. - `ci_workflow_validation.feature:134` — #4227 (closed 2026-04-08). The CI YAML threshold assertion still fails. ## Verification - `grep -r "@skip" features/ --include="*.feature"` → **zero results** ✓ - `grep -n "tags=not @skip" noxfile.py` → **zero results** ✓ - `nox -s unit_tests` → **629 features passed, 0 failed** ✓ (up from ~545 before this PR) - CI all green (coverage ≥ 97%) ✓ - Integration tests (Robot Framework) do not use `@skip` — confirmed no action needed ✓ - E2E tests (Robot Framework) do not use `@skip` — confirmed no action needed ✓ - `CHANGELOG.md` updated with entry for this change ✓ - `CONTRIBUTORS.md` — Rui Hu already listed ✓ ## Issues Addressed Closes #7025

hurui200320 added this to the v3.5.0 milestone 2026-04-10 09:19:32 +00:00

hurui200320 added the

labels 2026-04-10 09:19:39 +00:00

hurui200320 added a new dependency 2026-04-10 09:20:03 +00:00

#7025 feat(tests): replace all @skip tags with proper @tdd_expected_fail tags or remove them across the entire codebase

hurui200320 referenced this pull request

2026-04-10 09:20:56 +00:00

feat(tests): replace all @skip tags with proper @tdd_expected_fail tags or remove them across the entire codebase #7025

HAL9000 reviewed 2026-04-10 09:23:15 +00:00

HAL9000 left a comment

PR Review — `feat(tests): replace all @skip tags with proper @tdd_expected_fail tags`

PR #7221 | Branch: feat/skip-tag-to-tdd-expected-fail → master | Closes #7025 | Author: hurui200320 (Rui Hu)

Summary

This is a significant testing improvement PR that replaces 234 bare @skip occurrences across 82 Behave feature files with the correct TDD issue-capture tagging system. The PR also identifies and removes @tdd_expected_fail from 84 scenarios whose referenced bugs have already been fixed.

✅ Strengths

Proper issue reference — Closes #7025 is present in the PR body.
Labels applied — Priority/Medium, State/In Review, Type/Task labels are present.
Milestone assigned — v3.5.0 milestone is set.
Comprehensive description — PR body explains all changes, verification steps, and known edge cases.
Verification performed — grep -r "@skip" features/ confirms zero results; nox -s unit_tests shows 629 features passed.
CHANGELOG updated — Mentioned in PR description.
CONTRIBUTORS.md — Rui Hu already listed.
Correct TDD workflow — Properly follows CONTRIBUTING.md §Bug Fix Workflow for @tdd_expected_fail removal.
New regression guards file — features/tdd_regression_guards.feature correctly handles edge cases.
Known edge cases documented — tdd_exec_env_resolution_precedence.feature and session_list_summary_dedup.feature retain @tdd_expected_fail with explanation.

❌ Issues Requiring Attention

1. 🔴 BLOCKER — Missing Forgejo Dependency Link

Per CONTRIBUTING.md §Pull Request Process, rule 1:

"Add the linked issue as a Forgejo dependency on the PR with the correct direction: the PR must be marked as blocking the issue."

The Closes #7025 keyword is present in the body, but the Forgejo machine-readable dependency link (PR blocks issue) must also be set.

2. ⚠️ MEDIUM — Commit Footer

Per CONTRIBUTING.md §Commit Message Format, commits should include ISSUES CLOSED: #7025 in the footer.

3. ⚠️ MEDIUM — Integration Tests Not Mentioned

Per CONTRIBUTING.md §Testing Philosophy:

"Every coding task must include or update tests at multiple levels: unit tests, integration tests, and performance benchmarks."

The PR description mentions nox -s unit_tests passed (629 features). However, it doesn't mention whether nox -s integration_tests (Robot Framework) was run. Since this PR modifies Behave feature files, the Robot Framework integration tests should still pass. Please confirm nox -s integration_tests was run and passed.

4. ⚠️ MEDIUM — Coverage Not Verified

The PR description doesn't mention running nox -s coverage_report. With 84 previously-skipped scenarios now running, coverage may have changed. Please confirm coverage is still ≥97%.

Content Review

The approach is correct and well-executed:

Mechanical conversion — Extracting proper TDD tags from comment hints and replacing @skip is the right approach.
Bug-fixed scenario identification — Running nox -s unit_tests to identify which @tdd_expected_fail scenarios now pass is exactly the correct workflow per CONTRIBUTING.md.
@tdd_expected_fail removal — Removing @tdd_expected_fail from 84 fixed scenarios while keeping @tdd_issue and @tdd_issue_<N> tags is correct per the TDD workflow.
Edge case handling — Retaining @tdd_expected_fail at the feature level for still-open bugs (#1080, #3046) is correct.

This is a high-quality, well-executed PR that significantly improves test coverage and CI reliability.

Verdict

COMMENT — Excellent work! The implementation is correct and follows the TDD workflow precisely. The Forgejo dependency link must be set before merge. Please also confirm integration tests and coverage pass.

Automated by CleverAgents Bot
Supervisor: PR Review Pool | Agent: pr-review-pool-supervisor

## PR Review — `feat(tests): replace all @skip tags with proper @tdd_expected_fail tags` **PR #7221** | Branch: `feat/skip-tag-to-tdd-expected-fail` → `master` | Closes #7025 | Author: hurui200320 (Rui Hu) ### Summary This is a significant testing improvement PR that replaces 234 bare `@skip` occurrences across 82 Behave feature files with the correct TDD issue-capture tagging system. The PR also identifies and removes `@tdd_expected_fail` from 84 scenarios whose referenced bugs have already been fixed. ### ✅ Strengths 1. **Proper issue reference** — `Closes #7025` is present in the PR body. 2. **Labels applied** — `Priority/Medium`, `State/In Review`, `Type/Task` labels are present. 3. **Milestone assigned** — v3.5.0 milestone is set. 4. **Comprehensive description** — PR body explains all changes, verification steps, and known edge cases. 5. **Verification performed** — `grep -r "@skip" features/` confirms zero results; `nox -s unit_tests` shows 629 features passed. 6. **CHANGELOG updated** — Mentioned in PR description. 7. **CONTRIBUTORS.md** — Rui Hu already listed. 8. **Correct TDD workflow** — Properly follows CONTRIBUTING.md §Bug Fix Workflow for `@tdd_expected_fail` removal. 9. **New regression guards file** — `features/tdd_regression_guards.feature` correctly handles edge cases. 10. **Known edge cases documented** — `tdd_exec_env_resolution_precedence.feature` and `session_list_summary_dedup.feature` retain `@tdd_expected_fail` with explanation. ### ❌ Issues Requiring Attention #### 1. 🔴 BLOCKER — Missing Forgejo Dependency Link Per CONTRIBUTING.md §Pull Request Process, rule 1: > *"Add the linked issue as a Forgejo dependency on the PR with the correct direction: the PR must be marked as **blocking** the issue."* The `Closes #7025` keyword is present in the body, but the Forgejo machine-readable dependency link (PR blocks issue) must also be set. #### 2. ⚠️ MEDIUM — Commit Footer Per CONTRIBUTING.md §Commit Message Format, commits should include `ISSUES CLOSED: #7025` in the footer. #### 3. ⚠️ MEDIUM — Integration Tests Not Mentioned Per CONTRIBUTING.md §Testing Philosophy: > *"Every coding task must include or update tests at multiple levels: unit tests, integration tests, and performance benchmarks."* The PR description mentions `nox -s unit_tests` passed (629 features). However, it doesn't mention whether `nox -s integration_tests` (Robot Framework) was run. Since this PR modifies Behave feature files, the Robot Framework integration tests should still pass. Please confirm `nox -s integration_tests` was run and passed. #### 4. ⚠️ MEDIUM — Coverage Not Verified The PR description doesn't mention running `nox -s coverage_report`. With 84 previously-skipped scenarios now running, coverage may have changed. Please confirm coverage is still ≥97%. ### Content Review The approach is correct and well-executed: 1. **Mechanical conversion** — Extracting proper TDD tags from comment hints and replacing `@skip` is the right approach. 2. **Bug-fixed scenario identification** — Running `nox -s unit_tests` to identify which `@tdd_expected_fail` scenarios now pass is exactly the correct workflow per CONTRIBUTING.md. 3. **`@tdd_expected_fail` removal** — Removing `@tdd_expected_fail` from 84 fixed scenarios while keeping `@tdd_issue` and `@tdd_issue_<N>` tags is correct per the TDD workflow. 4. **Edge case handling** — Retaining `@tdd_expected_fail` at the feature level for still-open bugs (#1080, #3046) is correct. This is a high-quality, well-executed PR that significantly improves test coverage and CI reliability. ### Verdict **COMMENT** — Excellent work! The implementation is correct and follows the TDD workflow precisely. The Forgejo dependency link must be set before merge. Please also confirm integration tests and coverage pass. --- **Automated by CleverAgents Bot** Supervisor: PR Review Pool | Agent: pr-review-pool-supervisor

Issue	Status	Resolution
M1 — Two commits instead of one	✅ Fixed	Squashed into a single commit with CHANGELOG included
M2 — Issue statuses factually wrong	✅ Fixed	PR description now correctly states #1080, #3046, #2609 are closed but their fixes are on master and the specific scenario assertions still fail. `@tdd_expected_fail` tags are functionally correct.
M3 — Coverage ≥ 97% not demonstrated	✅ Addressed	CI is all green; coverage ≥ 97% confirmed via CI pipeline
M4 — Background step mismatched to scenarios	✅ Fixed	Split `tdd_regression_guards.feature` into `tdd_regression_guards_exec_env.feature` (no Background) and `tdd_regression_guards_session_list.feature` (with Background)
M5 — Feature-level tag cross-contamination	✅ Fixed	Each split file carries only its own `@tdd_issue_XXXX` tag at feature level

Issue	Status	Resolution
m1 — `--tags=not @skip` dead code in noxfile	✅ Fixed	Removed all four occurrences from `noxfile.py`
m2 — Duplicate tag lines in `tdd_skill_add_regression.feature`	✅ Fixed	Removed duplicate standalone `@tdd_issue @tdd_issue_4287` lines
m3 — Trailing double blank line	✅ Fixed	Removed extra trailing blank line from `tdd_exec_env_resolution_precedence.feature`
m4 — Description claims bugs are "fixed"	✅ Fixed	Reworded to: "The specific assertions below pass on the current codebase. The parent issue remains open for other aspects of the bug."
m5 — #4227 retention not documented	✅ Fixed	Added inline comment in `ci_workflow_validation.feature:134` explaining why this scenario retains `@tdd_expected_fail`

Issue	Status	Resolution
Missing Forgejo dependency link	ℹ️ Note	`Closes #7025` is in the PR body
Commit footer	✅ Fixed	`ISSUES CLOSED: #7025` added to commit footer
Integration tests not mentioned	✅ Verified	Robot Framework integration tests do NOT use `@skip` tags — they use `Skip If No LLM Keys` and tag exclusion (`--exclude slow`, `--exclude E2E`), which is a different mechanism. No changes needed.
Coverage not verified	✅ Addressed	CI reports all green (coverage ≥ 97%)

Issue	Previous Status	Current Status
M1 — Two commits (CHANGELOG split)	❌ Blocker	✅ Resolved — squashed into single `feat(tests):` commit `840bacf` with CHANGELOG included
M2 — Issue statuses factually wrong	❌ Major	✅ Resolved — PR description now correctly states #1080, #3046, #2609 are closed but fixes not yet reflected in specific scenario assertions
M3 — Coverage ≥ 97% not demonstrated	❌ Major	✅ Resolved — `nox -e coverage_report` confirmed 97% (comment #185004)
M4 — Background step mismatched to scenarios	❌ Major	✅ Resolved — split into `tdd_regression_guards_exec_env.feature` and `tdd_regression_guards_session_list.feature`
M5 — Feature-level tag cross-contamination	❌ Major	✅ Resolved — each split file carries only its own `@tdd_issue_XXXX` tag
m1 — `--tags=not @skip` dead code in noxfile	⚠️ Minor	✅ Resolved — all four occurrences removed from `noxfile.py`
m2 — Duplicate tag lines in `tdd_skill_add_regression.feature`	⚠️ Minor	✅ Resolved — duplicate lines removed
m3 — Trailing double blank line	⚠️ Minor	✅ Resolved
m4 — Description claims bugs "fixed" when open	⚠️ Minor	✅ Resolved — reworded accurately
m5 — #4227 retention undocumented	⚠️ Minor	✅ Resolved — inline comment added at `ci_workflow_validation.feature:134`
Forgejo dependency link	❌ Blocker	✅ Resolved — PR now blocks issue #7025
Commit footer `ISSUES CLOSED: #7025`	⚠️ Medium	✅ Resolved — present in `840bacf`
Integration tests not mentioned	⚠️ Medium	✅ Resolved — Robot Framework tests confirmed to use different skip mechanism; no `@skip` tags present
Coverage not verified	⚠️ Medium	✅ Resolved — 97% confirmed

feat(tests): replace all @skip tags with proper @tdd_expected_fail tags or remove them across the entire codebase #7221

Summary

Changes

Mechanical conversion (234 replacements across 82 files)

Bug-fixed scenarios — @tdd_expected_fail removed (84 scenarios)

Noxfile cleanup

Regression guard files

Duplicate tag cleanup

Inline comment for retained @tdd_expected_fail

Known edge cases — @tdd_expected_fail retained (closed issues, fix on master, scenarios still fail)

Verification

Issues Addressed

PR Review — feat(tests): replace all @skip tags with proper @tdd_expected_fail tags

Summary

✅ Strengths

❌ Issues Requiring Attention

1. 🔴 BLOCKER — Missing Forgejo Dependency Link

2. ⚠️ MEDIUM — Commit Footer

3. ⚠️ MEDIUM — Integration Tests Not Mentioned

4. ⚠️ MEDIUM — Coverage Not Verified

Content Review

Verdict

PR Review: !7221 (Ticket #7025)

Verdict: Request Changes

Critical Issues

Major Issues

Minor Issues

Nits

Summary

Review Fixes Applied

Major Issues

Minor Issues

HAL9000 Review Items

Integration & E2E Test Skip-Tag Audit

Follow-up Review — PR #7221 feat(tests): replace all @skip tags with proper @tdd_expected_fail tags

Status Check

Review of New Commit c403d6ae

Current PR Status

Fix: Gherkin parser error in regression guard feature files

Root Cause

Fix

Verification

Code Review — PR #7221

Previous Review Issues — Resolution Status

Current State Review

Commit Structure

CI Status

TDD Tag Compliance

Noxfile Cleanup

CHANGELOG

Labels, Milestone, Closing Keyword

Flaky Test Assessment

Summary

Code Review — PR #7221

Summary

Checks

Code Review — PR #7221

Review History Context

CI Status

Specification & Scope Alignment

CONTRIBUTING.md Compliance

TDD Tag Compliance

Naming Conventions (Focus Area)

Code Patterns (Focus Area)

API Consistency (Focus Area)

Flaky Test Assessment

Minor Observations (Non-blocking)

Summary

Code Review — PR #7221

Decision: APPROVED ✅

[AUTO-PRMRG-7221] Rebase Complete — Awaiting CI

Bug-fixed scenarios — `@tdd_expected_fail` removed (84 scenarios)

Inline comment for retained `@tdd_expected_fail`

Known edge cases — `@tdd_expected_fail` retained (closed issues, fix on master, scenarios still fail)

PR Review — `feat(tests): replace all @skip tags with proper @tdd_expected_fail tags`

Follow-up Review — PR #7221 `feat(tests): replace all @skip tags with proper @tdd_expected_fail tags`

Review of New Commit `c403d6ae`