Bug: show behave-parallel logs only for failed or errored chunks #8351

Closed
opened 2026-04-13 11:37:07 +00:00 by hurui200320 · 6 comments
Member

Metadata

  • Commit Message: fix(testing): print behave-parallel worker logs only for failed chunks
  • Branch: bugfix/m3-behave-parallel-failed-chunk-logs

Background and Context

nox -s unit_tests uses the in-process parallel Behave runner in scripts/run_behave_parallel.py.

In parallel mode, each worker captures stdout/stderr and returns the buffers to the parent process, and the parent currently replays logs for all chunks. This creates noisy output and makes failure diagnostics harder to spot quickly in CI and local runs.

The desired behavior is chunk-level filtering: show logs only when a worker chunk failed/errored (or crashed), and suppress logs for fully passing chunks.

Current Behavior

  • Worker chunks capture stdout/stderr with redirection.
  • Parent process prints captured stdout/stderr for every worker chunk.
  • Passing chunk logs add noise and can obscure failed/errored diagnostics.
  • Worker crash paths are not guaranteed to surface actionable diagnostics consistently.

Expected Behavior

  • Keep chunk-level capture in parallel mode.
  • Replay captured stdout/stderr only for chunks whose summary has failures/errors.
  • Also replay diagnostics when a worker crashes or when no scenarios ran in a chunk unexpectedly.
  • Suppress captured logs for fully passing chunks.
  • Preserve existing summary merge + exit semantics (including @tdd_expected_fail summary-based exit behavior and the no-scenarios safety net).

Scope

  • Target file: scripts/run_behave_parallel.py
  • Expected touchpoints:
    • _worker_run_features
    • parallel aggregation loop in main()
    • failure/no-scenarios checks used for conditional replay

Duplicate Check

Checked related open work and found no direct duplicate for this behavior change:

  • #8326 ([AUTO-INF-5] Fix behave testing configuration)
  • #8330 ([AUTO-INF-14] Reduce Flakiness in Behave Suite)
  • #8318 ([AUTO-INF-7] Improve accuracy of coverage reporting)
  • #8294 and PR #8325 (non-AssertionError guard visibility in TDD inversion)

These are adjacent test-infra topics, but none addresses conditional replay of worker stdout/stderr in run_behave_parallel.py.

Subtasks

  • Implement chunk-level conditional log replay so only failed/errored chunks print captured logs.
  • Add robust worker exception handling to surface traceback/diagnostics and mark the run as failed.
  • Ensure fully passing chunks do not print captured stdout/stderr.
  • Preserve existing summary aggregation and exit semantics.
  • Add/extend Behave coverage for pass/fail/exception logging behavior of the runner.
  • Add/extend Robot integration coverage for end-to-end runner output behavior.
  • Run and pass required quality gates:
    • nox -s lint
    • nox -s typecheck
    • nox -s unit_tests
    • nox -s integration_tests
    • nox -s coverage_report
    • nox -s benchmark (not applicable — test infrastructure changes only, no performance-sensitive application code modified)

Definition of Done

This issue is complete when:

  • All subtasks are checked off.
  • A commit is created where the first line matches the Metadata Commit Message exactly.
  • The commit is pushed on a branch matching the Metadata Branch pattern.
  • A PR to master is opened, linked, reviewed, and merged.
  • PR/issue dependency direction is set correctly in Forgejo.
  • CI is green and coverage remains >=97%.
## Metadata - **Commit Message**: `fix(testing): print behave-parallel worker logs only for failed chunks` - **Branch**: `bugfix/m3-behave-parallel-failed-chunk-logs` ## Background and Context `nox -s unit_tests` uses the in-process parallel Behave runner in `scripts/run_behave_parallel.py`. In parallel mode, each worker captures stdout/stderr and returns the buffers to the parent process, and the parent currently replays logs for all chunks. This creates noisy output and makes failure diagnostics harder to spot quickly in CI and local runs. The desired behavior is chunk-level filtering: show logs only when a worker chunk failed/errored (or crashed), and suppress logs for fully passing chunks. ## Current Behavior - Worker chunks capture stdout/stderr with redirection. - Parent process prints captured stdout/stderr for every worker chunk. - Passing chunk logs add noise and can obscure failed/errored diagnostics. - Worker crash paths are not guaranteed to surface actionable diagnostics consistently. ## Expected Behavior - Keep chunk-level capture in parallel mode. - Replay captured stdout/stderr **only** for chunks whose summary has failures/errors. - Also replay diagnostics when a worker crashes or when no scenarios ran in a chunk unexpectedly. - Suppress captured logs for fully passing chunks. - Preserve existing summary merge + exit semantics (including `@tdd_expected_fail` summary-based exit behavior and the no-scenarios safety net). ## Scope - Target file: `scripts/run_behave_parallel.py` - Expected touchpoints: - `_worker_run_features` - parallel aggregation loop in `main()` - failure/no-scenarios checks used for conditional replay ## Duplicate Check Checked related open work and found no direct duplicate for this behavior change: - #8326 (`[AUTO-INF-5] Fix behave testing configuration`) - #8330 (`[AUTO-INF-14] Reduce Flakiness in Behave Suite`) - #8318 (`[AUTO-INF-7] Improve accuracy of coverage reporting`) - #8294 and PR #8325 (non-AssertionError guard visibility in TDD inversion) These are adjacent test-infra topics, but none addresses conditional replay of worker stdout/stderr in `run_behave_parallel.py`. ## Subtasks - [x] Implement chunk-level conditional log replay so only failed/errored chunks print captured logs. - [x] Add robust worker exception handling to surface traceback/diagnostics and mark the run as failed. - [x] Ensure fully passing chunks do not print captured stdout/stderr. - [x] Preserve existing summary aggregation and exit semantics. - [x] Add/extend Behave coverage for pass/fail/exception logging behavior of the runner. - [x] Add/extend Robot integration coverage for end-to-end runner output behavior. - [x] Run and pass required quality gates: - [x] `nox -s lint` - [x] `nox -s typecheck` - [x] `nox -s unit_tests` - [x] `nox -s integration_tests` - [x] `nox -s coverage_report` - [ ] `nox -s benchmark` (not applicable — test infrastructure changes only, no performance-sensitive application code modified) ## Definition of Done This issue is complete when: - All subtasks are checked off. - A commit is created where the **first line** matches the Metadata Commit Message exactly. - The commit is pushed on a branch matching the Metadata Branch pattern. - A PR to `master` is opened, linked, reviewed, and merged. - PR/issue dependency direction is set correctly in Forgejo. - CI is green and coverage remains >=97%.
Owner

Thank you for filing this issue, @hurui200320. This has been received and is under review.

Initial Assessment:

The issue is well-formed and clearly describes the problem: the parallel Behave runner currently replays captured stdout/stderr for all worker chunks, including passing ones, creating noise that obscures failure diagnostics in CI and local runs. The desired behavior — conditional log replay only for failed/errored/crashed chunks — is a meaningful improvement to developer experience and CI signal quality.

Triage Notes:

  • The issue correctly identifies the target file (scripts/run_behave_parallel.py) and the specific touchpoints to modify.
  • The duplicate check is thorough and confirms this is not covered by adjacent open work (#8326, #8330, #8318, #8294/#8325).
  • The subtasks and Definition of Done are comprehensive.

Next Steps:

  • This issue will be reviewed for verification and milestone assignment.
  • Once verified, it will be scheduled for implementation.

If you have additional context or constraints (e.g., urgency relative to a specific release), please add them here.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor

Thank you for filing this issue, @hurui200320. This has been received and is under review. **Initial Assessment:** The issue is well-formed and clearly describes the problem: the parallel Behave runner currently replays captured stdout/stderr for all worker chunks, including passing ones, creating noise that obscures failure diagnostics in CI and local runs. The desired behavior — conditional log replay only for failed/errored/crashed chunks — is a meaningful improvement to developer experience and CI signal quality. **Triage Notes:** - The issue correctly identifies the target file (`scripts/run_behave_parallel.py`) and the specific touchpoints to modify. - The duplicate check is thorough and confirms this is not covered by adjacent open work (#8326, #8330, #8318, #8294/#8325). - The subtasks and Definition of Done are comprehensive. **Next Steps:** - This issue will be reviewed for verification and milestone assignment. - Once verified, it will be scheduled for implementation. If you have additional context or constraints (e.g., urgency relative to a specific release), please add them here. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor
Owner

Epic Linkage

This issue belongs to the v3.5.0 milestone for test infrastructure improvements.

⚠️ No specific testing infrastructure Epic was confirmed at time of linking. This issue should be linked to the appropriate v3.5.0 testing Epic once identified, or tracked under the general v3.5.0 testing backlog.

Milestone: v3.5.0 (milestone ID: 108)


Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor

## Epic Linkage This issue belongs to the **v3.5.0 milestone** for test infrastructure improvements. > ⚠️ No specific testing infrastructure Epic was confirmed at time of linking. This issue should be linked to the appropriate v3.5.0 testing Epic once identified, or tracked under the general v3.5.0 testing backlog. **Milestone**: v3.5.0 (milestone ID: 108) --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor
Owner

Epic Linkage

This issue is a child of Epic #8083: Hierarchical Plan Decomposition & Parallel Scaling (v3.5.0).

This testing infrastructure bug affects the quality gates for v3.5.0 milestone delivery. Fixing the behave-parallel log output improves CI diagnostics for the parallel execution test suite.

Dependency direction: This issue BLOCKS Epic #8083. The Epic DEPENDS ON this issue.

Milestone: v3.5.0


Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor

## Epic Linkage This issue is a child of Epic #8083: Hierarchical Plan Decomposition & Parallel Scaling (v3.5.0). This testing infrastructure bug affects the quality gates for v3.5.0 milestone delivery. Fixing the behave-parallel log output improves CI diagnostics for the parallel execution test suite. **Dependency direction**: This issue BLOCKS Epic #8083. The Epic DEPENDS ON this issue. **Milestone**: v3.5.0 --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor
Owner

Triage Decision: Verified — Should Have

Verified by: Project Owner Supervisor [AUTO-OWNR-1]
MoSCoW: Should Have (already set)
Priority: Critical (already set)

This human-submitted bug report is well-documented with clear reproduction steps, scope, and definition of done. Verified as a legitimate improvement to test infrastructure quality.

Rationale: Noisy test output harms developer productivity and CI diagnostics. This is a Should Have improvement that supports the >=97% coverage quality gate requirement.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## ✅ Triage Decision: Verified — Should Have **Verified by:** Project Owner Supervisor [AUTO-OWNR-1] **MoSCoW:** Should Have (already set) **Priority:** Critical (already set) This human-submitted bug report is well-documented with clear reproduction steps, scope, and definition of done. Verified as a legitimate improvement to test infrastructure quality. **Rationale:** Noisy test output harms developer productivity and CI diagnostics. This is a Should Have improvement that supports the >=97% coverage quality gate requirement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
hurui200320 added this to the v3.2.0 milestone 2026-04-14 04:48:09 +00:00
Author
Member

Implementation Notes

Design Decisions

Summary-based chunk filtering over runner.run() boolean:
The conditional log replay uses _chunk_has_failures(summary) and _chunk_no_scenarios_ran(summary) rather than the raw _worker_failed boolean returned by runner.run(). This is consistent with the existing exit-code logic and avoids spurious log replay for @tdd_expected_fail scenarios. The runner.run() return value tracks step failures in a local variable that cannot be updated by after_scenario hooks, so it reports True (failed) even when the TDD inversion handler has corrected the scenario status to passed. Using summary-based checks eliminates this false positive.

Worker crash handling with empty summary:
When _worker_run_features catches an unhandled exception, it writes the full traceback to stderr and returns an _empty_summary(). The parent detects the crash via _chunk_no_scenarios_ran(summary) (all counters are zero) and replays the captured stderr, surfacing the traceback. This approach reuses the existing zero-scenario detection pattern rather than introducing a separate crash flag.

Labeled output banners:
Each replayed chunk gets a --- Worker {idx} stdout/stderr (chunk failed) --- banner to make it clear which worker produced the output, aiding multi-chunk failure diagnosis.

Key Code Locations (commit 3d4418eb)

  • scripts.run_behave_parallel._chunk_has_failures — New helper for per-chunk failure detection
  • scripts.run_behave_parallel._chunk_no_scenarios_ran — New helper for per-chunk crash detection
  • scripts.run_behave_parallel._worker_run_features — Added try/except with traceback capture
  • scripts.run_behave_parallel.main — Updated aggregation loop with conditional replay

Test Coverage

  • Behave unit tests: features/behave_parallel_log_filtering.feature — 12 scenarios covering _chunk_has_failures, _chunk_no_scenarios_ran, passing chunk suppression, failed chunk replay, crash detection, mixed results, and worker exception handling
  • Robot integration tests: robot/behave_parallel_log_filtering.robot — 6 end-to-end test cases via robot/helper_behave_parallel_log_filtering.py

Quality Gate Results

Gate Status
nox -s lint All checks passed
nox -s typecheck 0 errors
nox -s unit_tests 15053 scenarios passed, 0 failed
nox -s integration_tests 1967 tests passed, 0 failed
nox -s coverage_report 97% coverage

PR

PR #9015 submitted. Dependency link (PR blocks #8351) needs to be set manually via Forgejo UI — the API endpoint returned a server error.

## Implementation Notes ### Design Decisions **Summary-based chunk filtering over `runner.run()` boolean:** The conditional log replay uses `_chunk_has_failures(summary)` and `_chunk_no_scenarios_ran(summary)` rather than the raw `_worker_failed` boolean returned by `runner.run()`. This is consistent with the existing exit-code logic and avoids spurious log replay for `@tdd_expected_fail` scenarios. The `runner.run()` return value tracks step failures in a local variable that cannot be updated by `after_scenario` hooks, so it reports `True` (failed) even when the TDD inversion handler has corrected the scenario status to passed. Using summary-based checks eliminates this false positive. **Worker crash handling with empty summary:** When `_worker_run_features` catches an unhandled exception, it writes the full traceback to stderr and returns an `_empty_summary()`. The parent detects the crash via `_chunk_no_scenarios_ran(summary)` (all counters are zero) and replays the captured stderr, surfacing the traceback. This approach reuses the existing zero-scenario detection pattern rather than introducing a separate crash flag. **Labeled output banners:** Each replayed chunk gets a `--- Worker {idx} stdout/stderr (chunk failed) ---` banner to make it clear which worker produced the output, aiding multi-chunk failure diagnosis. ### Key Code Locations (commit 3d4418eb) - `scripts.run_behave_parallel._chunk_has_failures` — New helper for per-chunk failure detection - `scripts.run_behave_parallel._chunk_no_scenarios_ran` — New helper for per-chunk crash detection - `scripts.run_behave_parallel._worker_run_features` — Added try/except with traceback capture - `scripts.run_behave_parallel.main` — Updated aggregation loop with conditional replay ### Test Coverage - **Behave unit tests:** `features/behave_parallel_log_filtering.feature` — 12 scenarios covering `_chunk_has_failures`, `_chunk_no_scenarios_ran`, passing chunk suppression, failed chunk replay, crash detection, mixed results, and worker exception handling - **Robot integration tests:** `robot/behave_parallel_log_filtering.robot` — 6 end-to-end test cases via `robot/helper_behave_parallel_log_filtering.py` ### Quality Gate Results | Gate | Status | |------|--------| | `nox -s lint` | All checks passed | | `nox -s typecheck` | 0 errors | | `nox -s unit_tests` | 15053 scenarios passed, 0 failed | | `nox -s integration_tests` | 1967 tests passed, 0 failed | | `nox -s coverage_report` | 97% coverage | ### PR PR #9015 submitted. Dependency link (PR blocks #8351) needs to be set manually via Forgejo UI — the API endpoint returned a server error.
Author
Member

Self-QA Implementation Notes (Cycles 1–3)

Automated self-QA review-and-fix loop completed. 3 review cycles ran; PR approved on cycle 3.


Cycle 1

Review findings (0C / 5M / 7m / 4n):

  • M1 (Major): Exit code regression — worker crash producing an all-zero summary, combined with other passing workers, resulted in process exit code 0 (silent CI green despite crash).
  • M2 (Major): Branch name mX instead of m3 — violates CONTRIBUTING.md convention.
  • M3 (Major): unittest.mock.patch used in Robot Framework integration tests — explicitly prohibited by CONTRIBUTING.md.
  • M4 (Major): Aggregation loop reimplemented in tests rather than calling production code — tautological tests.
  • M5 (Major): No test for TDD-inversion case (worker_failed=True but summary all-passing) — the primary design motivation was untested.
  • Minor: Duplicate _chunk_has_failures/_chunk_no_scenarios_ran functions; benchmark subtask unchecked; missing all-skipped edge case test; missing partial-crash exit code test; StringIO not closed; type: ignore annotations (pre-existing); code duplication across test domains.
  • Nits: Unused _worker_failed variable; trailing commas in print() calls; inconsistent return type annotation.

Fixes applied:

  • M1: _worker_run_features exception handler now sets crash_summary["features"]["errors"] = 1 so _has_failures() returns True when any worker crashes even if other workers passed.
  • M2: Created bugfix/m3-behave-parallel-failed-chunk-logs branch (both branches kept in sync — Forgejo API limitation prevents changing a PR's head branch after creation).
  • M3: Removed all mocking from Robot helper; crash path now triggered by passing a non-existent .feature path to _worker_run_features(), exercising real production exception handling.
  • M4: Extracted _aggregate_worker_results() as a module-level function in scripts/run_behave_parallel.py; both Behave steps and Robot helper now call the real function.
  • M5: Added TDD-inversion scenario — worker_failed=True but all-passing summary; asserts output is suppressed.
  • Minor: Aliased duplicate functions; added benchmark note to PR description; added all-skipped scenario; added partial-crash exit code scenario; removed trailing commas; unified Summary return type.

Cycle 2

Review findings (0C / 4M / 5m / 5n):

  • M1 (Major): PR still merging from mX branch (Forgejo limitation — documented).
  • M2 (Major): CHANGELOG.md not updated — explicit CONTRIBUTING.md requirement.
  • M3 (Major): No end-to-end test for _chunk_no_scenarios_ran as sole trigger for log replay (all replay-triggering aggregation tests went through _chunk_has_failures).
  • M4 (Major): Worker exception test fragile — depends on undocumented behave behavior (raising on non-existent feature path).
  • Minor: Docstring in _worker_run_features described crash as "all-zero summary" (contradicted by features.errors = 1); imports inside function body in Robot helper; missing stderr replay test for non-crash failures; manual sys.stdout/stderr swap instead of contextlib; _load_runner_module() not registered in sys.modules.
  • Nits: Commit body scenario count wrong (12 → 17); misleading scenario title; CWD requirement undocumented; WORKER CRASH assertion not precise enough.

Fixes applied:

  • M1: Both branches force-pushed to same SHA ee83b2b0; PR description updated with branch note.
  • M2: CHANGELOG.md entry added under ## [Unreleased] / ### Fixed.
  • M3: Added scenario: "Pure empty summary chunk triggers log replay via no-scenarios-ran path" — uses _empty_summary() with no features.errors so only _chunk_no_scenarios_ran fires.
  • M4: Detailed comments added in both step file and Robot helper explaining the dependency on behave's file-not-found raising behavior and what to check if the test breaks.
  • Minor: Docstring updated to accurately describe crash detection via features.errors = 1; import io moved to top level; stderr-replay scenario for non-crash failures added; contextlib.redirect_stdout/redirect_stderr used; sys.modules[spec.name] = mod registration added.
  • Nits: Commit count corrected; scenario title renamed to "produces summary with failures"; CWD requirement documented in docstring; "WORKER CRASH:" (with colon) for tighter assertion.

Cycle 3

Review findings (0C / 0M / 5m / 4n) — Approved:

  • Minor: CHANGELOG wording inaccurately calls crash summary "all-zero" (it has features.errors=1); _make_runner() output may escape capture if it fails before redirects are entered; banner format untested; three Robot tests are shallow (unit-test-level assertions, not exercising _aggregate_worker_results()); sys.modules registration before exec_module leaves broken entry on failure.
  • Nits: sys.modules comment overstates protection; no comment explaining why Exception (not BaseException) is caught; missing docstring on _passing_summary() in Robot helper; _chunk_no_scenarios_ran not negatively tested with failed/errored summary.

Verdict: Approve — Core logic correct, all major issues resolved, quality gates pass. Minor findings are recommended improvements but do not block merging.


Remaining Issues

The following minor/nit findings from Cycle 3 are recommended improvements but were not addressed in this self-QA loop (non-blocking):

  • CHANGELOG entry wording ("all-zero summary" → should say "features.errors = 1")
  • _make_runner() output may escape capture before redirect context managers are entered
  • Banner format (--- Worker N stdout (chunk failed) ---) not tested
  • Three Robot tests do not exercise _aggregate_worker_results() directly
  • sys.modules cleanup on exec_module failure
  • Minor nits: comment wording, missing docstring, additional negative test for _chunk_no_scenarios_ran

Self-QA conducted by automated agent loop — 3 review/fix cycles.

## Self-QA Implementation Notes (Cycles 1–3) Automated self-QA review-and-fix loop completed. 3 review cycles ran; PR approved on cycle 3. --- ### Cycle 1 **Review findings (0C / 5M / 7m / 4n):** - **M1 (Major):** Exit code regression — worker crash producing an all-zero summary, combined with other passing workers, resulted in process exit code 0 (silent CI green despite crash). - **M2 (Major):** Branch name `mX` instead of `m3` — violates CONTRIBUTING.md convention. - **M3 (Major):** `unittest.mock.patch` used in Robot Framework integration tests — explicitly prohibited by CONTRIBUTING.md. - **M4 (Major):** Aggregation loop reimplemented in tests rather than calling production code — tautological tests. - **M5 (Major):** No test for TDD-inversion case (`worker_failed=True` but summary all-passing) — the primary design motivation was untested. - Minor: Duplicate `_chunk_has_failures`/`_chunk_no_scenarios_ran` functions; benchmark subtask unchecked; missing all-skipped edge case test; missing partial-crash exit code test; `StringIO` not closed; `type: ignore` annotations (pre-existing); code duplication across test domains. - Nits: Unused `_worker_failed` variable; trailing commas in `print()` calls; inconsistent return type annotation. **Fixes applied:** - **M1:** `_worker_run_features` exception handler now sets `crash_summary["features"]["errors"] = 1` so `_has_failures()` returns `True` when any worker crashes even if other workers passed. - **M2:** Created `bugfix/m3-behave-parallel-failed-chunk-logs` branch (both branches kept in sync — Forgejo API limitation prevents changing a PR's head branch after creation). - **M3:** Removed all mocking from Robot helper; crash path now triggered by passing a non-existent `.feature` path to `_worker_run_features()`, exercising real production exception handling. - **M4:** Extracted `_aggregate_worker_results()` as a module-level function in `scripts/run_behave_parallel.py`; both Behave steps and Robot helper now call the real function. - **M5:** Added TDD-inversion scenario — `worker_failed=True` but all-passing summary; asserts output is suppressed. - Minor: Aliased duplicate functions; added benchmark note to PR description; added all-skipped scenario; added partial-crash exit code scenario; removed trailing commas; unified `Summary` return type. --- ### Cycle 2 **Review findings (0C / 4M / 5m / 5n):** - **M1 (Major):** PR still merging from `mX` branch (Forgejo limitation — documented). - **M2 (Major):** `CHANGELOG.md` not updated — explicit CONTRIBUTING.md requirement. - **M3 (Major):** No end-to-end test for `_chunk_no_scenarios_ran` as sole trigger for log replay (all replay-triggering aggregation tests went through `_chunk_has_failures`). - **M4 (Major):** Worker exception test fragile — depends on undocumented behave behavior (raising on non-existent feature path). - Minor: Docstring in `_worker_run_features` described crash as "all-zero summary" (contradicted by `features.errors = 1`); imports inside function body in Robot helper; missing stderr replay test for non-crash failures; manual `sys.stdout/stderr` swap instead of `contextlib`; `_load_runner_module()` not registered in `sys.modules`. - Nits: Commit body scenario count wrong (12 → 17); misleading scenario title; CWD requirement undocumented; WORKER CRASH assertion not precise enough. **Fixes applied:** - **M1:** Both branches force-pushed to same SHA `ee83b2b0`; PR description updated with branch note. - **M2:** CHANGELOG.md entry added under `## [Unreleased] / ### Fixed`. - **M3:** Added scenario: *"Pure empty summary chunk triggers log replay via no-scenarios-ran path"* — uses `_empty_summary()` with no `features.errors` so only `_chunk_no_scenarios_ran` fires. - **M4:** Detailed comments added in both step file and Robot helper explaining the dependency on behave's file-not-found raising behavior and what to check if the test breaks. - Minor: Docstring updated to accurately describe crash detection via `features.errors = 1`; `import io` moved to top level; stderr-replay scenario for non-crash failures added; `contextlib.redirect_stdout/redirect_stderr` used; `sys.modules[spec.name] = mod` registration added. - Nits: Commit count corrected; scenario title renamed to "produces summary with failures"; CWD requirement documented in docstring; `"WORKER CRASH:"` (with colon) for tighter assertion. --- ### Cycle 3 **Review findings (0C / 0M / 5m / 4n) — Approved:** - Minor: CHANGELOG wording inaccurately calls crash summary "all-zero" (it has `features.errors=1`); `_make_runner()` output may escape capture if it fails before redirects are entered; banner format untested; three Robot tests are shallow (unit-test-level assertions, not exercising `_aggregate_worker_results()`); `sys.modules` registration before `exec_module` leaves broken entry on failure. - Nits: `sys.modules` comment overstates protection; no comment explaining why `Exception` (not `BaseException`) is caught; missing docstring on `_passing_summary()` in Robot helper; `_chunk_no_scenarios_ran` not negatively tested with failed/errored summary. **Verdict: Approve** — Core logic correct, all major issues resolved, quality gates pass. Minor findings are recommended improvements but do not block merging. --- ### Remaining Issues The following minor/nit findings from Cycle 3 are recommended improvements but were not addressed in this self-QA loop (non-blocking): - CHANGELOG entry wording ("all-zero summary" → should say "features.errors = 1") - `_make_runner()` output may escape capture before redirect context managers are entered - Banner format (`--- Worker N stdout (chunk failed) ---`) not tested - Three Robot tests do not exercise `_aggregate_worker_results()` directly - `sys.modules` cleanup on `exec_module` failure - Minor nits: comment wording, missing docstring, additional negative test for `_chunk_no_scenarios_ran` --- *Self-QA conducted by automated agent loop — 3 review/fix cycles.*
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#8351
No description provided.