test: add TDD bug-capture test for #993 — server_connect non-atomic writes #1128

Merged
brent.edwards merged 7 commits from tdd/m6-server-connect-non-atomic into master 2026-03-26 05:41:47 +00:00
Member

Summary

Add TDD bug-capture Behave tests that prove bug #993 exists: server_connect() in server.py writes three config values (server.url, server.namespace, server.tls-verify) via sequential set_value() calls with no atomicity guarantee. When a middle call fails, earlier values are already persisted, leaving the config in a half-written state.

Changes

  • New feature file: features/tdd_server_connect_atomic_writes.feature — 3 scenarios tagged @tdd_expected_fail @tdd_bug @tdd_bug_993
  • New step definitions: features/steps/tdd_server_connect_atomic_writes_steps.py — Uses a _FailingConfigService subclass that raises OSError on a configurable set_value() call to simulate disk/permission failures

Scenarios

  1. Config remains unchanged when second set_value fails — Pre-populates config, calls server_connect where set_value fails on the 2nd call (server.namespace). Asserts error was raised and all three values are rolled back. ASSERT FAILS: server.url retains the new value.
  2. Config remains unchanged when third set_value fails — Same setup, failure on 3rd call (server.tls-verify). Asserts error was raised and both server.url and server.namespace are rolled back to original values. ASSERT FAILS: both retain new values.
  3. No partial server.url persisted when namespace write raises — Starts from empty config. Asserts error was raised and no server.url persisted. ASSERT FAILS: server.url was persisted despite the subsequent failure.

All assertions fail (proving the bug exists), and @tdd_expected_fail inverts these to CI passes.

Review Fixes Applied

  • Fixed Gherkin step text contradiction: Changed Scenario 2 step from "rolled back server.namespace to its new value" to "to its original value" — matching the assertion logic and the rollback semantics (feature file + @then decorator in step definitions).
  • Moved import tomllib to top-level imports: Was previously buried inside _read_flat_config() function body, violating CONTRIBUTING.md § Import Guidelines. Now at top of file alongside other stdlib imports.
  • Added temp directory cleanup: The _cleanup handler in step_fresh_config_dir now calls shutil.rmtree() on the temporary directory to prevent /tmp/tdd993_* accumulation over CI runs.
  • Added error-assertion Then step: New @then("an error should have been raised during server_connect") step verifies context.atomic_error is not None before config-checking steps, guarding against silent exception swallowing.

Quality Gates

Gate Result
nox -s lint Pass
nox -s typecheck Pass (0 errors)
nox -s unit_tests Pass (462 features, 12233 scenarios, 46806 steps)
nox -s integration_tests ⚠️ 7 pre-existing timeout failures (unrelated to this PR)
nox -s coverage_report Pass (98%, threshold 97%)

Robot Integration Test

N/A — The non-atomic write behavior is internal to server_connect and ConfigService.set_value(). No external system integration is involved.

Closes #1097

## Summary Add TDD bug-capture Behave tests that prove bug #993 exists: `server_connect()` in `server.py` writes three config values (`server.url`, `server.namespace`, `server.tls-verify`) via sequential `set_value()` calls with no atomicity guarantee. When a middle call fails, earlier values are already persisted, leaving the config in a half-written state. ### Changes - **New feature file**: `features/tdd_server_connect_atomic_writes.feature` — 3 scenarios tagged `@tdd_expected_fail @tdd_bug @tdd_bug_993` - **New step definitions**: `features/steps/tdd_server_connect_atomic_writes_steps.py` — Uses a `_FailingConfigService` subclass that raises `OSError` on a configurable `set_value()` call to simulate disk/permission failures ### Scenarios 1. **Config remains unchanged when second set_value fails** — Pre-populates config, calls `server_connect` where `set_value` fails on the 2nd call (`server.namespace`). Asserts error was raised and all three values are rolled back. **ASSERT FAILS**: `server.url` retains the new value. 2. **Config remains unchanged when third set_value fails** — Same setup, failure on 3rd call (`server.tls-verify`). Asserts error was raised and both `server.url` and `server.namespace` are rolled back to original values. **ASSERT FAILS**: both retain new values. 3. **No partial server.url persisted when namespace write raises** — Starts from empty config. Asserts error was raised and no `server.url` persisted. **ASSERT FAILS**: `server.url` was persisted despite the subsequent failure. All assertions fail (proving the bug exists), and `@tdd_expected_fail` inverts these to CI passes. ### Review Fixes Applied - **Fixed Gherkin step text contradiction**: Changed Scenario 2 step from "rolled back server.namespace to its **new** value" to "to its **original** value" — matching the assertion logic and the rollback semantics (feature file + `@then` decorator in step definitions). - **Moved `import tomllib` to top-level imports**: Was previously buried inside `_read_flat_config()` function body, violating CONTRIBUTING.md § Import Guidelines. Now at top of file alongside other stdlib imports. - **Added temp directory cleanup**: The `_cleanup` handler in `step_fresh_config_dir` now calls `shutil.rmtree()` on the temporary directory to prevent `/tmp/tdd993_*` accumulation over CI runs. - **Added error-assertion Then step**: New `@then("an error should have been raised during server_connect")` step verifies `context.atomic_error is not None` before config-checking steps, guarding against silent exception swallowing. ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | ✅ Pass | | `nox -s typecheck` | ✅ Pass (0 errors) | | `nox -s unit_tests` | ✅ Pass (462 features, 12233 scenarios, 46806 steps) | | `nox -s integration_tests` | ⚠️ 7 pre-existing timeout failures (unrelated to this PR) | | `nox -s coverage_report` | ✅ Pass (98%, threshold 97%) | ### Robot Integration Test N/A — The non-atomic write behavior is internal to `server_connect` and `ConfigService.set_value()`. No external system integration is involved. Closes #1097
brent.edwards added this to the v3.6.0 milestone 2026-03-23 11:52:02 +00:00
brent.edwards force-pushed tdd/m6-server-connect-non-atomic from 8ec20837c0
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 3m40s
CI / security (pull_request) Successful in 4m3s
CI / lint (pull_request) Successful in 4m8s
CI / typecheck (pull_request) Successful in 4m35s
CI / integration_tests (pull_request) Successful in 7m50s
CI / e2e_tests (pull_request) Successful in 8m27s
CI / unit_tests (pull_request) Successful in 10m37s
CI / docker (pull_request) Successful in 1m2s
CI / coverage (pull_request) Successful in 9m59s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h29m20s
to 231ed59ad2
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 37s
CI / lint (pull_request) Successful in 5m18s
CI / quality (pull_request) Successful in 5m36s
CI / typecheck (pull_request) Successful in 5m53s
CI / security (pull_request) Successful in 5m59s
CI / integration_tests (pull_request) Successful in 8m40s
CI / unit_tests (pull_request) Successful in 8m49s
CI / docker (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 10m19s
CI / coverage (pull_request) Successful in 11m10s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Failing after 46m26s
2026-03-23 16:49:12 +00:00
Compare
freemo approved these changes 2026-03-24 15:28:36 +00:00
Dismissed
freemo left a comment

Review: APPROVED

TDD tags correct (@tdd_bug @tdd_bug_993 @tdd_expected_fail). Behave steps fully implemented for server_connect non-atomic writes bug.

Note

Missing Robot Framework integration tests. For consistency with other TDD PRs, consider adding a .robot file + helper script in a follow-up.

## Review: APPROVED TDD tags correct (`@tdd_bug @tdd_bug_993 @tdd_expected_fail`). Behave steps fully implemented for server_connect non-atomic writes bug. ### Note Missing Robot Framework integration tests. For consistency with other TDD PRs, consider adding a `.robot` file + helper script in a follow-up.
Merge remote-tracking branch 'origin/master' into tdd/m6-server-connect-non-atomic
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m46s
CI / quality (pull_request) Successful in 3m43s
CI / security (pull_request) Successful in 3m56s
CI / integration_tests (pull_request) Successful in 7m57s
CI / unit_tests (pull_request) Successful in 8m31s
CI / docker (pull_request) Successful in 1m5s
CI / e2e_tests (pull_request) Successful in 9m30s
CI / coverage (pull_request) Successful in 10m8s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 55m59s
3dee9441a1
Merge branch 'master' into tdd/m6-server-connect-non-atomic
Some checks failed
CI / build (pull_request) Successful in 40s
CI / lint (pull_request) Successful in 5m20s
CI / quality (pull_request) Successful in 5m33s
CI / typecheck (pull_request) Successful in 5m40s
CI / benchmark-regression (pull_request) Waiting to run
CI / security (pull_request) Successful in 5m54s
CI / integration_tests (pull_request) Successful in 8m58s
CI / unit_tests (pull_request) Successful in 9m13s
CI / docker (pull_request) Successful in 1m5s
CI / e2e_tests (pull_request) Successful in 11m41s
CI / coverage (pull_request) Failing after 20m3s
CI / benchmark-publish (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 5s
2802af13d0
Merge remote-tracking branch 'origin/master' into merge-master-tmp
Some checks failed
CI / build (pull_request) Successful in 38s
CI / lint (pull_request) Successful in 5m22s
CI / typecheck (pull_request) Successful in 5m50s
CI / security (pull_request) Successful in 6m19s
CI / quality (pull_request) Successful in 6m22s
CI / unit_tests (pull_request) Successful in 9m30s
CI / integration_tests (pull_request) Failing after 9m39s
CI / benchmark-publish (pull_request) Has been skipped
CI / docker (pull_request) Successful in 1m11s
CI / e2e_tests (pull_request) Successful in 11m15s
CI / coverage (pull_request) Failing after 13m55s
CI / benchmark-regression (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
498dca8137
fix(test): increase container resolve regression timeouts to 120s
Some checks failed
CI / build (pull_request) Successful in 32s
CI / lint (pull_request) Successful in 3m22s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m2s
CI / integration_tests (pull_request) Successful in 6m42s
CI / unit_tests (pull_request) Successful in 7m4s
CI / docker (pull_request) Successful in 1m10s
CI / e2e_tests (pull_request) Successful in 9m49s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
822f999ebb
brent.edwards dismissed freemo's review 2026-03-26 01:44:09 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Merge remote-tracking branch 'origin/master' into merge-master-tmp
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 4m34s
CI / quality (pull_request) Successful in 4m54s
CI / typecheck (pull_request) Successful in 5m33s
CI / security (pull_request) Successful in 5m48s
CI / integration_tests (pull_request) Successful in 7m50s
CI / unit_tests (pull_request) Successful in 9m37s
CI / docker (pull_request) Successful in 1m11s
CI / e2e_tests (pull_request) Successful in 10m44s
CI / coverage (pull_request) Successful in 11m39s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
8ccda647a8
Merge remote-tracking branch 'origin/master' into merge-master-tmp
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 34s
CI / lint (pull_request) Successful in 5m4s
CI / typecheck (pull_request) Successful in 5m21s
CI / quality (pull_request) Successful in 5m21s
CI / security (pull_request) Successful in 5m39s
CI / integration_tests (pull_request) Successful in 10m26s
CI / unit_tests (pull_request) Successful in 10m45s
CI / docker (pull_request) Successful in 9s
CI / e2e_tests (pull_request) Successful in 13m21s
CI / coverage (pull_request) Successful in 11m48s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 1h18m2s
52511533b5
brent.edwards deleted branch tdd/m6-server-connect-non-atomic 2026-03-26 05:41:47 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!1128
No description provided.