UAT: agents server connect writes three config values non-atomically — partial failure leaves config in inconsistent state (tracked as TDD issue #993) #2166

Closed
opened 2026-04-03 04:38:15 +00:00 by freemo · 2 comments
Owner

Metadata

  • Branch: bugfix/m7-server-connect-atomic-writes-uat
  • Commit Message: fix(cli): make server_connect config writes atomic with rollback on partial failure
  • Milestone: v3.7.0
  • Parent Epic: #397

Background

Discovered during UAT testing. Issue #993 was opened to track this bug and is currently State/In Review on v3.6.0, but the fix has not been implemented — the TDD test in features/tdd_server_connect_atomic_writes.feature is still tagged @tdd_expected_fail, confirming the bug remains present in the codebase.

Description

The agents server connect command in src/cleveragents/cli/commands/server.py makes three sequential set_value() calls with no transaction, no try/except, and no rollback:

svc.set_value("server.url", config.server_url)
svc.set_value("server.namespace", config.namespace)
svc.set_value("server.tls-verify", config.tls_verify)

If the second or third call fails (e.g., disk full, permissions error), the config is left in a half-written state:

  • Second call fails: server.url is persisted; server.namespace and server.tls-verify retain old values.
  • Third call fails: server.url and server.namespace are persisted; server.tls-verify retains its old value.

This inconsistent state can cause confusing failures on subsequent commands that depend on all three values being coherent.

Expected Behavior

Per the specification, all three config values (server.url, server.namespace, server.tls-verify) must be written atomically — either all succeed or all fail with rollback to the original state.

Actual Behavior

Three sequential non-atomic writes. Partial failure leaves the configuration in an inconsistent state.

Code Location

src/cleveragents/cli/commands/server.pyserver_connect() function, lines 148–150.

  • Original bug report: #993 (State/In Review, v3.6.0 — fix not yet implemented)
  • TDD test (still failing): features/tdd_server_connect_atomic_writes.feature (tagged @tdd_expected_fail @tdd_issue @tdd_issue_993)

Subtasks

  • Implement atomic write for all three config values in server_connect() (use a single write_config() call or wrap in a try/except with rollback)
  • Remove @tdd_expected_fail tag from features/tdd_server_connect_atomic_writes.feature once the fix is in place
  • Add/update Behave unit test scenarios covering: (a) all-success path, (b) rollback when second write fails, (c) rollback when third write fails
  • Verify nox -e typecheck passes (all new code must be fully statically typed)
  • Verify nox -e unit_tests passes
  • Verify nox -e coverage_report shows coverage ≥ 97%
  • Update server.py inline documentation to describe the atomicity guarantee
  • Close or link resolution to original issue #993

Definition of Done

  • All three config values (server.url, server.namespace, server.tls-verify) are written atomically — all succeed or all fail with rollback
  • @tdd_expected_fail tag removed from features/tdd_server_connect_atomic_writes.feature; test now passes
  • Rollback behavior verified by Behave unit tests for mid-sequence failure scenarios
  • No # type: ignore suppressions introduced
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Acting on behalf of: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `bugfix/m7-server-connect-atomic-writes-uat` - **Commit Message**: `fix(cli): make server_connect config writes atomic with rollback on partial failure` - **Milestone**: v3.7.0 - **Parent Epic**: #397 ## Background Discovered during UAT testing. Issue #993 was opened to track this bug and is currently `State/In Review` on v3.6.0, but the fix has **not been implemented** — the TDD test in `features/tdd_server_connect_atomic_writes.feature` is still tagged `@tdd_expected_fail`, confirming the bug remains present in the codebase. ## Description The `agents server connect` command in `src/cleveragents/cli/commands/server.py` makes three sequential `set_value()` calls with no transaction, no try/except, and no rollback: ```python svc.set_value("server.url", config.server_url) svc.set_value("server.namespace", config.namespace) svc.set_value("server.tls-verify", config.tls_verify) ``` If the second or third call fails (e.g., disk full, permissions error), the config is left in a half-written state: - **Second call fails**: `server.url` is persisted; `server.namespace` and `server.tls-verify` retain old values. - **Third call fails**: `server.url` and `server.namespace` are persisted; `server.tls-verify` retains its old value. This inconsistent state can cause confusing failures on subsequent commands that depend on all three values being coherent. ## Expected Behavior Per the specification, all three config values (`server.url`, `server.namespace`, `server.tls-verify`) must be written atomically — either all succeed or all fail with rollback to the original state. ## Actual Behavior Three sequential non-atomic writes. Partial failure leaves the configuration in an inconsistent state. ## Code Location `src/cleveragents/cli/commands/server.py` — `server_connect()` function, lines 148–150. ## Related - Original bug report: #993 (`State/In Review`, v3.6.0 — fix not yet implemented) - TDD test (still failing): `features/tdd_server_connect_atomic_writes.feature` (tagged `@tdd_expected_fail @tdd_issue @tdd_issue_993`) ## Subtasks - [ ] Implement atomic write for all three config values in `server_connect()` (use a single `write_config()` call or wrap in a try/except with rollback) - [ ] Remove `@tdd_expected_fail` tag from `features/tdd_server_connect_atomic_writes.feature` once the fix is in place - [ ] Add/update Behave unit test scenarios covering: (a) all-success path, (b) rollback when second write fails, (c) rollback when third write fails - [ ] Verify `nox -e typecheck` passes (all new code must be fully statically typed) - [ ] Verify `nox -e unit_tests` passes - [ ] Verify `nox -e coverage_report` shows coverage ≥ 97% - [ ] Update `server.py` inline documentation to describe the atomicity guarantee - [ ] Close or link resolution to original issue #993 ## Definition of Done - [ ] All three config values (`server.url`, `server.namespace`, `server.tls-verify`) are written atomically — all succeed or all fail with rollback - [ ] `@tdd_expected_fail` tag removed from `features/tdd_server_connect_atomic_writes.feature`; test now passes - [ ] Rollback behavior verified by Behave unit tests for mid-sequence failure scenarios - [ ] No `# type: ignore` suppressions introduced - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.7.0 milestone 2026-04-03 04:38:19 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Medium (confirmed) — Non-atomic config writes in server connect can leave config in inconsistent state. Related to existing TDD issue #993.
  • Milestone: v3.7.0 (confirmed)
  • MoSCoW: Should Have — Atomic config writes are important for data integrity. The TDD test already exists (#993) but the fix hasn't been implemented.
  • Parent Epic: #397 (confirmed correct)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Medium (confirmed) — Non-atomic config writes in `server connect` can leave config in inconsistent state. Related to existing TDD issue #993. - **Milestone**: v3.7.0 (confirmed) - **MoSCoW**: Should Have — Atomic config writes are important for data integrity. The TDD test already exists (#993) but the fix hasn't been implemented. - **Parent Epic**: #397 (confirmed correct) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Closing as duplicate of #1203. Both issues describe the same bug: agents server connect writes three config values non-atomically, leaving config in an inconsistent state on partial failure. Issue #1203 was filed earlier and is the canonical bug report with State/In Review.

Please track this work in #1203.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: ca-backlog-groomer

Closing as duplicate of #1203. Both issues describe the same bug: `agents server connect` writes three config values non-atomically, leaving config in an inconsistent state on partial failure. Issue #1203 was filed earlier and is the canonical bug report with `State/In Review`. Please track this work in #1203. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: ca-backlog-groomer
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#397 Epic: Server & Autonomy Infrastructure
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#2166
No description provided.