TDD: Write failing test for #620 — skill add does not persist across CLI invocations #1091

Closed
opened 2026-03-22 16:30:06 +00:00 by freemo · 2 comments
Owner

Metadata

  • Commit Message: test: add TDD bug-capture test for #620 — skill add regression
  • Branch: tdd/m3-skill-add-regression

Background and Context

This is the TDD counterpart to bug #620. Per the project's Test-Driven Development workflow for bugs (see CONTRIBUTING.md > Bug Fix Workflow), the first step in fixing any bug is to write a test that captures the buggy behavior. The test is tagged with @tdd_bug, @tdd_bug_620, and @tdd_expected_fail so that it passes CI while the bug is still unfixed. Once the fix is implemented in #620, the @tdd_expected_fail tag will be removed and the test will run normally.

See #620 for full bug details.

Expected Behavior

A new test exists that:

  1. Captures the exact failure described in #620.
  2. Is tagged with @tdd_bug, @tdd_bug_620, and @tdd_expected_fail.
  3. Passes CI via the expected-failure mechanism (the underlying assertion fails, confirming the bug exists, but the tag inversion causes the test to pass).
  4. Would fail CI if the bug were fixed without removing the @tdd_expected_fail tag.

Acceptance Criteria

  • A test is written that captures the bug behavior described in #620.
  • The test is tagged with @tdd_bug, @tdd_bug_620, and @tdd_expected_fail.
  • The @tdd_expected_fail tag causes the test to pass CI (the underlying assertion fails as expected, proving the bug exists).
  • The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed.
  • Tag validation rules pass: @tdd_bug_620 has corresponding @tdd_bug, and @tdd_expected_fail has both.
  • A pull request is opened from the branch to master, CI passes, and the PR is merged through the normal merge process.

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the test and what bug behavior it captures.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, CI passes, and the PR is merged before this issue is marked done.

Subtasks

  • Code: Analyze bug #620 to identify the exact failure condition, including the inputs, state, and code path that trigger the bug.
  • Code: Determine the appropriate test type (Behave unit test, Robot integration test, or both) and file location for the reproducing test.
  • Tests (Behave): Write a Behave scenario in features/ that captures the bug. Tag the scenario with @tdd_bug, @tdd_bug_620, and @tdd_expected_fail. The scenario must exercise the specific code path that triggers the bug and assert the correct expected behavior (which currently fails due to the bug). Name the scenario descriptively to indicate it is a bug regression test.
  • Tests (Robot): N/A — This is a unit-level persistence bug in SkillService session management. The bug manifests in the service layer (session commit mismatch), not at integration boundaries. A Robot test would duplicate the Behave test without adding new signal.
  • Docs: Add a comment in the test file explaining this test captures bug #620 and uses @tdd_expected_fail until the fix is merged.
  • Quality: Verify CI passes with the tagged test. Confirm the underlying assertion fails for the correct reason.
  • Quality: Verify tag validation rules pass.
  • Quality: Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage.
  • Quality: Run nox (all default sessions), fix any errors if needed ensuring nox passes across entire code base.
## Metadata - **Commit Message**: `test: add TDD bug-capture test for #620 — skill add regression` - **Branch**: `tdd/m3-skill-add-regression` ## Background and Context This is the TDD counterpart to bug #620. Per the project's Test-Driven Development workflow for bugs (see `CONTRIBUTING.md` > Bug Fix Workflow), the first step in fixing any bug is to write a test that captures the buggy behavior. The test is tagged with `@tdd_bug`, `@tdd_bug_620`, and `@tdd_expected_fail` so that it passes CI while the bug is still unfixed. Once the fix is implemented in #620, the `@tdd_expected_fail` tag will be removed and the test will run normally. See #620 for full bug details. ## Expected Behavior A new test exists that: 1. Captures the exact failure described in #620. 2. Is tagged with `@tdd_bug`, `@tdd_bug_620`, and `@tdd_expected_fail`. 3. Passes CI via the expected-failure mechanism (the underlying assertion fails, confirming the bug exists, but the tag inversion causes the test to pass). 4. Would fail CI if the bug were fixed without removing the `@tdd_expected_fail` tag. ## Acceptance Criteria - [x] A test is written that captures the bug behavior described in #620. - [x] The test is tagged with `@tdd_bug`, `@tdd_bug_620`, and `@tdd_expected_fail`. - [x] The `@tdd_expected_fail` tag causes the test to pass CI (the underlying assertion fails as expected, proving the bug exists). - [x] The test is specific enough that it will pass normally (without the tag) only when the bug is genuinely fixed. - [x] Tag validation rules pass: `@tdd_bug_620` has corresponding `@tdd_bug`, and `@tdd_expected_fail` has both. - [ ] A pull request is opened from the branch to `master`, CI passes, and the PR is merged through the normal merge process. ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the test and what bug behavior it captures. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, CI passes, and the PR is **merged** before this issue is marked done. ## Subtasks - [x] Code: Analyze bug #620 to identify the exact failure condition, including the inputs, state, and code path that trigger the bug. - [x] Code: Determine the appropriate test type (Behave unit test, Robot integration test, or both) and file location for the reproducing test. - [x] Tests (Behave): Write a Behave scenario in `features/` that captures the bug. Tag the scenario with `@tdd_bug`, `@tdd_bug_620`, and `@tdd_expected_fail`. The scenario must exercise the specific code path that triggers the bug and assert the correct expected behavior (which currently fails due to the bug). Name the scenario descriptively to indicate it is a bug regression test. - [x] Tests (Robot): N/A — This is a unit-level persistence bug in `SkillService` session management. The bug manifests in the service layer (session commit mismatch), not at integration boundaries. A Robot test would duplicate the Behave test without adding new signal. - [x] Docs: Add a comment in the test file explaining this test captures bug #620 and uses `@tdd_expected_fail` until the fix is merged. - [x] Quality: Verify CI passes with the tagged test. Confirm the underlying assertion fails for the correct reason. - [x] Quality: Verify tag validation rules pass. - [x] Quality: Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. - [x] Quality: Run `nox` (all default sessions), fix any errors if needed ensuring nox passes across **entire** code base.
freemo added this to the v3.2.0 milestone 2026-03-22 16:30:06 +00:00
Member

Implementation Notes — TDD Bug #620 Capture Test

Root Cause Analysis

The bug is a session mismatch in SkillService's persistence path.

Code path (when using the DI container):

  1. _build_skill_service() in cleveragents.application.container creates a sessionmaker factory.
  2. This factory is passed to both SkillRepository and SkillService.
  3. SkillRepository.create() calls self._session_factory() → gets Session A, calls session.add() + session.flush().
  4. SkillService._persist_skill() calls self._commit(), which calls self._session_factory() → gets Session B (a new session), calls session.commit().
  5. Session B has no pending changes — the data flushed in Session A is never committed.
  6. When the process exits, Session A is garbage-collected and its uncommitted transaction is rolled back.

Why Existing Tests Pass

The existing skill_add_persist.feature tests work because they wire a shared session (a single pre-created Session object returned by every factory call). This masks the bug because flush() and commit() happen on the same session. The DI container uses standard sessionmaker behaviour where each call creates a new session.

Test Design

Feature file: features/tdd_skill_add_persist_regression.feature
Steps file: features/steps/tdd_skill_add_persist_regression_steps.py

The test mirrors the DI container's construction pattern exactly:

  1. Creates a file-backed SQLite database
  2. Builds a SkillService the same way _build_skill_service() does (standard sessionmaker, new session per call)
  3. Adds a skill
  4. Destroys the service (simulating process exit)
  5. Creates a new SkillService from the same DB
  6. Lists skills
  7. Asserts the added skill is present

The assertion fails (confirming the bug), and @tdd_expected_fail inverts the result so CI passes.

Test Output (Confirms Bug)

Then the listed skills should include "local/persist-regression-test"
  ASSERT FAILED: Bug #620: Skill 'local/persist-regression-test' was not
  found after service recreation. Available skills: []. This confirms the
  skill add did not persist to the database across service instances.

1 feature passed, 0 failed, 0 skipped
1 scenario passed, 0 failed, 0 skipped

Tags

  • @tdd_bug — permanent filter tag
  • @tdd_bug_620 — links to bug #620
  • @tdd_expected_fail — temporary; removed when fix is implemented

Robot Test — N/A

This is a unit-level persistence bug in SkillService._commit() — a session management issue. The bug manifests entirely in the service layer, not at integration boundaries. A Robot Framework integration test would not add diagnostic value beyond what the Behave test provides.

Additional Fix: Step Ambiguity

Fixed a pre-existing AmbiguousStep error in features/steps/tdd_exec_env_resolution_precedence_steps.py where the step pattern 'I resolve with plan_env ... plan_priority ... and project_env ... project_priority ...' was ambiguous with 'I resolve with plan_env ... and project_env ...' in execution_environment_steps.py. Renamed to 'I resolve precedence with ...' to resolve.

Quality Gate Results

Gate Result
nox -s lint Passed
nox -s typecheck 0 errors
nox -s unit_tests All scenarios passed (incl. new TDD test)
nox -s integration_tests 1674 tests passed
nox -s e2e_tests ⚠️ 35 LLM-dependent tests fail (same as master; 2 smoke tests pass)
nox -s coverage_report 98% (threshold: 97%)

Fix Guidance for #620

When implementing the fix, the developer should:

  1. Ensure SkillRepository and SkillService._commit() share the same session (e.g. use scoped sessions, or have the repo return/expose its session for commit).
  2. Remove the @tdd_expected_fail tag from features/tdd_skill_add_persist_regression.feature.
  3. The test should then pass normally as a permanent regression guard.
## Implementation Notes — TDD Bug #620 Capture Test ### Root Cause Analysis The bug is a **session mismatch** in `SkillService`'s persistence path. **Code path (when using the DI container):** 1. `_build_skill_service()` in `cleveragents.application.container` creates a `sessionmaker` factory. 2. This factory is passed to both `SkillRepository` and `SkillService`. 3. `SkillRepository.create()` calls `self._session_factory()` → gets **Session A**, calls `session.add()` + `session.flush()`. 4. `SkillService._persist_skill()` calls `self._commit()`, which calls `self._session_factory()` → gets **Session B** (a *new* session), calls `session.commit()`. 5. **Session B has no pending changes** — the data flushed in Session A is never committed. 6. When the process exits, Session A is garbage-collected and its uncommitted transaction is rolled back. ### Why Existing Tests Pass The existing `skill_add_persist.feature` tests work because they wire a **shared session** (a single pre-created Session object returned by every factory call). This masks the bug because `flush()` and `commit()` happen on the same session. The DI container uses standard `sessionmaker` behaviour where each call creates a new session. ### Test Design **Feature file:** `features/tdd_skill_add_persist_regression.feature` **Steps file:** `features/steps/tdd_skill_add_persist_regression_steps.py` The test mirrors the DI container's construction pattern exactly: 1. Creates a file-backed SQLite database 2. Builds a `SkillService` the same way `_build_skill_service()` does (standard `sessionmaker`, new session per call) 3. Adds a skill 4. Destroys the service (simulating process exit) 5. Creates a new `SkillService` from the same DB 6. Lists skills 7. Asserts the added skill is present The assertion fails (confirming the bug), and `@tdd_expected_fail` inverts the result so CI passes. ### Test Output (Confirms Bug) ``` Then the listed skills should include "local/persist-regression-test" ASSERT FAILED: Bug #620: Skill 'local/persist-regression-test' was not found after service recreation. Available skills: []. This confirms the skill add did not persist to the database across service instances. 1 feature passed, 0 failed, 0 skipped 1 scenario passed, 0 failed, 0 skipped ``` ### Tags - `@tdd_bug` — permanent filter tag - `@tdd_bug_620` — links to bug #620 - `@tdd_expected_fail` — temporary; removed when fix is implemented ### Robot Test — N/A This is a unit-level persistence bug in `SkillService._commit()` — a session management issue. The bug manifests entirely in the service layer, not at integration boundaries. A Robot Framework integration test would not add diagnostic value beyond what the Behave test provides. ### Additional Fix: Step Ambiguity Fixed a pre-existing `AmbiguousStep` error in `features/steps/tdd_exec_env_resolution_precedence_steps.py` where the step pattern `'I resolve with plan_env ... plan_priority ... and project_env ... project_priority ...'` was ambiguous with `'I resolve with plan_env ... and project_env ...'` in `execution_environment_steps.py`. Renamed to `'I resolve precedence with ...'` to resolve. ### Quality Gate Results | Gate | Result | |------|--------| | `nox -s lint` | ✅ Passed | | `nox -s typecheck` | ✅ 0 errors | | `nox -s unit_tests` | ✅ All scenarios passed (incl. new TDD test) | | `nox -s integration_tests` | ✅ 1674 tests passed | | `nox -s e2e_tests` | ⚠️ 35 LLM-dependent tests fail (same as master; 2 smoke tests pass) | | `nox -s coverage_report` | ✅ 98% (threshold: 97%) | ### Fix Guidance for #620 When implementing the fix, the developer should: 1. Ensure `SkillRepository` and `SkillService._commit()` share the same session (e.g. use scoped sessions, or have the repo return/expose its session for commit). 2. Remove the `@tdd_expected_fail` tag from `features/tdd_skill_add_persist_regression.feature`. 3. The test should then pass normally as a permanent regression guard.
Member

Closing as superseded/duplicate of the already-landed skill-add persistence regression track.

Equivalent coverage and follow-up are already on master from the #980 stream:

  • features/tdd_skill_add_regression.feature
  • features/steps/tdd_skill_add_regression_steps.py
  • robot/tdd_skill_add_regression.robot
  • robot/helper_tdd_skill_add_regression.py

Relevant landed commits include:

  • a6e2bc78 (initial TDD capture)
  • 93da31e8 (skill persistence fix)
  • 1878998b (tag migration cleanup)

Reopening this branch as a new PR would duplicate already-shipped work and create conflict churn, so this issue is being closed as duplicate/superseded.

Closing as **superseded/duplicate** of the already-landed skill-add persistence regression track. Equivalent coverage and follow-up are already on `master` from the #980 stream: - `features/tdd_skill_add_regression.feature` - `features/steps/tdd_skill_add_regression_steps.py` - `robot/tdd_skill_add_regression.robot` - `robot/helper_tdd_skill_add_regression.py` Relevant landed commits include: - `a6e2bc78` (initial TDD capture) - `93da31e8` (skill persistence fix) - `1878998b` (tag migration cleanup) Reopening this branch as a new PR would duplicate already-shipped work and create conflict churn, so this issue is being closed as duplicate/superseded.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#1091
No description provided.