bug(skill): skill add regression — persistence fix from PR #640 did not resolve the issue #980

Closed
opened 2026-03-16 16:04:37 +00:00 by freemo · 5 comments
Owner

Metadata

  • Commit Message: fix(skill): resolve skill add persistence regression after PR #640
  • Branch: bugfix/m3-skill-add-regression

Background and Context

Bug #620 reported that agents skill add --config <file> followed by agents skill list shows "No skills found." The root cause was identified as SkillService using in-memory OrderedDict storage with no database persistence. A fix was merged via PR #640, which wired the CLI to the database via SkillRepository.

However, @brent.edwards confirmed on Day 33 (2026-03-13) that the bug persists on commit b90f2ccc (after PR #640 merge). Running skill add followed by skill list in separate CLI invocations still shows "No skills found." This is a regression — the fix from PR #640 was either incomplete, reverted by a subsequent merge, or the wiring was broken by another change.

The original TDD tests (#637) may also be insufficient if they pass while the bug still exists — the tests may be verifying in-process behavior (same CLI invocation) rather than cross-process persistence.

Reproduction Steps

cd ~
mkdir data && cd data
uv venv && source .venv/bin/activate
uv pip install -e /app
agents init
agents skill add --config /app/examples/skills/single-tool.yaml
agents skill list    # Expected: shows the skill. Actual: "No skills found."

Expected Behavior

  1. agents skill add --config <file> persists the skill to the database
  2. agents skill list in a separate CLI invocation shows the previously added skill
  3. The TDD test accurately captures the cross-process persistence requirement

Acceptance Criteria

  • Root cause of the regression is identified (PR #640 incomplete, reverted, or wiring broken)
  • Skill add correctly persists to database across CLI invocations
  • Skill list shows skills added in previous CLI invocations
  • The TDD regression test (from the TDD counterpart issue) passes without @tdd_expected_fail
  • Tests (Behave): Cross-process skill persistence scenarios pass
  • Tests (Robot): Integration test for skill add/list persistence
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Subtasks

  • Code: Wait for TDD issue (to be created) to be completed and its PR merged to master.
  • Code: Create branch bugfix/m3-skill-add-regression from master (which now contains the tagged TDD test).
  • Code: Investigate why PR #640 fix did not resolve the issue — check if SkillRepository wiring was reverted, if session scope prevents cross-invocation persistence, or if the write is silently failing.
  • Code: Implement the fix that resolves the regression.
  • Code: Remove the @tdd_expected_fail tag from the test(s) tagged @tdd_bug_<N>. Leave the @tdd_bug and @tdd_bug_<N> tags in place as permanent regression references.
  • Code: Verify the test now passes normally without the @tdd_expected_fail tag.
  • Docs: Document the regression root cause in the commit message.
  • Tests (Behave): Verify cross-process skill persistence (separate CLI invocations).
  • Tests (Robot): Add integration test for skill lifecycle across processes.
  • Quality: Verify the CI quality gate passes (confirms @tdd_expected_fail tag removal).
  • Quality: Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improve coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun nox -s coverage_report to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%.
  • Quality: Run nox (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across entire code base, do not ignore any failure even if it seems unrelated to this commit, fix it.

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `fix(skill): resolve skill add persistence regression after PR #640` - **Branch**: `bugfix/m3-skill-add-regression` ## Background and Context Bug #620 reported that `agents skill add --config <file>` followed by `agents skill list` shows "No skills found." The root cause was identified as SkillService using in-memory OrderedDict storage with no database persistence. A fix was merged via PR #640, which wired the CLI to the database via SkillRepository. **However, @brent.edwards confirmed on Day 33 (2026-03-13) that the bug persists on commit `b90f2ccc` (after PR #640 merge).** Running `skill add` followed by `skill list` in separate CLI invocations still shows "No skills found." This is a regression — the fix from PR #640 was either incomplete, reverted by a subsequent merge, or the wiring was broken by another change. The original TDD tests (#637) may also be insufficient if they pass while the bug still exists — the tests may be verifying in-process behavior (same CLI invocation) rather than cross-process persistence. ## Reproduction Steps ```bash cd ~ mkdir data && cd data uv venv && source .venv/bin/activate uv pip install -e /app agents init agents skill add --config /app/examples/skills/single-tool.yaml agents skill list # Expected: shows the skill. Actual: "No skills found." ``` ## Expected Behavior 1. `agents skill add --config <file>` persists the skill to the database 2. `agents skill list` in a separate CLI invocation shows the previously added skill 3. The TDD test accurately captures the cross-process persistence requirement ## Acceptance Criteria - [ ] Root cause of the regression is identified (PR #640 incomplete, reverted, or wiring broken) - [ ] Skill add correctly persists to database across CLI invocations - [ ] Skill list shows skills added in previous CLI invocations - [ ] The TDD regression test (from the TDD counterpart issue) passes without `@tdd_expected_fail` - [ ] Tests (Behave): Cross-process skill persistence scenarios pass - [ ] Tests (Robot): Integration test for skill add/list persistence - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Subtasks - [ ] Code: Wait for TDD issue (to be created) to be completed and its PR merged to `master`. - [ ] Code: Create branch `bugfix/m3-skill-add-regression` from `master` (which now contains the tagged TDD test). - [ ] Code: Investigate why PR #640 fix did not resolve the issue — check if SkillRepository wiring was reverted, if session scope prevents cross-invocation persistence, or if the write is silently failing. - [ ] Code: Implement the fix that resolves the regression. - [ ] Code: Remove the `@tdd_expected_fail` tag from the test(s) tagged `@tdd_bug_<N>`. Leave the `@tdd_bug` and `@tdd_bug_<N>` tags in place as permanent regression references. - [ ] Code: Verify the test now passes normally without the `@tdd_expected_fail` tag. - [ ] Docs: Document the regression root cause in the commit message. - [ ] Tests (Behave): Verify cross-process skill persistence (separate CLI invocations). - [ ] Tests (Robot): Add integration test for skill lifecycle across processes. - [ ] Quality: Verify the CI quality gate passes (confirms `@tdd_expected_fail` tag removal). - [ ] Quality: Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improve coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun `nox -s coverage_report` to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%. - [ ] Quality: Run `nox` (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across **entire** code base, do not ignore any failure even if it seems unrelated to this commit, fix it. ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.2.0 milestone 2026-03-16 16:05:06 +00:00
Author
Owner

TDD workflow initiated for this bug:

  • Created TDD issue #981 to write a tagged test that captures this regression using cross-process persistence verification.
  • Dependency: this issue is blocked by #981 (please add dependency link via UI: #980 depends on #981).
  • Once #981 is merged to master, @CoreRasurae should create branch bugfix/m3-skill-add-regression from master and implement the fix.
  • The fix must remove the @tdd_expected_fail tag from the test so it runs normally.

Assignees:

  • TDD issue #981: @hamza.khyari (infrastructure expertise, independent verification)
  • Bug fix (this issue): @CoreRasurae (Python architecture expertise)
**TDD workflow initiated for this bug:** - Created TDD issue #981 to write a tagged test that captures this regression using cross-process persistence verification. - Dependency: this issue is blocked by #981 (please add dependency link via UI: #980 depends on #981). - Once #981 is merged to `master`, @CoreRasurae should create branch `bugfix/m3-skill-add-regression` from `master` and implement the fix. - The fix must remove the `@tdd_expected_fail` tag from the test so it runs normally. **Assignees:** - TDD issue #981: @hamza.khyari (infrastructure expertise, independent verification) - Bug fix (this issue): @CoreRasurae (Python architecture expertise)
freemo self-assigned this 2026-03-22 16:31:02 +00:00
Author
Owner

Assigned to @freemo for bug fix based on developer expertise (skill system — Jeff fixed #620 partially and knows the skill add flow best). This bug and its TDD counterpart (#981) are top priority per project policy — bugs always take precedence over feature work.

Assigned to @freemo for bug fix based on developer expertise (skill system — Jeff fixed #620 partially and knows the skill add flow best). This bug and its TDD counterpart (#981) are top priority per project policy — bugs always take precedence over feature work.
Author
Owner

Planning Agent — Discussion Review

TDD pipeline status for skill add regression:

Stage Issue Assignee Status
TDD test #981 @hamza.khyari Open (v3.2.0)
Bug fix #980 (this) @freemo Blocked by #981

Assignment note: The original triage (Day 37) assigned the bug fix to @CoreRasurae, but the latest comment reassigns to @freemo citing Jeff's knowledge of the skill add flow from PR #640. I agree with the reassignment — @freemo has the deepest context on the skill persistence issue since the partial fix in #620.

Milestone concern: This is v3.2.0 — a past milestone. If v3.2.0 has already been released, this bug should be escalated to the current active milestone. If v3.2.0 has not shipped, this bug is blocking the release.

@hamza.khyari — TDD issue #981 is the gating item and is also in v3.2.0. What is its status? This is Priority/Critical and should be treated as urgent if it is blocking a milestone release.

The 5-point estimate for the bug fix is reasonable given this is a regression in cross-process persistence — likely requires investigating the session/transaction lifecycle.

## Planning Agent — Discussion Review **TDD pipeline status for skill add regression:** | Stage | Issue | Assignee | Status | |-------|-------|----------|--------| | TDD test | #981 | @hamza.khyari | Open (v3.2.0) | | Bug fix | #980 (this) | @freemo | Blocked by #981 | **Assignment note:** The original triage (Day 37) assigned the bug fix to @CoreRasurae, but the latest comment reassigns to @freemo citing Jeff's knowledge of the skill add flow from PR #640. I agree with the reassignment — @freemo has the deepest context on the skill persistence issue since the partial fix in #620. **Milestone concern:** This is v3.2.0 — a **past milestone**. If v3.2.0 has already been released, this bug should be escalated to the current active milestone. If v3.2.0 has not shipped, this bug is blocking the release. @hamza.khyari — TDD issue #981 is the gating item and is also in v3.2.0. What is its status? This is Priority/Critical and should be treated as urgent if it is blocking a milestone release. The 5-point estimate for the bug fix is reasonable given this is a regression in cross-process persistence — likely requires investigating the session/transaction lifecycle.
Author
Owner

Implementation Complete

Root Cause

Two issues in the skill persistence layer:

  1. Missing table creation: _build_skill_service() in container.py created a SkillRepository pointing at the database but did not ensure the skills/skill_items tables existed. When missing, all repository operations failed silently.

  2. Session mismatch: SkillRepository lacked auto_commit support. Each call to session_factory() returned a new session, so the flush() and commit() operated on different sessions.

Fix Applied

  1. Added targeted table creation in _build_skill_service (mirrors _build_session_service pattern)
  2. Added auto_commit parameter to SkillRepository (mirrors SessionRepository pattern)
  3. Passed auto_commit=True from container builder
  4. Removed @tdd_expected_fail from TDD test — test now passes normally

Quality Gates

  • lint, typecheck, unit_tests (12232 scenarios), integration_tests (1674 tests, 0 failures), coverage (98%)

PR: #1120
Commit: 159cb7f7

## Implementation Complete ### Root Cause Two issues in the skill persistence layer: 1. **Missing table creation**: `_build_skill_service()` in `container.py` created a `SkillRepository` pointing at the database but did not ensure the `skills`/`skill_items` tables existed. When missing, all repository operations failed silently. 2. **Session mismatch**: `SkillRepository` lacked `auto_commit` support. Each call to `session_factory()` returned a new session, so the `flush()` and `commit()` operated on different sessions. ### Fix Applied 1. Added targeted table creation in `_build_skill_service` (mirrors `_build_session_service` pattern) 2. Added `auto_commit` parameter to `SkillRepository` (mirrors `SessionRepository` pattern) 3. Passed `auto_commit=True` from container builder 4. Removed `@tdd_expected_fail` from TDD test — test now passes normally ### Quality Gates - ✅ lint, typecheck, unit_tests (12232 scenarios), integration_tests (1674 tests, 0 failures), coverage (98%) PR: #1120 Commit: 159cb7f7
Author
Owner

Action required: Create Forgejo dependency link.

This bug issue must have a Forgejo dependency link to TDD issue #981 (bug is blocked by TDD issue). The Forgejo REST API does not support creating dependency links in Forgejo 14 — this must be done via the web UI:

  1. Open this issue in the browser
  2. In the sidebar, find "Dependencies" and click "Add dependency"
  3. Search for #981 and add it as a dependency (this issue depends on #981)

This ensures the correct TDD workflow ordering: #981 (write the tagged test) must be completed before this issue (implement the fix) can begin.

**Action required: Create Forgejo dependency link.** This bug issue must have a Forgejo dependency link to TDD issue #981 (bug is blocked by TDD issue). The Forgejo REST API does not support creating dependency links in Forgejo 14 — this must be done via the web UI: 1. Open this issue in the browser 2. In the sidebar, find "Dependencies" and click "Add dependency" 3. Search for #981 and add it as a dependency (this issue depends on #981) This ensures the correct TDD workflow ordering: #981 (write the tagged test) must be completed before this issue (implement the fix) can begin.
freemo 2026-03-24 17:23:10 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#980
No description provided.