refactor(a2a): update test files for ACP to A2A rename #737

Merged
hamza.khyari merged 1 commit from refactor/m7-acp-to-a2a-tests into master 2026-03-14 01:35:34 +00:00
Member

Summary

  • Fixes 2 stale "ACP" comment references in vulture_whitelist.py (lines 633, 957) that were missed by the source rename in PR #705
  • Exhaustive audit confirmed all other ACP→A2A renames (13 test file renames, ~626 content replacements across ~27 files) were already completed by PR #705

Audit Results

All 6 acceptance criteria from Issue #689 are satisfied on origin/master:

# Criterion Status
1 All 13 test files renamed PASS
2 All imports updated (cleveragents.acp → cleveragents.a2a) PASS
3 Gherkin scenario names updated PASS
4 Robot test names updated PASS
5 JSON fixture content updated PASS
6 Zero ACP references in test files PASS

Only remaining gaps were 2 stale comments in vulture_whitelist.py, fixed in this PR.

Closes #689

## Summary - Fixes 2 stale "ACP" comment references in `vulture_whitelist.py` (lines 633, 957) that were missed by the source rename in PR #705 - Exhaustive audit confirmed all other ACP→A2A renames (13 test file renames, ~626 content replacements across ~27 files) were already completed by PR #705 ## Audit Results All 6 acceptance criteria from Issue #689 are satisfied on `origin/master`: | # | Criterion | Status | |---|---|---| | 1 | All 13 test files renamed | PASS | | 2 | All imports updated (cleveragents.acp → cleveragents.a2a) | PASS | | 3 | Gherkin scenario names updated | PASS | | 4 | Robot test names updated | PASS | | 5 | JSON fixture content updated | PASS | | 6 | Zero ACP references in test files | PASS | Only remaining gaps were 2 stale comments in `vulture_whitelist.py`, fixed in this PR. Closes #689
CoreRasurae left a comment

Code Review Report -- PR #737 / Branch refactor/m7-acp-to-a2a-tests

Scope: Full review of the ACP-to-A2A rename (PR #705 commit ec0b7631, PR #737 commit 54e52b57), cross-referenced against docs/specification.md and issue #689 acceptance criteria.

Review cycles: 3 global passes covering bugs, security, specification compliance, test flaws, test coverage gaps, test isolation, and performance.

Rename status: The ACP-to-A2A rename in source code, test files, feature files, Robot files, and JSON fixtures is complete. Zero stale Acp* class names, acp.* structlog events, or acp_version field names remain in code. The vulture whitelist comment fix in this PR is correct.


Summary Table

Category High Medium Low Total
Bugs -- 2 2 4
Specification Compliance 2 3 1 6
Test Flaws -- 1 5 6
Test Coverage Gaps 1 5 7 13
Test Isolation -- 2 1 3
Performance -- 1 1 2
Security -- -- 4 4
Total 3 14 21 38

HIGH Severity

SC-1. Error codes are strings, not JSON-RPC 2.0 integers

File: src/cleveragents/a2a/errors.py:32-39, src/cleveragents/a2a/models.py:53

The specification (specification.md:43211-43229) requires JSON-RPC 2.0 integer error codes:

-32001 Entity not found       -32005 Already exists
-32002 Auth required           -32006 Budget exceeded
-32003 Forbidden               -32007 Version mismatch
-32004 Invalid state            -32008 Plan error

The implementation uses string codes ("NOT_FOUND", "PLAN_ERROR", etc.) and A2aErrorDetail.code is typed as str. Clients dispatching on integer codes per the spec will break.

SC-2. map_domain_error missing 2 spec-required exception mappings

File: src/cleveragents/a2a/errors.py:100-139

The specification maps DuplicateEntityError to -32005 and BudgetExceededError to -32006. Neither exception type appears in map_domain_error. These will silently fall through to the generic INTERNAL_ERROR catch-all.

TC-1. map_domain_error -- 3 of 8 exception branches untested

File: src/cleveragents/a2a/errors.py:131-136

No test exercises:

  • AuthenticationError -> AUTH_ERROR
  • AuthorizationError -> FORBIDDEN
  • ConfigurationError -> CONFIGURATION_ERROR

These are production error paths with zero coverage.


MEDIUM Severity

B-1. publish() iterates subscription dict directly -- re-entrancy crash

File: src/cleveragents/a2a/events.py:56

for sub_id, callback in self._subscriptions.items():

If any callback calls subscribe_local(), unsubscribe(), or close(), the dict is mutated during iteration, raising RuntimeError: dictionary changed size during iteration. This is a single-threaded re-entrancy bug, not just a threading concern.

Fix: for sub_id, callback in list(self._subscriptions.items()):

B-2. subscribe_local() succeeds on a closed queue

File: src/cleveragents/a2a/events.py:65-72

After close() sets _is_closed = True and clears _subscriptions, calling subscribe_local() silently adds a new entry to the cleared dict. The subscription will never receive events. No guard checks _is_closed.

SC-3. Operation names don't match specification convention

File: src/cleveragents/a2a/facade.py:57-69

The specification (specification.md:43091-43116) defines standard operations as session/new, session/load, session/prompt, etc. and extensions as _cleveragents/plan/use, _cleveragents/plan/execute, etc. The implementation uses a different naming convention (session.create, plan.create, plan.execute, etc.) that doesn't align with either.

SC-4. No Agent Card model or /.well-known/agent.json discovery

The specification (specification.md:43187-43207) requires Agent Card discovery. No AgentCard model or discovery endpoint exists anywhere in the codebase.

SC-5. A2A-Version HTTP header not implemented

File: src/cleveragents/a2a/versioning.py

Version negotiation logic exists but is not wired to the HTTP transport layer. The specification (specification.md:43234) requires A2A-Version header on requests/responses.

TC-2. 5 A2aEventQueue input validation guards untested

File: src/cleveragents/a2a/events.py:48-49, 67-68, 76-77, 85-86, 111-112

Guards for publish() with non-A2aEvent, subscribe_local() with non-callable, unsubscribe() with empty string, get_events() with invalid limit, and subscribe_remote() with empty endpoint are all untested.

TC-3. _cleanup_session_devcontainers not tested via facade

File: src/cleveragents/a2a/facade.py:252-279

The session.close handler calls _cleanup_session_devcontainers() which dynamically imports CleanupService. This entire code path (deferred import, try/except, logging) is never exercised by any A2A test.

TC-4. A2aHttpTransport and A2aVersionNegotiator input guards untested

Files: src/cleveragents/a2a/transport.py:30-31, 47-48, src/cleveragents/a2a/versioning.py:31-32, 44-45

Validation guards for send() with non-A2aRequest, connect() with empty URL, negotiate() with empty string, and is_supported() with empty string have zero test coverage.

TC-5. map_domain_error catch-all paths untested

File: src/cleveragents/a2a/errors.py:120-121, 137-139

The TypeError guard for non-Exception input (line 120), the generic CleverAgentsError fallback (line 137), and the plain Exception catch-all (line 139) are untested.

TC-6. Weak error assertions -- no error code validation

File: features/a2a_facade_wiring.feature:23-26, 39-42

session.close and plan.create validation failure tests only assert status == "error" without checking the error code or message. A bug that returns the wrong error code would pass.

P-1. Unbounded _events list growth -- memory leak

File: src/cleveragents/a2a/events.py:31, 50

self._events grows without limit. No max size, TTL, or eviction policy. In a long-running process, this accumulates all events forever.

TF-1. No-op assertion -- step_service_registered always passes

File: features/steps/a2a_facade_steps.py:57-59

def step_service_registered(context: Context) -> None:
    pass  # verifies nothing

Should assert the service is actually registered in the facade's service dict.

TI-1. Benchmark queue grows unboundedly across iterations

File: benchmarks/a2a_facade_bench.py:107-118

time_publish_single and time_publish_100 append to the queue across ASV iterations without clearing. Memory bloats and get_events() becomes progressively slower, producing non-representative benchmark results.

TI-2. Benchmark subscriber accumulation

File: benchmarks/a2a_facade_bench.py:117-118

time_subscribe_local adds a new subscriber per ASV iteration. After thousands of iterations, every publish() call fans out to thousands of callbacks, completely skewing timing results.


LOW Severity

B-3. 6 facade stub handlers skip validation

File: src/cleveragents/a2a/facade.py:242, 288, 303, 313, 328, 342

When services are None, stub handlers for session.close, plan.create, plan.execute, plan.status, plan.diff, and plan.apply return success responses without validating required parameters. The service-present paths validate. This creates behavioral divergence between modes -- tests written against stubs accept empty plan_id/session_id that would fail with real services.

B-4. A2aOperationNotFoundError bypasses error response wrapping

File: src/cleveragents/a2a/facade.py:138-139

All other exceptions get wrapped in A2aResponse(status="error"). This error is re-raised directly, forcing callers to handle two different error shapes.

S-1. No length/charset constraint on operation name

File: src/cleveragents/a2a/models.py:76-81

The _operation_non_empty validator only checks non-empty. An operation string with \n characters could inject fake log lines via structlog output. Recommend: re.match(r'^[a-z][a-z0-9_.]{0,63}$', value).

S-2. params and auth fields accept arbitrary unvalidated data

File: src/cleveragents/a2a/models.py:73-74

auth: dict[str, Any] | None has no schema or key constraints. Low risk in local mode but a deserialization concern if the model handles external HTTP input.

S-3. No SSRF protection on server_url

File: src/cleveragents/a2a/server_config.py:34-42

Validates http/https scheme but accepts http://169.254.169.254/, http://localhost:6379/, etc. Design gap to fix before server mode goes live.

S-4. str(exc) in error mapping could leak sensitive info

File: src/cleveragents/a2a/errors.py:124-139

If domain exceptions include tokens/credentials in their message strings, str(exc) propagates them into A2aResponse.error.message and into logs at facade.py:155.

  • docs/adr/ADR-047-acp-standard-adoption.md should be ADR-047-a2a-standard-adoption.md
  • docs/adr/ADR-026-agent-client-protocol.md should be ADR-026-agent-to-agent-protocol.md
  • Referenced in mkdocs.yml:70 and specification.md at lines 55, 23304, 23373, 42992, 43253

TF-2. Trivially true assertions in facade steps

File: features/steps/a2a_facade_steps.py:399-401, 425-427

assert context.response is not None and assert context.error_detail is not None can never fail once the preceding step assigns them.

TF-3. Type-coercing assertion hides type mismatch

File: features/a2a_facade_wiring.feature:93, step at a2a_facade_wiring_steps.py:213-215

Uses str(actual) == value which converts True (bool) to "True" (string), masking type-correctness issues. Compare to robot/helper_a2a_facade_wiring.py:196 which correctly uses is True.

TF-4. Tautological protocol-default-body tests

File: features/steps/a2a_clients_coverage_boost_steps.py:133-137

Protocol default bodies are ... (Ellipsis) which is a no-op. Testing "no exception was thrown" is trivially guaranteed for 7 scenarios.

TF-5. Hardcoded magic number 11 for operation count

File: robot/helper_a2a_facade.py:95-96

len(ops) == 11 breaks silently if _SUPPORTED_OPERATIONS changes. Should derive from source of truth.

TF-6. Mock delete not verified with correct argument

File: features/steps/a2a_facade_wiring_steps.py:77-81

session.close test doesn't assert svc.delete.assert_called_once_with("MOCK-SESSION-001"). A bug that passes the wrong session_id would be undetected.

TC-7 through TC-13. Additional coverage gaps (low priority)

  • Plan stub-mode operations (plan.execute/status/diff/apply without service) untested
  • Registry filter parameters (namespace, type_name) untested
  • EventQueue callback error handling (failing callback not preventing delivery) untested
  • Weak stub-fallback assertions (missing plan_id, tools list checks)
  • close() double-call idempotency untested
  • Publish-after-close with active subscribers untested
  • map_domain_error with generic RuntimeError untested

TI-3. Triplicated mock infrastructure

Files: a2a_facade_coverage_boost_steps.py:34-54, a2a_facade_wiring_steps.py:42-90, robot/helper_a2a_facade_wiring.py:36-95

Three files define near-identical _MockPlanIdentity, _MockPlan, and builder functions. Divergence risk on service contract changes.

P-2. disconnect() raises in teardown contexts

File: src/cleveragents/a2a/transport.py:41-47

A2aHttpTransport.disconnect() raises A2aNotAvailableError. If called in a finally block, this masks the original exception. Consider returning silently like is_connected() returns False.


Positive Findings

  • Rename completeness: All Python, Feature, Robot, JSON, and config files use correct A2A naming. Zero stale ACP references in code.
  • CLI integration: server.py, lsp.py, and lsp/server.py all have correct A2A imports and no stale references.
  • No import cycles: The deferred CleanupService import in facade.py is safe.
  • Handler map caching: The PERF-1 fix with invalidation on register_service is correctly implemented.
  • Error hierarchy: Exception class hierarchy mirrors the project-wide CleverAgentsError pattern correctly.

Reviewed by automated code review agent. 3 global review cycles performed across all categories (bugs, security, spec compliance, test flaws, test coverage, test isolation, performance). Cross-referenced against docs/specification.md (sections 43087-43248 for A2A architecture) and issue #689 acceptance criteria.

# Code Review Report -- PR #737 / Branch `refactor/m7-acp-to-a2a-tests` **Scope:** Full review of the ACP-to-A2A rename (PR #705 commit `ec0b7631`, PR #737 commit `54e52b57`), cross-referenced against `docs/specification.md` and issue #689 acceptance criteria. **Review cycles:** 3 global passes covering bugs, security, specification compliance, test flaws, test coverage gaps, test isolation, and performance. **Rename status:** The ACP-to-A2A rename in source code, test files, feature files, Robot files, and JSON fixtures is **complete**. Zero stale `Acp*` class names, `acp.*` structlog events, or `acp_version` field names remain in code. The vulture whitelist comment fix in this PR is correct. --- ## Summary Table | Category | High | Medium | Low | Total | |---|---|---|---|---| | Bugs | -- | 2 | 2 | 4 | | Specification Compliance | 2 | 3 | 1 | 6 | | Test Flaws | -- | 1 | 5 | 6 | | Test Coverage Gaps | 1 | 5 | 7 | 13 | | Test Isolation | -- | 2 | 1 | 3 | | Performance | -- | 1 | 1 | 2 | | Security | -- | -- | 4 | 4 | | **Total** | **3** | **14** | **21** | **38** | --- ## HIGH Severity ### SC-1. Error codes are strings, not JSON-RPC 2.0 integers **File:** `src/cleveragents/a2a/errors.py:32-39`, `src/cleveragents/a2a/models.py:53` The specification (`specification.md:43211-43229`) requires JSON-RPC 2.0 **integer** error codes: ``` -32001 Entity not found -32005 Already exists -32002 Auth required -32006 Budget exceeded -32003 Forbidden -32007 Version mismatch -32004 Invalid state -32008 Plan error ``` The implementation uses **string** codes (`"NOT_FOUND"`, `"PLAN_ERROR"`, etc.) and `A2aErrorDetail.code` is typed as `str`. Clients dispatching on integer codes per the spec will break. ### SC-2. `map_domain_error` missing 2 spec-required exception mappings **File:** `src/cleveragents/a2a/errors.py:100-139` The specification maps `DuplicateEntityError` to `-32005` and `BudgetExceededError` to `-32006`. Neither exception type appears in `map_domain_error`. These will silently fall through to the generic `INTERNAL_ERROR` catch-all. ### TC-1. `map_domain_error` -- 3 of 8 exception branches untested **File:** `src/cleveragents/a2a/errors.py:131-136` No test exercises: - `AuthenticationError` -> `AUTH_ERROR` - `AuthorizationError` -> `FORBIDDEN` - `ConfigurationError` -> `CONFIGURATION_ERROR` These are production error paths with zero coverage. --- ## MEDIUM Severity ### B-1. `publish()` iterates subscription dict directly -- re-entrancy crash **File:** `src/cleveragents/a2a/events.py:56` ```python for sub_id, callback in self._subscriptions.items(): ``` If any callback calls `subscribe_local()`, `unsubscribe()`, or `close()`, the dict is mutated during iteration, raising `RuntimeError: dictionary changed size during iteration`. This is a **single-threaded re-entrancy bug**, not just a threading concern. **Fix:** `for sub_id, callback in list(self._subscriptions.items()):` ### B-2. `subscribe_local()` succeeds on a closed queue **File:** `src/cleveragents/a2a/events.py:65-72` After `close()` sets `_is_closed = True` and clears `_subscriptions`, calling `subscribe_local()` silently adds a new entry to the cleared dict. The subscription will never receive events. No guard checks `_is_closed`. ### SC-3. Operation names don't match specification convention **File:** `src/cleveragents/a2a/facade.py:57-69` The specification (`specification.md:43091-43116`) defines standard operations as `session/new`, `session/load`, `session/prompt`, etc. and extensions as `_cleveragents/plan/use`, `_cleveragents/plan/execute`, etc. The implementation uses a different naming convention (`session.create`, `plan.create`, `plan.execute`, etc.) that doesn't align with either. ### SC-4. No Agent Card model or `/.well-known/agent.json` discovery The specification (`specification.md:43187-43207`) requires Agent Card discovery. No `AgentCard` model or discovery endpoint exists anywhere in the codebase. ### SC-5. `A2A-Version` HTTP header not implemented **File:** `src/cleveragents/a2a/versioning.py` Version negotiation logic exists but is not wired to the HTTP transport layer. The specification (`specification.md:43234`) requires `A2A-Version` header on requests/responses. ### TC-2. 5 `A2aEventQueue` input validation guards untested **File:** `src/cleveragents/a2a/events.py:48-49, 67-68, 76-77, 85-86, 111-112` Guards for `publish()` with non-A2aEvent, `subscribe_local()` with non-callable, `unsubscribe()` with empty string, `get_events()` with invalid limit, and `subscribe_remote()` with empty endpoint are all untested. ### TC-3. `_cleanup_session_devcontainers` not tested via facade **File:** `src/cleveragents/a2a/facade.py:252-279` The `session.close` handler calls `_cleanup_session_devcontainers()` which dynamically imports `CleanupService`. This entire code path (deferred import, try/except, logging) is never exercised by any A2A test. ### TC-4. `A2aHttpTransport` and `A2aVersionNegotiator` input guards untested **Files:** `src/cleveragents/a2a/transport.py:30-31, 47-48`, `src/cleveragents/a2a/versioning.py:31-32, 44-45` Validation guards for `send()` with non-A2aRequest, `connect()` with empty URL, `negotiate()` with empty string, and `is_supported()` with empty string have zero test coverage. ### TC-5. `map_domain_error` catch-all paths untested **File:** `src/cleveragents/a2a/errors.py:120-121, 137-139` The `TypeError` guard for non-Exception input (line 120), the generic `CleverAgentsError` fallback (line 137), and the plain `Exception` catch-all (line 139) are untested. ### TC-6. Weak error assertions -- no error code validation **File:** `features/a2a_facade_wiring.feature:23-26, 39-42` `session.close` and `plan.create` validation failure tests only assert `status == "error"` without checking the error code or message. A bug that returns the wrong error code would pass. ### P-1. Unbounded `_events` list growth -- memory leak **File:** `src/cleveragents/a2a/events.py:31, 50` `self._events` grows without limit. No max size, TTL, or eviction policy. In a long-running process, this accumulates all events forever. ### TF-1. No-op assertion -- `step_service_registered` always passes **File:** `features/steps/a2a_facade_steps.py:57-59` ```python def step_service_registered(context: Context) -> None: pass # verifies nothing ``` Should assert the service is actually registered in the facade's service dict. ### TI-1. Benchmark queue grows unboundedly across iterations **File:** `benchmarks/a2a_facade_bench.py:107-118` `time_publish_single` and `time_publish_100` append to the queue across ASV iterations without clearing. Memory bloats and `get_events()` becomes progressively slower, producing non-representative benchmark results. ### TI-2. Benchmark subscriber accumulation **File:** `benchmarks/a2a_facade_bench.py:117-118` `time_subscribe_local` adds a new subscriber per ASV iteration. After thousands of iterations, every `publish()` call fans out to thousands of callbacks, completely skewing timing results. --- ## LOW Severity ### B-3. 6 facade stub handlers skip validation **File:** `src/cleveragents/a2a/facade.py:242, 288, 303, 313, 328, 342` When services are `None`, stub handlers for `session.close`, `plan.create`, `plan.execute`, `plan.status`, `plan.diff`, and `plan.apply` return success responses without validating required parameters. The service-present paths validate. This creates behavioral divergence between modes -- tests written against stubs accept empty `plan_id`/`session_id` that would fail with real services. ### B-4. `A2aOperationNotFoundError` bypasses error response wrapping **File:** `src/cleveragents/a2a/facade.py:138-139` All other exceptions get wrapped in `A2aResponse(status="error")`. This error is re-raised directly, forcing callers to handle two different error shapes. ### S-1. No length/charset constraint on operation name **File:** `src/cleveragents/a2a/models.py:76-81` The `_operation_non_empty` validator only checks non-empty. An operation string with `\n` characters could inject fake log lines via structlog output. Recommend: `re.match(r'^[a-z][a-z0-9_.]{0,63}$', value)`. ### S-2. `params` and `auth` fields accept arbitrary unvalidated data **File:** `src/cleveragents/a2a/models.py:73-74` `auth: dict[str, Any] | None` has no schema or key constraints. Low risk in local mode but a deserialization concern if the model handles external HTTP input. ### S-3. No SSRF protection on `server_url` **File:** `src/cleveragents/a2a/server_config.py:34-42` Validates http/https scheme but accepts `http://169.254.169.254/`, `http://localhost:6379/`, etc. Design gap to fix before server mode goes live. ### S-4. `str(exc)` in error mapping could leak sensitive info **File:** `src/cleveragents/a2a/errors.py:124-139` If domain exceptions include tokens/credentials in their message strings, `str(exc)` propagates them into `A2aResponse.error.message` and into logs at `facade.py:155`. ### SC-6. 2 stale ADR filenames (outside PR scope but related) - `docs/adr/ADR-047-acp-standard-adoption.md` should be `ADR-047-a2a-standard-adoption.md` - `docs/adr/ADR-026-agent-client-protocol.md` should be `ADR-026-agent-to-agent-protocol.md` - Referenced in `mkdocs.yml:70` and `specification.md` at lines 55, 23304, 23373, 42992, 43253 ### TF-2. Trivially true assertions in facade steps **File:** `features/steps/a2a_facade_steps.py:399-401, 425-427` `assert context.response is not None` and `assert context.error_detail is not None` can never fail once the preceding step assigns them. ### TF-3. Type-coercing assertion hides type mismatch **File:** `features/a2a_facade_wiring.feature:93`, step at `a2a_facade_wiring_steps.py:213-215` Uses `str(actual) == value` which converts `True` (bool) to `"True"` (string), masking type-correctness issues. Compare to `robot/helper_a2a_facade_wiring.py:196` which correctly uses `is True`. ### TF-4. Tautological protocol-default-body tests **File:** `features/steps/a2a_clients_coverage_boost_steps.py:133-137` Protocol default bodies are `...` (Ellipsis) which is a no-op. Testing "no exception was thrown" is trivially guaranteed for 7 scenarios. ### TF-5. Hardcoded magic number 11 for operation count **File:** `robot/helper_a2a_facade.py:95-96` `len(ops) == 11` breaks silently if `_SUPPORTED_OPERATIONS` changes. Should derive from source of truth. ### TF-6. Mock `delete` not verified with correct argument **File:** `features/steps/a2a_facade_wiring_steps.py:77-81` `session.close` test doesn't assert `svc.delete.assert_called_once_with("MOCK-SESSION-001")`. A bug that passes the wrong `session_id` would be undetected. ### TC-7 through TC-13. Additional coverage gaps (low priority) - Plan stub-mode operations (`plan.execute/status/diff/apply` without service) untested - Registry filter parameters (`namespace`, `type_name`) untested - EventQueue callback error handling (failing callback not preventing delivery) untested - Weak stub-fallback assertions (missing plan_id, tools list checks) - `close()` double-call idempotency untested - Publish-after-close with active subscribers untested - `map_domain_error` with generic `RuntimeError` untested ### TI-3. Triplicated mock infrastructure **Files:** `a2a_facade_coverage_boost_steps.py:34-54`, `a2a_facade_wiring_steps.py:42-90`, `robot/helper_a2a_facade_wiring.py:36-95` Three files define near-identical `_MockPlanIdentity`, `_MockPlan`, and builder functions. Divergence risk on service contract changes. ### P-2. `disconnect()` raises in teardown contexts **File:** `src/cleveragents/a2a/transport.py:41-47` `A2aHttpTransport.disconnect()` raises `A2aNotAvailableError`. If called in a `finally` block, this masks the original exception. Consider returning silently like `is_connected()` returns `False`. --- ## Positive Findings - **Rename completeness:** All Python, Feature, Robot, JSON, and config files use correct A2A naming. Zero stale ACP references in code. - **CLI integration:** `server.py`, `lsp.py`, and `lsp/server.py` all have correct A2A imports and no stale references. - **No import cycles:** The deferred `CleanupService` import in `facade.py` is safe. - **Handler map caching:** The PERF-1 fix with invalidation on `register_service` is correctly implemented. - **Error hierarchy:** Exception class hierarchy mirrors the project-wide `CleverAgentsError` pattern correctly. --- *Reviewed by automated code review agent. 3 global review cycles performed across all categories (bugs, security, spec compliance, test flaws, test coverage, test isolation, performance). Cross-referenced against `docs/specification.md` (sections 43087-43248 for A2A architecture) and issue #689 acceptance criteria.*
freemo added this to the v3.6.0 milestone 2026-03-12 20:22:22 +00:00
freemo left a comment

Review — PR #737: refactor(a2a): update stale ACP comments in vulture whitelist

Diff Summary

  • Files changed: 1 (vulture_whitelist.py)
  • Lines: +2 / −2

Commit Message Compliance

  • Issue #689 metadata: refactor(a2a): update test files for ACP to A2A rename
  • PR commit first line: refactor(a2a): update stale ACP comments in vulture whitelist
  • Match: Mismatch. The issue metadata prescribes refactor(a2a): update test files for ACP to A2A rename but the PR commit uses a different message. Per CONTRIBUTING.md, the commit message first line must match the issue metadata's Commit Message field exactly.

I understand the PR only covers the final 2 stale comments (the bulk of the rename was done in PR #705), so the commit message is more descriptively accurate for this PR's scope. However, the project convention requires it to match the issue metadata. Options:

  1. Amend the commit to use the prescribed message, or
  2. Update issue #689's metadata to reflect a revised commit message for this remaining cleanup PR.

Closes Keyword

  • PR body uses ISSUES CLOSED: #689 — This is not a standard Forgejo auto-close keyword. Forgejo recognizes: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved (followed by #N). Please change to Closes #689.

Label Compliance

Required Label Status
Type label Type/Task
Priority label Missing (issue has Priority/Medium)
MoSCoW label Missing (issue has MoSCoW/Must have)

Code Quality

  • The 2-line change itself is correct: replacing stale "ACP" with "A2A" in comment text at lines 633 and 957 of vulture_whitelist.py.
  • The audit table in the PR body documenting all 6 acceptance criteria is a nice touch.

Required Changes

  1. Fix commit message to match issue #689 metadata, or coordinate with the issue author to update the metadata.
  2. Replace ISSUES CLOSED: #689 with Closes #689 in the PR body.
  3. Add labels: Priority/Medium and MoSCoW/Must have.
## Review — PR #737: refactor(a2a): update stale ACP comments in vulture whitelist ### Diff Summary - **Files changed:** 1 (`vulture_whitelist.py`) - **Lines:** +2 / −2 ### Commit Message Compliance - **Issue #689 metadata:** `refactor(a2a): update test files for ACP to A2A rename` - **PR commit first line:** `refactor(a2a): update stale ACP comments in vulture whitelist` - **Match:** ❌ **Mismatch.** The issue metadata prescribes `refactor(a2a): update test files for ACP to A2A rename` but the PR commit uses a different message. Per CONTRIBUTING.md, the commit message first line must match the issue metadata's Commit Message field exactly. I understand the PR only covers the final 2 stale comments (the bulk of the rename was done in PR #705), so the commit message is more descriptively accurate for *this* PR's scope. However, the project convention requires it to match the issue metadata. Options: 1. Amend the commit to use the prescribed message, or 2. Update issue #689's metadata to reflect a revised commit message for this remaining cleanup PR. ### Closes Keyword - PR body uses `ISSUES CLOSED: #689` ❌ — This is **not** a standard Forgejo auto-close keyword. Forgejo recognizes: `close`, `closes`, `closed`, `fix`, `fixes`, `fixed`, `resolve`, `resolves`, `resolved` (followed by `#N`). Please change to `Closes #689`. ### Label Compliance | Required Label | Status | |---|---| | Type label | ✅ `Type/Task` | | Priority label | ❌ Missing (issue has `Priority/Medium`) | | MoSCoW label | ❌ Missing (issue has `MoSCoW/Must have`) | ### Code Quality - The 2-line change itself is correct: replacing stale "ACP" with "A2A" in comment text at lines 633 and 957 of `vulture_whitelist.py`. - The audit table in the PR body documenting all 6 acceptance criteria is a nice touch. ### Required Changes 1. **Fix commit message** to match issue #689 metadata, or coordinate with the issue author to update the metadata. 2. **Replace `ISSUES CLOSED: #689`** with `Closes #689` in the PR body. 3. **Add labels:** `Priority/Medium` and `MoSCoW/Must have`.
hamza.khyari force-pushed refactor/m7-acp-to-a2a-tests from 54e52b574c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 21s
CI / security (pull_request) Successful in 40s
CI / typecheck (pull_request) Successful in 1m13s
CI / unit_tests (pull_request) Successful in 3m6s
CI / docker (pull_request) Successful in 42s
CI / integration_tests (pull_request) Successful in 3m57s
CI / coverage (pull_request) Successful in 5m21s
CI / benchmark-regression (pull_request) Successful in 38m42s
to 0038b737c1
Some checks failed
CI / lint (pull_request) Successful in 14s
CI / typecheck (pull_request) Successful in 36s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
CI / coverage (pull_request) Successful in 5m23s
CI / security (pull_request) Successful in 33s
CI / quality (pull_request) Successful in 15s
CI / benchmark-regression (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-13 14:52:23 +00:00
Compare
hamza.khyari changed title from refactor(a2a): update stale ACP comments in vulture whitelist to refactor(a2a): update test files for ACP to A2A rename 2026-03-13 14:52:47 +00:00
Author
Member

Follow-up Issues from Code Review

The review by @CoreRasurae identified several pre-existing bugs and spec compliance gaps in the A2A module (src/cleveragents/a2a/). These are outside the scope of this PR (#689 — ACP→A2A test rename) and should be tracked as separate issues.

All findings below have been validated against the source code and docs/specification.md.

HIGH — Spec Compliance

1. Error codes are strings, not JSON-RPC 2.0 integers

  • errors.py:32-39 defines codes as "NOT_FOUND", "PLAN_ERROR", etc.
  • models.py:53 types A2aErrorDetail.code as str
  • Spec (specification.md:43211-43229) requires integer codes: -32001 (entity not found) through -32008 (plan error), plus reserved -32600 to -32603
  • Fix: change constants to int, change A2aErrorDetail.code: int, update map_domain_error return type to tuple[int, str]

2. map_domain_error missing spec-required exception mappings

  • Spec maps DuplicateEntityError-32005 and BudgetExceededError-32006
  • Neither exception class exists in src/cleveragents/core/exceptions.py
  • Neither mapping exists in errors.py:100-139
  • Fix: add both exception classes to core/exceptions.py, add branches in map_domain_error

3. 3 of 8 exception branches in map_domain_error untested

  • AuthenticationErrorAUTH_ERROR, AuthorizationErrorFORBIDDEN, ConfigurationErrorCONFIGURATION_ERROR have zero test coverage
  • Only 4 branches tested (in robot/helper_a2a_facade_wiring.py:216-232)

HIGH — Bugs

4. publish() re-entrancy crash

  • events.py:56: for sub_id, callback in self._subscriptions.items() iterates dict directly
  • If any callback calls subscribe_local(), unsubscribe(), or close(), raises RuntimeError: dictionary changed size during iteration
  • Fix: list(self._subscriptions.items())

5. subscribe_local() succeeds on a closed queue

  • events.py:65-72: no _is_closed guard — silently adds subscriptions that will never fire
  • publish() has the guard (line 46), subscribe_local() doesn't
  • Fix: add if self._is_closed: raise RuntimeError(...) guard

MEDIUM — Spec Compliance

6. Operation names don't match spec convention

  • facade.py:57-69 uses dot-delimited names: session.create, plan.execute, etc.
  • Spec (specification.md:23422-23433, 43107-43114) uses slash-delimited with _cleveragents/ prefix: _cleveragents/plan/use, _cleveragents/plan/execute, _cleveragents/registry/{entity}/list, etc.
  • This is a larger change with cascade across all test files

MEDIUM — Performance

7. Unbounded _events list growth

  • events.py:31,50: _events grows without limit; get_events() never removes; no max size/TTL/eviction
  • Only close() clears — in a long-running process, memory grows indefinitely
  • Fix: add max_size parameter with oldest-event eviction on overflow

LOW — Test Quality

8. No-op step assertion

  • a2a_facade_steps.py:57-59: step_service_registered body is pass — asserts nothing
  • Should verify the service is actually registered in the facade
## Follow-up Issues from Code Review The review by @CoreRasurae identified several pre-existing bugs and spec compliance gaps in the A2A module (`src/cleveragents/a2a/`). These are outside the scope of this PR (#689 — ACP→A2A test rename) and should be tracked as separate issues. All findings below have been validated against the source code and `docs/specification.md`. ### HIGH — Spec Compliance **1. Error codes are strings, not JSON-RPC 2.0 integers** - `errors.py:32-39` defines codes as `"NOT_FOUND"`, `"PLAN_ERROR"`, etc. - `models.py:53` types `A2aErrorDetail.code` as `str` - Spec (`specification.md:43211-43229`) requires integer codes: `-32001` (entity not found) through `-32008` (plan error), plus reserved `-32600` to `-32603` - Fix: change constants to `int`, change `A2aErrorDetail.code: int`, update `map_domain_error` return type to `tuple[int, str]` **2. `map_domain_error` missing spec-required exception mappings** - Spec maps `DuplicateEntityError` → `-32005` and `BudgetExceededError` → `-32006` - Neither exception class exists in `src/cleveragents/core/exceptions.py` - Neither mapping exists in `errors.py:100-139` - Fix: add both exception classes to `core/exceptions.py`, add branches in `map_domain_error` **3. 3 of 8 exception branches in `map_domain_error` untested** - `AuthenticationError` → `AUTH_ERROR`, `AuthorizationError` → `FORBIDDEN`, `ConfigurationError` → `CONFIGURATION_ERROR` have zero test coverage - Only 4 branches tested (in `robot/helper_a2a_facade_wiring.py:216-232`) ### HIGH — Bugs **4. `publish()` re-entrancy crash** - `events.py:56`: `for sub_id, callback in self._subscriptions.items()` iterates dict directly - If any callback calls `subscribe_local()`, `unsubscribe()`, or `close()`, raises `RuntimeError: dictionary changed size during iteration` - Fix: `list(self._subscriptions.items())` **5. `subscribe_local()` succeeds on a closed queue** - `events.py:65-72`: no `_is_closed` guard — silently adds subscriptions that will never fire - `publish()` has the guard (line 46), `subscribe_local()` doesn't - Fix: add `if self._is_closed: raise RuntimeError(...)` guard ### MEDIUM — Spec Compliance **6. Operation names don't match spec convention** - `facade.py:57-69` uses dot-delimited names: `session.create`, `plan.execute`, etc. - Spec (`specification.md:23422-23433, 43107-43114`) uses slash-delimited with `_cleveragents/` prefix: `_cleveragents/plan/use`, `_cleveragents/plan/execute`, `_cleveragents/registry/{entity}/list`, etc. - This is a larger change with cascade across all test files ### MEDIUM — Performance **7. Unbounded `_events` list growth** - `events.py:31,50`: `_events` grows without limit; `get_events()` never removes; no max size/TTL/eviction - Only `close()` clears — in a long-running process, memory grows indefinitely - Fix: add `max_size` parameter with oldest-event eviction on overflow ### LOW — Test Quality **8. No-op step assertion** - `a2a_facade_steps.py:57-59`: `step_service_registered` body is `pass` — asserts nothing - Should verify the service is actually registered in the facade
hamza.khyari force-pushed refactor/m7-acp-to-a2a-tests from 0038b737c1
Some checks failed
CI / lint (pull_request) Successful in 14s
CI / typecheck (pull_request) Successful in 36s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
CI / coverage (pull_request) Successful in 5m23s
CI / security (pull_request) Successful in 33s
CI / quality (pull_request) Successful in 15s
CI / benchmark-regression (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to 8b1dfe8510
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / e2e_tests (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 32s
CI / security (pull_request) Successful in 32s
CI / unit_tests (pull_request) Successful in 2m21s
CI / integration_tests (pull_request) Successful in 3m2s
CI / docker (pull_request) Successful in 47s
CI / coverage (pull_request) Successful in 4m56s
CI / benchmark-regression (pull_request) Successful in 35m58s
2026-03-13 19:30:29 +00:00
Compare
brent.edwards approved these changes 2026-03-13 22:22:25 +00:00
Dismissed
brent.edwards left a comment

Code Review — PR #737: refactor(a2a): update test files for ACP to A2A rename

Reviewer: @brent.edwards
Commit reviewed: 8b1dfe85
Review cycles: 2 full passes + 4 parallel verification threads (ACP reference audit, vulture whitelist accuracy, issue #689 acceptance criteria, CONTRIBUTING.md compliance)


Verification of Previous Requested Changes (@freemo)

All three items from @freemo's REQUEST_CHANGES review (commit 54e52b57) have been addressed in the force-pushed commit 8b1dfe85:

Requested Change Status
Fix commit message to match issue #689 metadata Done — first line is now refactor(a2a): update test files for ACP to A2A rename
Replace ISSUES CLOSED: #689 with Closes #689 in PR body Done — PR body ends with Closes #689
Add Priority/Medium and MoSCoW/Must have labels Done — both labels present on the PR

Code Correctness

Diff scope: 1 file changed (vulture_whitelist.py), +2/−2 lines. Both changes are comment-only.

Change 1 (line 633): ACPA2A in facade wiring section header

Verified correct. The error codes and facade wiring symbols referenced by this section (NOT_FOUND, VALIDATION_ERROR, INVALID_STATE, PLAN_ERROR, AUTH_ERROR, FORBIDDEN, CONFIGURATION_ERROR, INTERNAL_ERROR, map_domain_error, _session_service, _plan_lifecycle_service, _tool_registry, _resource_registry_service, _event_queue, _cleanup_session_devcontainers) all live in src/cleveragents/a2a/errors.py and src/cleveragents/a2a/facade.py. All 15 whitelist entries below this comment are valid.

Change 2 (line 957): ACPA2A in LspServer facade property comment

Verified correct. src/cleveragents/lsp/server.py defines LspServer.facade as a lazy property that creates an A2aLocalFacade instance (imported from cleveragents.a2a.facade). The module docstring and property docstring both reference "A2A". The old "ACP" comment was stale.


Rename Completeness (Issue #689 Acceptance Criteria)

Exhaustive audit across features/, robot/, benchmarks/, and features/fixtures/ confirms all 6 statically verifiable acceptance criteria are satisfied:

# Criterion Result
1 All 13 test files renamed PASS — 13 a2a_* files found, zero acp_* files
2 All imports updated (cleveragents.acpcleveragents.a2a) PASS — 54 a2a import lines found, zero acp imports
3 Gherkin scenario names updated PASS — all feature files reference A2A
4 Robot test names updated PASS — all robot files reference A2A
5 JSON fixture content updated PASSa2a_facade_flows.json uses A2A naming
6 Zero ACP references in test files PASS — case-insensitive sweep returned zero matches

The bulk of the rename was completed in PR #705 (already on master). This PR correctly fixes the only 2 remaining stale comments in vulture_whitelist.py. After this PR merges, zero acp/ACP references will remain in any code or test file (only legitimate historical references in ADR docs and specification.md).


PR Metadata Compliance

Check Status
Commit first line matches issue #689 metadata PASS
PR body has Forgejo closing keyword (Closes #689) PASS
Type/Task label present PASS
Milestone v3.6.0 matches issue PASS
Branch name matches issue metadata PASS
Single commit, no merge commits PASS
Branch rebased on current master HEAD (232ef965) PASS
CONTRIBUTORS.md (hamza.khyari already listed) PASS
Conventional Changelog format PASS

Findings

P3:nit — No changelog entry

CONTRIBUTING.md PR requirement #6 requires a changelog update for every PR. This is a 2-line comment-only change with zero user-facing impact, so a changelog entry is arguably unnecessary. Noting for completeness — author's discretion.

CONTRIBUTING.md requires the PR to be configured as "blocking" issue #689 (and issue #689 to "depend on" this PR). Please confirm this dependency link is set with the correct direction.


Verdict

APPROVED. The code change is trivially correct, the ACP→A2A rename is verified complete across all test infrastructure, and all of @freemo's requested changes have been addressed. Only P3 nits remain, which are at the author's discretion per the review playbook.

Note: @freemo's REQUEST_CHANGES review from the previous commit is still active and will need to be re-approved or dismissed before merge.

Reviewed by @brent.edwards. 2 review cycles + 4 parallel verification threads (ACP reference audit, vulture whitelist symbol verification, issue #689 acceptance criteria, CONTRIBUTING.md compliance).

# Code Review — PR #737: refactor(a2a): update test files for ACP to A2A rename **Reviewer:** @brent.edwards **Commit reviewed:** `8b1dfe85` **Review cycles:** 2 full passes + 4 parallel verification threads (ACP reference audit, vulture whitelist accuracy, issue #689 acceptance criteria, CONTRIBUTING.md compliance) --- ## Verification of Previous Requested Changes (@freemo) All three items from @freemo's REQUEST_CHANGES review (commit `54e52b57`) have been addressed in the force-pushed commit `8b1dfe85`: | Requested Change | Status | |---|---| | Fix commit message to match issue #689 metadata | **Done** — first line is now `refactor(a2a): update test files for ACP to A2A rename` | | Replace `ISSUES CLOSED: #689` with `Closes #689` in PR body | **Done** — PR body ends with `Closes #689` | | Add `Priority/Medium` and `MoSCoW/Must have` labels | **Done** — both labels present on the PR | --- ## Code Correctness **Diff scope:** 1 file changed (`vulture_whitelist.py`), +2/−2 lines. Both changes are comment-only. ### Change 1 (line 633): `ACP` → `A2A` in facade wiring section header Verified correct. The error codes and facade wiring symbols referenced by this section (`NOT_FOUND`, `VALIDATION_ERROR`, `INVALID_STATE`, `PLAN_ERROR`, `AUTH_ERROR`, `FORBIDDEN`, `CONFIGURATION_ERROR`, `INTERNAL_ERROR`, `map_domain_error`, `_session_service`, `_plan_lifecycle_service`, `_tool_registry`, `_resource_registry_service`, `_event_queue`, `_cleanup_session_devcontainers`) all live in `src/cleveragents/a2a/errors.py` and `src/cleveragents/a2a/facade.py`. All 15 whitelist entries below this comment are valid. ### Change 2 (line 957): `ACP` → `A2A` in LspServer facade property comment Verified correct. `src/cleveragents/lsp/server.py` defines `LspServer.facade` as a lazy property that creates an `A2aLocalFacade` instance (imported from `cleveragents.a2a.facade`). The module docstring and property docstring both reference "A2A". The old "ACP" comment was stale. --- ## Rename Completeness (Issue #689 Acceptance Criteria) Exhaustive audit across `features/`, `robot/`, `benchmarks/`, and `features/fixtures/` confirms all 6 statically verifiable acceptance criteria are satisfied: | # | Criterion | Result | |---|---|---| | 1 | All 13 test files renamed | **PASS** — 13 `a2a_*` files found, zero `acp_*` files | | 2 | All imports updated (`cleveragents.acp` → `cleveragents.a2a`) | **PASS** — 54 `a2a` import lines found, zero `acp` imports | | 3 | Gherkin scenario names updated | **PASS** — all feature files reference A2A | | 4 | Robot test names updated | **PASS** — all robot files reference A2A | | 5 | JSON fixture content updated | **PASS** — `a2a_facade_flows.json` uses A2A naming | | 6 | Zero ACP references in test files | **PASS** — case-insensitive sweep returned zero matches | The bulk of the rename was completed in PR #705 (already on master). This PR correctly fixes the only 2 remaining stale comments in `vulture_whitelist.py`. After this PR merges, zero `acp`/`ACP` references will remain in any code or test file (only legitimate historical references in ADR docs and `specification.md`). --- ## PR Metadata Compliance | Check | Status | |---|---| | Commit first line matches issue #689 metadata | **PASS** | | PR body has Forgejo closing keyword (`Closes #689`) | **PASS** | | `Type/Task` label present | **PASS** | | Milestone `v3.6.0` matches issue | **PASS** | | Branch name matches issue metadata | **PASS** | | Single commit, no merge commits | **PASS** | | Branch rebased on current master HEAD (`232ef965`) | **PASS** | | CONTRIBUTORS.md (hamza.khyari already listed) | **PASS** | | Conventional Changelog format | **PASS** | --- ## Findings ### P3:nit — No changelog entry CONTRIBUTING.md PR requirement #6 requires a changelog update for every PR. This is a 2-line comment-only change with zero user-facing impact, so a changelog entry is arguably unnecessary. Noting for completeness — author's discretion. ### P3:nit — Verify Forgejo dependency link direction CONTRIBUTING.md requires the PR to be configured as "blocking" issue #689 (and issue #689 to "depend on" this PR). Please confirm this dependency link is set with the correct direction. --- ## Verdict **APPROVED.** The code change is trivially correct, the ACP→A2A rename is verified complete across all test infrastructure, and all of @freemo's requested changes have been addressed. Only P3 nits remain, which are at the author's discretion per the review playbook. Note: @freemo's REQUEST_CHANGES review from the previous commit is still active and will need to be re-approved or dismissed before merge. *Reviewed by @brent.edwards. 2 review cycles + 4 parallel verification threads (ACP reference audit, vulture whitelist symbol verification, issue #689 acceptance criteria, CONTRIBUTING.md compliance).*
brent.edwards left a comment

Approve

Approve
hamza.khyari force-pushed refactor/m7-acp-to-a2a-tests from 8b1dfe8510
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / e2e_tests (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 32s
CI / security (pull_request) Successful in 32s
CI / unit_tests (pull_request) Successful in 2m21s
CI / integration_tests (pull_request) Successful in 3m2s
CI / docker (pull_request) Successful in 47s
CI / coverage (pull_request) Successful in 4m56s
CI / benchmark-regression (pull_request) Successful in 35m58s
to 4133c46aff
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / e2e_tests (pull_request) Successful in 25s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 34s
CI / unit_tests (pull_request) Successful in 2m12s
CI / docker (pull_request) Successful in 8s
CI / integration_tests (pull_request) Successful in 2m37s
CI / coverage (pull_request) Successful in 5m6s
CI / benchmark-regression (pull_request) Successful in 35m40s
2026-03-14 01:29:49 +00:00
Compare
hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-03-14 01:31:05 +00:00
hamza.khyari deleted branch refactor/m7-acp-to-a2a-tests 2026-03-14 01:35:58 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!737
No description provided.