test(e2e): add M6 autonomy acceptance suite #470

Merged
brent.edwards merged 8 commits from feature/m6-autonomy-smoke into master 2026-02-28 06:33:48 +00:00
Member

Summary

Closes #211

Adds comprehensive M6 autonomy acceptance test suites covering ACP facade flows, autonomy guardrails, and audit trail persistence.

Behave BDD Scenarios (52 scenarios)

  • ACP local facade dispatch for all 11 operations (session, plan, registry, context, event)
  • Guard enforcement (denylist, allowlist, cost budget, tool call limits, write/apply approval)
  • Automation profile built-in validation (8 profiles) and custom namespaced creation
  • Profile resolution precedence (plan > action > project > global)
  • Event queue lifecycle (publish, subscribe, unsubscribe, close, remote reject)
  • HTTP transport stub rejection in local mode (send, connect, disconnect)
  • ACP version negotiation (accept supported, reject unsupported)
  • ACP model validation (AcpRequest, AcpResponse, AcpEvent, AcpErrorDetail)

Robot Integration Tests (11 tests)

  • Facade session/plan lifecycle and unknown operation error
  • Event queue publish/subscribe, transport stub, version negotiation
  • Guard denylist/budget enforcement, profile resolution
  • Fixture loading, full end-to-end flow

ASV Performance Benchmarks (5 suites)

  • Facade dispatch operations
  • Guard evaluation (denylist, allowlist, budget, no-guards)
  • Profile resolution and service operations
  • Event queue publish/subscribe
  • Fixture loading overhead

Fixtures

  • features/fixtures/m6/acp_facade_flows.json — ACP operation flows
  • features/fixtures/m6/autonomy_guardrails.json — Guard configurations
  • features/fixtures/m6/automation_profiles.json — Profile assertions and precedence cases

Documentation

  • Updated docs/development/testing.md with M6 acceptance suite documentation
## Summary Closes #211 Adds comprehensive M6 autonomy acceptance test suites covering ACP facade flows, autonomy guardrails, and audit trail persistence. ### Behave BDD Scenarios (52 scenarios) - ACP local facade dispatch for all 11 operations (session, plan, registry, context, event) - Guard enforcement (denylist, allowlist, cost budget, tool call limits, write/apply approval) - Automation profile built-in validation (8 profiles) and custom namespaced creation - Profile resolution precedence (plan > action > project > global) - Event queue lifecycle (publish, subscribe, unsubscribe, close, remote reject) - HTTP transport stub rejection in local mode (send, connect, disconnect) - ACP version negotiation (accept supported, reject unsupported) - ACP model validation (AcpRequest, AcpResponse, AcpEvent, AcpErrorDetail) ### Robot Integration Tests (11 tests) - Facade session/plan lifecycle and unknown operation error - Event queue publish/subscribe, transport stub, version negotiation - Guard denylist/budget enforcement, profile resolution - Fixture loading, full end-to-end flow ### ASV Performance Benchmarks (5 suites) - Facade dispatch operations - Guard evaluation (denylist, allowlist, budget, no-guards) - Profile resolution and service operations - Event queue publish/subscribe - Fixture loading overhead ### Fixtures - `features/fixtures/m6/acp_facade_flows.json` — ACP operation flows - `features/fixtures/m6/autonomy_guardrails.json` — Guard configurations - `features/fixtures/m6/automation_profiles.json` — Profile assertions and precedence cases ### Documentation - Updated `docs/development/testing.md` with M6 acceptance suite documentation
test(e2e): add M6 autonomy acceptance suite
All checks were successful
CI / lint (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 28s
CI / security (pull_request) Successful in 31s
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 53s
CI / build (pull_request) Successful in 24s
CI / integration_tests (pull_request) Successful in 4m25s
CI / unit_tests (pull_request) Successful in 23m33s
CI / docker (pull_request) Successful in 11s
CI / benchmark-regression (pull_request) Successful in 28m0s
CI / coverage (pull_request) Successful in 48m3s
348c230bc5
Add comprehensive M6 autonomy acceptance test suites covering the ACP
local-mode facade, autonomy guardrails, automation profile resolution,
event queue pub/sub, HTTP transport stub, and version negotiation.

Behave suite (52 scenarios):
- ACP facade dispatch for all 11 operations
- Guard enforcement (denylist, allowlist, budget, call limit, write/apply)
- Automation profile built-in validation and custom creation
- Profile resolution precedence (plan > action > project > global)
- Event queue lifecycle (publish, subscribe, unsubscribe, close)
- HTTP transport stub rejection in local mode
- ACP version negotiation (accept/reject)
- Model validation (AcpRequest, AcpResponse, AcpEvent, AcpErrorDetail)

Robot integration suite (11 tests):
- Facade session/plan lifecycle, unknown operation error
- Event queue publish/subscribe, transport stub, version negotiation
- Guard denylist/budget enforcement, profile resolution
- Fixture loading, full end-to-end flow

ASV benchmarks (5 suites):
- Facade dispatch, guard evaluation, profile resolution
- Event queue operations, fixture loading

Fixtures: acp_facade_flows.json, autonomy_guardrails.json,
automation_profiles.json

Closes #211
Merge branch 'master' into feature/m6-autonomy-smoke
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 13s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 36s
CI / security (pull_request) Successful in 39s
CI / integration_tests (pull_request) Successful in 3m7s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
f90cf24617
freemo approved these changes 2026-02-27 22:22:34 +00:00
Dismissed
freemo left a comment

Review: PR #470 — test(e2e): add M6 autonomy acceptance suite

Overall Assessment: Approved

This is a solid, well-structured test PR. The M6 autonomy acceptance suite is comprehensive (52 Behave scenarios, 11 Robot tests, 5 ASV benchmark suites), well-organized, and follows the project's BDD conventions. The code quality is high and the documentation update is thorough.


CONTRIBUTING.md Compliance

Requirement Status Notes
Detailed PR description Clear summary, scenario counts, fixture listing, issue reference
Issue reference (Closes #211) Present in both PR body and commit message
One Epic scope Focused on M6 autonomy acceptance testing
Atomic commit(s) Single logical unit — entire test suite is one coherent change
Conventional Changelog format test(e2e): add M6 autonomy acceptance suite matches issue metadata exactly
Commit references ticket Closes #211 in commit body
No build/install artifacts Clean
CONTRIBUTORS.md brent.edwards already listed
Version bump N/A Test-only change; no version bump needed per guidelines
File organization Features, steps, fixtures, robot, benchmarks, docs all in correct directories
BDD test organization Feature-named step file, m6 smoke prefix avoids AmbiguousStep conflicts
Changelog update ⚠️ No CHANGELOG.md update — CONTRIBUTING requirement #6 asks for a changelog entry per commit
Milestone ⚠️ PR has no milestone — Issue #211 is assigned to v3.5.0; the PR should match (requirement #11)
Type label ⚠️ No Type/ label on the PR — should have a Type/ label per requirement #12

Code Quality

  • Feature file (309 lines): Well-structured with clear section comments, proper Given/When/Then patterns, and comprehensive coverage of ACP facade, guardrails, profiles, events, transport, and version negotiation.
  • Step definitions (774 lines): Exceeds the 500-line guideline. This is acceptable given the breadth of scenarios (52) and the inherent verbosity of step definition files, but consider splitting into 2 files in a future PR if this grows further (e.g., m6_facade_steps.py and m6_guardrails_steps.py).
  • Robot suite (97 lines) + helper (356 lines): Clean separation of Robot keywords and Python helper subcommands. Each subcommand is self-contained with clear sentinel output.
  • ASV benchmarks (231 lines): Proper ASV class structure with setup/teardown methods. Good coverage of facade dispatch, guard evaluation, profile resolution, event queue, and fixture loading.
  • Fixtures: Well-structured JSON with descriptive names and clear purpose per file.
  • Documentation: Thorough update to docs/development/testing.md with overview, fixture table, suite breakdown, run commands, and triage tips.

Type Safety & Error Handling

  • Type annotations used consistently throughout step definitions and helpers
  • Single # type: ignore[arg-type] used appropriately in a test that deliberately passes the wrong type
  • Error handling follows the pattern of catching specific exceptions in test assertions

Minor Suggestions (non-blocking)

  1. Add a CHANGELOG.md entry for this commit describing the M6 acceptance suite addition.
  2. Assign milestone v3.5.0 to this PR to match issue #211.
  3. Add a Type/ label (likely Type/Task or Type/Testing if available) to the PR.
  4. The step definitions file at 774 lines is functional but on the large side — consider splitting in a follow-up if more M6 scenarios are added.
## Review: PR #470 — test(e2e): add M6 autonomy acceptance suite ### Overall Assessment: **Approved** This is a solid, well-structured test PR. The M6 autonomy acceptance suite is comprehensive (52 Behave scenarios, 11 Robot tests, 5 ASV benchmark suites), well-organized, and follows the project's BDD conventions. The code quality is high and the documentation update is thorough. --- ### CONTRIBUTING.md Compliance | Requirement | Status | Notes | |---|---|---| | Detailed PR description | ✅ | Clear summary, scenario counts, fixture listing, issue reference | | Issue reference (Closes #211) | ✅ | Present in both PR body and commit message | | One Epic scope | ✅ | Focused on M6 autonomy acceptance testing | | Atomic commit(s) | ✅ | Single logical unit — entire test suite is one coherent change | | Conventional Changelog format | ✅ | `test(e2e): add M6 autonomy acceptance suite` matches issue metadata exactly | | Commit references ticket | ✅ | `Closes #211` in commit body | | No build/install artifacts | ✅ | Clean | | CONTRIBUTORS.md | ✅ | brent.edwards already listed | | Version bump | N/A | Test-only change; no version bump needed per guidelines | | File organization | ✅ | Features, steps, fixtures, robot, benchmarks, docs all in correct directories | | BDD test organization | ✅ | Feature-named step file, `m6 smoke` prefix avoids AmbiguousStep conflicts | | **Changelog update** | ⚠️ | **No CHANGELOG.md update** — CONTRIBUTING requirement #6 asks for a changelog entry per commit | | **Milestone** | ⚠️ | **PR has no milestone** — Issue #211 is assigned to v3.5.0; the PR should match (requirement #11) | | **Type label** | ⚠️ | **No `Type/` label on the PR** — should have a `Type/` label per requirement #12 | ### Code Quality - **Feature file** (309 lines): Well-structured with clear section comments, proper Given/When/Then patterns, and comprehensive coverage of ACP facade, guardrails, profiles, events, transport, and version negotiation. - **Step definitions** (774 lines): Exceeds the 500-line guideline. This is acceptable given the breadth of scenarios (52) and the inherent verbosity of step definition files, but consider splitting into 2 files in a future PR if this grows further (e.g., `m6_facade_steps.py` and `m6_guardrails_steps.py`). - **Robot suite** (97 lines) + **helper** (356 lines): Clean separation of Robot keywords and Python helper subcommands. Each subcommand is self-contained with clear sentinel output. - **ASV benchmarks** (231 lines): Proper ASV class structure with setup/teardown methods. Good coverage of facade dispatch, guard evaluation, profile resolution, event queue, and fixture loading. - **Fixtures**: Well-structured JSON with descriptive names and clear purpose per file. - **Documentation**: Thorough update to `docs/development/testing.md` with overview, fixture table, suite breakdown, run commands, and triage tips. ### Type Safety & Error Handling - Type annotations used consistently throughout step definitions and helpers ✅ - Single `# type: ignore[arg-type]` used appropriately in a test that deliberately passes the wrong type ✅ - Error handling follows the pattern of catching specific exceptions in test assertions ✅ ### Minor Suggestions (non-blocking) 1. **Add a CHANGELOG.md entry** for this commit describing the M6 acceptance suite addition. 2. **Assign milestone v3.5.0** to this PR to match issue #211. 3. **Add a `Type/` label** (likely `Type/Task` or `Type/Testing` if available) to the PR. 4. The step definitions file at 774 lines is functional but on the large side — consider splitting in a follow-up if more M6 scenarios are added.
@ -0,0 +1,774 @@
"""Step definitions for M6 autonomy acceptance smoke tests.
Owner

At 774 lines this exceeds the project's 500-line guideline. It works well as a single file today given the m6 smoke prefix grouping, but if more scenarios are added consider splitting into e.g. m6_facade_steps.py and m6_guardrails_steps.py.

At 774 lines this exceeds the project's 500-line guideline. It works well as a single file today given the `m6 smoke` prefix grouping, but if more scenarios are added consider splitting into e.g. `m6_facade_steps.py` and `m6_guardrails_steps.py`.
brent.edwards added this to the v3.5.0 milestone 2026-02-27 22:30:29 +00:00
fix(test): address PR #470 review feedback
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 28s
CI / security (pull_request) Successful in 31s
CI / build (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 45s
CI / integration_tests (pull_request) Successful in 5m26s
CI / unit_tests (pull_request) Successful in 18m48s
CI / docker (pull_request) Successful in 14s
CI / benchmark-regression (pull_request) Successful in 28m19s
CI / coverage (pull_request) Successful in 1h26m48s
c129f4c3f0
- Add CHANGELOG.md entry for M6 autonomy acceptance suite
- Split m6_autonomy_acceptance_steps.py (774 lines) into
  m6_facade_steps.py (398 lines) and m6_guardrails_steps.py (399 lines)
  to comply with the project's 500-line guideline

Closes #211
brent.edwards dismissed freemo's review 2026-02-27 22:33:55 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Done — split into m6_facade_steps.py (398 lines) and m6_guardrails_steps.py (399 lines), both well under the 500-line guideline. Behave auto-discovers steps from all files in steps/ so no feature file changes needed.

Done — split into `m6_facade_steps.py` (398 lines) and `m6_guardrails_steps.py` (399 lines), both well under the 500-line guideline. Behave auto-discovers steps from all files in `steps/` so no feature file changes needed.
Author
Member

All review items addressed in commit c129f4c3:

Item Status
CHANGELOG.md entry Added
Milestone v3.5.0 Set via API
Type/Testing label Added via API
Split step file (<500 lines) m6_facade_steps.py (398) + m6_guardrails_steps.py (399)

Lint and typecheck pass locally. CI triggered.

All review items addressed in commit `c129f4c3`: | Item | Status | |---|---| | CHANGELOG.md entry | Added | | Milestone v3.5.0 | Set via API | | Type/Testing label | Added via API | | Split step file (<500 lines) | `m6_facade_steps.py` (398) + `m6_guardrails_steps.py` (399) | Lint and typecheck pass locally. CI triggered.
freemo approved these changes 2026-02-27 22:45:51 +00:00
Dismissed
brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-02-27 22:46:45 +00:00
Merge branch 'master' into feature/m6-autonomy-smoke
All checks were successful
CI / lint (pull_request) Successful in 22s
CI / quality (pull_request) Successful in 23s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 47s
CI / build (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 58s
CI / integration_tests (pull_request) Successful in 5m17s
CI / unit_tests (pull_request) Successful in 27m17s
CI / docker (pull_request) Successful in 40s
CI / benchmark-regression (pull_request) Successful in 28m29s
CI / coverage (pull_request) Successful in 1h43m0s
158a99c543
Merge branch 'master' into feature/m6-autonomy-smoke
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / build (pull_request) Successful in 23s
CI / security (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 32s
CI / integration_tests (pull_request) Successful in 2m48s
CI / unit_tests (pull_request) Successful in 15m6s
CI / docker (pull_request) Successful in 1m1s
CI / benchmark-regression (pull_request) Successful in 20m38s
CI / coverage (pull_request) Has been cancelled
a2f0dadd3b
brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-02-28 03:14:40 +00:00
Merge branch 'master' into feature/m6-autonomy-smoke
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 21s
CI / build (pull_request) Successful in 24s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 1m0s
CI / integration_tests (pull_request) Successful in 4m28s
CI / unit_tests (pull_request) Successful in 12m28s
CI / docker (pull_request) Successful in 39s
CI / benchmark-regression (pull_request) Successful in 27m40s
CI / coverage (pull_request) Successful in 43m12s
3ec3cae203
Merge branch 'master' into feature/m6-autonomy-smoke
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 12s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 31s
CI / integration_tests (pull_request) Successful in 2m59s
CI / unit_tests (pull_request) Successful in 11m20s
CI / docker (pull_request) Successful in 8s
CI / benchmark-regression (pull_request) Successful in 21m14s
CI / coverage (pull_request) Successful in 1h8m33s
e2129a26ba
refactor: merge from master
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 16s
CI / build (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 31s
CI / security (pull_request) Successful in 59s
CI / integration_tests (pull_request) Successful in 3m30s
CI / benchmark-regression (pull_request) Successful in 21m22s
CI / unit_tests (pull_request) Successful in 23m4s
CI / docker (pull_request) Successful in 16s
CI / coverage (pull_request) Successful in 43m38s
38e75342a9
This code is a merge from master.
brent.edwards dismissed freemo's review 2026-02-28 05:49:35 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards deleted branch feature/m6-autonomy-smoke 2026-02-28 06:33:48 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!470
No description provided.