test(e2e): add M6 autonomy acceptance suite #470

2026-02-27T20:31:42Z

brent.edwards commented

2026-02-27 20:31:42 +00:00

Summary

Closes #211

Adds comprehensive M6 autonomy acceptance test suites covering ACP facade flows, autonomy guardrails, and audit trail persistence.

Behave BDD Scenarios (52 scenarios)

ACP local facade dispatch for all 11 operations (session, plan, registry, context, event)
Guard enforcement (denylist, allowlist, cost budget, tool call limits, write/apply approval)
Automation profile built-in validation (8 profiles) and custom namespaced creation
Profile resolution precedence (plan > action > project > global)
Event queue lifecycle (publish, subscribe, unsubscribe, close, remote reject)
HTTP transport stub rejection in local mode (send, connect, disconnect)
ACP version negotiation (accept supported, reject unsupported)
ACP model validation (AcpRequest, AcpResponse, AcpEvent, AcpErrorDetail)

Robot Integration Tests (11 tests)

Facade session/plan lifecycle and unknown operation error
Event queue publish/subscribe, transport stub, version negotiation
Guard denylist/budget enforcement, profile resolution
Fixture loading, full end-to-end flow

ASV Performance Benchmarks (5 suites)

Facade dispatch operations
Guard evaluation (denylist, allowlist, budget, no-guards)
Profile resolution and service operations
Event queue publish/subscribe
Fixture loading overhead

Fixtures

features/fixtures/m6/acp_facade_flows.json — ACP operation flows
features/fixtures/m6/autonomy_guardrails.json — Guard configurations
features/fixtures/m6/automation_profiles.json — Profile assertions and precedence cases

Documentation

Updated docs/development/testing.md with M6 acceptance suite documentation

## Summary Closes #211 Adds comprehensive M6 autonomy acceptance test suites covering ACP facade flows, autonomy guardrails, and audit trail persistence. ### Behave BDD Scenarios (52 scenarios) - ACP local facade dispatch for all 11 operations (session, plan, registry, context, event) - Guard enforcement (denylist, allowlist, cost budget, tool call limits, write/apply approval) - Automation profile built-in validation (8 profiles) and custom namespaced creation - Profile resolution precedence (plan > action > project > global) - Event queue lifecycle (publish, subscribe, unsubscribe, close, remote reject) - HTTP transport stub rejection in local mode (send, connect, disconnect) - ACP version negotiation (accept supported, reject unsupported) - ACP model validation (AcpRequest, AcpResponse, AcpEvent, AcpErrorDetail) ### Robot Integration Tests (11 tests) - Facade session/plan lifecycle and unknown operation error - Event queue publish/subscribe, transport stub, version negotiation - Guard denylist/budget enforcement, profile resolution - Fixture loading, full end-to-end flow ### ASV Performance Benchmarks (5 suites) - Facade dispatch operations - Guard evaluation (denylist, allowlist, budget, no-guards) - Profile resolution and service operations - Event queue publish/subscribe - Fixture loading overhead ### Fixtures - `features/fixtures/m6/acp_facade_flows.json` — ACP operation flows - `features/fixtures/m6/autonomy_guardrails.json` — Guard configurations - `features/fixtures/m6/automation_profiles.json` — Profile assertions and precedence cases ### Documentation - Updated `docs/development/testing.md` with M6 acceptance suite documentation

brent.edwards self-assigned this 2026-02-27 20:31:42 +00:00

brent.edwards added 1 commit 2026-02-27 20:31:42 +00:00

test(e2e): add M6 autonomy acceptance suite

CI / lint (pull_request) Successful in 21s

Details

CI / quality (pull_request) Successful in 28s

Details

CI / security (pull_request) Successful in 31s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / typecheck (pull_request) Successful in 53s

Details

CI / build (pull_request) Successful in 24s

Details

CI / integration_tests (pull_request) Successful in 4m25s

Details

CI / unit_tests (pull_request) Successful in 23m33s

Details

CI / docker (pull_request) Successful in 11s

Details

CI / benchmark-regression (pull_request) Successful in 28m0s

Details

CI / coverage (pull_request) Successful in 48m3s

Details

348c230bc5

Add comprehensive M6 autonomy acceptance test suites covering the ACP
local-mode facade, autonomy guardrails, automation profile resolution,
event queue pub/sub, HTTP transport stub, and version negotiation.

Behave suite (52 scenarios):
- ACP facade dispatch for all 11 operations
- Guard enforcement (denylist, allowlist, budget, call limit, write/apply)
- Automation profile built-in validation and custom creation
- Profile resolution precedence (plan > action > project > global)
- Event queue lifecycle (publish, subscribe, unsubscribe, close)
- HTTP transport stub rejection in local mode
- ACP version negotiation (accept/reject)
- Model validation (AcpRequest, AcpResponse, AcpEvent, AcpErrorDetail)

Robot integration suite (11 tests):
- Facade session/plan lifecycle, unknown operation error
- Event queue publish/subscribe, transport stub, version negotiation
- Guard denylist/budget enforcement, profile resolution
- Fixture loading, full end-to-end flow

ASV benchmarks (5 suites):
- Facade dispatch, guard evaluation, profile resolution
- Event queue operations, fixture loading

Fixtures: acp_facade_flows.json, autonomy_guardrails.json,
automation_profiles.json

Closes #211

brent.edwards added 1 commit 2026-02-27 22:14:57 +00:00

Merge branch 'master' into feature/m6-autonomy-smoke

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 13s

Details

CI / build (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / typecheck (pull_request) Successful in 36s

Details

CI / security (pull_request) Successful in 39s

Details

CI / integration_tests (pull_request) Successful in 3m7s

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

f90cf24617

freemo approved these changes 2026-02-27 22:22:34 +00:00

Dismissed

freemo left a comment

Review: PR #470 — test(e2e): add M6 autonomy acceptance suite

Overall Assessment: Approved

This is a solid, well-structured test PR. The M6 autonomy acceptance suite is comprehensive (52 Behave scenarios, 11 Robot tests, 5 ASV benchmark suites), well-organized, and follows the project's BDD conventions. The code quality is high and the documentation update is thorough.

CONTRIBUTING.md Compliance

Requirement	Status	Notes
Detailed PR description	✅	Clear summary, scenario counts, fixture listing, issue reference
Issue reference (Closes #211)	✅	Present in both PR body and commit message
One Epic scope	✅	Focused on M6 autonomy acceptance testing
Atomic commit(s)	✅	Single logical unit — entire test suite is one coherent change
Conventional Changelog format	✅	`test(e2e): add M6 autonomy acceptance suite` matches issue metadata exactly
Commit references ticket	✅	`Closes #211` in commit body
No build/install artifacts	✅	Clean
CONTRIBUTORS.md	✅	brent.edwards already listed
Version bump	N/A	Test-only change; no version bump needed per guidelines
File organization	✅	Features, steps, fixtures, robot, benchmarks, docs all in correct directories
BDD test organization	✅	Feature-named step file, `m6 smoke` prefix avoids AmbiguousStep conflicts
Changelog update	⚠️	No CHANGELOG.md update — CONTRIBUTING requirement #6 asks for a changelog entry per commit
Milestone	⚠️	PR has no milestone — Issue #211 is assigned to v3.5.0; the PR should match (requirement #11)
Type label	⚠️	No `Type/` label on the PR — should have a `Type/` label per requirement #12

Code Quality

Feature file (309 lines): Well-structured with clear section comments, proper Given/When/Then patterns, and comprehensive coverage of ACP facade, guardrails, profiles, events, transport, and version negotiation.
Step definitions (774 lines): Exceeds the 500-line guideline. This is acceptable given the breadth of scenarios (52) and the inherent verbosity of step definition files, but consider splitting into 2 files in a future PR if this grows further (e.g., m6_facade_steps.py and m6_guardrails_steps.py).
Robot suite (97 lines) + helper (356 lines): Clean separation of Robot keywords and Python helper subcommands. Each subcommand is self-contained with clear sentinel output.
ASV benchmarks (231 lines): Proper ASV class structure with setup/teardown methods. Good coverage of facade dispatch, guard evaluation, profile resolution, event queue, and fixture loading.
Fixtures: Well-structured JSON with descriptive names and clear purpose per file.
Documentation: Thorough update to docs/development/testing.md with overview, fixture table, suite breakdown, run commands, and triage tips.

Type Safety & Error Handling

Type annotations used consistently throughout step definitions and helpers ✅
Single # type: ignore[arg-type] used appropriately in a test that deliberately passes the wrong type ✅
Error handling follows the pattern of catching specific exceptions in test assertions ✅

Minor Suggestions (non-blocking)

Add a CHANGELOG.md entry for this commit describing the M6 acceptance suite addition.
Assign milestone v3.5.0 to this PR to match issue #211.
Add a Type/ label (likely Type/Task or Type/Testing if available) to the PR.
The step definitions file at 774 lines is functional but on the large side — consider splitting in a follow-up if more M6 scenarios are added.

## Review: PR #470 — test(e2e): add M6 autonomy acceptance suite ### Overall Assessment: **Approved** This is a solid, well-structured test PR. The M6 autonomy acceptance suite is comprehensive (52 Behave scenarios, 11 Robot tests, 5 ASV benchmark suites), well-organized, and follows the project's BDD conventions. The code quality is high and the documentation update is thorough. --- ### CONTRIBUTING.md Compliance | Requirement | Status | Notes | |---|---|---| | Detailed PR description | ✅ | Clear summary, scenario counts, fixture listing, issue reference | | Issue reference (Closes #211) | ✅ | Present in both PR body and commit message | | One Epic scope | ✅ | Focused on M6 autonomy acceptance testing | | Atomic commit(s) | ✅ | Single logical unit — entire test suite is one coherent change | | Conventional Changelog format | ✅ | `test(e2e): add M6 autonomy acceptance suite` matches issue metadata exactly | | Commit references ticket | ✅ | `Closes #211` in commit body | | No build/install artifacts | ✅ | Clean | | CONTRIBUTORS.md | ✅ | brent.edwards already listed | | Version bump | N/A | Test-only change; no version bump needed per guidelines | | File organization | ✅ | Features, steps, fixtures, robot, benchmarks, docs all in correct directories | | BDD test organization | ✅ | Feature-named step file, `m6 smoke` prefix avoids AmbiguousStep conflicts | | **Changelog update** | ⚠️ | **No CHANGELOG.md update** — CONTRIBUTING requirement #6 asks for a changelog entry per commit | | **Milestone** | ⚠️ | **PR has no milestone** — Issue #211 is assigned to v3.5.0; the PR should match (requirement #11) | | **Type label** | ⚠️ | **No `Type/` label on the PR** — should have a `Type/` label per requirement #12 | ### Code Quality - **Feature file** (309 lines): Well-structured with clear section comments, proper Given/When/Then patterns, and comprehensive coverage of ACP facade, guardrails, profiles, events, transport, and version negotiation. - **Step definitions** (774 lines): Exceeds the 500-line guideline. This is acceptable given the breadth of scenarios (52) and the inherent verbosity of step definition files, but consider splitting into 2 files in a future PR if this grows further (e.g., `m6_facade_steps.py` and `m6_guardrails_steps.py`). - **Robot suite** (97 lines) + **helper** (356 lines): Clean separation of Robot keywords and Python helper subcommands. Each subcommand is self-contained with clear sentinel output. - **ASV benchmarks** (231 lines): Proper ASV class structure with setup/teardown methods. Good coverage of facade dispatch, guard evaluation, profile resolution, event queue, and fixture loading. - **Fixtures**: Well-structured JSON with descriptive names and clear purpose per file. - **Documentation**: Thorough update to `docs/development/testing.md` with overview, fixture table, suite breakdown, run commands, and triage tips. ### Type Safety & Error Handling - Type annotations used consistently throughout step definitions and helpers ✅ - Single `# type: ignore[arg-type]` used appropriately in a test that deliberately passes the wrong type ✅ - Error handling follows the pattern of catching specific exceptions in test assertions ✅ ### Minor Suggestions (non-blocking) 1. **Add a CHANGELOG.md entry** for this commit describing the M6 acceptance suite addition. 2. **Assign milestone v3.5.0** to this PR to match issue #211. 3. **Add a `Type/` label** (likely `Type/Task` or `Type/Testing` if available) to the PR. 4. The step definitions file at 774 lines is functional but on the large side — consider splitting in a follow-up if more M6 scenarios are added.

features/steps/m6_autonomy_acceptance_steps.py Outdated

						
				@@ -0,0 +1,774 @@

				"""Step definitions for M6 autonomy acceptance smoke tests.

freemo commented

2026-02-27 22:22:34 +00:00

At 774 lines this exceeds the project's 500-line guideline. It works well as a single file today given the m6 smoke prefix grouping, but if more scenarios are added consider splitting into e.g. m6_facade_steps.py and m6_guardrails_steps.py.

At 774 lines this exceeds the project's 500-line guideline. It works well as a single file today given the `m6 smoke` prefix grouping, but if more scenarios are added consider splitting into e.g. `m6_facade_steps.py` and `m6_guardrails_steps.py`.

brent.edwards added this to the v3.5.0 milestone 2026-02-27 22:30:29 +00:00

brent.edwards added the

Type

Testing

label 2026-02-27 22:30:31 +00:00

brent.edwards referenced this issue from a commit

2026-02-27 22:33:55 +00:00

fix(test): address PR #470 review feedback

brent.edwards added 1 commit 2026-02-27 22:33:55 +00:00

fix(test): address PR #470 review feedback

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / quality (pull_request) Successful in 28s

Details

CI / security (pull_request) Successful in 31s

Details

CI / build (pull_request) Successful in 26s

Details

CI / typecheck (pull_request) Successful in 45s

Details

CI / integration_tests (pull_request) Successful in 5m26s

Details

CI / unit_tests (pull_request) Successful in 18m48s

Details

CI / docker (pull_request) Successful in 14s

Details

CI / benchmark-regression (pull_request) Successful in 28m19s

Details

CI / coverage (pull_request) Successful in 1h26m48s

Details

c129f4c3f0

- Add CHANGELOG.md entry for M6 autonomy acceptance suite
- Split m6_autonomy_acceptance_steps.py (774 lines) into
  m6_facade_steps.py (398 lines) and m6_guardrails_steps.py (399 lines)
  to comply with the project's 500-line guideline

Closes #211

brent.edwards dismissed freemo's review 2026-02-27 22:33:55 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards commented

2026-02-27 22:34:09 +00:00

Done — split into m6_facade_steps.py (398 lines) and m6_guardrails_steps.py (399 lines), both well under the 500-line guideline. Behave auto-discovers steps from all files in steps/ so no feature file changes needed.

Done — split into `m6_facade_steps.py` (398 lines) and `m6_guardrails_steps.py` (399 lines), both well under the 500-line guideline. Behave auto-discovers steps from all files in `steps/` so no feature file changes needed.

brent.edwards commented

2026-02-27 22:34:18 +00:00

All review items addressed in commit c129f4c3:

Item	Status
CHANGELOG.md entry	Added
Milestone v3.5.0	Set via API
Type/Testing label	Added via API
Split step file (<500 lines)	`m6_facade_steps.py` (398) + `m6_guardrails_steps.py` (399)

Lint and typecheck pass locally. CI triggered.

All review items addressed in commit `c129f4c3`: | Item | Status | |---|---| | CHANGELOG.md entry | Added | | Milestone v3.5.0 | Set via API | | Type/Testing label | Added via API | | Split step file (<500 lines) | `m6_facade_steps.py` (398) + `m6_guardrails_steps.py` (399) | Lint and typecheck pass locally. CI triggered.

freemo approved these changes 2026-02-27 22:45:51 +00:00

Dismissed

brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-02-27 22:46:45 +00:00

brent.edwards added 1 commit 2026-02-28 01:10:10 +00:00

Merge branch 'master' into feature/m6-autonomy-smoke

CI / lint (pull_request) Successful in 22s

Details

CI / quality (pull_request) Successful in 23s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / security (pull_request) Successful in 47s

Details

CI / build (pull_request) Successful in 29s

Details

CI / typecheck (pull_request) Successful in 58s

Details

CI / integration_tests (pull_request) Successful in 5m17s

Details

CI / unit_tests (pull_request) Successful in 27m17s

Details

CI / docker (pull_request) Successful in 40s

Details

CI / benchmark-regression (pull_request) Successful in 28m29s

Details

CI / coverage (pull_request) Successful in 1h43m0s

Details

158a99c543

brent.edwards added 1 commit 2026-02-28 03:12:42 +00:00

Merge branch 'master' into feature/m6-autonomy-smoke

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / quality (pull_request) Successful in 17s

Details

CI / build (pull_request) Successful in 23s

Details

CI / security (pull_request) Successful in 30s

Details

CI / typecheck (pull_request) Successful in 32s

Details

CI / integration_tests (pull_request) Successful in 2m48s

Details

CI / unit_tests (pull_request) Successful in 15m6s

Details

CI / docker (pull_request) Successful in 1m1s

Details

CI / benchmark-regression (pull_request) Successful in 20m38s

Details

CI / coverage (pull_request) Has been cancelled

Details

a2f0dadd3b

brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-02-28 03:14:40 +00:00

brent.edwards added 1 commit 2026-02-28 03:34:55 +00:00

Merge branch 'master' into feature/m6-autonomy-smoke

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / quality (pull_request) Successful in 21s

Details

CI / build (pull_request) Successful in 24s

Details

CI / security (pull_request) Successful in 34s

Details

CI / typecheck (pull_request) Successful in 1m0s

Details

CI / integration_tests (pull_request) Successful in 4m28s

Details

CI / unit_tests (pull_request) Successful in 12m28s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / benchmark-regression (pull_request) Successful in 27m40s

Details

CI / coverage (pull_request) Successful in 43m12s

Details

3ec3cae203

brent.edwards added 1 commit 2026-02-28 04:28:36 +00:00

Merge branch 'master' into feature/m6-autonomy-smoke

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 12s

Details

CI / build (pull_request) Successful in 16s

Details

CI / quality (pull_request) Successful in 17s

Details

CI / security (pull_request) Successful in 30s

Details

CI / typecheck (pull_request) Successful in 31s

Details

CI / integration_tests (pull_request) Successful in 2m59s

Details

CI / unit_tests (pull_request) Successful in 11m20s

Details

CI / docker (pull_request) Successful in 8s

Details

CI / benchmark-regression (pull_request) Successful in 21m14s

Details

CI / coverage (pull_request) Successful in 1h8m33s

Details

e2129a26ba

brent.edwards added 1 commit 2026-02-28 05:49:35 +00:00

refactor: merge from master

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 14s

Details

CI / quality (pull_request) Successful in 16s

Details

CI / build (pull_request) Successful in 23s

Details

CI / typecheck (pull_request) Successful in 31s

Details

CI / security (pull_request) Successful in 59s

Details

CI / integration_tests (pull_request) Successful in 3m30s

Details

CI / benchmark-regression (pull_request) Successful in 21m22s

Details

CI / unit_tests (pull_request) Successful in 23m4s

Details

CI / docker (pull_request) Successful in 16s

Details

CI / coverage (pull_request) Successful in 43m38s

Details

38e75342a9

This code is a merge from master.

brent.edwards dismissed freemo's review 2026-02-28 05:49:35 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards merged commit ab32584f1d into master

2026-02-28 06:33:48 +00:00

brent.edwards deleted branch feature/m6-autonomy-smoke

2026-02-28 06:33:48 +00:00

brent.edwards referenced this issue from a commit

2026-02-28 06:33:50 +00:00

Merge pull request 'test(e2e): add M6 autonomy acceptance suite' (#470) from feature/m6-autonomy-smoke into master

freemo added the

State

Completed

label 2026-03-04 00:41:54 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#470

				`@@ -0,0 +1,774 @@`
				`"""Step definitions for M6 autonomy acceptance smoke tests.`