test(e2e): set up E2E test infrastructure — nox session, CI job, Robot Framework @E2E tag #740

Closed
opened 2026-03-12 19:28:52 +00:00 by freemo · 1 comment
Owner

Metadata

  • Commit Message: test(e2e): set up E2E test infrastructure — nox session, CI job, Robot Framework @E2E tag
  • Branch: test/e2e-infrastructure

Background

All true end-to-end tests need dedicated infrastructure separate from the existing integration test suite. E2E tests use zero mocking, stubbing, or test doubles of any kind — they exercise the real CleverAgents CLI against real LLM API keys (Anthropic/OpenAI already available in CI). This ticket sets up the foundational infrastructure: a new nox session (nox -s e2e_tests), a dedicated CI job, and the Robot Framework tagging convention (@E2E) that all subsequent E2E test tickets depend on.

Expected Behavior

A new nox -s e2e_tests session exists that discovers and runs only Robot Framework test suites tagged with @E2E. These tests are excluded from the standard nox -s integration_tests session. A dedicated CI pipeline job runs E2E tests separately, using real Anthropic/OpenAI API keys from the CI environment. The E2E session and CI job are fully functional so that subsequent E2E test tickets can simply add .robot files and have them picked up automatically.

Acceptance Criteria

  • New e2e_tests nox session added to noxfile.py that runs Robot Framework with --include E2E tag filter
  • The e2e_tests session discovers .robot files from a dedicated directory (e.g., robot/e2e/)
  • The existing integration_tests nox session excludes @E2E-tagged tests (via --exclude E2E or directory separation)
  • A dedicated CI job is added that runs nox -s e2e_tests with real LLM API keys from CI environment variables
  • The CI E2E job is configured to run separately from the standard integration test job (not blocking regular CI)
  • A minimal smoke-test .robot file with @E2E tag is included to validate the infrastructure works
  • The smoke test exercises a basic agents --version command (no LLM call needed) to verify the E2E harness
  • nox -s e2e_tests passes when run locally with valid API keys
  • nox (all default sessions) continues to pass — E2E tests do not interfere with existing sessions
  • Coverage >=97% is maintained

Subtasks

  • Add e2e_tests session to noxfile.py with Robot Framework --include E2E filtering
  • Create robot/e2e/ directory for E2E test suites
  • Ensure integration_tests session excludes E2E-tagged tests
  • Add dedicated CI job for nox -s e2e_tests with real API key environment variables
  • Write minimal smoke-test .robot file (robot/e2e/smoke_test.robot) with @E2E tag
  • Verify nox -s e2e_tests runs the smoke test successfully
  • Verify nox -s integration_tests does NOT run E2E-tagged tests
  • Tests (Behave): N/A (infrastructure ticket)
  • Tests (Robot): The smoke test .robot file IS this ticket's deliverable
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `test(e2e): set up E2E test infrastructure — nox session, CI job, Robot Framework @E2E tag` - **Branch**: `test/e2e-infrastructure` ## Background All true end-to-end tests need dedicated infrastructure separate from the existing integration test suite. E2E tests use **zero mocking, stubbing, or test doubles of any kind** — they exercise the real CleverAgents CLI against real LLM API keys (Anthropic/OpenAI already available in CI). This ticket sets up the foundational infrastructure: a new nox session (`nox -s e2e_tests`), a dedicated CI job, and the Robot Framework tagging convention (`@E2E`) that all subsequent E2E test tickets depend on. ## Expected Behavior A new `nox -s e2e_tests` session exists that discovers and runs only Robot Framework test suites tagged with `@E2E`. These tests are excluded from the standard `nox -s integration_tests` session. A dedicated CI pipeline job runs E2E tests separately, using real Anthropic/OpenAI API keys from the CI environment. The E2E session and CI job are fully functional so that subsequent E2E test tickets can simply add `.robot` files and have them picked up automatically. ## Acceptance Criteria - [x] New `e2e_tests` nox session added to `noxfile.py` that runs Robot Framework with `--include E2E` tag filter - [x] The `e2e_tests` session discovers `.robot` files from a dedicated directory (e.g., `robot/e2e/`) - [x] The existing `integration_tests` nox session excludes `@E2E`-tagged tests (via `--exclude E2E` or directory separation) - [x] A dedicated CI job is added that runs `nox -s e2e_tests` with real LLM API keys from CI environment variables - [x] The CI E2E job is configured to run separately from the standard integration test job (not blocking regular CI) - [x] A minimal smoke-test `.robot` file with `@E2E` tag is included to validate the infrastructure works - [x] The smoke test exercises a basic `agents --version` command (no LLM call needed) to verify the E2E harness - [x] `nox -s e2e_tests` passes when run locally with valid API keys - [x] `nox` (all default sessions) continues to pass — E2E tests do not interfere with existing sessions - [x] Coverage >=97% is maintained ## Subtasks - [x] Add `e2e_tests` session to `noxfile.py` with Robot Framework `--include E2E` filtering - [x] Create `robot/e2e/` directory for E2E test suites - [x] Ensure `integration_tests` session excludes E2E-tagged tests - [x] Add dedicated CI job for `nox -s e2e_tests` with real API key environment variables - [x] Write minimal smoke-test `.robot` file (`robot/e2e/smoke_test.robot`) with `@E2E` tag - [x] Verify `nox -s e2e_tests` runs the smoke test successfully - [x] Verify `nox -s integration_tests` does NOT run E2E-tagged tests - [x] Tests (Behave): N/A (infrastructure ticket) - [x] Tests (Robot): The smoke test `.robot` file IS this ticket's deliverable - [x] Verify coverage >=97% via `nox -s coverage_report` - [x] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo self-assigned this 2026-03-12 19:28:58 +00:00
freemo added this to the v3.2.0 milestone 2026-03-12 19:28:58 +00:00
Author
Owner

Implementation Notes

Design Decisions

  1. Separate output directory: E2E test results go to build/reports/robot-e2e/ (not build/reports/robot/) to avoid colliding with integration test artifacts.

  2. Sequential execution via robot (not pabot): E2E tests run sequentially since they exercise real LLM APIs with rate limits and may have longer execution times. Using robot directly instead of pabot avoids concurrency issues with real API keys.

  3. LLM API key propagation: The e2e_tests nox session explicitly propagates ANTHROPIC_API_KEY, OPENAI_API_KEY, and GOOGLE_API_KEY from the outer environment into the session. This ensures keys are available to subprocess invocations.

  4. Graceful skip mechanism: common_e2e.resource provides a Skip If No LLM Keys keyword that tests can use as a [Setup] step. If neither ANTHROPIC_API_KEY nor OPENAI_API_KEY is set, the test is skipped rather than failing. This allows the smoke test (which doesn't need keys) to always pass.

  5. --exclude E2E on integration_tests: Added to the pabot arguments in the existing integration_tests session so that if an E2E-tagged test were accidentally placed in the robot/ root, it would still be excluded from integration runs.

  6. CI job independence: The e2e_tests CI job has no needs dependencies — it runs in parallel with all other jobs. It does not block regular CI. API keys are injected via Forgejo CI secrets.

  7. Not in default sessions: e2e_tests is deliberately NOT added to nox.options.sessions because it requires real API keys that may not be present in all development environments.

Files Changed

  • noxfile.py: Added e2e_tests session, added --exclude E2E to integration_tests session
  • robot/e2e/common_e2e.resource: Shared E2E resource with setup/teardown, skip mechanism, CLI runner keyword, flexible assertions, temp git repo helper
  • robot/e2e/smoke_test.robot: Minimal smoke test with [Tags] E2E — exercises agents --version and agents --help
  • .forgejo/workflows/ci.yml: Added e2e_tests job with LLM API key secrets
  • CHANGELOG.md: Added entry for #740

Test Results

  • nox -s e2e_tests: 2 tests, 2 passed, 0 failed
  • nox -s integration_tests: 1491 tests, 1491 passed, 0 failed (E2E excluded)
  • nox -s unit_tests: 10674 scenarios, 0 failed
  • nox -s coverage_report: 98.2% (threshold: 97%)
  • All other default sessions (lint, format, typecheck, security_scan, dead_code, docs, build): PASS
## Implementation Notes ### Design Decisions 1. **Separate output directory**: E2E test results go to `build/reports/robot-e2e/` (not `build/reports/robot/`) to avoid colliding with integration test artifacts. 2. **Sequential execution via `robot` (not `pabot`)**: E2E tests run sequentially since they exercise real LLM APIs with rate limits and may have longer execution times. Using `robot` directly instead of `pabot` avoids concurrency issues with real API keys. 3. **LLM API key propagation**: The `e2e_tests` nox session explicitly propagates `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, and `GOOGLE_API_KEY` from the outer environment into the session. This ensures keys are available to subprocess invocations. 4. **Graceful skip mechanism**: `common_e2e.resource` provides a `Skip If No LLM Keys` keyword that tests can use as a `[Setup]` step. If neither `ANTHROPIC_API_KEY` nor `OPENAI_API_KEY` is set, the test is skipped rather than failing. This allows the smoke test (which doesn't need keys) to always pass. 5. **`--exclude E2E` on integration_tests**: Added to the `pabot` arguments in the existing `integration_tests` session so that if an E2E-tagged test were accidentally placed in the `robot/` root, it would still be excluded from integration runs. 6. **CI job independence**: The `e2e_tests` CI job has no `needs` dependencies — it runs in parallel with all other jobs. It does not block regular CI. API keys are injected via Forgejo CI secrets. 7. **Not in default sessions**: `e2e_tests` is deliberately NOT added to `nox.options.sessions` because it requires real API keys that may not be present in all development environments. ### Files Changed - `noxfile.py`: Added `e2e_tests` session, added `--exclude E2E` to `integration_tests` session - `robot/e2e/common_e2e.resource`: Shared E2E resource with setup/teardown, skip mechanism, CLI runner keyword, flexible assertions, temp git repo helper - `robot/e2e/smoke_test.robot`: Minimal smoke test with `[Tags] E2E` — exercises `agents --version` and `agents --help` - `.forgejo/workflows/ci.yml`: Added `e2e_tests` job with LLM API key secrets - `CHANGELOG.md`: Added entry for #740 ### Test Results - `nox -s e2e_tests`: 2 tests, 2 passed, 0 failed - `nox -s integration_tests`: 1491 tests, 1491 passed, 0 failed (E2E excluded) - `nox -s unit_tests`: 10674 scenarios, 0 failed - `nox -s coverage_report`: 98.2% (threshold: 97%) - All other default sessions (lint, format, typecheck, security_scan, dead_code, docs, build): PASS
freemo 2026-03-12 21:12:08 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#740
No description provided.