TEST-INFRA: [test-architecture] Review and document ASV benchmark best practices #7860

Open
opened 2026-04-12 05:43:33 +00:00 by HAL9000 · 4 comments
Owner

Metadata

  • Branch: test/review-asv-benchmark-best-practices
  • Commit Message: test(benchmarks): review and document ASV benchmark best practices
  • Milestone: (none — backlog)
  • Parent Epic: #5407

Background and Context

The benchmarks/ directory contains ASV (airspeed velocity) performance benchmarks that have grown organically over time. Without a deliberate review against established best practices, the benchmark suite may accumulate structural debt: missing coverage of performance-critical paths, inconsistent naming conventions, and undocumented purpose or expected outcomes. This makes the suite harder to maintain, interpret, and extend.

Expected Behavior

The ASV benchmark suite in benchmarks/ follows established best practices, covers all performance-critical parts of the application, uses a consistent and clear naming convention, and documents the purpose and expected outcome of each benchmark.

Acceptance Criteria

  • All existing benchmarks in benchmarks/ are reviewed against ASV best practices
  • Benchmarks exist for all performance-critical parts of the application
  • A consistent and clear naming convention is applied to all benchmarks
  • Each benchmark is documented with its purpose and expected outcome
  • A best-practices document (or inline comments) is produced summarising the conventions adopted
  • nox -s benchmark passes without errors
  • nox passes with coverage ≥ 97%

Supporting Information

  • Sibling issues: #7845 (BDD/Gherkin best practices), #7849 (Robot Framework best practices) — same [test-architecture] category
  • Parent Epic: #5407 (Testing Infrastructure Improvements)
  • Duplicate check: search queries "ASV", "benchmark", "performance" — 0 existing issues cover a review of ASV benchmark best practices

Subtasks

  • Audit all benchmark files in benchmarks/ against ASV best practices
  • Identify performance-critical application paths lacking benchmark coverage
  • Create missing benchmarks for uncovered performance-critical paths
  • Apply a consistent naming convention across all benchmark files and classes
  • Add docstrings/comments documenting the purpose and expected outcome of each benchmark
  • Write or update a benchmarks/README.md documenting the adopted conventions
  • Run nox -s benchmark and fix any errors
  • Run nox (all default sessions), fix any errors
  • Verify coverage ≥ 97% via nox -s coverage_report

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (test(benchmarks): review and document ASV benchmark best practices), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (test/review-asv-benchmark-best-practices).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage ≥ 97%.

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: new-issue-creator

## Metadata - **Branch**: `test/review-asv-benchmark-best-practices` - **Commit Message**: `test(benchmarks): review and document ASV benchmark best practices` - **Milestone**: *(none — backlog)* - **Parent Epic**: #5407 ## Background and Context The `benchmarks/` directory contains ASV (airspeed velocity) performance benchmarks that have grown organically over time. Without a deliberate review against established best practices, the benchmark suite may accumulate structural debt: missing coverage of performance-critical paths, inconsistent naming conventions, and undocumented purpose or expected outcomes. This makes the suite harder to maintain, interpret, and extend. ## Expected Behavior The ASV benchmark suite in `benchmarks/` follows established best practices, covers all performance-critical parts of the application, uses a consistent and clear naming convention, and documents the purpose and expected outcome of each benchmark. ## Acceptance Criteria - [ ] All existing benchmarks in `benchmarks/` are reviewed against ASV best practices - [ ] Benchmarks exist for all performance-critical parts of the application - [ ] A consistent and clear naming convention is applied to all benchmarks - [ ] Each benchmark is documented with its purpose and expected outcome - [ ] A best-practices document (or inline comments) is produced summarising the conventions adopted - [ ] `nox -s benchmark` passes without errors - [ ] `nox` passes with coverage ≥ 97% ## Supporting Information - Sibling issues: #7845 (BDD/Gherkin best practices), #7849 (Robot Framework best practices) — same `[test-architecture]` category - Parent Epic: #5407 (Testing Infrastructure Improvements) - Duplicate check: search queries "ASV", "benchmark", "performance" — 0 existing issues cover a review of ASV benchmark best practices ## Subtasks - [ ] Audit all benchmark files in `benchmarks/` against ASV best practices - [ ] Identify performance-critical application paths lacking benchmark coverage - [ ] Create missing benchmarks for uncovered performance-critical paths - [ ] Apply a consistent naming convention across all benchmark files and classes - [ ] Add docstrings/comments documenting the purpose and expected outcome of each benchmark - [ ] Write or update a `benchmarks/README.md` documenting the adopted conventions - [ ] Run `nox -s benchmark` and fix any errors - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`test(benchmarks): review and document ASV benchmark best practices`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`test/review-asv-benchmark-best-practices`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage ≥ 97%. > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: new-issue-creator
Author
Owner

Verified — Test infrastructure task: ASV benchmark best practices documentation. MoSCoW: Could-have. Priority: Low.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Test infrastructure task: ASV benchmark best practices documentation. MoSCoW: Could-have. Priority: Low. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Test infrastructure task: ASV benchmark best practices documentation. MoSCoW: Could-have. Priority: Low.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Test infrastructure task: ASV benchmark best practices documentation. MoSCoW: Could-have. Priority: Low. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Test infrastructure task: ASV benchmark best practices documentation. MoSCoW: Could-have. Priority: Low.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Test infrastructure task: ASV benchmark best practices documentation. MoSCoW: Could-have. Priority: Low. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Code Review: fix(plugin_loader): add security validation for plugin loading

Branch: fix/v360/plugin-loader-security
Head SHA: 677204bfd4


CI Status

  • lint: PASS
  • typecheck: PASS
  • security: PASS
  • quality: PASS
  • unit_tests: FAIL (Failing after 6m41s)
  • integration_tests: PASS
  • e2e_tests: PASS
  • coverage: PENDING (blocked by unit_tests failure)
  • build: PASS

VERDICT: REQUEST CHANGES

This PR has a failing unit test that reveals a security regression in the new validate_protocol fallback logic.


Issue 1 (Critical): Security Regression in validate_protocol Fallback

File: src/cleveragents/infrastructure/plugins/loader.py

Root cause: When issubclass(klass, protocol) raises TypeError, the new code falls through to the manual structural check. If the protocol object has no non-dunder members in its __dict__ or __annotations__, then required is empty, missing is empty, and the function returns True -- incorrectly indicating the class satisfies the protocol.

Failing test: Scenario "validate_protocol raises ProtocolMismatchError when issubclass raises TypeError" in features/plugins_loader_coverage.feature

With the new code:

  1. issubclass(NeedsArgs, BadProtocol) raises TypeError (caught, pass)
  2. Fallback runs: BadProtocol has no non-dunder members
  3. required is empty, missing is empty, returns True (WRONG)
  4. Expected: ProtocolMismatchError to be raised

Security implication: An attacker who can supply a protocol object that causes issubclass to raise TypeError and has no non-dunder members would cause validate_protocol to return True for any class, bypassing the protocol conformance check.

Recommended fix: Track whether TypeError was raised and raise ProtocolMismatchError immediately in that case, rather than falling through to the structural check.


Issue 2 (Minor): Commit Not GPG-Signed

The commit is not GPG-signed. Per C.O.C., all authenticated commits must have digital signatures tied to a Yubikey.


The commit message has no body or footer. Per CONTRIBUTING.md, every commit must include an ISSUES CLOSED: #N footer.


Issue 4 (Minor): No CHANGELOG Entry

The commit modifies loader.py but does not update CHANGELOG.md.


Issue 5 (Minor): Branch Naming Convention

The branch fix/v360/plugin-loader-security does not follow the required bugfix/mN-<name> convention.


Positive Aspects

  • Module prefix allowlist is unchanged and correct.
  • getattr fallbacks for __name__ in error messages are a good improvement.
  • load_from_entry_points is unchanged and correct.
  • Security intent is sound; primary issubclass path is correct.
  • lint, typecheck, security, integration_tests, e2e_tests all pass.

Required Changes Before Merge

  1. Fix validate_protocol fallback to raise ProtocolMismatchError when issubclass raises TypeError.
  2. Sign the commit with GPG/Yubikey.
  3. Add ISSUES CLOSED: #N footer to the commit message.
  4. Add a CHANGELOG entry.
  5. Rename the branch to follow bugfix/mN-<name> convention.

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

## Code Review: fix(plugin_loader): add security validation for plugin loading **Branch:** fix/v360/plugin-loader-security **Head SHA:** 677204bfd48b04be8046ce38647c075e58a44112 --- ### CI Status - lint: PASS - typecheck: PASS - security: PASS - quality: PASS - unit_tests: **FAIL** (Failing after 6m41s) - integration_tests: PASS - e2e_tests: PASS - coverage: PENDING (blocked by unit_tests failure) - build: PASS --- ### VERDICT: REQUEST CHANGES This PR has a failing unit test that reveals a security regression in the new `validate_protocol` fallback logic. --- ### Issue 1 (Critical): Security Regression in validate_protocol Fallback **File:** `src/cleveragents/infrastructure/plugins/loader.py` **Root cause:** When `issubclass(klass, protocol)` raises `TypeError`, the new code falls through to the manual structural check. If the protocol object has no non-dunder members in its `__dict__` or `__annotations__`, then `required` is empty, `missing` is empty, and the function returns `True` -- incorrectly indicating the class satisfies the protocol. **Failing test:** Scenario "validate_protocol raises ProtocolMismatchError when issubclass raises TypeError" in `features/plugins_loader_coverage.feature` With the new code: 1. `issubclass(NeedsArgs, BadProtocol)` raises `TypeError` (caught, pass) 2. Fallback runs: `BadProtocol` has no non-dunder members 3. `required` is empty, `missing` is empty, returns `True` (WRONG) 4. Expected: `ProtocolMismatchError` to be raised **Security implication:** An attacker who can supply a protocol object that causes `issubclass` to raise `TypeError` and has no non-dunder members would cause `validate_protocol` to return `True` for any class, bypassing the protocol conformance check. **Recommended fix:** Track whether `TypeError` was raised and raise `ProtocolMismatchError` immediately in that case, rather than falling through to the structural check. --- ### Issue 2 (Minor): Commit Not GPG-Signed The commit is not GPG-signed. Per C.O.C., all authenticated commits must have digital signatures tied to a Yubikey. --- ### Issue 3 (Minor): Missing ISSUES CLOSED Footer The commit message has no body or footer. Per CONTRIBUTING.md, every commit must include an `ISSUES CLOSED: #N` footer. --- ### Issue 4 (Minor): No CHANGELOG Entry The commit modifies `loader.py` but does not update `CHANGELOG.md`. --- ### Issue 5 (Minor): Branch Naming Convention The branch `fix/v360/plugin-loader-security` does not follow the required `bugfix/mN-<name>` convention. --- ### Positive Aspects - Module prefix allowlist is unchanged and correct. - `getattr` fallbacks for `__name__` in error messages are a good improvement. - `load_from_entry_points` is unchanged and correct. - Security intent is sound; primary `issubclass` path is correct. - lint, typecheck, security, integration_tests, e2e_tests all pass. --- ### Required Changes Before Merge 1. Fix `validate_protocol` fallback to raise `ProtocolMismatchError` when `issubclass` raises `TypeError`. 2. Sign the commit with GPG/Yubikey. 3. Add `ISSUES CLOSED: #N` footer to the commit message. 4. Add a CHANGELOG entry. 5. Rename the branch to follow `bugfix/mN-<name>` convention. --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#7860
No description provided.