feat(security): add safety profile enforcement #518

2026-03-02T23:33:08Z

CoreRasurae commented

2026-03-02 23:33:08 +00:00

Summary

Implements safety profile resolution and enforcement in the tool execution pipeline, replacing the NotImplementedError stub with working precedence logic and runtime safety checks.

Closes #345

Changes

Core Implementation

resolve_safety_profile() — plan > action > project > global precedence; returns DEFAULT_SAFETY_PROFILE with GLOBAL provenance when all levels are None
ToolExecutionContext.safety_profile — optional field carrying the resolved profile
ToolRuntime._enforce_capabilities() — extended with:
- Unsafe tool gating (allow_unsafe_tools vs ToolCapability.unsafe)
- Skill category allow-list (allowed_skill_categories vs tool_skill_category metadata)
- Checkpoint requirement OR-combined from context flag and safety profile
ToolSafetyViolationError — new error class in the tool error hierarchy

Files Modified

File	Change
`src/cleveragents/domain/models/core/safety_profile.py`	Replace `NotImplementedError` stub with resolution logic
`src/cleveragents/tool/context.py`	Add `safety_profile` field to `ToolExecutionContext`
`src/cleveragents/tool/lifecycle.py`	Add `ToolSafetyViolationError`, extend `_enforce_capabilities()`
`src/cleveragents/tool/__init__.py`	Export `ToolSafetyViolationError`

Files Added

File	Purpose
`features/safety_profile_enforcement.feature`	24 BDD scenarios for enforcement
`features/steps/safety_profile_enforcement_steps.py`	Step definitions
`robot/safety_profile_enforcement.robot`	9 Robot integration tests
`robot/helper_safety_profile_enforcement.py`	Robot helper
`benchmarks/safety_profile_bench.py`	4 ASV benchmark suites
`docs/reference/safety_profiles.md`	Reference documentation

Test Updates

features/safety_profile.feature — replaced NotImplementedError stub test with 5 resolution precedence scenarios (30 total)
features/steps/safety_profile_steps.py — added resolution step definitions

Verification

Check	Result
`nox -e typecheck`	0 errors, 0 warnings
`nox -e unit_tests`	7735 scenarios, 0 failures
`nox -e coverage_report`	97% overall
`nox -e integration_tests`	9/9 passed
`nox -e benchmark`	All suites complete

Backward Compatibility

When no SafetyProfile is set on ToolExecutionContext (the default), the new enforcement checks are skipped entirely. All existing tool execution paths are unaffected.

## Summary Implements safety profile resolution and enforcement in the tool execution pipeline, replacing the `NotImplementedError` stub with working precedence logic and runtime safety checks. Closes #345 ## Changes ### Core Implementation - **`resolve_safety_profile()`** — plan > action > project > global precedence; returns `DEFAULT_SAFETY_PROFILE` with GLOBAL provenance when all levels are None - **`ToolExecutionContext.safety_profile`** — optional field carrying the resolved profile - **`ToolRuntime._enforce_capabilities()`** — extended with: - Unsafe tool gating (`allow_unsafe_tools` vs `ToolCapability.unsafe`) - Skill category allow-list (`allowed_skill_categories` vs `tool_skill_category` metadata) - Checkpoint requirement OR-combined from context flag and safety profile - **`ToolSafetyViolationError`** — new error class in the tool error hierarchy ### Files Modified | File | Change | |------|--------| | `src/cleveragents/domain/models/core/safety_profile.py` | Replace `NotImplementedError` stub with resolution logic | | `src/cleveragents/tool/context.py` | Add `safety_profile` field to `ToolExecutionContext` | | `src/cleveragents/tool/lifecycle.py` | Add `ToolSafetyViolationError`, extend `_enforce_capabilities()` | | `src/cleveragents/tool/__init__.py` | Export `ToolSafetyViolationError` | ### Files Added | File | Purpose | |------|---------| | `features/safety_profile_enforcement.feature` | 24 BDD scenarios for enforcement | | `features/steps/safety_profile_enforcement_steps.py` | Step definitions | | `robot/safety_profile_enforcement.robot` | 9 Robot integration tests | | `robot/helper_safety_profile_enforcement.py` | Robot helper | | `benchmarks/safety_profile_bench.py` | 4 ASV benchmark suites | | `docs/reference/safety_profiles.md` | Reference documentation | ### Test Updates - `features/safety_profile.feature` — replaced `NotImplementedError` stub test with 5 resolution precedence scenarios (30 total) - `features/steps/safety_profile_steps.py` — added resolution step definitions ## Verification | Check | Result | |-------|--------| | `nox -e typecheck` | 0 errors, 0 warnings | | `nox -e unit_tests` | 7735 scenarios, 0 failures | | `nox -e coverage_report` | 97% overall | | `nox -e integration_tests` | 9/9 passed | | `nox -e benchmark` | All suites complete | ## Backward Compatibility When no `SafetyProfile` is set on `ToolExecutionContext` (the default), the new enforcement checks are skipped entirely. All existing tool execution paths are unaffected.

CoreRasurae referenced this pull request

2026-03-02 23:33:23 +00:00

feat(security): add safety profile enforcement #345

CoreRasurae force-pushed feature/m7-post-safety from e09201af29 to 5e247d05e2

2026-03-03 17:12:44 +00:00

Compare

brent.edwards approved these changes 2026-03-03 19:18:24 +00:00

Dismissed

brent.edwards left a comment

Review — PR #518 `feat(security): add safety profile enforcement`

Verdict: APPROVED with comments

The implementation is well-structured. resolve_safety_profile() implements clean plan > action > project > global precedence with keyword-only args and tuple return. The _enforce_capabilities() extension adds 8 ordered checks (read-only, checkpoint, unsafe gating, skill category, sandbox, human approval, cost limits, retry limits) with a clear error hierarchy (ToolSafetyViolationError, ToolSandboxRequiredError, ToolHumanApprovalRequiredError, ToolCostLimitExceededError, ToolRetryLimitExceededError — all inheriting from ToolRuntimeError). The SafetyProfile model is frozen/immutable with field and cross-field validators. New types are properly re-exported in tool/__init__.py. The ToolExecutionContext.safety_profile field is optional (None = no enforcement), preserving backward compatibility. BDD scenarios are comprehensive with stub tool instances.

No P0 or P1 findings.

P2:should-fix

Missing PR label — Per CONTRIBUTING.md, every PR must carry exactly one Type/ label. This PR has no labels. The linked issue #345 carries Type/Feature.
Missing PR milestone — Per CONTRIBUTING.md, the PR milestone must match the linked issue. This PR has no milestone set. Issue #345 is assigned to milestone v3.6.0.
Commit footer uses Refs: #345 instead of a closing keyword — Per CONTRIBUTING.md, every commit must reference the issue with a closing keyword (e.g. ISSUES CLOSED: #345). The PR body correctly says Closes #345, but the commit footer only has Refs: #345, which will not auto-close the issue on merge.
Scenario count mismatch in PR body — The PR description claims "11 BDD scenarios" in safety_profile_enforcement.feature, but the file actually contains 24 scenarios (verified by counting Scenario: / Scenario Outline: entries plus their Examples expansions). Please update the PR body to reflect the correct count.

P3:nit

Nox flag typo in verification table — The PR body's verification table uses nox -e (not a valid nox flag) instead of nox -s in all entries.

P2 items should be addressed in a follow-up within 3 business days.

## Review — PR #518 `feat(security): add safety profile enforcement` **Verdict: APPROVED with comments** The implementation is well-structured. `resolve_safety_profile()` implements clean `plan > action > project > global` precedence with keyword-only args and tuple return. The `_enforce_capabilities()` extension adds 8 ordered checks (read-only, checkpoint, unsafe gating, skill category, sandbox, human approval, cost limits, retry limits) with a clear error hierarchy (`ToolSafetyViolationError`, `ToolSandboxRequiredError`, `ToolHumanApprovalRequiredError`, `ToolCostLimitExceededError`, `ToolRetryLimitExceededError` — all inheriting from `ToolRuntimeError`). The `SafetyProfile` model is frozen/immutable with field and cross-field validators. New types are properly re-exported in `tool/__init__.py`. The `ToolExecutionContext.safety_profile` field is optional (`None` = no enforcement), preserving backward compatibility. BDD scenarios are comprehensive with stub tool instances. No P0 or P1 findings. --- ### P2:should-fix 1. **Missing PR label** — Per `CONTRIBUTING.md`, every PR must carry exactly one `Type/` label. This PR has no labels. The linked issue #345 carries `Type/Feature`. 2. **Missing PR milestone** — Per `CONTRIBUTING.md`, the PR milestone must match the linked issue. This PR has no milestone set. Issue #345 is assigned to milestone `v3.6.0`. 3. **Commit footer uses `Refs: #345` instead of a closing keyword** — Per `CONTRIBUTING.md`, every commit must reference the issue with a closing keyword (e.g. `ISSUES CLOSED: #345`). The PR body correctly says `Closes #345`, but the commit footer only has `Refs: #345`, which will not auto-close the issue on merge. 4. **Scenario count mismatch in PR body** — The PR description claims "11 BDD scenarios" in `safety_profile_enforcement.feature`, but the file actually contains **24 scenarios** (verified by counting `Scenario:` / `Scenario Outline:` entries plus their `Examples` expansions). Please update the PR body to reflect the correct count. ### P3:nit 5. **Nox flag typo in verification table** — The PR body's verification table uses `nox -e` (not a valid nox flag) instead of `nox -s` in all entries. --- P2 items should be addressed in a follow-up within 3 business days.

CoreRasurae added this to the v3.6.0 milestone 2026-03-03 20:49:28 +00:00

CoreRasurae added the

Type

Feature

label 2026-03-03 20:49:36 +00:00

CoreRasurae force-pushed feature/m7-post-safety from 5e247d05e2 to 57a99f2521

2026-03-03 20:54:15 +00:00

Compare

CoreRasurae force-pushed feature/m7-post-safety from 57a99f2521 to 809fcd223b

2026-03-03 22:04:11 +00:00

Compare

brent.edwards approved these changes 2026-03-03 22:11:05 +00:00