BUG-HUNT: correctness - is_sensitive_key "auth" substring causes false positives on innocent field names #7766

Open
opened 2026-04-12 03:28:20 +00:00 by HAL9000 · 3 comments
Owner

Bug Report: Correctness — is_sensitive_key Substring Match on "auth" Causes False Positives

Severity Assessment

  • Impact: Legitimate, non-sensitive fields whose names happen to contain the substring "auth" (e.g., author, authority, coauthored_by) or "token" (e.g., num_tokens, context_tokens, max_context_tokens) are silently redacted as if they were secrets. This causes data loss in logs and redacted dicts without any warning.
  • Likelihood: Medium — fields like author are common in git metadata, commit info, and API responses that the agent might log.
  • Priority: Medium

Location

  • File: src/cleveragents/shared/redaction.py
  • Function/Class: is_sensitive_key
  • Lines: 110–124

Description

The function uses any(sub in lower for sub in _SENSITIVE_SUBSTRINGS) for substring matching. The substring "auth" is too short and matches innocuous field names. While _FALSE_POSITIVE_KEYS provides an exact-name allowlist, it only covers a small set of known cases and cannot anticipate all false positives.

Evidence

# redaction.py lines 27–39
_SENSITIVE_SUBSTRINGS: set[str] = {
    ...
    "auth",   # <-- too broad: matches "author", "authority", "coauthored_by"
    "token",  # <-- too broad: matches "num_tokens", "context_tokens", etc.
    ...
}

Demonstration:

is_sensitive_key("author")           # True  — FALSE POSITIVE
is_sensitive_key("authority")        # True  — FALSE POSITIVE
is_sensitive_key("coauthored_by")    # True  — FALSE POSITIVE
is_sensitive_key("num_tokens")       # True  — FALSE POSITIVE (not in allowlist)
is_sensitive_key("context_tokens")   # True  — FALSE POSITIVE (not in allowlist)
is_sensitive_key("max_context_tokens") # True — FALSE POSITIVE (not in allowlist)

Note: _FALSE_POSITIVE_KEYS currently only covers a narrow set of token-counting keys; any new token-related field not in that set will be wrongly redacted.

Expected Behavior

Only fields that are semantically related to secrets/credentials should be flagged. Fields like author, authority, and num_tokens should not be redacted.

Actual Behavior

Any field name containing "auth" or "token" as a substring is flagged as sensitive, causing data loss.

Suggested Fix

Use word-boundary or more precise matching for short substrings like "auth" and "token":

import re

# Use pattern-based matching requiring auth/token as whole words or specific positions
_SENSITIVE_PATTERNS = [
    re.compile(r"(?:^|[_\-])auth(?:$|[_\-])"),  # auth as standalone word in key
    re.compile(r"(?:^|[_\-])token(?:$|[_\-])"), # token as standalone word
    re.compile(r"api[_\-]?key"),
    re.compile(r"password|passwd"),
    re.compile(r"secret"),
    re.compile(r"credential"),
    re.compile(r"private[_\-]?key"),
    re.compile(r"access[_\-]?key"),
]

Alternatively, expand _FALSE_POSITIVE_KEYS to include all known legitimate token/auth-prefixed keys and document the pattern.

Category

correctness

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: Correctness — `is_sensitive_key` Substring Match on `"auth"` Causes False Positives ### Severity Assessment - **Impact**: Legitimate, non-sensitive fields whose names happen to contain the substring `"auth"` (e.g., `author`, `authority`, `coauthored_by`) or `"token"` (e.g., `num_tokens`, `context_tokens`, `max_context_tokens`) are silently redacted as if they were secrets. This causes data loss in logs and redacted dicts without any warning. - **Likelihood**: Medium — fields like `author` are common in git metadata, commit info, and API responses that the agent might log. - **Priority**: Medium ### Location - **File**: `src/cleveragents/shared/redaction.py` - **Function/Class**: `is_sensitive_key` - **Lines**: 110–124 ### Description The function uses `any(sub in lower for sub in _SENSITIVE_SUBSTRINGS)` for substring matching. The substring `"auth"` is too short and matches innocuous field names. While `_FALSE_POSITIVE_KEYS` provides an exact-name allowlist, it only covers a small set of known cases and cannot anticipate all false positives. ### Evidence ```python # redaction.py lines 27–39 _SENSITIVE_SUBSTRINGS: set[str] = { ... "auth", # <-- too broad: matches "author", "authority", "coauthored_by" "token", # <-- too broad: matches "num_tokens", "context_tokens", etc. ... } ``` Demonstration: ```python is_sensitive_key("author") # True — FALSE POSITIVE is_sensitive_key("authority") # True — FALSE POSITIVE is_sensitive_key("coauthored_by") # True — FALSE POSITIVE is_sensitive_key("num_tokens") # True — FALSE POSITIVE (not in allowlist) is_sensitive_key("context_tokens") # True — FALSE POSITIVE (not in allowlist) is_sensitive_key("max_context_tokens") # True — FALSE POSITIVE (not in allowlist) ``` Note: `_FALSE_POSITIVE_KEYS` currently only covers a narrow set of token-counting keys; any new token-related field not in that set will be wrongly redacted. ### Expected Behavior Only fields that are semantically related to secrets/credentials should be flagged. Fields like `author`, `authority`, and `num_tokens` should not be redacted. ### Actual Behavior Any field name containing `"auth"` or `"token"` as a substring is flagged as sensitive, causing data loss. ### Suggested Fix Use word-boundary or more precise matching for short substrings like `"auth"` and `"token"`: ```python import re # Use pattern-based matching requiring auth/token as whole words or specific positions _SENSITIVE_PATTERNS = [ re.compile(r"(?:^|[_\-])auth(?:$|[_\-])"), # auth as standalone word in key re.compile(r"(?:^|[_\-])token(?:$|[_\-])"), # token as standalone word re.compile(r"api[_\-]?key"), re.compile(r"password|passwd"), re.compile(r"secret"), re.compile(r"credential"), re.compile(r"private[_\-]?key"), re.compile(r"access[_\-]?key"), ] ``` Alternatively, expand `_FALSE_POSITIVE_KEYS` to include all known legitimate token/auth-prefixed keys and document the pattern. ### Category correctness ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-12 03:44:47 +00:00
Author
Owner

Verified — Bug: is_sensitive_key 'auth' substring match causes false positives. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Bug: is_sensitive_key 'auth' substring match causes false positives. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Bug: is_sensitive_key 'auth' substring match causes false positives. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Bug: is_sensitive_key 'auth' substring match causes false positives. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Bug: is_sensitive_key 'auth' substring match causes false positives. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Bug: is_sensitive_key 'auth' substring match causes false positives. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7766
No description provided.