BUG-HUNT: [security] _SECRET_PATTERNS in shared/redaction.py does not cover LangSmith, GitHub, or other modern API token formats — redact_value() fails to mask lsv2_pt_*, ghp_*, ghs_* tokens in log strings #6678

Open
opened 2026-04-09 23:17:08 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: Security — Incomplete Secret Redaction Pattern Coverage

Severity Assessment

  • Impact: The redact_value() function in shared/redaction.py is the last line of defense for masking secrets that appear as raw string values in log output (not as dict keys). It only covers a narrow set of token formats. Secrets in modern formats — specifically LangSmith API keys (lsv2_pt_*), GitHub tokens (ghp_*, ghs_*, gho_*), HuggingFace tokens (hf_*), and Replicate API keys (r8_*) — are not matched by any pattern and will be logged in plaintext.
  • Likelihood: Medium — when Settings._synchronize_langsmith_environment() writes the LangSmith API key to os.environ["LANGCHAIN_API_KEY"], any subsequent diagnostic dump, error message, or log event that includes this value as a raw string will expose it unredacted.
  • Priority: High

Location

  • File: src/cleveragents/shared/redaction.py
  • Function: redact_value, _SECRET_PATTERNS
  • Lines: 61–75 (patterns), 127–144 (redact_value)

Description

The _SECRET_PATTERNS list covers only a subset of well-known token formats:

# src/cleveragents/shared/redaction.py  lines 61-75
_SECRET_PATTERNS: list[re.Pattern[str]] = [
    # OpenAI keys: sk-proj-..., sk-...
    re.compile(r"sk-(?:proj-)?[A-Za-z0-9_-]{10,}"),
    # Anthropic keys: sk-ant-api03-...
    re.compile(r"sk-ant-[A-Za-z0-9_-]{10,}"),
    # Google / Gemini API keys: AIzaSy...
    re.compile(r"AIzaSy[A-Za-z0-9_-]{30,}"),
    # Token IDs: tok_...
    re.compile(r"tok_[A-Za-z0-9]{10,}"),
    # Bearer tokens
    re.compile(r"Bearer\s+[A-Za-z0-9._~+/=-]{20,}"),
    # Generic long hex/base64 keys (40+ chars)
    re.compile(r"(?:key|KEY)-[A-Za-z0-9]{20,}"),
]

The following token formats, all actively configured in config/settings.py, are not covered:

Provider Token Format Example
LangSmith lsv2_pt_<hex> lsv2_pt_abc123def456ghi789jkl012mno345
GitHub ghp_<alphanumeric> ghp_16C7e42F292c6912E169BF1tz8A6mF
GitHub (service) ghs_<alphanumeric> ghs_servicetokenhere12345678901
GitHub (OAuth) gho_<alphanumeric> gho_oauthtokenhere12345678901234
HuggingFace hf_<alphanumeric> hf_qOCxjJVfBKLnHJdCqKjVfaBCDeEfGhIjK
Replicate r8_<alphanumeric> r8_HRAbAFpHFqNsDEVNJLAVjQFJxZiqBvVNi

Cross-Module Exposure Chain

1. User configures LANGSMITH_API_KEY=lsv2_pt_abc123...

2. Settings.is_langsmith_enabled property calls 
   Settings._synchronize_langsmith_environment() which does:
   os.environ.setdefault("LANGCHAIN_API_KEY", api_key)

3. Some diagnostic path (e.g., provider_configuration_diagnostics, 
   or an error handler using stdlib logging instead of structlog)
   logs a string containing the API key value.

4. redact_value("...lsv2_pt_abc123...") is called.

5. None of the 6 _SECRET_PATTERNS match "lsv2_pt_*".

6. The LangSmith API key is logged in plaintext.

Note: is_sensitive_key("langsmith_api_key") does return True (because "api_key" is in _SENSITIVE_SUBSTRINGS), so when the key appears as a dict key, it is correctly redacted by redact_dict(). The gap is specifically when the value appears inline in a string (e.g., in an error message, an event= field, or a log context). The secrets_masking_processor only calls redact_value() on string values — not pattern-matches against sensitive key formats.

Evidence

# From shared/redaction.py
def redact_value(value: str) -> str:
    result = value
    with _patterns_lock:
        patterns = list(_SECRET_PATTERNS)
    for pattern in patterns:
        result = pattern.sub(REDACTED, result)
    return result

Testing:

from cleveragents.shared.redaction import redact_value
# LangSmith key - NOT redacted:
assert redact_value("key=lsv2_pt_abc123def456ghi789jkl012mno345") == "key=lsv2_pt_abc123def456ghi789jkl012mno345"
# GitHub token - NOT redacted:
assert redact_value("token=ghp_16C7e42F292c6912E169BF1tz8A6mFdo") == "token=ghp_16C7e42F292c6912E169BF1tz8A6mFdo"

Expected Behavior

redact_value() should mask all common API token formats including:

  • LangSmith: lsv2_pt_*, lsv2_sk_*
  • GitHub: ghp_*, ghs_*, gho_*, github_pat_*
  • HuggingFace: hf_*
  • Replicate: r8_*
  • Generic: High-entropy strings following <short-prefix>_<20+char-alphanumeric> pattern

Actual Behavior

redact_value() only matches OpenAI (sk-*), Anthropic (sk-ant-*), Google (AIzaSy*), generic tok_*, Bearer tokens, and KEY-* patterns. All other token formats pass through unredacted.

Suggested Fix

Add patterns for the missing token formats:

_SECRET_PATTERNS: list[re.Pattern[str]] = [
    # ... existing patterns ...
    # LangSmith API keys
    re.compile(r"lsv2_(?:pt|sk)_[A-Za-z0-9]{10,}"),
    # GitHub tokens (personal access, service, OAuth, fine-grained)
    re.compile(r"gh[psoar]_[A-Za-z0-9]{10,}"),
    re.compile(r"github_pat_[A-Za-z0-9_]{10,}"),
    # HuggingFace tokens
    re.compile(r"hf_[A-Za-z0-9]{10,}"),
    # Replicate tokens
    re.compile(r"r8_[A-Za-z0-9]{10,}"),
]

Category

security / spec-alignment

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: Security — Incomplete Secret Redaction Pattern Coverage ### Severity Assessment - **Impact**: The `redact_value()` function in `shared/redaction.py` is the last line of defense for masking secrets that appear as raw string values in log output (not as dict keys). It only covers a narrow set of token formats. Secrets in modern formats — specifically LangSmith API keys (`lsv2_pt_*`), GitHub tokens (`ghp_*`, `ghs_*`, `gho_*`), HuggingFace tokens (`hf_*`), and Replicate API keys (`r8_*`) — are not matched by any pattern and will be logged in plaintext. - **Likelihood**: Medium — when `Settings._synchronize_langsmith_environment()` writes the LangSmith API key to `os.environ["LANGCHAIN_API_KEY"]`, any subsequent diagnostic dump, error message, or log event that includes this value as a raw string will expose it unredacted. - **Priority**: High ### Location - **File**: `src/cleveragents/shared/redaction.py` - **Function**: `redact_value`, `_SECRET_PATTERNS` - **Lines**: 61–75 (patterns), 127–144 (redact_value) ### Description The `_SECRET_PATTERNS` list covers only a subset of well-known token formats: ```python # src/cleveragents/shared/redaction.py lines 61-75 _SECRET_PATTERNS: list[re.Pattern[str]] = [ # OpenAI keys: sk-proj-..., sk-... re.compile(r"sk-(?:proj-)?[A-Za-z0-9_-]{10,}"), # Anthropic keys: sk-ant-api03-... re.compile(r"sk-ant-[A-Za-z0-9_-]{10,}"), # Google / Gemini API keys: AIzaSy... re.compile(r"AIzaSy[A-Za-z0-9_-]{30,}"), # Token IDs: tok_... re.compile(r"tok_[A-Za-z0-9]{10,}"), # Bearer tokens re.compile(r"Bearer\s+[A-Za-z0-9._~+/=-]{20,}"), # Generic long hex/base64 keys (40+ chars) re.compile(r"(?:key|KEY)-[A-Za-z0-9]{20,}"), ] ``` The following token formats, all actively configured in `config/settings.py`, are **not covered**: | Provider | Token Format | Example | |---|---|---| | LangSmith | `lsv2_pt_<hex>` | `lsv2_pt_abc123def456ghi789jkl012mno345` | | GitHub | `ghp_<alphanumeric>` | `ghp_16C7e42F292c6912E169BF1tz8A6mF` | | GitHub (service) | `ghs_<alphanumeric>` | `ghs_servicetokenhere12345678901` | | GitHub (OAuth) | `gho_<alphanumeric>` | `gho_oauthtokenhere12345678901234` | | HuggingFace | `hf_<alphanumeric>` | `hf_qOCxjJVfBKLnHJdCqKjVfaBCDeEfGhIjK` | | Replicate | `r8_<alphanumeric>` | `r8_HRAbAFpHFqNsDEVNJLAVjQFJxZiqBvVNi` | ### Cross-Module Exposure Chain ``` 1. User configures LANGSMITH_API_KEY=lsv2_pt_abc123... 2. Settings.is_langsmith_enabled property calls Settings._synchronize_langsmith_environment() which does: os.environ.setdefault("LANGCHAIN_API_KEY", api_key) 3. Some diagnostic path (e.g., provider_configuration_diagnostics, or an error handler using stdlib logging instead of structlog) logs a string containing the API key value. 4. redact_value("...lsv2_pt_abc123...") is called. 5. None of the 6 _SECRET_PATTERNS match "lsv2_pt_*". 6. The LangSmith API key is logged in plaintext. ``` Note: `is_sensitive_key("langsmith_api_key")` does return `True` (because "api_key" is in `_SENSITIVE_SUBSTRINGS`), so when the key appears as a **dict key**, it is correctly redacted by `redact_dict()`. The gap is specifically when the value appears **inline in a string** (e.g., in an error message, an `event=` field, or a log context). The `secrets_masking_processor` only calls `redact_value()` on string values — not pattern-matches against sensitive key formats. ### Evidence ```python # From shared/redaction.py def redact_value(value: str) -> str: result = value with _patterns_lock: patterns = list(_SECRET_PATTERNS) for pattern in patterns: result = pattern.sub(REDACTED, result) return result ``` Testing: ```python from cleveragents.shared.redaction import redact_value # LangSmith key - NOT redacted: assert redact_value("key=lsv2_pt_abc123def456ghi789jkl012mno345") == "key=lsv2_pt_abc123def456ghi789jkl012mno345" # GitHub token - NOT redacted: assert redact_value("token=ghp_16C7e42F292c6912E169BF1tz8A6mFdo") == "token=ghp_16C7e42F292c6912E169BF1tz8A6mFdo" ``` ### Expected Behavior `redact_value()` should mask all common API token formats including: - LangSmith: `lsv2_pt_*`, `lsv2_sk_*` - GitHub: `ghp_*`, `ghs_*`, `gho_*`, `github_pat_*` - HuggingFace: `hf_*` - Replicate: `r8_*` - Generic: High-entropy strings following `<short-prefix>_<20+char-alphanumeric>` pattern ### Actual Behavior `redact_value()` only matches OpenAI (`sk-*`), Anthropic (`sk-ant-*`), Google (`AIzaSy*`), generic `tok_*`, Bearer tokens, and `KEY-*` patterns. All other token formats pass through unredacted. ### Suggested Fix Add patterns for the missing token formats: ```python _SECRET_PATTERNS: list[re.Pattern[str]] = [ # ... existing patterns ... # LangSmith API keys re.compile(r"lsv2_(?:pt|sk)_[A-Za-z0-9]{10,}"), # GitHub tokens (personal access, service, OAuth, fine-grained) re.compile(r"gh[psoar]_[A-Za-z0-9]{10,}"), re.compile(r"github_pat_[A-Za-z0-9_]{10,}"), # HuggingFace tokens re.compile(r"hf_[A-Za-z0-9]{10,}"), # Replicate tokens re.compile(r"r8_[A-Za-z0-9]{10,}"), ] ``` ### Category security / spec-alignment ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-09 23:28:31 +00:00
Author
Owner

Verified — Critical security bug: secret patterns don't cover modern API token formats — tokens leak in logs. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical security bug: secret patterns don't cover modern API token formats — tokens leak in logs. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6678
No description provided.