BUG-HUNT: [error-handling] Unsafe Datetime Parsing in LegacyDataMigrator Can Lead to Data Loss #2977

Open
opened 2026-04-05 03:01:43 +00:00 by freemo · 2 comments
Owner

Metadata

  • Branch: fix/legacy-migrator-unsafe-datetime-parsing
  • Commit Message: fix(database): log and propagate unparseable datetime strings in LegacyDataMigrator
  • Milestone: v3.7.0
  • Parent Epic: #362

Bug Report: [error-handling] — Unsafe Datetime Parsing in LegacyDataMigrator Can Lead to Data Loss

Severity Assessment

  • Impact: Incorrect timestamps in the database, which can affect data integrity, auditing, and any features that rely on accurate creation/modification times.
  • Likelihood: Medium. This will occur if legacy datetime strings are in a format not anticipated by the two hardcoded parsing attempts.
  • Priority: Medium

Location

  • File: src/cleveragents/infrastructure/database/legacy_migrator.py
  • Function/Class: LegacyDataMigrator._parse_datetime
  • Lines: 289-302

Description

The _parse_datetime method in LegacyDataMigrator uses a try...except block that catches broad (ValueError, AttributeError) exceptions during datetime parsing. If a datetime string from a legacy JSON file does not match the expected ISO or "%Y-%m-%d %H:%M:%S" formats, the exception is caught, and the function returns None. The calling code often replaces this None value with datetime.now(), effectively corrupting the original timestamp.

This violates CONTRIBUTING.md's error handling conventions:

"Do not suppress errors. Let exceptions propagate to top-level execution."
"Only catch exceptions when you can meaningfully handle them."
"Design code to fail immediately when something is wrong (fail-fast)."

Evidence

def _parse_datetime(self, dt_str: str | None) -> datetime | None:
    """Parse datetime string to datetime object."""
    if not dt_str:
        return None

    try:
        # Try ISO format first
        return datetime.fromisoformat(dt_str)
    except (ValueError, AttributeError):
        try:
            # Try common datetime format
            return datetime.strptime(dt_str, "%Y-%m-%d %H:%M:%S")
        except (ValueError, AttributeError):
            return None

Expected Behavior

If a datetime string cannot be parsed, the system should log a warning with the problematic value and either store the original string in a separate field for manual inspection or skip the record with a clear error message. Silently replacing unparseable timestamps with the current time is incorrect.

Actual Behavior

Unparseable datetime strings are silently converted to None, which is then often replaced by the current time, leading to data corruption.

Suggested Fix

  • Log a warning when a datetime string cannot be parsed, including the unparseable string in the log message.
  • Consider adding more parsing formats or using a more flexible parsing library if a variety of legacy formats are expected.
  • For unrecoverable parsing failures, either store the original string in a new raw_timestamp column for later analysis or skip the record and report the failure, rather than inserting an incorrect timestamp.

Subtasks

  • Investigate all call sites of _parse_datetime in LegacyDataMigrator to understand how None return values are handled
  • Add structured warning logging for unparseable datetime strings (include the raw string in the log message)
  • Replace silent return None fallback with a logged warning and a raised exception or explicit skip-with-report behaviour, per fail-fast principle
  • Evaluate whether additional datetime formats should be supported; document the supported set
  • Update type annotations and docstrings to reflect new behaviour
  • Tests (Behave): Add scenarios covering unparseable datetime strings, verifying warning is logged and no silent data corruption occurs
  • Tests (Behave): Add scenarios for all supported datetime formats to confirm correct parsing
  • Tests (Robot): Add integration test verifying migration behaviour when legacy data contains malformed timestamps
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(database): log and propagate unparseable datetime strings in LegacyDataMigrator), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (fix/legacy-migrator-unsafe-datetime-parsing).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/legacy-migrator-unsafe-datetime-parsing` - **Commit Message**: `fix(database): log and propagate unparseable datetime strings in LegacyDataMigrator` - **Milestone**: v3.7.0 - **Parent Epic**: #362 ## Bug Report: [error-handling] — Unsafe Datetime Parsing in LegacyDataMigrator Can Lead to Data Loss ### Severity Assessment - **Impact**: Incorrect timestamps in the database, which can affect data integrity, auditing, and any features that rely on accurate creation/modification times. - **Likelihood**: Medium. This will occur if legacy datetime strings are in a format not anticipated by the two hardcoded parsing attempts. - **Priority**: Medium ### Location - **File**: `src/cleveragents/infrastructure/database/legacy_migrator.py` - **Function/Class**: `LegacyDataMigrator._parse_datetime` - **Lines**: 289-302 ### Description The `_parse_datetime` method in `LegacyDataMigrator` uses a `try...except` block that catches broad `(ValueError, AttributeError)` exceptions during datetime parsing. If a datetime string from a legacy JSON file does not match the expected ISO or `"%Y-%m-%d %H:%M:%S"` formats, the exception is caught, and the function returns `None`. The calling code often replaces this `None` value with `datetime.now()`, effectively corrupting the original timestamp. This violates CONTRIBUTING.md's error handling conventions: > "Do not suppress errors. Let exceptions propagate to top-level execution." > "Only catch exceptions when you can meaningfully handle them." > "Design code to fail immediately when something is wrong (fail-fast)." ### Evidence ```python def _parse_datetime(self, dt_str: str | None) -> datetime | None: """Parse datetime string to datetime object.""" if not dt_str: return None try: # Try ISO format first return datetime.fromisoformat(dt_str) except (ValueError, AttributeError): try: # Try common datetime format return datetime.strptime(dt_str, "%Y-%m-%d %H:%M:%S") except (ValueError, AttributeError): return None ``` ### Expected Behavior If a datetime string cannot be parsed, the system should log a warning with the problematic value and either store the original string in a separate field for manual inspection or skip the record with a clear error message. Silently replacing unparseable timestamps with the current time is incorrect. ### Actual Behavior Unparseable datetime strings are silently converted to `None`, which is then often replaced by the current time, leading to data corruption. ### Suggested Fix - Log a warning when a datetime string cannot be parsed, including the unparseable string in the log message. - Consider adding more parsing formats or using a more flexible parsing library if a variety of legacy formats are expected. - For unrecoverable parsing failures, either store the original string in a new `raw_timestamp` column for later analysis or skip the record and report the failure, rather than inserting an incorrect timestamp. ## Subtasks - [ ] Investigate all call sites of `_parse_datetime` in `LegacyDataMigrator` to understand how `None` return values are handled - [ ] Add structured warning logging for unparseable datetime strings (include the raw string in the log message) - [ ] Replace silent `return None` fallback with a logged warning and a raised exception or explicit skip-with-report behaviour, per fail-fast principle - [ ] Evaluate whether additional datetime formats should be supported; document the supported set - [ ] Update type annotations and docstrings to reflect new behaviour - [ ] Tests (Behave): Add scenarios covering unparseable datetime strings, verifying warning is logged and no silent data corruption occurs - [ ] Tests (Behave): Add scenarios for all supported datetime formats to confirm correct parsing - [ ] Tests (Robot): Add integration test verifying migration behaviour when legacy data contains malformed timestamps - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(database): log and propagate unparseable datetime strings in LegacyDataMigrator`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`fix/legacy-migrator-unsafe-datetime-parsing`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.7.0 milestone 2026-04-05 03:01:52 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Confirmed
  • MoSCoW: Should Have

Valid finding verified during batch triage.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Confirmed - **MoSCoW**: Should Have Valid finding verified during batch triage. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Starting implementation on branch fix/legacy-migrator-unsafe-datetime-parsing.

Plan: Fix unsafe datetime parsing in LegacyDataMigrator._parse_datetime to:

  1. Log structured warnings for unparseable datetime strings (include raw string in log)
  2. Replace silent return None fallback with logged warning + raised exception per fail-fast principle
  3. Update type annotations and docstrings
  4. Add Behave tests for unparseable datetime strings (warning logged, no silent data corruption)
  5. Add Behave tests for all supported datetime formats
  6. Add Robot Framework integration test for malformed timestamps
  7. Verify coverage ≥97% via nox -s coverage_report
  8. Run full nox suite

Difficulty assessment: Medium → starting at sonnet tier.


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-worker

Starting implementation on branch `fix/legacy-migrator-unsafe-datetime-parsing`. **Plan**: Fix unsafe datetime parsing in `LegacyDataMigrator._parse_datetime` to: 1. Log structured warnings for unparseable datetime strings (include raw string in log) 2. Replace silent `return None` fallback with logged warning + raised exception per fail-fast principle 3. Update type annotations and docstrings 4. Add Behave tests for unparseable datetime strings (warning logged, no silent data corruption) 5. Add Behave tests for all supported datetime formats 6. Add Robot Framework integration test for malformed timestamps 7. Verify coverage ≥97% via `nox -s coverage_report` 8. Run full `nox` suite **Difficulty assessment**: Medium → starting at sonnet tier. --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-worker
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#2977
No description provided.