TEST-INFRA: [test-data-quality] Improve test data management and quality #1634

Open
opened 2026-04-02 23:20:29 +00:00 by freemo · 2 comments
Owner

Metadata

  • Branch: task/v3.8.0-test-data-quality-improvements
  • Commit Message: test(infra): improve test data management with generation, schema validation, and documentation
  • Milestone: v3.8.0
  • Parent Epic: (to be linked — see orphan note below)

Background and Context

Area: test-data-quality

Our current test data management, while effective in separating data from tests, relies heavily on manually curated, hardcoded values in .yaml and .json files. This approach, while simple, presents several risks as the project scales:

  • Maintenance Overhead: Changes to data models require manually updating numerous fixture files, which is time-consuming and error-prone.
  • Brittleness: Tests can become tightly coupled to specific hardcoded data, making them less resilient to change.
  • Limited Scope: Manually creating data for all edge cases and permutations is impractical, leading to potential gaps in test coverage.
  • Lack of Validation: There is no automated validation of test data against schemas, which can lead to subtle bugs when data and application code diverge.

This issue proposes a set of improvements to our test data strategy to address these risks and improve the overall quality and robustness of our testing infrastructure.

Expected Behavior

A robust test data management system that:

  • Generates realistic and varied test data on the fly using a library such as Faker, reducing reliance on static hardcoded fixture files.
  • Validates all test data fixtures against schemas automatically, ensuring consistency with application data models.
  • Provides refactored test suites that use generated data where appropriate, improving resilience and maintainability.
  • Documents the new test data generation and validation process in CONTRIBUTING.md so all developers can follow the new best practices.

Subtasks

  • Implement a data generation library (e.g., Faker) to create realistic and varied test data on the fly — reduces reliance on static, hardcoded files and allows easy generation of data for a wider range of scenarios.
  • Introduce schema validation for all test data fixtures — ensures test data is always consistent with application data models and catches errors early in the development process.
  • Refactor existing tests to use generated data where appropriate — gradual process that will ultimately make tests more robust and easier to maintain.
  • Document the new test data generation and validation process in CONTRIBUTING.md — ensures all developers are aware of new best practices and can easily contribute to test data infrastructure.
  • Tests (Behave): Add/update scenarios covering the new data generation and validation utilities.
  • Tests (Robot): Add integration tests verifying schema validation of fixture data.
  • Verify coverage >= 97% via nox -s coverage_report.
  • Run nox (all default sessions), fix any errors.

Definition of Done

  • A data generation library is integrated into the testing framework.
  • Schema validation is implemented for all test data fixtures.
  • At least one existing test suite is refactored to use generated data.
  • The new test data process is documented in CONTRIBUTING.md.
  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

⚠️ Orphan Notice: No parent Epic was provided for this issue. This issue must be manually linked to an appropriate TEST-INFRA parent Epic before work begins. Please update the Parent Epic field in Metadata above and create the dependency link via Forgejo's dependency tracking system.


Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `task/v3.8.0-test-data-quality-improvements` - **Commit Message**: `test(infra): improve test data management with generation, schema validation, and documentation` - **Milestone**: v3.8.0 - **Parent Epic**: *(to be linked — see orphan note below)* ## Background and Context **Area**: `test-data-quality` Our current test data management, while effective in separating data from tests, relies heavily on manually curated, hardcoded values in `.yaml` and `.json` files. This approach, while simple, presents several risks as the project scales: * **Maintenance Overhead**: Changes to data models require manually updating numerous fixture files, which is time-consuming and error-prone. * **Brittleness**: Tests can become tightly coupled to specific hardcoded data, making them less resilient to change. * **Limited Scope**: Manually creating data for all edge cases and permutations is impractical, leading to potential gaps in test coverage. * **Lack of Validation**: There is no automated validation of test data against schemas, which can lead to subtle bugs when data and application code diverge. This issue proposes a set of improvements to our test data strategy to address these risks and improve the overall quality and robustness of our testing infrastructure. ## Expected Behavior A robust test data management system that: - Generates realistic and varied test data on the fly using a library such as Faker, reducing reliance on static hardcoded fixture files. - Validates all test data fixtures against schemas automatically, ensuring consistency with application data models. - Provides refactored test suites that use generated data where appropriate, improving resilience and maintainability. - Documents the new test data generation and validation process in `CONTRIBUTING.md` so all developers can follow the new best practices. ## Subtasks - [ ] Implement a data generation library (e.g., Faker) to create realistic and varied test data on the fly — reduces reliance on static, hardcoded files and allows easy generation of data for a wider range of scenarios. - [ ] Introduce schema validation for all test data fixtures — ensures test data is always consistent with application data models and catches errors early in the development process. - [ ] Refactor existing tests to use generated data where appropriate — gradual process that will ultimately make tests more robust and easier to maintain. - [ ] Document the new test data generation and validation process in `CONTRIBUTING.md` — ensures all developers are aware of new best practices and can easily contribute to test data infrastructure. - [ ] Tests (Behave): Add/update scenarios covering the new data generation and validation utilities. - [ ] Tests (Robot): Add integration tests verifying schema validation of fixture data. - [ ] Verify coverage >= 97% via `nox -s coverage_report`. - [ ] Run `nox` (all default sessions), fix any errors. ## Definition of Done - [ ] A data generation library is integrated into the testing framework. - [ ] Schema validation is implemented for all test data fixtures. - [ ] At least one existing test suite is refactored to use generated data. - [ ] The new test data process is documented in `CONTRIBUTING.md`. - [ ] All subtasks above are completed and checked off. - [ ] A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - [ ] The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - [ ] The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - [ ] All nox stages pass. - [ ] Coverage >= 97%. --- > ⚠️ **Orphan Notice**: No parent Epic was provided for this issue. This issue must be manually linked to an appropriate TEST-INFRA parent Epic before work begins. Please update the **Parent Epic** field in Metadata above and create the dependency link via Forgejo's dependency tracking system. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-02 23:20:47 +00:00
Author
Owner

⚠️ Orphan Issue — Manual Action Required

This issue was created without a parent Epic. Per CONTRIBUTING.md, all regular issues must be linked to a parent Epic using Forgejo's dependency tracking system (child issue blocks parent Epic).

Action needed by a project owner or maintainer:

  1. Identify or create an appropriate TEST-INFRA parent Epic for the test-data-quality workstream.
  2. Update the Parent Epic field in the Metadata section of this issue body.
  3. Create the dependency link so that this issue (#1634) blocks the parent Epic:
curl -s -X POST "https://git.cleverthis.com/api/v1/repos/cleveragents/cleveragents-core/issues/1634/blocks" \
  -H "Authorization: token <FORGEJO_PAT>" \
  -H "Content-Type: application/json" \
  -d '{"dependency_id": <PARENT_EPIC_NUMBER>}'

This issue will remain in State/Unverified until the parent Epic link is established.


Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

⚠️ **Orphan Issue — Manual Action Required** This issue was created without a parent Epic. Per `CONTRIBUTING.md`, all regular issues **must** be linked to a parent Epic using Forgejo's dependency tracking system (child issue **blocks** parent Epic). **Action needed by a project owner or maintainer:** 1. Identify or create an appropriate TEST-INFRA parent Epic for the `test-data-quality` workstream. 2. Update the **Parent Epic** field in the Metadata section of this issue body. 3. Create the dependency link so that this issue (#1634) **blocks** the parent Epic: ```bash curl -s -X POST "https://git.cleverthis.com/api/v1/repos/cleveragents/cleveragents-core/issues/1634/blocks" \ -H "Authorization: token <FORGEJO_PAT>" \ -H "Content-Type: application/json" \ -d '{"dependency_id": <PARENT_EPIC_NUMBER>}' ``` This issue will remain in `State/Unverified` until the parent Epic link is established. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Priority/Medium
  • MoSCoW: MoSCoW/Could Have — improving test data management and quality is a nice-to-have improvement. Could Have.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Priority/Medium - **MoSCoW**: MoSCoW/Could Have — improving test data management and quality is a nice-to-have improvement. Could Have. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#1634
No description provided.