Files
agent/tests/pipeline_setup/suite_01_synthetic/code_qa_eval
2026-03-12 23:33:51 +03:00
..
2026-03-12 16:55:23 +03:00
2026-03-12 16:55:23 +03:00
2026-03-12 16:55:23 +03:00
2026-03-12 16:55:23 +03:00
2026-03-12 16:55:23 +03:00
2026-03-12 16:55:23 +03:00
2026-03-12 23:33:51 +03:00
2026-03-12 23:33:51 +03:00

CODE_QA evaluation harness

Runs the canonical CODE_QA pipeline (IntentRouterV2 → retrieval → evidence gate → diagnostics) over golden cases and writes artifacts for calibration.

Modes

  • Fixture (default): Uses tests/pipeline_setup/suite_01_synthetic/fixtures/code_qa_repo. No env vars required.
  • Local repo: Set CODE_QA_REPO_PATH to a directory; optionally CODE_QA_PROJECT_ID.

Run

From the project root (agent repo):

python -m tests.pipeline_setup.suite_01_synthetic.code_qa_eval.run

Requires a configured database (same as pipeline_intent_rag router_rag tests). Outputs:

  • tests/pipeline_setup/test_results/code_qa_eval/<run_id>/*.md and *.json per case
  • tests/pipeline_setup/test_results/code_qa_eval/summary_<run_id>.md batch summary

Exit code 0 if all golden cases pass, 1 otherwise.

Golden cases

Edit tests/pipeline_setup/suite_01_synthetic/golden/code_qa/cases.yaml to add or change cases. See tests/pipeline_setup/suite_01_synthetic/golden/code_qa/README.md for the field format.

Tests

pytest tests/pipeline_setup/suite_01_synthetic/code_qa_eval/ -v

The fixture-mode integration test (test_run_eval_fixture_mode_structure) is skipped if the DB or dependencies are not available.