Фикс состояния
This commit is contained in:
16
tests/README.md
Normal file
16
tests/README.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Структура тестов
|
||||
|
||||
- **unit_tests/** — юнит-тесты приложений:
|
||||
- `agent/` — тесты модуля agent (оркестратор, роутер, сервисы)
|
||||
- `chat/` — тесты модуля chat
|
||||
- `rag/` — тесты модуля RAG (индексация, retrieval, intent router)
|
||||
|
||||
- **pipeline_setup/** — тесты, используемые в настройке пайплайна:
|
||||
- `pipeline_intent_rag/` — intent-router → RAG → LLM (маркеры `router_rag`, `full_chain`)
|
||||
- `code_qa_eval/` — CODE_QA golden eval harness
|
||||
- `golden/`, `fixtures/`, `test_results/`, `utils/` — данные, результаты прогонов и скрипты запуска пайплайна
|
||||
|
||||
Запуск:
|
||||
|
||||
- Юнит-тесты: `pytest tests/unit_tests/ -v`
|
||||
- Пайплайн-тесты: `pytest tests/pipeline_setup/ -v` (см. `pipeline_setup/pipeline_intent_rag/README.md` и `pipeline_setup/code_qa_eval/README.md`)
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -4,5 +4,6 @@ import sys
|
||||
from pathlib import Path
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[1]
|
||||
if str(ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(ROOT))
|
||||
SRC = ROOT / "src"
|
||||
if str(SRC) not in sys.path:
|
||||
sys.path.insert(0, str(SRC))
|
||||
|
||||
115
tests/pipeline_setup/README.md
Normal file
115
tests/pipeline_setup/README.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Тесты пайплайна (pipeline_setup)
|
||||
|
||||
Каталог содержит тесты и вспомогательные модули для настройки и проверки пайплайна: intent router → RAG retrieval → LLM.
|
||||
|
||||
## Содержимое
|
||||
|
||||
| Каталог / файл | Назначение |
|
||||
|----------------|------------|
|
||||
| **suite_01_synthetic/** | Синтетические тесты на fixture-репозитории и golden-кейсах для CODE_QA. |
|
||||
| **suite_02_pipeline/** | Интеграционные наборы `router_only`, `router_rag`, `full_chain` и CLI для их запуска. |
|
||||
| **test_results/** | Результаты прогонов: JSONL и отчёты тестовых пайплайнов, артефакты CODE_QA eval. |
|
||||
| **utils/** | Общие низкоуровневые утилиты. Сейчас здесь живёт переиспользуемый RAG-индексатор. |
|
||||
|
||||
## Структура наборов
|
||||
|
||||
- `suite_01_synthetic` — fixture-репозиторий `code_qa_repo`, golden-кейсы и evaluation harness.
|
||||
- `suite_02_pipeline` — интеграционные сценарии `router_only`, `router_rag`, `full_chain` и CLI-скрипты запуска.
|
||||
|
||||
## CLI для pipeline-набора
|
||||
|
||||
Запуск — из **корня проекта (agent)**:
|
||||
|
||||
```bash
|
||||
# Из каталога agent
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.<скрипт> [параметры]
|
||||
```
|
||||
|
||||
### 1. Индексация репозитория
|
||||
|
||||
Создаёт RAG-сессию и индексирует указанную директорию. Результат — `rag_session_id`, который передаётся в пайплайны с retrieval или full_chain.
|
||||
|
||||
```bash
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.index_repo --repo-path <путь_к_репо> [--project-id ID]
|
||||
```
|
||||
|
||||
| Параметр | Обязательный | Описание |
|
||||
|----------|--------------|----------|
|
||||
| `--repo-path` | да | Путь к корню индексируемого репозитория. |
|
||||
| `--project-id` | нет | Идентификатор проекта для сессии; по умолчанию — имя директории репо. |
|
||||
|
||||
Пример:
|
||||
|
||||
```bash
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.index_repo --repo-path ./tests/pipeline_setup/suite_01_synthetic/fixtures/code_qa_repo --project-id code_qa_repo
|
||||
# Вывод: rag_session_id=<uuid>
|
||||
```
|
||||
|
||||
### 2. Пайплайн «только intent router»
|
||||
|
||||
Цепочка: классификация запроса (intent_router_v2), без RAG и LLM.
|
||||
|
||||
```bash
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.run_router_only [--case-id ID ...] [--verbose] [--test-name PREFIX]
|
||||
```
|
||||
|
||||
| Параметр | Описание |
|
||||
|----------|----------|
|
||||
| `--case-id` | Запустить только указанные кейсы (можно повторять). |
|
||||
| `--verbose` | Выводить диагностику по каждому кейсу. |
|
||||
| `--test-name` | Префикс имени файла с результатами (по умолчанию `cli_router_only`). |
|
||||
|
||||
### 3. Пайплайн «intent router + retrieval»
|
||||
|
||||
Цепочка: intent_router_v2 → RAG retrieval. Требуется предварительная индексация (или `--reindex-repo-path`).
|
||||
|
||||
```bash
|
||||
# С уже полученным rag_session_id:
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.run_router_rag --rag-session-id <uuid> [--case-id ID ...] [--verbose]
|
||||
|
||||
# С индексацией перед прогоном:
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.run_router_rag --reindex-repo-path <путь_к_репо> [--reindex-project-id ID] [--case-id ID ...]
|
||||
```
|
||||
|
||||
| Параметр | Описание |
|
||||
|----------|----------|
|
||||
| `--rag-session-id` | UUID RAG-сессии (результат индексации). |
|
||||
| `--reindex-repo-path` | Индексировать репо перед прогоном и использовать новую сессию. |
|
||||
| `--reindex-project-id` | project_id для новой сессии при `--reindex-repo-path`. |
|
||||
| `--case-id` | Запустить только указанные кейсы. |
|
||||
| `--verbose` | Диагностика по каждому кейсу. |
|
||||
| `--test-name` | Префикс файла результатов (по умолчанию `cli_router_rag`). |
|
||||
|
||||
Переменная окружения: `RUN_INTENT_PIPELINE_ROUTER_RAG=1` — включить режим router_rag. Обязательна `DATABASE_URL`.
|
||||
|
||||
### 4. Пайплайн «полная цепочка» (router + retrieval + LLM)
|
||||
|
||||
Цепочка: intent_router_v2 → RAG retrieval → ответ LLM (например GigaChat).
|
||||
|
||||
```bash
|
||||
# С уже полученным rag_session_id:
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.run_full_chain --rag-session-id <uuid> [--case-id ID ...] [--verbose]
|
||||
|
||||
# С индексацией перед прогоном:
|
||||
python -m tests.pipeline_setup.suite_02_pipeline.cli.run_full_chain --reindex-repo-path <путь_к_репо> [--reindex-project-id ID] [--case-id ID ...]
|
||||
```
|
||||
|
||||
Параметры — те же, что у `run_router_rag`. Дополнительно: `RUN_INTENT_PIPELINE_FULL_CHAIN=1` и настройки доступа к LLM (см. `.env` и `suite_02_pipeline/pipeline_intent_rag/.env.test`).
|
||||
|
||||
## Запуск через pytest
|
||||
|
||||
- Только intent router (без RAG/LLM):
|
||||
`pytest tests/pipeline_setup/suite_02_pipeline/pipeline_intent_rag/test_intent_router_only_matrix.py -q`
|
||||
|
||||
- С RAG (нужны `RUN_INTENT_PIPELINE_ROUTER_RAG=1` и БД):
|
||||
`RUN_INTENT_PIPELINE_ROUTER_RAG=1 pytest -m router_rag tests/pipeline_setup/suite_02_pipeline/pipeline_intent_rag/ -q`
|
||||
|
||||
- Полная цепочка (нужны `RUN_INTENT_PIPELINE_FULL_CHAIN=1`, БД и LLM):
|
||||
`RUN_INTENT_PIPELINE_FULL_CHAIN=1 pytest -m full_chain tests/pipeline_setup/suite_02_pipeline/pipeline_intent_rag/ -q`
|
||||
|
||||
Подробнее — в `suite_02_pipeline/pipeline_intent_rag/README.md` и `suite_01_synthetic/code_qa_eval/README.md`.
|
||||
|
||||
## Окружение
|
||||
|
||||
- Загрузка env: сначала `suite_02_pipeline/pipeline_intent_rag/.env.test`, затем workspace `.env`.
|
||||
- Для retrieval и full_chain нужны `DATABASE_URL` и (для full_chain) настройки LLM.
|
||||
1
tests/pipeline_setup/__init__.py
Normal file
1
tests/pipeline_setup/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Pipeline configuration tests: intent-router → RAG → LLM, CODE_QA eval harness."""
|
||||
5
tests/pipeline_setup/conftest.py
Normal file
5
tests/pipeline_setup/conftest.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from tests.pipeline_setup.env_loader import load_pipeline_setup_env
|
||||
|
||||
load_pipeline_setup_env(start_dir=__file__)
|
||||
42
tests/pipeline_setup/env_loader.py
Normal file
42
tests/pipeline_setup/env_loader.py
Normal file
@@ -0,0 +1,42 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
from app.modules.shared.env_loader import load_workspace_env
|
||||
|
||||
|
||||
def load_pipeline_setup_env(start_dir: str | Path | None = None) -> list[Path]:
|
||||
base = Path(start_dir or Path.cwd()).resolve()
|
||||
loaded = load_workspace_env(start_dir=base)
|
||||
pipeline_root = _find_pipeline_setup_root(base)
|
||||
env_path = pipeline_root / ".env"
|
||||
if env_path.is_file():
|
||||
_apply_env_file(env_path)
|
||||
loaded.append(env_path)
|
||||
return loaded
|
||||
|
||||
|
||||
def _find_pipeline_setup_root(base: Path) -> Path:
|
||||
for directory in (base, *base.parents):
|
||||
if directory.name == "pipeline_setup" and (directory / "__init__.py").is_file():
|
||||
return directory
|
||||
raise RuntimeError(f"Unable to locate tests/pipeline_setup root from: {base}")
|
||||
|
||||
|
||||
def _apply_env_file(path: Path) -> None:
|
||||
for raw_line in path.read_text(encoding="utf-8").splitlines():
|
||||
line = raw_line.strip()
|
||||
if not line or line.startswith("#") or "=" not in line:
|
||||
continue
|
||||
key, raw_value = line.split("=", 1)
|
||||
name = key.removeprefix("export ").strip()
|
||||
if not name:
|
||||
continue
|
||||
os.environ[name] = _normalize_value(raw_value.strip())
|
||||
|
||||
|
||||
def _normalize_value(value: str) -> str:
|
||||
if len(value) >= 2 and value[0] == value[-1] and value[0] in {"'", '"'}:
|
||||
return value[1:-1]
|
||||
return value
|
||||
63
tests/pipeline_setup/suite_01_synthetic/README.md
Normal file
63
tests/pipeline_setup/suite_01_synthetic/README.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# Suite 01 Synthetic
|
||||
|
||||
Синтетический набор тестов для проверки CODE_QA на тестовом репозитории:
|
||||
[fixtures/code_qa_repo](/Users/alex/Dev_projects_v2/ai driven app process/v2/agent/tests/pipeline_setup/suite_01_synthetic/fixtures/code_qa_repo).
|
||||
|
||||
## Что входит в suite
|
||||
|
||||
- `fixtures/` — тестовый репозиторий и входные данные
|
||||
- `golden/` — golden-кейсы
|
||||
- `code_qa_eval/` — eval harness, который индексирует репозиторий и прогоняет golden-кейсы
|
||||
|
||||
## Запуск тестов
|
||||
|
||||
Из корня проекта:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=. pytest tests/pipeline_setup/suite_01_synthetic/code_qa_eval/ -q
|
||||
```
|
||||
|
||||
Точечный запуск:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=. pytest tests/pipeline_setup/suite_01_synthetic/code_qa_eval/test_eval_harness.py -q
|
||||
```
|
||||
|
||||
## Запуск eval harness
|
||||
|
||||
Из корня проекта:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=. python -m tests.pipeline_setup.suite_01_synthetic.code_qa_eval.run
|
||||
```
|
||||
|
||||
Во время запуска harness:
|
||||
|
||||
- выбирает репозиторий для индексации
|
||||
- создаёт новый `rag_session_id`
|
||||
- индексирует репозиторий в RAG
|
||||
- прогоняет golden-кейсы через CODE_QA pipeline
|
||||
|
||||
## Параметры запуска
|
||||
|
||||
У suite нет собственных CLI-флагов; конфигурация задаётся через переменные окружения:
|
||||
|
||||
- `CODE_QA_REPO_PATH` — путь к локальному репозиторию вместо fixture-репозитория
|
||||
- `CODE_QA_PROJECT_ID` — `project_id` для создаваемой RAG-сессии; если не задан, берётся имя директории репозитория
|
||||
|
||||
Поведение по умолчанию:
|
||||
|
||||
- если `CODE_QA_REPO_PATH` не задан, используется `fixtures/code_qa_repo`
|
||||
- для fixture-режима `project_id` фиксирован как `code_qa_repo`
|
||||
- `rag_session_id` заранее не передаётся, а создаётся автоматически во время индексации
|
||||
|
||||
## Что требуется для запуска
|
||||
|
||||
- рабочий `DATABASE_URL`
|
||||
- доступные зависимости для RAG и SQLAlchemy
|
||||
|
||||
Артефакты пишутся в:
|
||||
[test_results/code_qa_eval](/Users/alex/Dev_projects_v2/ai driven app process/v2/agent/tests/pipeline_setup/test_results/code_qa_eval)
|
||||
|
||||
Детали harness:
|
||||
[code_qa_eval/README.md](/Users/alex/Dev_projects_v2/ai driven app process/v2/agent/tests/pipeline_setup/suite_01_synthetic/code_qa_eval/README.md)
|
||||
1
tests/pipeline_setup/suite_01_synthetic/__init__.py
Normal file
1
tests/pipeline_setup/suite_01_synthetic/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Synthetic pipeline tests built around fixture repositories and golden cases."""
|
||||
@@ -0,0 +1,35 @@
|
||||
# CODE_QA evaluation harness
|
||||
|
||||
Runs the canonical CODE_QA pipeline (IntentRouterV2 → retrieval → evidence gate → diagnostics) over golden cases and writes artifacts for calibration.
|
||||
|
||||
## Modes
|
||||
|
||||
- **Fixture (default):** Uses `tests/pipeline_setup/suite_01_synthetic/fixtures/code_qa_repo`. No env vars required.
|
||||
- **Local repo:** Set `CODE_QA_REPO_PATH` to a directory; optionally `CODE_QA_PROJECT_ID`.
|
||||
|
||||
## Run
|
||||
|
||||
From the **project root** (agent repo):
|
||||
|
||||
```bash
|
||||
python -m tests.pipeline_setup.suite_01_synthetic.code_qa_eval.run
|
||||
```
|
||||
|
||||
Requires a configured database (same as pipeline_intent_rag router_rag tests). Outputs:
|
||||
|
||||
- `tests/pipeline_setup/test_results/code_qa_eval/<run_id>/*.md` and `*.json` per case
|
||||
- `tests/pipeline_setup/test_results/code_qa_eval/summary_<run_id>.md` batch summary
|
||||
|
||||
Exit code 0 if all golden cases pass, 1 otherwise.
|
||||
|
||||
## Golden cases
|
||||
|
||||
Edit `tests/pipeline_setup/suite_01_synthetic/golden/code_qa/cases.yaml` to add or change cases. See `tests/pipeline_setup/suite_01_synthetic/golden/code_qa/README.md` for the field format.
|
||||
|
||||
## Tests
|
||||
|
||||
```bash
|
||||
pytest tests/pipeline_setup/suite_01_synthetic/code_qa_eval/ -v
|
||||
```
|
||||
|
||||
The fixture-mode integration test (`test_run_eval_fixture_mode_structure`) is skipped if the DB or dependencies are not available.
|
||||
@@ -0,0 +1 @@
|
||||
"""CODE_QA pipeline calibration: golden runner, diagnostics artifacts, fixture and real-repo support."""
|
||||
@@ -0,0 +1,152 @@
|
||||
"""Write diagnostics artifacts and batch summary for CODE_QA evaluation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.runner import EvalCaseResult
|
||||
|
||||
|
||||
def dump_run_artifact(
|
||||
result: EvalCaseResult,
|
||||
out_dir: Path,
|
||||
*,
|
||||
run_id: str = "",
|
||||
) -> None:
|
||||
"""Write one run: markdown summary and JSON detail for manual review."""
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
prefix = result.case.id
|
||||
if run_id:
|
||||
prefix = f"{run_id}_{prefix}"
|
||||
|
||||
md_path = out_dir / f"{prefix}.md"
|
||||
md_path.write_text(_run_markdown(result), encoding="utf-8")
|
||||
|
||||
json_path = out_dir / f"{prefix}.json"
|
||||
json_path.write_text(
|
||||
json.dumps(_run_json(result), ensure_ascii=False, indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
|
||||
def _run_markdown(result: EvalCaseResult) -> str:
|
||||
c = result.case
|
||||
p = result.pipeline_result
|
||||
dr = p.diagnostics_report
|
||||
lines = [
|
||||
f"# {c.id}",
|
||||
"",
|
||||
"## Query",
|
||||
c.query,
|
||||
"",
|
||||
"## Expected",
|
||||
f"- intent: {c.expected_intent}, sub_intent: {c.expected_sub_intent}",
|
||||
f"- answer_mode: {c.expected_answer_mode}",
|
||||
"",
|
||||
"## Actual",
|
||||
f"- intent: {p.router_result.intent}, sub_intent: {p.router_result.query_plan.sub_intent if p.router_result.query_plan else '—'}",
|
||||
f"- answer_mode: {p.answer_mode}",
|
||||
f"- evidence_gate_passed: {p.evidence_gate_passed}",
|
||||
f"- evidence_count: {p.evidence_bundle.evidence_count}",
|
||||
"",
|
||||
"## Result",
|
||||
"PASS" if result.passed else "FAIL",
|
||||
"",
|
||||
]
|
||||
if result.mismatches:
|
||||
lines.append("## Mismatches")
|
||||
for m in result.mismatches:
|
||||
lines.append(f"- {m}")
|
||||
lines.append("")
|
||||
lines.extend([
|
||||
"## Router",
|
||||
f"- path_scope: {list(getattr(p.router_result.retrieval_spec.filters, 'path_scope', []) or [])}",
|
||||
f"- layers: {[q.layer_id for q in (p.router_result.retrieval_spec.layer_queries or [])]}",
|
||||
"",
|
||||
"## Retrieval",
|
||||
f"- requested_layers: {p.retrieval_request.requested_layers}",
|
||||
f"- chunk_count: {len(p.retrieval_result.code_chunks)}",
|
||||
f"- layer_outcomes: {[(o.layer_id, o.hit_count) for o in p.retrieval_result.layer_outcomes]}",
|
||||
"",
|
||||
"## Evidence gate",
|
||||
f"- failure_reasons: {dr.failure_reasons if dr else []}",
|
||||
"",
|
||||
"## Timings (ms)",
|
||||
f"{p.timings_ms}",
|
||||
"",
|
||||
])
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _run_json(result: EvalCaseResult) -> dict:
|
||||
c = result.case
|
||||
p = result.pipeline_result
|
||||
dr = p.diagnostics_report
|
||||
return {
|
||||
"case_id": c.id,
|
||||
"query": c.query,
|
||||
"expected": {
|
||||
"intent": c.expected_intent,
|
||||
"sub_intent": c.expected_sub_intent,
|
||||
"answer_mode": c.expected_answer_mode,
|
||||
},
|
||||
"actual": {
|
||||
"intent": p.router_result.intent,
|
||||
"sub_intent": p.router_result.query_plan.sub_intent if p.router_result.query_plan else None,
|
||||
"answer_mode": p.answer_mode,
|
||||
"evidence_gate_passed": p.evidence_gate_passed,
|
||||
"evidence_count": p.evidence_bundle.evidence_count,
|
||||
},
|
||||
"passed": result.passed,
|
||||
"mismatches": result.mismatches,
|
||||
"router_result": dr.router_result if dr else {},
|
||||
"retrieval_request": dr.retrieval_request if dr else {},
|
||||
"per_layer_outcome": dr.per_layer_outcome if dr else [],
|
||||
"failure_reasons": dr.failure_reasons if dr else [],
|
||||
"timings_ms": p.timings_ms,
|
||||
}
|
||||
|
||||
|
||||
def write_batch_summary(
|
||||
results: list[EvalCaseResult],
|
||||
out_dir: Path,
|
||||
*,
|
||||
run_id: str = "",
|
||||
) -> Path:
|
||||
"""Write a single readable batch summary; returns path to the file."""
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
passed = sum(1 for r in results if r.passed)
|
||||
total = len(results)
|
||||
stamp = run_id or datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
path = out_dir / f"summary_{stamp}.md"
|
||||
lines = [
|
||||
"# CODE_QA evaluation summary",
|
||||
"",
|
||||
f"**{passed}/{total}** cases passed.",
|
||||
"",
|
||||
"| Case ID | Query | Expected scenario | Actual scenario | Target | Evidence | Answer mode | Pass |",
|
||||
"|---------|-------|------------------|-----------------|--------|----------|-------------|------|",
|
||||
]
|
||||
for r in results:
|
||||
c = r.case
|
||||
p = r.pipeline_result
|
||||
sub = p.router_result.query_plan.sub_intent if p.router_result.query_plan else "—"
|
||||
target = "—"
|
||||
if p.evidence_bundle.resolved_target:
|
||||
target = p.evidence_bundle.resolved_target[:40] + ("…" if len(p.evidence_bundle.resolved_target or "") > 40 else "")
|
||||
ev = "✓" if p.evidence_gate_passed else "✗"
|
||||
mode = p.answer_mode
|
||||
pass_mark = "✓" if r.passed else "✗"
|
||||
q_short = c.query[:40] + ("…" if len(c.query) > 40 else "")
|
||||
lines.append(
|
||||
f"| {c.id} | {q_short} | {c.expected_sub_intent} | {sub} | {target} | {ev} | {mode} | {pass_mark} |"
|
||||
)
|
||||
lines.append("")
|
||||
lines.append("## Failures")
|
||||
for r in results:
|
||||
if not r.passed and r.mismatches:
|
||||
lines.append(f"- **{r.case.id}**: {'; '.join(r.mismatches)}")
|
||||
path.write_text("\n".join(lines), encoding="utf-8")
|
||||
return path
|
||||
@@ -0,0 +1,41 @@
|
||||
"""Eval harness config: fixture vs user-provided repo path, artifact output."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class EvalConfig:
|
||||
"""Configuration for CODE_QA evaluation runs."""
|
||||
|
||||
repo_path: Path
|
||||
test_results_dir: Path
|
||||
golden_cases_path: Path
|
||||
project_id: str | None = None
|
||||
use_fixture: bool = True
|
||||
|
||||
@classmethod
|
||||
def from_env(cls, project_root: Path | None = None) -> "EvalConfig":
|
||||
"""Build config: fixture repo by default; optional CODE_QA_REPO_PATH for local calibration."""
|
||||
root = project_root or Path(__file__).resolve().parent.parent
|
||||
raw_repo = os.getenv("CODE_QA_REPO_PATH", "").strip()
|
||||
if raw_repo:
|
||||
repo_path = Path(raw_repo).expanduser().resolve()
|
||||
use_fixture = False
|
||||
project_id = os.getenv("CODE_QA_PROJECT_ID", "").strip() or repo_path.name
|
||||
else:
|
||||
repo_path = root / "fixtures" / "code_qa_repo"
|
||||
use_fixture = True
|
||||
project_id = "code_qa_repo"
|
||||
test_results_dir = root / "test_results" / "code_qa_eval"
|
||||
golden_cases_path = root / "golden" / "code_qa" / "cases.yaml"
|
||||
return cls(
|
||||
repo_path=repo_path,
|
||||
test_results_dir=test_results_dir,
|
||||
golden_cases_path=golden_cases_path,
|
||||
project_id=project_id,
|
||||
use_fixture=use_fixture,
|
||||
)
|
||||
@@ -0,0 +1,51 @@
|
||||
"""Load golden cases from YAML for CODE_QA evaluation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class GoldenCase:
|
||||
"""Single golden case for pipeline evaluation."""
|
||||
|
||||
id: str
|
||||
query: str
|
||||
expected_intent: str
|
||||
expected_sub_intent: str
|
||||
expected_answer_mode: str = "normal"
|
||||
expected_target_hint: str | None = None
|
||||
expected_path_scope_contains: list[str] = field(default_factory=list)
|
||||
expected_symbol_candidates_contain: list[str] = field(default_factory=list)
|
||||
expected_layers: list[str] = field(default_factory=list)
|
||||
notes: str = ""
|
||||
|
||||
|
||||
def load_golden_cases(path: Path) -> list[GoldenCase]:
|
||||
"""Load and parse golden cases from YAML. Returns list of GoldenCase."""
|
||||
if not path.exists():
|
||||
return []
|
||||
with path.open(encoding="utf-8") as f:
|
||||
data = yaml.safe_load(f) or {}
|
||||
cases_raw = data.get("cases") or []
|
||||
out: list[GoldenCase] = []
|
||||
for c in cases_raw:
|
||||
if not isinstance(c, dict) or not c.get("id") or not c.get("query"):
|
||||
continue
|
||||
out.append(GoldenCase(
|
||||
id=str(c["id"]),
|
||||
query=str(c["query"]),
|
||||
expected_intent=str(c.get("expected_intent", "CODE_QA")),
|
||||
expected_sub_intent=str(c.get("expected_sub_intent", "EXPLAIN")),
|
||||
expected_answer_mode=str(c.get("expected_answer_mode", "normal")),
|
||||
expected_target_hint=c.get("expected_target_hint"),
|
||||
expected_path_scope_contains=list(c.get("expected_path_scope_contains") or []),
|
||||
expected_symbol_candidates_contain=list(c.get("expected_symbol_candidates_contain") or []),
|
||||
expected_layers=list(c.get("expected_layers") or []),
|
||||
notes=str(c.get("notes") or ""),
|
||||
))
|
||||
return out
|
||||
111
tests/pipeline_setup/suite_01_synthetic/code_qa_eval/run.py
Normal file
111
tests/pipeline_setup/suite_01_synthetic/code_qa_eval/run.py
Normal file
@@ -0,0 +1,111 @@
|
||||
"""Entrypoint: run CODE_QA golden evaluation and write artifacts + summary."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
_agent_root = Path(__file__).resolve().parents[4]
|
||||
if str(_agent_root) not in sys.path:
|
||||
sys.path.insert(0, str(_agent_root))
|
||||
_src = _agent_root / "src"
|
||||
if _src.exists() and str(_src) not in sys.path:
|
||||
sys.path.insert(0, str(_src))
|
||||
|
||||
# Load .env from project root so DATABASE_URL is available
|
||||
from app.modules.shared.env_loader import load_workspace_env
|
||||
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.artifacts import dump_run_artifact, write_batch_summary
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.config import EvalConfig
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.runner import run_eval
|
||||
|
||||
|
||||
class _TeeStream:
|
||||
def __init__(self, *streams) -> None:
|
||||
self._streams = streams
|
||||
|
||||
def write(self, data: str) -> int:
|
||||
for stream in self._streams:
|
||||
stream.write(data)
|
||||
return len(data)
|
||||
|
||||
def flush(self) -> None:
|
||||
for stream in self._streams:
|
||||
stream.flush()
|
||||
|
||||
|
||||
def _check_db_available() -> bool:
|
||||
"""Try to connect to the database; return False if unavailable."""
|
||||
try:
|
||||
from sqlalchemy import text
|
||||
from app.modules.shared.db import get_engine
|
||||
with get_engine().connect() as conn:
|
||||
conn.execute(text("SELECT 1"))
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def main() -> None:
|
||||
# Workspace root (agent repo) for .env; pipeline_setup root for fixtures/golden/test_results
|
||||
workspace_root = Path(__file__).resolve().parents[3]
|
||||
pipeline_root = Path(__file__).resolve().parents[1]
|
||||
load_workspace_env(workspace_root)
|
||||
config = EvalConfig.from_env(project_root=pipeline_root)
|
||||
run_id = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
results_run_dir = config.test_results_dir / run_id
|
||||
results_run_dir.mkdir(parents=True, exist_ok=True)
|
||||
console_log_path = results_run_dir / "console_output.txt"
|
||||
|
||||
original_stdout = sys.stdout
|
||||
original_stderr = sys.stderr
|
||||
with console_log_path.open("w", encoding="utf-8") as log_file:
|
||||
sys.stdout = _TeeStream(original_stdout, log_file)
|
||||
sys.stderr = _TeeStream(original_stderr, log_file)
|
||||
try:
|
||||
_run_with_logging(config, results_run_dir, run_id)
|
||||
finally:
|
||||
sys.stdout.flush()
|
||||
sys.stderr.flush()
|
||||
sys.stdout = original_stdout
|
||||
sys.stderr = original_stderr
|
||||
|
||||
|
||||
def _run_with_logging(config: EvalConfig, results_run_dir: Path, run_id: str) -> None:
|
||||
print(f"Console log: {results_run_dir / 'console_output.txt'}")
|
||||
|
||||
if not config.repo_path.exists():
|
||||
print(f"Repo path not found: {config.repo_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
if not config.golden_cases_path.exists():
|
||||
print(f"Golden cases not found: {config.golden_cases_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if not _check_db_available():
|
||||
print(
|
||||
"Database is not available. Evaluation requires a running PostgreSQL instance.\n"
|
||||
"Set DATABASE_URL (e.g. postgresql+psycopg://user:pass@localhost:5432/db) or start the DB (e.g. docker-compose up -d db).",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Running evaluation: repo={config.repo_path}, fixture={config.use_fixture}")
|
||||
print(f"Results: {results_run_dir}")
|
||||
|
||||
results = run_eval(config)
|
||||
for r in results:
|
||||
dump_run_artifact(r, results_run_dir, run_id=run_id)
|
||||
|
||||
summary_path = write_batch_summary(results, config.test_results_dir, run_id=run_id)
|
||||
passed = sum(1 for r in results if r.passed)
|
||||
total = len(results)
|
||||
print(f"\n{passed}/{total} cases passed. Summary: {summary_path}")
|
||||
for r in results:
|
||||
if not r.passed:
|
||||
print(f" FAIL {r.case.id}: {'; '.join(r.mismatches)}")
|
||||
sys.exit(0 if passed == total else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
102
tests/pipeline_setup/suite_01_synthetic/code_qa_eval/runner.py
Normal file
102
tests/pipeline_setup/suite_01_synthetic/code_qa_eval/runner.py
Normal file
@@ -0,0 +1,102 @@
|
||||
"""Run golden cases through CodeQAPipelineRunner and compare to expected."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
from app.modules.rag.code_qa_pipeline import CodeQAPipelineResult, CodeQAPipelineRunner
|
||||
from app.modules.rag.contracts.enums import RagLayer
|
||||
from app.modules.rag.intent_router_v2 import ConversationState, IntentRouterV2, RepoContext
|
||||
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.config import EvalConfig
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.golden_loader import GoldenCase, load_golden_cases
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class EvalCaseResult:
|
||||
"""Result of evaluating one golden case."""
|
||||
|
||||
case: GoldenCase
|
||||
pipeline_result: CodeQAPipelineResult
|
||||
passed: bool
|
||||
mismatches: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
def _repo_context() -> RepoContext:
|
||||
return RepoContext(
|
||||
languages=["python"],
|
||||
available_domains=["CODE", "DOCS"],
|
||||
available_layers=[
|
||||
RagLayer.CODE_ENTRYPOINTS,
|
||||
RagLayer.CODE_SYMBOL_CATALOG,
|
||||
RagLayer.CODE_DEPENDENCY_GRAPH,
|
||||
RagLayer.CODE_SEMANTIC_ROLES,
|
||||
RagLayer.CODE_SOURCE_CHUNKS,
|
||||
RagLayer.DOCS_MODULE_CATALOG,
|
||||
RagLayer.DOCS_FACT_INDEX,
|
||||
RagLayer.DOCS_SECTION_INDEX,
|
||||
RagLayer.DOCS_POLICY_INDEX,
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def run_eval(config: EvalConfig) -> list[EvalCaseResult]:
|
||||
"""Index repo, run all golden cases through the pipeline, compare to expected. Returns list of EvalCaseResult."""
|
||||
from app.modules.rag.persistence.repository import RagRepository
|
||||
from tests.pipeline_setup.suite_02_pipeline.pipeline_intent_rag.helpers.rag_db_adapter import RagDbAdapter, SessionEmbeddingDimensions
|
||||
from tests.pipeline_setup.utils.rag_indexer import RagSessionIndexer
|
||||
|
||||
repo = RagRepository()
|
||||
repo.ensure_tables()
|
||||
indexer = RagSessionIndexer(repo)
|
||||
rag_session_id = indexer.index_repo(config.repo_path, project_id=config.project_id)
|
||||
|
||||
adapter = RagDbAdapter(repository=repo, dim_resolver=SessionEmbeddingDimensions())
|
||||
router = IntentRouterV2()
|
||||
runner = CodeQAPipelineRunner(
|
||||
router=router,
|
||||
retrieval_adapter=adapter,
|
||||
repo_context=_repo_context(),
|
||||
)
|
||||
|
||||
cases = load_golden_cases(config.golden_cases_path)
|
||||
|
||||
results: list[EvalCaseResult] = []
|
||||
for case in cases:
|
||||
pipeline_result = runner.run(case.query, rag_session_id, run_retrieval=True, run_hydrate=True)
|
||||
passed, mismatches = _compare(case, pipeline_result)
|
||||
results.append(EvalCaseResult(case=case, pipeline_result=pipeline_result, passed=passed, mismatches=mismatches))
|
||||
return results
|
||||
|
||||
|
||||
def _compare(case: GoldenCase, result: CodeQAPipelineResult) -> tuple[bool, list[str]]:
|
||||
mismatches: list[str] = []
|
||||
rr = result.router_result
|
||||
sub_intent = (rr.query_plan.sub_intent if rr.query_plan else None) or ""
|
||||
|
||||
if rr.intent != case.expected_intent:
|
||||
mismatches.append(f"intent: expected {case.expected_intent}, got {rr.intent}")
|
||||
if sub_intent != case.expected_sub_intent:
|
||||
mismatches.append(f"sub_intent: expected {case.expected_sub_intent}, got {sub_intent}")
|
||||
if result.answer_mode != case.expected_answer_mode:
|
||||
mismatches.append(f"answer_mode: expected {case.expected_answer_mode}, got {result.answer_mode}")
|
||||
|
||||
if case.expected_path_scope_contains:
|
||||
path_scope = list(getattr(rr.retrieval_spec.filters, "path_scope", []) or [])
|
||||
for want in case.expected_path_scope_contains:
|
||||
if not any(want in p for p in path_scope):
|
||||
mismatches.append(f"path_scope should contain '{want}', got {path_scope}")
|
||||
if case.expected_symbol_candidates_contain:
|
||||
candidates = list(rr.query_plan.symbol_candidates or []) if rr.query_plan else []
|
||||
for want in case.expected_symbol_candidates_contain:
|
||||
if want not in candidates:
|
||||
mismatches.append(f"symbol_candidates should contain '{want}', got {candidates}")
|
||||
if case.expected_layers:
|
||||
layers = [str(q.layer_id) for q in (rr.retrieval_spec.layer_queries or [])]
|
||||
for want in case.expected_layers:
|
||||
if want not in layers:
|
||||
mismatches.append(f"layers should include '{want}', got {layers}")
|
||||
|
||||
passed = len(mismatches) == 0
|
||||
return passed, mismatches
|
||||
@@ -0,0 +1,189 @@
|
||||
"""Tests for CODE_QA evaluation harness: golden loader, compare logic, fixture-mode run."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from app.modules.rag.code_qa_pipeline import CodeQAPipelineResult
|
||||
from app.modules.rag.intent_router_v2.models import (
|
||||
CodeRetrievalFilters,
|
||||
EvidencePolicy,
|
||||
IntentRouterResult,
|
||||
QueryPlan,
|
||||
RetrievalSpec,
|
||||
SymbolResolution,
|
||||
)
|
||||
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.config import EvalConfig
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.golden_loader import GoldenCase, load_golden_cases
|
||||
from tests.pipeline_setup.suite_01_synthetic.code_qa_eval.runner import EvalCaseResult, _compare, run_eval
|
||||
|
||||
_TEST_ROOT = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def test_load_golden_cases_returns_list() -> None:
|
||||
path = _TEST_ROOT / "golden" / "code_qa" / "cases.yaml"
|
||||
if not path.exists():
|
||||
pytest.skip("Golden cases file not found")
|
||||
cases = load_golden_cases(path)
|
||||
assert isinstance(cases, list)
|
||||
assert len(cases) >= 1
|
||||
c = cases[0]
|
||||
assert c.id
|
||||
assert c.query
|
||||
assert c.expected_intent in ("CODE_QA", "DOCS_QA")
|
||||
assert c.expected_sub_intent in ("OPEN_FILE", "EXPLAIN", "FIND_TESTS", "FIND_ENTRYPOINTS", "GENERAL_QA")
|
||||
assert c.expected_answer_mode in ("normal", "degraded", "insufficient")
|
||||
|
||||
|
||||
def test_compare_passed_when_all_match() -> None:
|
||||
case = GoldenCase(
|
||||
id="test",
|
||||
query="Open app/main.py",
|
||||
expected_intent="CODE_QA",
|
||||
expected_sub_intent="OPEN_FILE",
|
||||
expected_answer_mode="normal",
|
||||
)
|
||||
result = _make_pipeline_result(intent="CODE_QA", sub_intent="OPEN_FILE", answer_mode="normal")
|
||||
passed, mismatches = _compare(case, result)
|
||||
assert passed is True
|
||||
assert len(mismatches) == 0
|
||||
|
||||
|
||||
def test_compare_fails_on_intent_mismatch() -> None:
|
||||
case = GoldenCase(
|
||||
id="test",
|
||||
query="Open app/main.py",
|
||||
expected_intent="CODE_QA",
|
||||
expected_sub_intent="OPEN_FILE",
|
||||
expected_answer_mode="normal",
|
||||
)
|
||||
result = _make_pipeline_result(intent="DOCS_QA", sub_intent="OPEN_FILE", answer_mode="normal")
|
||||
passed, mismatches = _compare(case, result)
|
||||
assert passed is False
|
||||
assert any("intent" in m for m in mismatches)
|
||||
|
||||
|
||||
def test_compare_fails_on_answer_mode_mismatch() -> None:
|
||||
case = GoldenCase(
|
||||
id="test",
|
||||
query="Explain NonExistent",
|
||||
expected_intent="CODE_QA",
|
||||
expected_sub_intent="EXPLAIN",
|
||||
expected_answer_mode="degraded",
|
||||
)
|
||||
result = _make_pipeline_result(intent="CODE_QA", sub_intent="EXPLAIN", answer_mode="normal")
|
||||
passed, mismatches = _compare(case, result)
|
||||
assert passed is False
|
||||
assert any("answer_mode" in m for m in mismatches)
|
||||
|
||||
|
||||
def test_compare_path_scope_contains() -> None:
|
||||
case = GoldenCase(
|
||||
id="test",
|
||||
query="Open app/main.py",
|
||||
expected_intent="CODE_QA",
|
||||
expected_sub_intent="OPEN_FILE",
|
||||
expected_path_scope_contains=["app/main.py"],
|
||||
)
|
||||
result = _make_pipeline_result(
|
||||
intent="CODE_QA",
|
||||
sub_intent="OPEN_FILE",
|
||||
path_scope=["app/main.py"],
|
||||
)
|
||||
passed, _ = _compare(case, result)
|
||||
assert passed
|
||||
case_bad = GoldenCase(
|
||||
id="test2",
|
||||
query="Open other",
|
||||
expected_intent="CODE_QA",
|
||||
expected_sub_intent="OPEN_FILE",
|
||||
expected_path_scope_contains=["app/main.py"],
|
||||
)
|
||||
result_bad = _make_pipeline_result(intent="CODE_QA", sub_intent="OPEN_FILE", path_scope=[])
|
||||
passed_bad, mismatches_bad = _compare(case_bad, result_bad)
|
||||
assert not passed_bad
|
||||
assert any("path_scope" in m for m in mismatches_bad)
|
||||
|
||||
|
||||
def test_eval_config_fixture_mode_by_default() -> None:
|
||||
config = EvalConfig.from_env(project_root=_TEST_ROOT)
|
||||
assert config.use_fixture is True
|
||||
assert "code_qa_repo" in str(config.repo_path)
|
||||
assert config.repo_path == _TEST_ROOT / "fixtures" / "code_qa_repo"
|
||||
assert config.golden_cases_path == _TEST_ROOT / "golden" / "code_qa" / "cases.yaml"
|
||||
assert config.test_results_dir == _TEST_ROOT / "test_results" / "code_qa_eval"
|
||||
|
||||
|
||||
def test_run_eval_fixture_mode_structure() -> None:
|
||||
"""Run full eval on fixture repo; validates harness path. Skips if DB/deps unavailable."""
|
||||
config = EvalConfig.from_env(project_root=_TEST_ROOT)
|
||||
if not config.repo_path.exists():
|
||||
pytest.skip("Fixture repo not found")
|
||||
if not config.golden_cases_path.exists():
|
||||
pytest.skip("Golden cases not found")
|
||||
try:
|
||||
results = run_eval(config)
|
||||
except Exception as e:
|
||||
msg = str(e).lower()
|
||||
if (
|
||||
"connect" in msg
|
||||
or "database" in msg
|
||||
or "engine" in msg
|
||||
or "modulenotfounderror" in msg
|
||||
or "sqlalchemy" in msg
|
||||
):
|
||||
pytest.skip(f"DB or dependencies not available: {e}")
|
||||
raise
|
||||
assert isinstance(results, list)
|
||||
assert len(results) >= 1
|
||||
for r in results:
|
||||
assert isinstance(r, EvalCaseResult)
|
||||
assert r.case is not None
|
||||
assert r.pipeline_result is not None
|
||||
assert isinstance(r.passed, bool)
|
||||
assert isinstance(r.mismatches, list)
|
||||
|
||||
|
||||
def _make_pipeline_result(
|
||||
*,
|
||||
intent: str = "CODE_QA",
|
||||
sub_intent: str = "EXPLAIN",
|
||||
answer_mode: str = "normal",
|
||||
path_scope: list[str] | None = None,
|
||||
) -> CodeQAPipelineResult:
|
||||
from app.modules.rag.code_qa_pipeline.contracts import (
|
||||
EvidenceBundle,
|
||||
RetrievalRequest,
|
||||
RetrievalResult,
|
||||
)
|
||||
|
||||
filters = CodeRetrievalFilters(path_scope=path_scope or [])
|
||||
router_result = IntentRouterResult(
|
||||
intent=intent,
|
||||
graph_id="CodeQAGraph",
|
||||
retrieval_profile="code",
|
||||
conversation_mode="START",
|
||||
query_plan=QueryPlan(raw="", normalized="", sub_intent=sub_intent),
|
||||
retrieval_spec=RetrievalSpec(filters=filters),
|
||||
symbol_resolution=SymbolResolution(),
|
||||
evidence_policy=EvidencePolicy(),
|
||||
)
|
||||
req = RetrievalRequest(rag_session_id="", query="", sub_intent=sub_intent, path_scope=path_scope or [])
|
||||
res = RetrievalResult()
|
||||
bundle = EvidenceBundle(resolved_sub_intent=sub_intent, evidence_count=1)
|
||||
return CodeQAPipelineResult(
|
||||
user_query="",
|
||||
rag_session_id="",
|
||||
router_result=router_result,
|
||||
retrieval_request=req,
|
||||
retrieval_result=res,
|
||||
evidence_bundle=bundle,
|
||||
evidence_gate_passed=(answer_mode == "normal"),
|
||||
answer_synthesis_input=None,
|
||||
diagnostics_report=None,
|
||||
answer_mode=answer_mode,
|
||||
timings_ms={},
|
||||
)
|
||||
@@ -0,0 +1,17 @@
|
||||
"""Entrypoint: runs the orders API. Uses src layout (package order_app under src/)."""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add src to path so "order_app" is importable from repo root
|
||||
_root = Path(__file__).resolve().parent
|
||||
_src = _root / "src"
|
||||
if str(_src) not in sys.path:
|
||||
sys.path.insert(0, str(_src))
|
||||
|
||||
from order_app.api.orders import create_app
|
||||
|
||||
app = create_app()
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run(host="0.0.0.0", port=8000)
|
||||
@@ -0,0 +1 @@
|
||||
"""Fixture package for CODE_QA pipeline: orders domain, API, services."""
|
||||
@@ -0,0 +1,3 @@
|
||||
from order_app.api.orders import create_app
|
||||
|
||||
__all__ = ["create_app"]
|
||||
@@ -0,0 +1,30 @@
|
||||
"""Orders API handlers."""
|
||||
|
||||
from flask import Flask, request, jsonify
|
||||
|
||||
from order_app.services.order_service import OrderService
|
||||
from order_app.repositories.order_repository import OrderRepository
|
||||
|
||||
|
||||
def create_app() -> Flask:
|
||||
app = Flask(__name__)
|
||||
repo = OrderRepository()
|
||||
service = OrderService(repo)
|
||||
|
||||
@app.route("/orders", methods=["POST"])
|
||||
def create_order():
|
||||
data = request.get_json() or {}
|
||||
order = service.create_order(
|
||||
product_id=data.get("product_id"),
|
||||
quantity=data.get("quantity", 1),
|
||||
)
|
||||
return jsonify({"order_id": order.id, "status": order.status}), 201
|
||||
|
||||
@app.route("/orders/<order_id>", methods=["GET"])
|
||||
def get_order(order_id: str):
|
||||
order = service.get_order(order_id)
|
||||
if order is None:
|
||||
return jsonify({"error": "not found"}), 404
|
||||
return jsonify({"id": order.id, "status": order.status})
|
||||
|
||||
return app
|
||||
@@ -0,0 +1,3 @@
|
||||
from order_app.domain.order import Order
|
||||
|
||||
__all__ = ["Order"]
|
||||
@@ -0,0 +1,11 @@
|
||||
"""Domain model for an order."""
|
||||
|
||||
import uuid
|
||||
|
||||
|
||||
class Order:
|
||||
def __init__(self, product_id: str = "", quantity: int = 1) -> None:
|
||||
self.id = str(uuid.uuid4())
|
||||
self.product_id = product_id
|
||||
self.quantity = quantity
|
||||
self.status = "pending"
|
||||
@@ -0,0 +1,3 @@
|
||||
from order_app.repositories.order_repository import OrderRepository
|
||||
|
||||
__all__ = ["OrderRepository"]
|
||||
@@ -0,0 +1,14 @@
|
||||
"""Persistence for Order entities."""
|
||||
|
||||
from order_app.domain.order import Order
|
||||
|
||||
|
||||
class OrderRepository:
|
||||
_store: dict[str, Order] = {}
|
||||
|
||||
def save(self, order: Order) -> Order:
|
||||
self._store[order.id] = order
|
||||
return order
|
||||
|
||||
def find_by_id(self, order_id: str) -> Order | None:
|
||||
return self._store.get(order_id)
|
||||
@@ -0,0 +1,3 @@
|
||||
from order_app.services.order_service import OrderService
|
||||
|
||||
__all__ = ["OrderService"]
|
||||
@@ -0,0 +1,16 @@
|
||||
"""Order business logic: delegates to repository."""
|
||||
|
||||
from order_app.domain.order import Order
|
||||
from order_app.repositories.order_repository import OrderRepository
|
||||
|
||||
|
||||
class OrderService:
|
||||
def __init__(self, repository: OrderRepository) -> None:
|
||||
self._repo = repository
|
||||
|
||||
def create_order(self, product_id: str | None = None, quantity: int = 1) -> Order:
|
||||
order = Order(product_id=product_id or "", quantity=quantity)
|
||||
return self._repo.save(order)
|
||||
|
||||
def get_order(self, order_id: str) -> Order | None:
|
||||
return self._repo.find_by_id(order_id)
|
||||
@@ -0,0 +1,3 @@
|
||||
from order_app.utils.helpers import format_order_id
|
||||
|
||||
__all__ = ["format_order_id"]
|
||||
@@ -0,0 +1,6 @@
|
||||
"""Shared utilities for the code_qa fixture repo."""
|
||||
|
||||
|
||||
def format_order_id(raw: str) -> str:
|
||||
"""Normalize order id for display."""
|
||||
return raw.strip().lower() or "unknown"
|
||||
@@ -0,0 +1,35 @@
|
||||
"""Tests for OrderService. Repo uses src layout: add src to path for order_app."""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
# Fixture repo root; parent of tests/
|
||||
_repo_root = Path(__file__).resolve().parent.parent
|
||||
_src = _repo_root / "src"
|
||||
if str(_src) not in sys.path:
|
||||
sys.path.insert(0, str(_src))
|
||||
|
||||
from order_app.domain.order import Order
|
||||
from order_app.services.order_service import OrderService
|
||||
from order_app.repositories.order_repository import OrderRepository
|
||||
|
||||
|
||||
def test_create_order() -> None:
|
||||
repo = OrderRepository()
|
||||
service = OrderService(repo)
|
||||
order = service.create_order(product_id="prod-1", quantity=2)
|
||||
assert isinstance(order, Order)
|
||||
assert order.product_id == "prod-1"
|
||||
assert order.quantity == 2
|
||||
assert order.status == "pending"
|
||||
|
||||
|
||||
def test_get_order_returns_saved_order() -> None:
|
||||
repo = OrderRepository()
|
||||
service = OrderService(repo)
|
||||
created = service.create_order(product_id="p1")
|
||||
found = service.get_order(created.id)
|
||||
assert found is not None
|
||||
assert found.id == created.id
|
||||
@@ -0,0 +1,13 @@
|
||||
# Golden cases for CODE_QA pipeline calibration
|
||||
|
||||
Each case defines:
|
||||
- `id`: unique case id
|
||||
- `query`: user query text
|
||||
- `expected_intent`: CODE_QA (or DOCS_QA for docs-only; this set is code-only)
|
||||
- `expected_sub_intent`: OPEN_FILE | EXPLAIN | FIND_TESTS | FIND_ENTRYPOINTS | GENERAL_QA
|
||||
- `expected_answer_mode`: normal | degraded | insufficient
|
||||
- `expected_target_hint`: optional — path (for OPEN_FILE), symbol (for EXPLAIN), or test-like
|
||||
- `expected_layers`: optional — list of layer ids we expect in the retrieval plan
|
||||
- `notes`: optional — borderline, negative, or calibration hint
|
||||
|
||||
We assert routing, retrieval alignment, evidence sufficiency, and answer mode — not exact LLM wording.
|
||||
@@ -0,0 +1,142 @@
|
||||
# Golden cases for CODE_QA pipeline (fixture repo: code_qa_repo)
|
||||
# Scenarios: OPEN_FILE, EXPLAIN, FIND_TESTS, FIND_ENTRYPOINTS, GENERAL_QA
|
||||
|
||||
cases:
|
||||
# --- OPEN_FILE ---
|
||||
- id: open_file_main_positive
|
||||
query: "Открой файл main.py"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: OPEN_FILE
|
||||
expected_answer_mode: normal
|
||||
expected_target_hint: path
|
||||
expected_path_scope_contains: ["main.py"]
|
||||
notes: "Clear path; fixture has main.py (src layout)"
|
||||
|
||||
- id: open_file_api_positive
|
||||
query: "Покажи src/order_app/api/orders.py"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: OPEN_FILE
|
||||
expected_answer_mode: normal
|
||||
expected_path_scope_contains: ["src/order_app/api/orders.py"]
|
||||
notes: "Handler module (src layout)"
|
||||
|
||||
- id: open_file_borderline
|
||||
query: "Open main"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: OPEN_FILE
|
||||
expected_answer_mode: normal
|
||||
notes: "Short path hint; may resolve to main.py"
|
||||
|
||||
- id: open_file_negative
|
||||
query: "Открой файл nonexistent/foo.py"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: OPEN_FILE
|
||||
expected_answer_mode: degraded
|
||||
notes: "Path not in repo; evidence gate should fail or degrade"
|
||||
|
||||
# --- EXPLAIN ---
|
||||
- id: explain_order_positive
|
||||
query: "Объясни класс Order"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: EXPLAIN
|
||||
expected_answer_mode: normal
|
||||
expected_target_hint: symbol
|
||||
expected_symbol_candidates_contain: ["Order"]
|
||||
notes: "Domain class in fixture"
|
||||
|
||||
- id: explain_order_service_positive
|
||||
query: "Как работает OrderService?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: EXPLAIN
|
||||
expected_answer_mode: normal
|
||||
expected_symbol_candidates_contain: ["OrderService"]
|
||||
notes: "Service layer"
|
||||
|
||||
- id: explain_borderline
|
||||
query: "Что делает create_order?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: EXPLAIN
|
||||
expected_answer_mode: normal
|
||||
notes: "Function name; may resolve"
|
||||
|
||||
- id: explain_negative
|
||||
query: "Объясни класс NonExistentClass"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: EXPLAIN
|
||||
expected_answer_mode: degraded
|
||||
notes: "Symbol not in repo"
|
||||
|
||||
# --- FIND_TESTS ---
|
||||
- id: find_tests_positive
|
||||
query: "Где тесты для OrderService?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_TESTS
|
||||
expected_answer_mode: normal
|
||||
expected_target_hint: test-like
|
||||
expected_symbol_candidates_contain: ["OrderService"]
|
||||
notes: "Fixture has tests/test_order_service.py"
|
||||
|
||||
- id: find_tests_order_positive
|
||||
query: "Найди тесты для Order"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_TESTS
|
||||
expected_answer_mode: normal
|
||||
notes: "Tests reference Order"
|
||||
|
||||
- id: find_tests_borderline
|
||||
query: "Есть ли тесты на репозиторий?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_TESTS
|
||||
expected_answer_mode: normal
|
||||
notes: "Vague target"
|
||||
|
||||
- id: find_tests_negative
|
||||
query: "Где тесты для NonExistent?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_TESTS
|
||||
expected_answer_mode: degraded
|
||||
notes: "Target not in repo"
|
||||
|
||||
# --- FIND_ENTRYPOINTS ---
|
||||
- id: find_entrypoints_positive
|
||||
query: "Какие точки входа в приложение?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_ENTRYPOINTS
|
||||
expected_answer_mode: normal
|
||||
notes: "Fixture has main.py entrypoint (src layout)"
|
||||
|
||||
- id: find_entrypoints_english
|
||||
query: "Find application entrypoints"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_ENTRYPOINTS
|
||||
expected_answer_mode: normal
|
||||
notes: "English variant"
|
||||
|
||||
- id: find_entrypoints_borderline
|
||||
query: "Где main?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: FIND_ENTRYPOINTS
|
||||
expected_answer_mode: normal
|
||||
notes: "Short; may route to entrypoints or OPEN_FILE"
|
||||
|
||||
# --- GENERAL_QA ---
|
||||
- id: general_qa_positive
|
||||
query: "Что делает этот проект?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: GENERAL_QA
|
||||
expected_answer_mode: normal
|
||||
notes: "Broad question; bounded context"
|
||||
|
||||
- id: general_qa_how
|
||||
query: "How does order creation work?"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: GENERAL_QA
|
||||
expected_answer_mode: normal
|
||||
notes: "General flow question"
|
||||
|
||||
- id: general_qa_borderline
|
||||
query: "Расскажи про код"
|
||||
expected_intent: CODE_QA
|
||||
expected_sub_intent: GENERAL_QA
|
||||
expected_answer_mode: normal
|
||||
notes: "Very vague; fallback to GENERAL_QA"
|
||||
@@ -0,0 +1,79 @@
|
||||
{
|
||||
"case_id": "explain_borderline",
|
||||
"query": "Что делает create_order?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 5
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"intent: expected CODE_QA, got PROJECT_MISC"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"graph_id": "ProjectMiscGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Что делает create_order?",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "D1_MODULE_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "D3_SECTION_INDEX",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 31,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# explain_borderline
|
||||
|
||||
## Query
|
||||
Что делает create_order?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: PROJECT_MISC, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 5
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- intent: expected CODE_QA, got PROJECT_MISC
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 5
|
||||
- layer_outcomes: [('D1_MODULE_CATALOG', 0), ('D3_SECTION_INDEX', 0), ('C1_SYMBOL_CATALOG', 1), ('C0_SOURCE_CHUNKS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 31, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,87 @@
|
||||
{
|
||||
"case_id": "explain_negative",
|
||||
"query": "Объясни класс NonExistentClass",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "degraded"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 21
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"answer_mode: expected degraded, got normal"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Объясни класс NonExistentClass",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 8,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C4_SEMANTIC_ROLES",
|
||||
"hit_count": 3,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C3_ENTRYPOINTS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 46,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# explain_negative
|
||||
|
||||
## Query
|
||||
Объясни класс NonExistentClass
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: degraded
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 21
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- answer_mode: expected degraded, got normal
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
- chunk_count: 21
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 0), ('C0_SOURCE_CHUNKS', 8), ('C4_SEMANTIC_ROLES', 3), ('C2_DEPENDENCY_GRAPH', 6), ('C3_ENTRYPOINTS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 46, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,61 @@
|
||||
{
|
||||
"case_id": "explain_order_positive",
|
||||
"query": "Объясни класс Order",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 15
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Объясни класс Order",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 43,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# explain_order_positive
|
||||
|
||||
## Query
|
||||
Объясни класс Order
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 15
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
- chunk_count: 15
|
||||
- layer_outcomes: [('C0_SOURCE_CHUNKS', 1)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 43, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,61 @@
|
||||
{
|
||||
"case_id": "explain_order_service_positive",
|
||||
"query": "Как работает OrderService?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 15
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Как работает OrderService?",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 44,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# explain_order_service_positive
|
||||
|
||||
## Query
|
||||
Как работает OrderService?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 15
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
- chunk_count: 15
|
||||
- layer_outcomes: [('C0_SOURCE_CHUNKS', 1)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 44, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,82 @@
|
||||
{
|
||||
"case_id": "find_entrypoints_borderline",
|
||||
"query": "Где main?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "degraded",
|
||||
"evidence_gate_passed": false,
|
||||
"evidence_count": 6
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"intent: expected CODE_QA, got PROJECT_MISC",
|
||||
"answer_mode: expected normal, got degraded"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"graph_id": "ProjectMiscGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Где main?",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "D1_MODULE_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "D3_SECTION_INDEX",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 2,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
}
|
||||
],
|
||||
"failure_reasons": [
|
||||
"entrypoints_not_found"
|
||||
],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 51,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,36 @@
|
||||
# find_entrypoints_borderline
|
||||
|
||||
## Query
|
||||
Где main?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: PROJECT_MISC, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: degraded
|
||||
- evidence_gate_passed: False
|
||||
- evidence_count: 6
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- intent: expected CODE_QA, got PROJECT_MISC
|
||||
- answer_mode: expected normal, got degraded
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 6
|
||||
- layer_outcomes: [('D1_MODULE_CATALOG', 0), ('D3_SECTION_INDEX', 0), ('C1_SYMBOL_CATALOG', 2), ('C0_SOURCE_CHUNKS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: ['entrypoints_not_found']
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 51, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,61 @@
|
||||
{
|
||||
"case_id": "find_entrypoints_english",
|
||||
"query": "Find application entrypoints",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 10
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C3_ENTRYPOINTS",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Find application entrypoints",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C3_ENTRYPOINTS",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C3_ENTRYPOINTS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 19,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# find_entrypoints_english
|
||||
|
||||
## Query
|
||||
Find application entrypoints
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 10
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C3_ENTRYPOINTS', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C3_ENTRYPOINTS', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 10
|
||||
- layer_outcomes: [('C3_ENTRYPOINTS', 4), ('C0_SOURCE_CHUNKS', 6)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 19, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,82 @@
|
||||
{
|
||||
"case_id": "find_entrypoints_positive",
|
||||
"query": "Какие точки входа в приложение?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "degraded",
|
||||
"evidence_gate_passed": false,
|
||||
"evidence_count": 8
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"intent: expected CODE_QA, got PROJECT_MISC",
|
||||
"answer_mode: expected normal, got degraded"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"graph_id": "ProjectMiscGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Какие точки входа в приложение?",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "D1_MODULE_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "D3_SECTION_INDEX",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
}
|
||||
],
|
||||
"failure_reasons": [
|
||||
"entrypoints_not_found"
|
||||
],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 52,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,36 @@
|
||||
# find_entrypoints_positive
|
||||
|
||||
## Query
|
||||
Какие точки входа в приложение?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: PROJECT_MISC, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: degraded
|
||||
- evidence_gate_passed: False
|
||||
- evidence_count: 8
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- intent: expected CODE_QA, got PROJECT_MISC
|
||||
- answer_mode: expected normal, got degraded
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 8
|
||||
- layer_outcomes: [('D1_MODULE_CATALOG', 0), ('D3_SECTION_INDEX', 0), ('C1_SYMBOL_CATALOG', 4), ('C0_SOURCE_CHUNKS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: ['entrypoints_not_found']
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 52, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,69 @@
|
||||
{
|
||||
"case_id": "find_tests_borderline",
|
||||
"query": "Есть ли тесты на репозиторий?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 16
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Есть ли тесты на репозиторий?",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 8,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 2,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 24,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# find_tests_borderline
|
||||
|
||||
## Query
|
||||
Есть ли тесты на репозиторий?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 16
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 16
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 8), ('C2_DEPENDENCY_GRAPH', 6), ('C0_SOURCE_CHUNKS', 2)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 24, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,71 @@
|
||||
{
|
||||
"case_id": "find_tests_negative",
|
||||
"query": "Где тесты для NonExistent?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "degraded"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 8
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"answer_mode: expected degraded, got normal"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Где тесты для NonExistent?",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 2,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 32,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# find_tests_negative
|
||||
|
||||
## Query
|
||||
Где тесты для NonExistent?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: degraded
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 8
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- answer_mode: expected degraded, got normal
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 8
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 0), ('C2_DEPENDENCY_GRAPH', 6), ('C0_SOURCE_CHUNKS', 2)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 32, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,69 @@
|
||||
{
|
||||
"case_id": "find_tests_order_positive",
|
||||
"query": "Найди тесты для Order",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 8
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Найди тесты для Order",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 2,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 32,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# find_tests_order_positive
|
||||
|
||||
## Query
|
||||
Найди тесты для Order
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 8
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 8
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 0), ('C2_DEPENDENCY_GRAPH', 6), ('C0_SOURCE_CHUNKS', 2)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 32, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,69 @@
|
||||
{
|
||||
"case_id": "find_tests_positive",
|
||||
"query": "Где тесты для OrderService?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 8
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Где тесты для OrderService?",
|
||||
"sub_intent": "FIND_TESTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 2,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 34,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# find_tests_positive
|
||||
|
||||
## Query
|
||||
Где тесты для OrderService?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: FIND_TESTS
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 8
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C2_DEPENDENCY_GRAPH', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 8
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 0), ('C2_DEPENDENCY_GRAPH', 6), ('C0_SOURCE_CHUNKS', 2)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 34, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,87 @@
|
||||
{
|
||||
"case_id": "general_qa_borderline",
|
||||
"query": "Расскажи про код",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "GENERAL_QA",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 29
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"sub_intent: expected GENERAL_QA, got EXPLAIN"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Расскажи про код",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 8,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 8,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C4_SEMANTIC_ROLES",
|
||||
"hit_count": 3,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C3_ENTRYPOINTS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 39,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# general_qa_borderline
|
||||
|
||||
## Query
|
||||
Расскажи про код
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: GENERAL_QA
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 29
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- sub_intent: expected GENERAL_QA, got EXPLAIN
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
- chunk_count: 29
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 8), ('C0_SOURCE_CHUNKS', 8), ('C4_SEMANTIC_ROLES', 3), ('C2_DEPENDENCY_GRAPH', 6), ('C3_ENTRYPOINTS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 39, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,87 @@
|
||||
{
|
||||
"case_id": "general_qa_how",
|
||||
"query": "How does order creation work?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "GENERAL_QA",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "ARCHITECTURE",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 20
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"sub_intent: expected GENERAL_QA, got ARCHITECTURE"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "ARCHITECTURE",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C3_ENTRYPOINTS",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "How does order creation work?",
|
||||
"sub_intent": "ARCHITECTURE",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C3_ENTRYPOINTS",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C4_SEMANTIC_ROLES",
|
||||
"hit_count": 3,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C3_ENTRYPOINTS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 8,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 42,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# general_qa_how
|
||||
|
||||
## Query
|
||||
How does order creation work?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: GENERAL_QA
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: ARCHITECTURE
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 20
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- sub_intent: expected GENERAL_QA, got ARCHITECTURE
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C4_SEMANTIC_ROLES', 'C3_ENTRYPOINTS', 'C2_DEPENDENCY_GRAPH', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C4_SEMANTIC_ROLES', 'C3_ENTRYPOINTS', 'C2_DEPENDENCY_GRAPH', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 20
|
||||
- layer_outcomes: [('C4_SEMANTIC_ROLES', 3), ('C3_ENTRYPOINTS', 4), ('C2_DEPENDENCY_GRAPH', 8), ('C1_SYMBOL_CATALOG', 1), ('C0_SOURCE_CHUNKS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 42, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,80 @@
|
||||
{
|
||||
"case_id": "general_qa_positive",
|
||||
"query": "Что делает этот проект?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "GENERAL_QA",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 8
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"intent: expected CODE_QA, got PROJECT_MISC",
|
||||
"sub_intent: expected GENERAL_QA, got EXPLAIN"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"graph_id": "ProjectMiscGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Что делает этот проект?",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "D1_MODULE_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "D3_SECTION_INDEX",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": true
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 56,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,36 @@
|
||||
# general_qa_positive
|
||||
|
||||
## Query
|
||||
Что делает этот проект?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: GENERAL_QA
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: PROJECT_MISC, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 8
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- intent: expected CODE_QA, got PROJECT_MISC
|
||||
- sub_intent: expected GENERAL_QA, got EXPLAIN
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 8
|
||||
- layer_outcomes: [('D1_MODULE_CATALOG', 0), ('D3_SECTION_INDEX', 0), ('C1_SYMBOL_CATALOG', 4), ('C0_SOURCE_CHUNKS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 56, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,57 @@
|
||||
{
|
||||
"case_id": "open_file_api_positive",
|
||||
"query": "Покажи src/order_app/api/orders.py",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 1
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"path_scope": [
|
||||
"src/order_app/api/orders.py"
|
||||
],
|
||||
"layers": [
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Покажи src/order_app/api/orders.py",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"path_scope": [
|
||||
"src/order_app/api/orders.py"
|
||||
],
|
||||
"requested_layers": [
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 6,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# open_file_api_positive
|
||||
|
||||
## Query
|
||||
Покажи src/order_app/api/orders.py
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 1
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: ['src/order_app/api/orders.py']
|
||||
- layers: ['C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 1
|
||||
- layer_outcomes: [('C0_SOURCE_CHUNKS', 1)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 6, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,63 @@
|
||||
{
|
||||
"case_id": "open_file_borderline",
|
||||
"query": "Open main",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 10
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"sub_intent: expected OPEN_FILE, got FIND_ENTRYPOINTS"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C3_ENTRYPOINTS",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Open main",
|
||||
"sub_intent": "FIND_ENTRYPOINTS",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C3_ENTRYPOINTS",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C3_ENTRYPOINTS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 24,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# open_file_borderline
|
||||
|
||||
## Query
|
||||
Open main
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: FIND_ENTRYPOINTS
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 10
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- sub_intent: expected OPEN_FILE, got FIND_ENTRYPOINTS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C3_ENTRYPOINTS', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C3_ENTRYPOINTS', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 10
|
||||
- layer_outcomes: [('C3_ENTRYPOINTS', 4), ('C0_SOURCE_CHUNKS', 6)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 24, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,57 @@
|
||||
{
|
||||
"case_id": "open_file_main_positive",
|
||||
"query": "Открой файл main.py",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 1
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"path_scope": [
|
||||
"main.py"
|
||||
],
|
||||
"layers": [
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Открой файл main.py",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"path_scope": [
|
||||
"main.py"
|
||||
],
|
||||
"requested_layers": [
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 6,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# open_file_main_positive
|
||||
|
||||
## Query
|
||||
Открой файл main.py
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 1
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: ['main.py']
|
||||
- layers: ['C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 1
|
||||
- layer_outcomes: [('C0_SOURCE_CHUNKS', 1)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 6, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,62 @@
|
||||
{
|
||||
"case_id": "open_file_negative",
|
||||
"query": "Открой файл nonexistent/foo.py",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "degraded"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"answer_mode": "insufficient",
|
||||
"evidence_gate_passed": false,
|
||||
"evidence_count": 0
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"answer_mode: expected degraded, got insufficient"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"path_scope": [
|
||||
"nonexistent/foo.py"
|
||||
],
|
||||
"layers": [
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "not_requested"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "b3c7ec39-40a7-40e5-8ba5-fc0e2e3cc63c",
|
||||
"query": "Открой файл nonexistent/foo.py",
|
||||
"sub_intent": "OPEN_FILE",
|
||||
"path_scope": [
|
||||
"nonexistent/foo.py"
|
||||
],
|
||||
"requested_layers": [
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [
|
||||
"path_scope_empty",
|
||||
"layer_c0_empty"
|
||||
],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 6,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# open_file_negative
|
||||
|
||||
## Query
|
||||
Открой файл nonexistent/foo.py
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: degraded
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: OPEN_FILE
|
||||
- answer_mode: insufficient
|
||||
- evidence_gate_passed: False
|
||||
- evidence_count: 0
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- answer_mode: expected degraded, got insufficient
|
||||
|
||||
## Router
|
||||
- path_scope: ['nonexistent/foo.py']
|
||||
- layers: ['C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 0
|
||||
- layer_outcomes: [('C0_SOURCE_CHUNKS', 0)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: ['path_scope_empty', 'layer_c0_empty']
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 6, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,79 @@
|
||||
{
|
||||
"case_id": "explain_borderline",
|
||||
"query": "Что делает create_order?",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 5
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"intent: expected CODE_QA, got PROJECT_MISC"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "PROJECT_MISC",
|
||||
"graph_id": "ProjectMiscGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "564f37c4-259f-4d21-be24-04b3a51e3c64",
|
||||
"query": "Что делает create_order?",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"D1_MODULE_CATALOG",
|
||||
"D3_SECTION_INDEX",
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "D1_MODULE_CATALOG",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "D3_SECTION_INDEX",
|
||||
"hit_count": 0,
|
||||
"empty": true,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 31,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# explain_borderline
|
||||
|
||||
## Query
|
||||
Что делает create_order?
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: PROJECT_MISC, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 5
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- intent: expected CODE_QA, got PROJECT_MISC
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['D1_MODULE_CATALOG', 'D3_SECTION_INDEX', 'C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS']
|
||||
- chunk_count: 5
|
||||
- layer_outcomes: [('D1_MODULE_CATALOG', 0), ('D3_SECTION_INDEX', 0), ('C1_SYMBOL_CATALOG', 1), ('C0_SOURCE_CHUNKS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 31, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,87 @@
|
||||
{
|
||||
"case_id": "explain_negative",
|
||||
"query": "Объясни класс NonExistentClass",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "degraded"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 22
|
||||
},
|
||||
"passed": false,
|
||||
"mismatches": [
|
||||
"answer_mode: expected degraded, got normal"
|
||||
],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "564f37c4-259f-4d21-be24-04b3a51e3c64",
|
||||
"query": "Объясни класс NonExistentClass",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C1_SYMBOL_CATALOG",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 8,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C4_SEMANTIC_ROLES",
|
||||
"hit_count": 3,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C2_DEPENDENCY_GRAPH",
|
||||
"hit_count": 6,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
},
|
||||
{
|
||||
"layer_id": "C3_ENTRYPOINTS",
|
||||
"hit_count": 4,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 41,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
# explain_negative
|
||||
|
||||
## Query
|
||||
Объясни класс NonExistentClass
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: degraded
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 22
|
||||
|
||||
## Result
|
||||
FAIL
|
||||
|
||||
## Mismatches
|
||||
- answer_mode: expected degraded, got normal
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
- chunk_count: 22
|
||||
- layer_outcomes: [('C1_SYMBOL_CATALOG', 1), ('C0_SOURCE_CHUNKS', 8), ('C4_SEMANTIC_ROLES', 3), ('C2_DEPENDENCY_GRAPH', 6), ('C3_ENTRYPOINTS', 4)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 41, 'symbol_resolution': 0}
|
||||
@@ -0,0 +1,61 @@
|
||||
{
|
||||
"case_id": "explain_order_positive",
|
||||
"query": "Объясни класс Order",
|
||||
"expected": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal"
|
||||
},
|
||||
"actual": {
|
||||
"intent": "CODE_QA",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"answer_mode": "normal",
|
||||
"evidence_gate_passed": true,
|
||||
"evidence_count": 15
|
||||
},
|
||||
"passed": true,
|
||||
"mismatches": [],
|
||||
"router_result": {
|
||||
"intent": "CODE_QA",
|
||||
"graph_id": "CodeQAGraph",
|
||||
"conversation_mode": "START",
|
||||
"retrieval_profile": "code",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
],
|
||||
"symbol_resolution_status": "pending"
|
||||
},
|
||||
"retrieval_request": {
|
||||
"rag_session_id": "564f37c4-259f-4d21-be24-04b3a51e3c64",
|
||||
"query": "Объясни класс Order",
|
||||
"sub_intent": "EXPLAIN",
|
||||
"path_scope": [],
|
||||
"requested_layers": [
|
||||
"C1_SYMBOL_CATALOG",
|
||||
"C0_SOURCE_CHUNKS",
|
||||
"C4_SEMANTIC_ROLES",
|
||||
"C2_DEPENDENCY_GRAPH",
|
||||
"C3_ENTRYPOINTS"
|
||||
]
|
||||
},
|
||||
"per_layer_outcome": [
|
||||
{
|
||||
"layer_id": "C0_SOURCE_CHUNKS",
|
||||
"hit_count": 1,
|
||||
"empty": false,
|
||||
"fallback_used": false
|
||||
}
|
||||
],
|
||||
"failure_reasons": [],
|
||||
"timings_ms": {
|
||||
"router": 0,
|
||||
"retrieval_total": 45,
|
||||
"symbol_resolution": 0
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
# explain_order_positive
|
||||
|
||||
## Query
|
||||
Объясни класс Order
|
||||
|
||||
## Expected
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
|
||||
## Actual
|
||||
- intent: CODE_QA, sub_intent: EXPLAIN
|
||||
- answer_mode: normal
|
||||
- evidence_gate_passed: True
|
||||
- evidence_count: 15
|
||||
|
||||
## Result
|
||||
PASS
|
||||
|
||||
## Router
|
||||
- path_scope: []
|
||||
- layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
|
||||
## Retrieval
|
||||
- requested_layers: ['C1_SYMBOL_CATALOG', 'C0_SOURCE_CHUNKS', 'C4_SEMANTIC_ROLES', 'C2_DEPENDENCY_GRAPH', 'C3_ENTRYPOINTS']
|
||||
- chunk_count: 15
|
||||
- layer_outcomes: [('C0_SOURCE_CHUNKS', 1)]
|
||||
|
||||
## Evidence gate
|
||||
- failure_reasons: []
|
||||
|
||||
## Timings (ms)
|
||||
{'router': 0, 'retrieval_total': 45, 'symbol_resolution': 0}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user