фиксирую состояние
This commit is contained in:
+7
-1
@@ -1,3 +1,9 @@
|
||||
.env
|
||||
.venv
|
||||
__pycache__
|
||||
__pycache__
|
||||
|
||||
# Pipeline harness: per-run artifacts (md/json from tests.pipeline_setup_v3/v4)
|
||||
tests/**/test_runs/**/*.md
|
||||
tests/**/test_runs/**/*.json
|
||||
tests/**/test_results/**/*.md
|
||||
tests/**/test_results/**/*.json
|
||||
@@ -0,0 +1,4 @@
|
||||
# Запросы
|
||||
1. Какие методы апи есть в проекте
|
||||
2. Какие методы апи есть для healthcheck
|
||||
3. Где документация на healthcheck
|
||||
@@ -2,31 +2,44 @@
|
||||
|
||||
## 1. Архитектура
|
||||
|
||||
Текущий `V2IntentRouter` состоит из следующих компонентов:
|
||||
Текущий `V2IntentRouter` реализован как **LLM-first router**.
|
||||
Deterministic-слой не выбирает маршрут по умолчанию и используется только для:
|
||||
|
||||
- preprocessing
|
||||
- validation ответа LLM
|
||||
- fallback, если LLM не ответил или вернул невалидный маршрут
|
||||
|
||||
Актуальные компоненты:
|
||||
|
||||
- `router.py`
|
||||
Главная точка входа и оркестратор.
|
||||
Главная точка входа и оркестратор пайплайна.
|
||||
|
||||
- `modules/normalizer.py`
|
||||
Нормализация текста запроса в `normalized_query`.
|
||||
|
||||
- `modules/target_terms.py`
|
||||
Извлечение `target_terms`, `endpoint_paths`, `matched_aliases`, `alias_docs`.
|
||||
Извлечение retrieval-oriented `target_terms`, `endpoint_paths`, `matched_aliases`, `alias_docs`.
|
||||
|
||||
- `modules/anchors.py`
|
||||
Извлечение `anchors` и вспомогательных marker-сигналов.
|
||||
Извлечение `anchors` и marker-сигналов для fallback и downstream retrieval.
|
||||
|
||||
- `routers/docs_subintent_resolver.py`
|
||||
Определение `subintent`.
|
||||
|
||||
- `routers/deterministic.py`
|
||||
Детерминированное определение `routing_domain`, `intent`, `subintent`, `confidence`, `routing_mode`, `llm_router_used`, `reason_short`.
|
||||
- `routers/route_catalog.py`
|
||||
Каталог допустимых маршрутов (`allowed_routes`).
|
||||
|
||||
- `routers/llm.py`
|
||||
LLM-based определение `routing_domain`, `intent`, `subintent`, `confidence`, `reason_short`.
|
||||
Основной LLM-router. Получает нормализованный запрос, `target_terms`, `anchors` и список допустимых маршрутов.
|
||||
|
||||
- `routers/validator.py`
|
||||
Deterministic validator для enum-значений, комбинации маршрута и базовой нормализации `confidence`.
|
||||
|
||||
- `routers/confidence.py`
|
||||
Пост-обработка confidence после ответа LLM.
|
||||
|
||||
- `routers/fallback.py`
|
||||
Fallback-маршрутизация, если LLM не ответил или ответ не прошёл validator.
|
||||
|
||||
- `routers/prompts.yml`
|
||||
Prompt для LLM-router.
|
||||
Prompt-контракт для LLM-router.
|
||||
|
||||
## 2. Контракт
|
||||
|
||||
@@ -53,7 +66,6 @@
|
||||
`V2RouteAnchors`:
|
||||
|
||||
- `entity_names: list[str]`
|
||||
- `terms: list[str]`
|
||||
- `file_names: list[str]`
|
||||
- `endpoint_paths: list[str]`
|
||||
- `target_doc_hints: list[str]`
|
||||
@@ -78,35 +90,61 @@
|
||||
- `SUMMARY`
|
||||
- `FIND_FILES`
|
||||
|
||||
### Поддерживаемые маршруты
|
||||
### Допустимые маршруты
|
||||
|
||||
- `GENERAL / GENERAL_QA / SUMMARY`
|
||||
- `DOCS / DOC_EXPLAIN / SUMMARY`
|
||||
- `DOCS / DOC_EXPLAIN / FIND_FILES`
|
||||
|
||||
## 4. Флоу обработки запроса
|
||||
Эти маршруты централизованно заданы в `routers/route_catalog.py`.
|
||||
|
||||
## 4. Актуальный флоу
|
||||
|
||||
Пайплайн обработки запроса:
|
||||
|
||||
1. `router.py` принимает `user_query`.
|
||||
2. `modules/normalizer.py` строит `normalized_query`.
|
||||
3. `modules/target_terms.py` извлекает ключевые термы и alias-based сигналы.
|
||||
4. `modules/anchors.py` строит `anchors` и marker-сигналы.
|
||||
3. `modules/target_terms.py` извлекает:
|
||||
- `target_terms`
|
||||
- `endpoint_paths`
|
||||
- `matched_aliases`
|
||||
- `alias_docs`
|
||||
4. `modules/anchors.py` строит:
|
||||
- `anchors`
|
||||
- `file_markers`
|
||||
- `architecture_markers`
|
||||
- `logic_markers`
|
||||
- `domain_markers`
|
||||
- `endpoint_markers`
|
||||
5. `router.py` собирает `QueryFeatures`.
|
||||
6. `routers/deterministic.py` пытается определить маршрут детерминированно.
|
||||
7. Если deterministic route найден, он сразу возвращается.
|
||||
8. Если deterministic route не найден, `router.py` вызывает `routers/llm.py`.
|
||||
9. Если LLM вернул валидный маршрут, собирается `V2RouteResult` с `routing_mode="llm_assisted"`.
|
||||
10. Если LLM недоступен или не вернул валидный маршрут, используется fallback:
|
||||
`GENERAL / GENERAL_QA / SUMMARY` с `routing_mode="llm_fallback"`.
|
||||
6. `routers/llm.py` вызывается как **основной селектор маршрута**.
|
||||
7. `routers/validator.py` проверяет:
|
||||
- что значения входят в допустимые enum
|
||||
- что комбинация маршрута разрешена
|
||||
- что `confidence` можно привести к `float`
|
||||
8. `routers/confidence.py` корректирует confidence на основе силы сигналов.
|
||||
9. Если ответ LLM валиден, возвращается `V2RouteResult` с `routing_mode="llm_default"`.
|
||||
10. Если LLM не ответил, вернул сломанный JSON или невалидный маршрут, `routers/fallback.py` строит fallback route:
|
||||
- `FIND_FILES`, если есть `file_markers`
|
||||
- `DOCS / DOC_EXPLAIN / SUMMARY`, если есть docs-oriented anchors
|
||||
- иначе `GENERAL / GENERAL_QA / SUMMARY`
|
||||
|
||||
## 5. Компоненты по флоу
|
||||
|
||||
### `router.py`
|
||||
|
||||
- Задача
|
||||
Собрать весь процесс роутинга в одной входной точке.
|
||||
Оркестрировать полный routing pipeline.
|
||||
|
||||
- Как решает
|
||||
Последовательно вызывает normalizer, target terms extractor, anchors extractor, deterministic router и при необходимости LLM router.
|
||||
Последовательно вызывает:
|
||||
- normalizer
|
||||
- target terms extractor
|
||||
- anchor extractor
|
||||
- LLM router
|
||||
- validator
|
||||
- confidence adjuster
|
||||
- fallback router
|
||||
|
||||
- Вход
|
||||
`user_query: str`
|
||||
@@ -117,7 +155,7 @@
|
||||
### `modules/normalizer.py`
|
||||
|
||||
- Задача
|
||||
Привести запрос к стабильной форме для дальнейшего анализа.
|
||||
Привести запрос к стабильной форме для анализа.
|
||||
|
||||
- Как решает
|
||||
Схлопывает лишние пробелы через `" ".join(...split())`.
|
||||
@@ -131,14 +169,29 @@
|
||||
### `modules/target_terms.py`
|
||||
|
||||
- Задача
|
||||
Выделить ключевые термы и retrieval-сигналы из запроса.
|
||||
Построить **чистое retrieval-поле** `target_terms`.
|
||||
|
||||
- Как решает
|
||||
Использует:
|
||||
- regex для path/entity-like фрагментов
|
||||
- список stop-words
|
||||
- alias rules с фразами и каноническими термами
|
||||
- эвристику для `/health`
|
||||
Использует позитивную модель отбора и включает в `target_terms` только:
|
||||
- endpoint paths
|
||||
- identifier-like tokens
|
||||
- alias canonical terms
|
||||
- domain terms
|
||||
|
||||
Исключаются:
|
||||
- question words
|
||||
- intent words
|
||||
- filler/noisy words
|
||||
- marker words
|
||||
- короткие токены `< 3`, если это не endpoint или alias
|
||||
- битые path-like токены
|
||||
|
||||
Дополнительно:
|
||||
- lowercase
|
||||
- trim punctuation по краям
|
||||
- dedupe
|
||||
- ограничение до `7` элементов
|
||||
- приоритет: endpoints → identifiers → aliases → domain terms
|
||||
|
||||
- Вход
|
||||
`normalized_query: str`
|
||||
@@ -153,117 +206,141 @@
|
||||
### `modules/anchors.py`
|
||||
|
||||
- Задача
|
||||
Построить полный набор `anchors` и doc-oriented marker-сигналов.
|
||||
Построить `anchors` и marker-сигналы, не смешивая их с `target_terms`.
|
||||
|
||||
- Как решает
|
||||
Использует:
|
||||
- regex для `entity_names` и `file_names`
|
||||
- словари marker-фраз:
|
||||
- file markers
|
||||
- architecture markers
|
||||
- logic markers
|
||||
- domain markers
|
||||
- endpoint markers
|
||||
- map `endpoint -> target_doc_hint`
|
||||
- alias docs из `TargetTermsAnalysis`
|
||||
Извлекает:
|
||||
- `entity_names` из PascalCase-like токенов
|
||||
- `file_names` только по жёстким правилам:
|
||||
- `*.md`, `*.yaml`, `*.yml`, `*.json`
|
||||
- `docs/...`, `doc/...`, `documentation/...`
|
||||
- `endpoint_paths` из `TargetTermsAnalysis`
|
||||
- `target_doc_hints` из alias docs, endpoint map и marker-сигналов
|
||||
|
||||
- Вход
|
||||
- `normalized_query: str`
|
||||
- `TargetTermsAnalysis`
|
||||
|
||||
- Выход
|
||||
`AnchorAnalysis`:
|
||||
- `anchors`
|
||||
Marker-сигналы живут отдельно:
|
||||
- `file_markers`
|
||||
- `architecture_markers`
|
||||
- `logic_markers`
|
||||
- `domain_markers`
|
||||
- `endpoint_markers`
|
||||
|
||||
### `routers/docs_subintent_resolver.py`
|
||||
|
||||
- Задача
|
||||
Определить `subintent`.
|
||||
|
||||
- Как решает
|
||||
Эвристика:
|
||||
- если есть `file_markers` -> `FIND_FILES`
|
||||
- если есть doc-signals (`endpoint_paths`, `endpoint_markers`, `architecture_markers`, `logic_markers`, `domain_markers`, `target_doc_hints`) -> `SUMMARY`
|
||||
- иначе `None`
|
||||
|
||||
- Вход
|
||||
`QueryFeatures`
|
||||
- `normalized_query: str`
|
||||
- `TargetTermsAnalysis`
|
||||
|
||||
- Выход
|
||||
`subintent: str | None`
|
||||
`AnchorAnalysis`
|
||||
|
||||
### `routers/deterministic.py`
|
||||
### `routers/route_catalog.py`
|
||||
|
||||
- Задача
|
||||
Детерминированно определить маршрут без LLM там, где это возможно.
|
||||
Держать один источник истины для допустимых маршрутов.
|
||||
|
||||
- Как решает
|
||||
Использует:
|
||||
- `DocsSubintentResolver`
|
||||
- проверку conflicting doc anchors
|
||||
- список general markers
|
||||
|
||||
Правила:
|
||||
- `FIND_FILES` -> `DOCS / DOC_EXPLAIN / FIND_FILES`
|
||||
- `subintent != None` и нет конфликта doc-signals -> `DOCS / DOC_EXPLAIN / SUMMARY`
|
||||
- general marker -> `GENERAL / GENERAL_QA / SUMMARY`
|
||||
|
||||
- Вход
|
||||
- `user_query: str`
|
||||
- `QueryFeatures`
|
||||
- `anchors: V2RouteAnchors`
|
||||
|
||||
- Выход
|
||||
`V2RouteResult | None`
|
||||
Возвращает:
|
||||
- список `allowed_routes` для payload LLM
|
||||
- проверку допустимости комбинации `routing_domain + intent + subintent`
|
||||
|
||||
### `routers/llm.py`
|
||||
|
||||
- Задача
|
||||
Определить маршрут через LLM, если deterministic routing не дал результата.
|
||||
Выбрать маршрут через LLM как основной селектор.
|
||||
|
||||
- Как решает
|
||||
Формирует JSON payload из:
|
||||
- `user_query`
|
||||
- `normalized_query`
|
||||
- `target_terms`
|
||||
- `anchors`
|
||||
- списка допустимых маршрутов
|
||||
- `allowed_routes`
|
||||
|
||||
Затем:
|
||||
- вызывает LLM
|
||||
- парсит JSON
|
||||
- валидирует маршрут по whitelist
|
||||
- нормализует `confidence`
|
||||
- возвращает сырой candidate route без deterministic business-routing
|
||||
|
||||
- Вход
|
||||
- `user_query: str`
|
||||
- `normalized_query: str`
|
||||
- `target_terms: list[str]`
|
||||
- `anchors: dict`
|
||||
|
||||
- Выход
|
||||
`dict | None`:
|
||||
`dict | None`
|
||||
|
||||
### `routers/validator.py`
|
||||
|
||||
- Задача
|
||||
Deterministic validation ответа LLM.
|
||||
|
||||
- Как решает
|
||||
Проверяет:
|
||||
- что `routing_domain`, `intent`, `subintent` заполнены
|
||||
- что комбинация маршрута входит в `route_catalog`
|
||||
- что `confidence` можно привести к числу
|
||||
|
||||
- Вход
|
||||
`dict | None`
|
||||
|
||||
- Выход
|
||||
Валидированный `dict | None`
|
||||
|
||||
### `routers/confidence.py`
|
||||
|
||||
- Задача
|
||||
Сделать confidence осмысленным после ответа LLM.
|
||||
|
||||
- Как решает
|
||||
Корректирует confidence:
|
||||
- `-0.1`, если нет strong anchors
|
||||
- `-0.1`, если запрос короткий или vague
|
||||
- `+0.05`, если есть явный signal (`file_markers`, `endpoint_paths`, `endpoint_markers`)
|
||||
- затем clamp в диапазон `0.0..1.0`
|
||||
|
||||
- Вход
|
||||
- `confidence: float`
|
||||
- `QueryFeatures`
|
||||
|
||||
- Выход
|
||||
`confidence: float`
|
||||
|
||||
### `routers/fallback.py`
|
||||
|
||||
- Задача
|
||||
Построить deterministic fallback, если LLM невалиден.
|
||||
|
||||
- Как решает
|
||||
Правила:
|
||||
- есть `file_markers` → `DOCS / DOC_EXPLAIN / FIND_FILES`
|
||||
- есть docs-signals (`endpoint_paths`, `target_doc_hints`, `matched_aliases`, marker groups) → `DOCS / DOC_EXPLAIN / SUMMARY`
|
||||
- иначе → `GENERAL / GENERAL_QA / SUMMARY`
|
||||
|
||||
- Вход
|
||||
- `user_query: str`
|
||||
- `QueryFeatures`
|
||||
- `anchors: V2RouteAnchors`
|
||||
- `llm_attempted: bool`
|
||||
|
||||
- Выход
|
||||
`V2RouteResult`
|
||||
|
||||
### `routers/prompts.yml`
|
||||
|
||||
- Задача
|
||||
Задать LLM-router контракт ответа и guidance по confidence.
|
||||
|
||||
- Как решает
|
||||
Ограничивает модель только `allowed_routes` и требует JSON с полями:
|
||||
- `routing_domain`
|
||||
- `intent`
|
||||
- `subintent`
|
||||
- `confidence`
|
||||
- `reason_short`
|
||||
|
||||
### `routers/prompts.yml`
|
||||
## 6. Ключевые инварианты
|
||||
|
||||
- Задача
|
||||
Задать LLM-router формальный контракт ответа.
|
||||
|
||||
- Как решает
|
||||
Описывает допустимые маршруты и требует вернуть только JSON.
|
||||
|
||||
- Вход
|
||||
Payload от `routers/llm.py`
|
||||
|
||||
- Выход
|
||||
Структурированный JSON-ответ LLM
|
||||
- LLM является default router.
|
||||
- Deterministic-слой не принимает основной routing decision.
|
||||
- `target_terms` содержат только retrieval-useful terms.
|
||||
- `anchors` не содержат `terms`.
|
||||
- `/health` и другие endpoint paths не должны попадать в `file_names`, если это не файл с расширением.
|
||||
- `file_names` содержат только реальные file/doc paths.
|
||||
- Fallback используется только если LLM недоступен или вернул невалидный маршрут.
|
||||
|
||||
@@ -0,0 +1,316 @@
|
||||
# V2RetrievalPolicyResolver Architecture
|
||||
|
||||
## 1. Роль компонента
|
||||
|
||||
`V2RetrievalPolicyResolver` это deterministic bridge между `V2IntentRouter` и docs-RAG retrieval.
|
||||
|
||||
Компонент работает поверх уже готового `V2RouteResult` и не делает повторную интерпретацию пользовательского текста:
|
||||
|
||||
- не вызывает LLM;
|
||||
- не меняет `intent` и `subintent`;
|
||||
- не ранжирует документы;
|
||||
- не собирает evidence.
|
||||
|
||||
Его задача: собрать один `RetrievalPlan` с полями:
|
||||
|
||||
- `profile`
|
||||
- `layers`
|
||||
- `limit`
|
||||
- `filters`
|
||||
|
||||
## 2. Зависимости
|
||||
|
||||
Актуальная реализация опирается на:
|
||||
|
||||
- `src/app/core/agent/processes/v2/retrieval/policy_resolver.py`
|
||||
- `src/app/core/agent/processes/v2/anchor_signals.py`
|
||||
- `src/app/core/agent/processes/v2/models.py`
|
||||
- `src/app/core/rag/contracts/enums.py`
|
||||
- `src/app/core/agent/processes/v2/retrieval/v2_rag_adapter.py`
|
||||
- `src/app/core/rag/retrieval/session_retriever.py`
|
||||
- `src/app/core/rag/persistence/repository.py`
|
||||
- `src/app/core/rag/persistence/query_repository.py`
|
||||
- `src/app/core/rag/persistence/retrieval_statement_builder.py`
|
||||
|
||||
## 3. Входной контракт
|
||||
|
||||
Resolver использует:
|
||||
|
||||
- `route.intent`
|
||||
- `route.subintent`
|
||||
- `route.anchors.entity_names`
|
||||
- `route.anchors.file_names`
|
||||
- `route.anchors.endpoint_paths`
|
||||
- `route.anchors.target_doc_hints`
|
||||
- `route.anchors.matched_aliases`
|
||||
- `route.anchors.process_domain`
|
||||
- `route.anchors.process_subdomain`
|
||||
|
||||
`route.target_terms` в текущей реализации profile/filter branching не влияет.
|
||||
|
||||
## 4. Верхнеуровневый branching
|
||||
|
||||
`resolve(route)` имеет три ветки:
|
||||
|
||||
1. `GENERAL_QA` -> `general_qa_grounded_summary`
|
||||
2. `FIND_FILES` -> `file_lookup`
|
||||
3. иначе -> docs summary branch
|
||||
|
||||
Инварианты:
|
||||
|
||||
- `GENERAL_QA` всегда остаётся general profile;
|
||||
- `FIND_FILES` всегда остаётся `file_lookup`;
|
||||
- resolver всегда возвращает один валидный `RetrievalPlan`.
|
||||
|
||||
## 5. Внутренняя декомпозиция
|
||||
|
||||
Текущая реализация разбита на два helper-класса.
|
||||
|
||||
### `_AnchorTermCollector`
|
||||
|
||||
Собирает термы для `prefer_like_patterns`.
|
||||
|
||||
Источники:
|
||||
|
||||
- basename из `target_doc_hints`
|
||||
- `endpoint_paths`
|
||||
- `file_names`
|
||||
- `entity_names`
|
||||
- `matched_aliases`
|
||||
- `process_domain`
|
||||
- `process_subdomain`
|
||||
|
||||
Все значения нормализуются в lower-case и превращаются в SQL-like patterns вида `"%term%"`.
|
||||
|
||||
Для `FIND_FILES` действует отдельное правило:
|
||||
|
||||
- если есть `target_doc_hints`, `prefer_like_patterns` строится только по basename hints;
|
||||
- иначе используется общий набор collected terms.
|
||||
|
||||
### `_RouteFilterBuilder`
|
||||
|
||||
Собирает `filters` для трёх веток:
|
||||
|
||||
- `general_filters(route)`
|
||||
- `summary_filters(route)`
|
||||
- `find_files_filters(route)`
|
||||
|
||||
Дополнительно содержит path selection:
|
||||
|
||||
- `_summary_prefixes(route)`
|
||||
- `_find_files_prefixes(route)`
|
||||
- `_find_files_prefer_prefixes(route)`
|
||||
|
||||
## 6. Signal detection
|
||||
|
||||
Summary profile и часть path preferences зависят от `anchor_signal_types(route)`.
|
||||
|
||||
Сигналы вычисляются так:
|
||||
|
||||
- `FIND_FILES`
|
||||
- если `route.subintent == FIND_FILES`
|
||||
- `API_ENDPOINT`
|
||||
- если есть `endpoint_paths`
|
||||
- или в `target_doc_hints` / `file_names` / `matched_aliases` встречаются маркеры `"/api/"`, `"api"`, `"endpoint"`
|
||||
- `ARCHITECTURE`
|
||||
- если в `target_doc_hints` / `file_names` / `matched_aliases` встречаются `"/architecture/"`, `"architecture"`, `"arch"`
|
||||
- `LOGIC_FLOW`
|
||||
- если в `target_doc_hints` / `file_names` / `matched_aliases` встречаются `"/logic/"`, `"logic"`, `"workflow"`, `"flow"`, `"process"`
|
||||
- `DOMAIN_ENTITY`
|
||||
- если есть `entity_names`
|
||||
- или в `target_doc_hints` / `file_names` / `matched_aliases` встречаются `"/domains/"`, `"domain"`, `"entity"`, `"component"`
|
||||
|
||||
Важно:
|
||||
|
||||
- `process_domain` и `process_subdomain` сейчас **не участвуют** в signal detection;
|
||||
- они влияют только на filters и `prefer_like_patterns`.
|
||||
|
||||
## 7. Summary profile selection
|
||||
|
||||
Метод `_summary_profile(route)` использует:
|
||||
|
||||
- `meaningful = anchor_signal_types(route) - {FIND_FILES}`
|
||||
|
||||
Правило:
|
||||
|
||||
- если meaningful signal не ровно один -> `docs_summary_generic`
|
||||
- если ровно один:
|
||||
- `API_ENDPOINT` -> `docs_summary_api_endpoint`
|
||||
- `ARCHITECTURE` -> `docs_summary_architecture`
|
||||
- `LOGIC_FLOW` -> `docs_summary_logic_flow`
|
||||
- `DOMAIN_ENTITY` -> `docs_summary_domain_entity`
|
||||
|
||||
Следствие:
|
||||
|
||||
- конфликт API + architecture -> generic;
|
||||
- API + entity -> generic;
|
||||
- weak/no signals -> generic.
|
||||
|
||||
## 8. Profiles, layers, limits
|
||||
|
||||
### `general_qa_grounded_summary`
|
||||
|
||||
- condition: `route.intent == GENERAL_QA`
|
||||
- layers: `[D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
|
||||
- limit: `8`
|
||||
|
||||
### `file_lookup`
|
||||
|
||||
- condition: `route.subintent == FIND_FILES`
|
||||
- layers: `[D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]`
|
||||
- limit: `12`
|
||||
|
||||
### `docs_summary_api_endpoint`
|
||||
|
||||
- layers: `[D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]`
|
||||
- limit: `8`
|
||||
|
||||
### `docs_summary_logic_flow`
|
||||
|
||||
- layers: `[D4_WORKFLOW_INDEX, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
|
||||
- limit: `8`
|
||||
|
||||
### `docs_summary_domain_entity`
|
||||
|
||||
- layers: `[D3_ENTITY_CATALOG, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
|
||||
- limit: `8`
|
||||
|
||||
### `docs_summary_architecture`
|
||||
|
||||
- layers: `[D1_DOCUMENT_CATALOG, D5_RELATION_GRAPH, D0_DOC_CHUNKS]`
|
||||
- limit: `8`
|
||||
|
||||
### `docs_summary_generic`
|
||||
|
||||
- layers: `[D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
|
||||
- limit: `8`
|
||||
|
||||
## 9. Filters by branch
|
||||
|
||||
### General branch
|
||||
|
||||
`general_filters(route)` возвращает:
|
||||
|
||||
- `prefer_path_prefixes = ["docs/architecture/", "docs/"]`
|
||||
- `prefer_like_patterns = ["%readme.md%", "%overview%"]`
|
||||
- `target_doc_hints = list(route.anchors.target_doc_hints)`
|
||||
|
||||
Это обзорный, но не узкий plan: hard `path_prefixes` здесь нет.
|
||||
|
||||
### Summary branch
|
||||
|
||||
`summary_filters(route)` всегда включает:
|
||||
|
||||
- `target_doc_hints`
|
||||
- `metadata.domain`, если есть `process_domain`
|
||||
- `metadata.subdomain`, если есть `process_subdomain`
|
||||
- `prefer_path_prefixes`
|
||||
- `prefer_like_patterns`
|
||||
|
||||
Дополнительно:
|
||||
|
||||
- если есть `API_ENDPOINT` signal, добавляется hard `path_prefixes = ["docs/api/", "docs/"]`
|
||||
|
||||
`prefer_path_prefixes` для summary:
|
||||
|
||||
- API -> `["docs/api/", "docs/"]`
|
||||
- ARCHITECTURE -> `["docs/architecture/", "docs/"]`
|
||||
- LOGIC_FLOW -> `["docs/logic/", "docs/architecture/", "docs/"]`
|
||||
- DOMAIN_ENTITY -> `["docs/domains/", "docs/", "docs/api/"]`
|
||||
- empty signals -> `["docs/"]`
|
||||
|
||||
Если сигналов несколько, prefixes объединяются и dedupe-ятся с сохранением порядка.
|
||||
|
||||
### FIND_FILES branch
|
||||
|
||||
`find_files_filters(route)` всегда включает:
|
||||
|
||||
- `target_doc_hints`
|
||||
- `metadata.domain`, если есть `process_domain`
|
||||
- `metadata.subdomain`, если есть `process_subdomain`
|
||||
- `path_prefixes`
|
||||
- `prefer_path_prefixes`
|
||||
- `prefer_like_patterns`
|
||||
|
||||
`path_prefixes` для `FIND_FILES` выбираются по приоритету:
|
||||
|
||||
1. директории из `target_doc_hints`
|
||||
2. директории из `file_names`, если путь начинается с `docs/`
|
||||
3. signal-based fallback:
|
||||
- API -> `["docs/api/", "docs/"]`
|
||||
- ARCHITECTURE -> `["docs/architecture/", "docs/"]`
|
||||
- LOGIC_FLOW -> `["docs/logic/", "docs/"]`
|
||||
- DOMAIN_ENTITY -> `["docs/domains/", "docs/"]`
|
||||
4. default -> `["docs/"]`
|
||||
|
||||
`prefer_path_prefixes` для `FIND_FILES`:
|
||||
|
||||
- начинается с `path_prefixes`
|
||||
- если есть `process_domain` или `process_subdomain`, дополнительно добавляет:
|
||||
- `"docs/domains/"`
|
||||
- `"docs/logic/"`
|
||||
|
||||
## 10. Hard и soft сигналы в текущей реализации
|
||||
|
||||
В терминах текущего кода:
|
||||
|
||||
Hard-ish / narrowing filters:
|
||||
|
||||
- `path_prefixes`
|
||||
- `metadata.domain`
|
||||
- `metadata.subdomain`
|
||||
|
||||
Soft preferences:
|
||||
|
||||
- `prefer_path_prefixes`
|
||||
- `prefer_like_patterns`
|
||||
|
||||
Отдельно:
|
||||
|
||||
- `target_doc_hints` всегда сохраняются в `RetrievalPlan.filters`, но **не маппятся напрямую** в `RagRepository.retrieve(...)` как SQL hard filter.
|
||||
|
||||
То есть сейчас `target_doc_hints` это не прямой DB filter, а downstream anchor для других шагов пайплайна и для deterministic exact-doc seeding logic.
|
||||
|
||||
## 11. Интеграция с retrieval stack
|
||||
|
||||
Следующий слой после resolver теперь исполняет plan не напрямую в `V2Process`, а через `V2RagRetrievalAdapter`.
|
||||
|
||||
`V2RagRetrievalAdapter.fetch_rows(...)` использует `RetrievalPlan` так:
|
||||
|
||||
- читает `filters["target_doc_hints"]` из самого плана;
|
||||
- делает exact-path seed через `retrieve_exact_files(...)`;
|
||||
- для missing hints делает substring fallback через `retrieve_chunks_by_path_substrings(...)`;
|
||||
- затем делает обычный semantic retrieve через `RagSessionRetriever.retrieve(...)`;
|
||||
- объединяет exact / substring / semantic rows через dedupe merge.
|
||||
|
||||
Это важный сдвиг: execution strategy теперь зависит от **контракта `RetrievalPlan`**, а не от скрытой route-specific логики внутри `V2Process`.
|
||||
|
||||
`RagSessionRetriever._map_filters()` прокидывает в `RagRepository.retrieve(...)`:
|
||||
|
||||
- `path_prefixes`
|
||||
- `exclude_path_prefixes`
|
||||
- `exclude_like_patterns`
|
||||
- `prefer_path_prefixes`
|
||||
- `prefer_like_patterns`
|
||||
- `prefer_non_tests`
|
||||
- `metadata_domain` из `filters["metadata.domain"]`
|
||||
- `metadata_subdomain` из `filters["metadata.subdomain"]`
|
||||
|
||||
`RetrievalStatementBuilder.build_retrieve(...)` добавляет SQL predicates:
|
||||
|
||||
- `lower(metadata_json->>'domain') = :metadata_domain`
|
||||
- `lower(metadata_json->>'subdomain') = :metadata_subdomain`
|
||||
|
||||
Таким образом:
|
||||
|
||||
- `process_domain/process_subdomain` реально участвуют в retrieval query;
|
||||
- `target_doc_hints` реально участвуют в retrieval execution strategy на уровне adapter;
|
||||
- `V2RetrievalPolicyResolver` определяет plan contract, а следующий шаг исполняет этот contract более буквально.
|
||||
|
||||
## 12. Актуальные ограничения
|
||||
|
||||
- Логика полностью deterministic.
|
||||
- `target_terms` сейчас не участвуют в branching resolver.
|
||||
- `process_domain/process_subdomain` не влияют на summary profile selection.
|
||||
- API signal добавляет `path_prefixes` даже в generic summary, если среди конфликтующих сигналов присутствует API.
|
||||
- `target_doc_hints` не являются прямым SQL filter внутри обычного `retrieve`, но используются adapter-уровнем для exact-path / substring seeding до semantic retrieval.
|
||||
@@ -0,0 +1,130 @@
|
||||
# Runtime Trace: 20260407-175918-b17b76678614
|
||||
|
||||
- active_rag_session_id: 94851e51-1514-4a77-9570-b17b76678614
|
||||
|
||||
## request
|
||||
```json
|
||||
{
|
||||
"request_id": "req_d9dae665c88b476db700a3f7bd210370",
|
||||
"session_id": "as_da5ddd4aacd94ec5b7078dd69e06c9c6",
|
||||
"active_rag_session_id": "94851e51-1514-4a77-9570-b17b76678614",
|
||||
"process_version": "v1",
|
||||
"created_at": "2026-04-07T17:59:18.592170+00:00",
|
||||
"message": "Ты тут?"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "workflow_started",
|
||||
"workflow_id": "v1.flow_main"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "step_started",
|
||||
"workflow_id": "v1.flow_main",
|
||||
"step_id": "prepare_user_message",
|
||||
"input": {}
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "step_completed",
|
||||
"workflow_id": "v1.flow_main",
|
||||
"step_id": "prepare_user_message",
|
||||
"output": {
|
||||
"prepared_message_length": 7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "step_started",
|
||||
"workflow_id": "v1.flow_main",
|
||||
"step_id": "generate_answer",
|
||||
"input": {
|
||||
"prompt_name": "v1_flow_main.answer",
|
||||
"prepared_message_length": 7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1.llm
|
||||
```json
|
||||
{
|
||||
"event": "request",
|
||||
"prompt_name": "v1_flow_main.answer",
|
||||
"system_prompt": "Ты полезный ассистент.\nОтветь на сообщение пользователя по существу.\nНе придумывай факты, если данных недостаточно.\nЕсли пользователь пишет по-русски, отвечай по-русски.",
|
||||
"user_prompt": "Ты тут?",
|
||||
"log_context": "agent:req_d9dae665c88b476db700a3f7bd210370"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1.llm
|
||||
```json
|
||||
{
|
||||
"event": "response",
|
||||
"text": "Да, я здесь! Чем могу помочь?"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "step_completed",
|
||||
"workflow_id": "v1.flow_main",
|
||||
"step_id": "generate_answer",
|
||||
"output": {
|
||||
"answer_length": 29
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "step_started",
|
||||
"workflow_id": "v1.flow_main",
|
||||
"step_id": "finalize_answer",
|
||||
"input": {
|
||||
"answer_length_before_strip": 29
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "step_completed",
|
||||
"workflow_id": "v1.flow_main",
|
||||
"step_id": "finalize_answer",
|
||||
"output": {
|
||||
"answer_length": 29
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v1
|
||||
```json
|
||||
{
|
||||
"event": "workflow_completed",
|
||||
"workflow_id": "v1.flow_main"
|
||||
}
|
||||
```
|
||||
|
||||
## result
|
||||
```json
|
||||
{
|
||||
"status": "done",
|
||||
"answer": "Да, я здесь! Чем могу помочь?",
|
||||
"completed_at": "2026-04-07T17:59:19.326182+00:00"
|
||||
}
|
||||
```
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,622 @@
|
||||
# Runtime Trace: 20260407-182058-3f56c69c7290
|
||||
|
||||
- active_rag_session_id: c8b893cc-cb13-4493-a6d1-3f56c69c7290
|
||||
|
||||
## request
|
||||
```json
|
||||
{
|
||||
"request_id": "req_bab9c8812ac94847bb102cba68516f10",
|
||||
"session_id": "as_4fdccc9c55c549faad8f3ef379371129",
|
||||
"active_rag_session_id": "c8b893cc-cb13-4493-a6d1-3f56c69c7290",
|
||||
"process_version": "v2",
|
||||
"created_at": "2026-04-07T18:20:58.679614+00:00",
|
||||
"message": "Как работает метод health?"
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2
|
||||
```json
|
||||
{
|
||||
"event": "intent_routed",
|
||||
"routing_domain": "DOCS",
|
||||
"intent": "DOC_EXPLAIN",
|
||||
"subintent": "SUMMARY",
|
||||
"normalized_query": "Как работает метод health?",
|
||||
"target_terms": [
|
||||
"метод",
|
||||
"health"
|
||||
],
|
||||
"anchors": {
|
||||
"entity_names": [],
|
||||
"file_names": [],
|
||||
"endpoint_paths": [],
|
||||
"target_doc_hints": [],
|
||||
"matched_aliases": [],
|
||||
"process_domain": null,
|
||||
"process_subdomain": null,
|
||||
"signal_types": []
|
||||
},
|
||||
"confidence": 0.75,
|
||||
"routing_mode": "llm_default",
|
||||
"llm_router_used": true,
|
||||
"reason_short": "Запрос на понимание работы конкретного метода \"health\".",
|
||||
"rag_session_id": "c8b893cc-cb13-4493-a6d1-3f56c69c7290"
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "router_resolved",
|
||||
"domain": "DOCS",
|
||||
"intent": "DOC_EXPLAIN",
|
||||
"subintent": "SUMMARY",
|
||||
"confidence": 0.75
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "anchors_extracted",
|
||||
"signal_types": [],
|
||||
"endpoint_paths": [],
|
||||
"target_doc_hints": [],
|
||||
"matched_aliases": [],
|
||||
"target_terms": [
|
||||
"метод",
|
||||
"health"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "alias_resolution",
|
||||
"resolved_aliases": [],
|
||||
"target_doc_hints": []
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.retrieval_policy
|
||||
```json
|
||||
{
|
||||
"event": "retrieval_plan_resolved",
|
||||
"profile": "docs_summary_generic",
|
||||
"layers": [
|
||||
"D1_DOCUMENT_CATALOG",
|
||||
"D0_DOC_CHUNKS"
|
||||
],
|
||||
"limit": 8,
|
||||
"filters": {
|
||||
"target_doc_hints": [],
|
||||
"prefer_path_prefixes": [
|
||||
"docs/"
|
||||
],
|
||||
"prefer_like_patterns": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "retrieval_profile_selected",
|
||||
"profile": "docs_summary_generic",
|
||||
"layers": [
|
||||
"D1_DOCUMENT_CATALOG",
|
||||
"D0_DOC_CHUNKS"
|
||||
],
|
||||
"filters": {
|
||||
"target_doc_hints": [],
|
||||
"prefer_path_prefixes": [
|
||||
"docs/"
|
||||
],
|
||||
"prefer_like_patterns": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.rag_retrieval
|
||||
```json
|
||||
{
|
||||
"event": "rag_rows_fetched",
|
||||
"profile": "docs_summary_generic",
|
||||
"row_count": 8,
|
||||
"rows": [
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "Архитектура Telegram Notify App",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint.",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint."
|
||||
},
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/README.md",
|
||||
"title": "Индекс технической документации test_echo_app",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: ",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: "
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Операторские и мониторинговые клиенты",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Интеграции > Операторские и мониторинговые клиенты",
|
||||
"content_preview": "- target: ext.operator_and_probes\n- target_type: external_system\n- direction: inbound\n- interaction: calls\n- via: HTTP `/health`, `/actions/{action}`, `/send`\n- purpose: диагностика, lifecycle-управление и ручная отправка сообщений\n- details:\n - transport: FastAPI + UvicornThreadRunner\n - status_mapping: non-ok health -> HTTP 503"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Связанные документы",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Связанные документы",
|
||||
"content_preview": "- [API /health](../api/health-endpoint.md)\n- [API /actions/{action}](../api/control-actions-endpoint.md)\n- [API /send](../api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](../logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](../domains/runtime-health-entity.md)"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/README.md",
|
||||
"title": "index.test_echo_app_docs:Навигация",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Индекс технической документации test_echo_app > Details > Навигация",
|
||||
"content_preview": "- [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md)\n- [API /health](./api/health-endpoint.md)\n- [API /actions/{action}](./api/control-actions-endpoint.md)\n- [API /send](./api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](./logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](./domains/runtime-health-entity.md)\n- [Каталог ошибок]("
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Summary",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Summary",
|
||||
"content_preview": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint."
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/README.md",
|
||||
"title": "index.test_echo_app_docs:Summary",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Индекс технической документации test_echo_app > Summary",
|
||||
"content_preview": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: "
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Интеграционные сценарии",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Интеграционные сценарии",
|
||||
"content_preview": "1. При старте `main()` загружает YAML-конфиг, извлекает host, port и интервал отправки, затем собирает runtime.\n2. `RuntimeManager` регистрирует `TelegramControlChannel` для HTTP control plane.\n3. `TelegramNotifyModule` добавляет `TelegramNotifyWorker` и `TelegramSendService` в runtime.\n4. Внешний клиент вызывает endpoint'ы control plane для health-check, lifecycle-операций или ручной отправки.\n5."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "candidate_generation",
|
||||
"query": "Как работает метод health?",
|
||||
"profile": "docs_summary_generic",
|
||||
"details": {
|
||||
"target_doc_hints": [],
|
||||
"candidates_before_ranking": [
|
||||
"docs/architecture/telegram-notify-app-overview.md",
|
||||
"docs/README.md",
|
||||
"docs/architecture/telegram-notify-app-overview.md",
|
||||
"docs/architecture/telegram-notify-app-overview.md",
|
||||
"docs/README.md",
|
||||
"docs/architecture/telegram-notify-app-overview.md",
|
||||
"docs/README.md",
|
||||
"docs/architecture/telegram-notify-app-overview.md"
|
||||
]
|
||||
},
|
||||
"resolved_aliases": [],
|
||||
"target_doc_hints": [],
|
||||
"candidate_docs_before_ranking": [
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "Архитектура Telegram Notify App",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint.",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint."
|
||||
},
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/README.md",
|
||||
"title": "Индекс технической документации test_echo_app",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: ",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: "
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Операторские и мониторинговые клиенты",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Интеграции > Операторские и мониторинговые клиенты",
|
||||
"content_preview": "- target: ext.operator_and_probes\n- target_type: external_system\n- direction: inbound\n- interaction: calls\n- via: HTTP `/health`, `/actions/{action}`, `/send`\n- purpose: диагностика, lifecycle-управление и ручная отправка сообщений\n- details:\n - transport: FastAPI + UvicornThreadRunner\n - status_mapping: non-ok health -> HTTP 503"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Связанные документы",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Связанные документы",
|
||||
"content_preview": "- [API /health](../api/health-endpoint.md)\n- [API /actions/{action}](../api/control-actions-endpoint.md)\n- [API /send](../api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](../logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](../domains/runtime-health-entity.md)"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/README.md",
|
||||
"title": "index.test_echo_app_docs:Навигация",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Индекс технической документации test_echo_app > Details > Навигация",
|
||||
"content_preview": "- [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md)\n- [API /health](./api/health-endpoint.md)\n- [API /actions/{action}](./api/control-actions-endpoint.md)\n- [API /send](./api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](./logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](./domains/runtime-health-entity.md)\n- [Каталог ошибок]("
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Summary",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Summary",
|
||||
"content_preview": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint."
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/README.md",
|
||||
"title": "index.test_echo_app_docs:Summary",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Индекс технической документации test_echo_app > Summary",
|
||||
"content_preview": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: "
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Интеграционные сценарии",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Интеграционные сценарии",
|
||||
"content_preview": "1. При старте `main()` загружает YAML-конфиг, извлекает host, port и интервал отправки, затем собирает runtime.\n2. `RuntimeManager` регистрирует `TelegramControlChannel` для HTTP control plane.\n3. `TelegramNotifyModule` добавляет `TelegramNotifyWorker` и `TelegramSendService` в runtime.\n4. Внешний клиент вызывает endpoint'ы control plane для health-check, lifecycle-операций или ручной отправки.\n5."
|
||||
}
|
||||
],
|
||||
"sources": {
|
||||
"seeded": [],
|
||||
"metadata_lookup": [],
|
||||
"semantic": [
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "Архитектура Telegram Notify App",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint.",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint."
|
||||
},
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/README.md",
|
||||
"title": "Индекс технической документации test_echo_app",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: ",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: "
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Операторские и мониторинговые клиенты",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Интеграции > Операторские и мониторинговые клиенты",
|
||||
"content_preview": "- target: ext.operator_and_probes\n- target_type: external_system\n- direction: inbound\n- interaction: calls\n- via: HTTP `/health`, `/actions/{action}`, `/send`\n- purpose: диагностика, lifecycle-управление и ручная отправка сообщений\n- details:\n - transport: FastAPI + UvicornThreadRunner\n - status_mapping: non-ok health -> HTTP 503"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Связанные документы",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Связанные документы",
|
||||
"content_preview": "- [API /health](../api/health-endpoint.md)\n- [API /actions/{action}](../api/control-actions-endpoint.md)\n- [API /send](../api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](../logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](../domains/runtime-health-entity.md)"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/README.md",
|
||||
"title": "index.test_echo_app_docs:Навигация",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Индекс технической документации test_echo_app > Details > Навигация",
|
||||
"content_preview": "- [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md)\n- [API /health](./api/health-endpoint.md)\n- [API /actions/{action}](./api/control-actions-endpoint.md)\n- [API /send](./api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](./logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](./domains/runtime-health-entity.md)\n- [Каталог ошибок]("
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "retrieval_executed",
|
||||
"query": "Как работает метод health?",
|
||||
"profile": "docs_summary_generic",
|
||||
"row_count": 8,
|
||||
"target_doc_hints": [],
|
||||
"top_results": [
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "Архитектура Telegram Notify App",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint.",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint."
|
||||
},
|
||||
{
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"path": "docs/README.md",
|
||||
"title": "Индекс технической документации test_echo_app",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: ",
|
||||
"section_path": "",
|
||||
"content_preview": "- Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: "
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Операторские и мониторинговые клиенты",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Интеграции > Операторские и мониторинговые клиенты",
|
||||
"content_preview": "- target: ext.operator_and_probes\n- target_type: external_system\n- direction: inbound\n- interaction: calls\n- via: HTTP `/health`, `/actions/{action}`, `/send`\n- purpose: диагностика, lifecycle-управление и ручная отправка сообщений\n- details:\n - transport: FastAPI + UvicornThreadRunner\n - status_mapping: non-ok health -> HTTP 503"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"title": "architecture.telegram_notify_app:Связанные документы",
|
||||
"document_id": "architecture.telegram_notify_app",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Архитектура Telegram Notify App > Details > Связанные документы",
|
||||
"content_preview": "- [API /health](../api/health-endpoint.md)\n- [API /actions/{action}](../api/control-actions-endpoint.md)\n- [API /send](../api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](../logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](../domains/runtime-health-entity.md)"
|
||||
},
|
||||
{
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"path": "docs/README.md",
|
||||
"title": "index.test_echo_app_docs:Навигация",
|
||||
"document_id": "index.test_echo_app_docs",
|
||||
"entity_name": "",
|
||||
"summary_text": "",
|
||||
"section_path": "Индекс технической документации test_echo_app > Details > Навигация",
|
||||
"content_preview": "- [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md)\n- [API /health](./api/health-endpoint.md)\n- [API /actions/{action}](./api/control-actions-endpoint.md)\n- [API /send](./api/send-message-endpoint.md)\n- [Логика цикла отправки уведомлений](./logic/telegram-notification-loop.md)\n- [Доменная модель runtime health](./domains/runtime-health-entity.md)\n- [Каталог ошибок]("
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.evidence
|
||||
```json
|
||||
{
|
||||
"event": "evidence_assembled",
|
||||
"mode": "summary",
|
||||
"document_count": 2,
|
||||
"documents": [
|
||||
"docs/README.md",
|
||||
"docs/architecture/telegram-notify-app-overview.md"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "evidence_assembled",
|
||||
"mode": "summary",
|
||||
"primary_doc": "docs/README.md",
|
||||
"document_count": 2
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "ranking_explained",
|
||||
"doc": "docs/README.md",
|
||||
"score_breakdown": {
|
||||
"semantic": 20,
|
||||
"path_match": 0,
|
||||
"filename_match": 0,
|
||||
"alias_match": 0,
|
||||
"anchor_boost": 0,
|
||||
"target_doc_boost": 0,
|
||||
"generic_penalty": 0
|
||||
},
|
||||
"score": 20,
|
||||
"match_reason": "semantic_match"
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "ranking_explained",
|
||||
"doc": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"score_breakdown": {
|
||||
"semantic": 20,
|
||||
"path_match": 0,
|
||||
"filename_match": 0,
|
||||
"alias_match": 0,
|
||||
"anchor_boost": 0,
|
||||
"target_doc_boost": 0,
|
||||
"generic_penalty": 0
|
||||
},
|
||||
"score": 20,
|
||||
"match_reason": "semantic_match"
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "ranking_explained",
|
||||
"top_docs_after_ranking": [
|
||||
{
|
||||
"doc": "docs/README.md",
|
||||
"score": 20,
|
||||
"match_reason": "semantic_match"
|
||||
},
|
||||
{
|
||||
"doc": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"score": 20,
|
||||
"match_reason": "semantic_match"
|
||||
}
|
||||
],
|
||||
"ranking_score_breakdown": [
|
||||
{
|
||||
"doc": "docs/README.md",
|
||||
"score_breakdown": {
|
||||
"semantic": 20,
|
||||
"path_match": 0,
|
||||
"filename_match": 0,
|
||||
"alias_match": 0,
|
||||
"anchor_boost": 0,
|
||||
"target_doc_boost": 0,
|
||||
"generic_penalty": 0
|
||||
}
|
||||
},
|
||||
{
|
||||
"doc": "docs/architecture/telegram-notify-app-overview.md",
|
||||
"score_breakdown": {
|
||||
"semantic": 20,
|
||||
"path_match": 0,
|
||||
"filename_match": 0,
|
||||
"alias_match": 0,
|
||||
"anchor_boost": 0,
|
||||
"target_doc_boost": 0,
|
||||
"generic_penalty": 0
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "evidence_gate_checked",
|
||||
"passed": true,
|
||||
"reason": "target_doc_found",
|
||||
"answer_mode": "grounded_summary"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v2.summary
|
||||
```json
|
||||
{
|
||||
"event": "workflow_started",
|
||||
"workflow_id": "v2.docs_explain.summary"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v2.summary.llm
|
||||
```json
|
||||
{
|
||||
"event": "request",
|
||||
"prompt_name": "v2_docs_explain.summary_answer",
|
||||
"system_prompt": "Ты объясняешь документацию только на основе найденных SUMMARY-блоков.\nИспользуй только факты из входного контекста.\nЕсли информации мало, прямо скажи об этом и не додумывай детали.\nВ конце перечисли файлы, на которые ты опирался.",
|
||||
"user_prompt": "Запрос пользователя:\nКак работает метод health?\n\nСигналы запроса:\n{\n \"entity_names\": [],\n \"file_names\": [],\n \"endpoint_paths\": [],\n \"target_doc_hints\": [],\n \"matched_aliases\": [],\n \"process_domain\": null,\n \"process_subdomain\": null,\n \"signal_types\": []\n}\n\nНайденные SUMMARY-блоки:\n\n1. path: docs/README.md\ntitle: Индекс технической документации test_echo_app\nmatch_reason: semantic_match\nsummary: - Purpose: точка входа в техническую документацию сервиса `test_echo_app`.\n- Scope: архитектура, HTTP API control plane, цикл отправки уведомлений, health-модель и каталог ошибок.\n- Canonical structure: `docs/architecture`, `docs/api`, `docs/logic`, `docs/domains`, `docs/errors`.\n- Primary parent doc: [Архитектура Telegram Notify App](./architecture/telegram-notify-app-overview.md).\n- Navigation: документы связаны через `related_docs`, `parent`/`children` и markdown-ссылки без дублирования деталей.\n\n2. path: docs/architecture/telegram-notify-app-overview.md\ntitle: Архитектура Telegram Notify App\nmatch_reason: semantic_match\nsummary: - Purpose: сервис поднимает HTTP control plane и фоновый worker для отправки уведомлений в Telegram.\n- Entry point: `src/telegram_notify_app/main.py`.\n- Main components: `RuntimeManager`, `TelegramControlChannel`, `TelegramNotifyModule`, `TelegramNotifyWorker`, `TelegramSendService`.\n- Configuration: `config/config.yaml` или путь из `CONFIG_PATH`.\n- Related API: [`/health`](../api/health-endpoint.md), [`/actions/{action}`](../api/control-actions-endpoint.md), [`/send`](../api/send-message-endpoint.md).\n- Related logic: [цикл отправки уведомлений](../logic/telegram-notification-loop.md).\n- Related domain: [runtime health](../domains/runtime-health-entity.md).",
|
||||
"log_context": "agent:req_bab9c8812ac94847bb102cba68516f10"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v2.summary.llm
|
||||
```json
|
||||
{
|
||||
"event": "response",
|
||||
"text": "На основе представленного контекста невозможно предоставить подробное объяснение работы метода health. \n\nФайлы, на которые я опирался:\n1. docs/README.md\n2. docs/architecture/telegram-notify-app-overview.md"
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v2.summary
|
||||
```json
|
||||
{
|
||||
"event": "workflow_trace_flushed",
|
||||
"workflow_id": "v2.docs_explain.summary",
|
||||
"steps": [
|
||||
{
|
||||
"step_id": "generate_summary_answer",
|
||||
"title": "Сборка ответа по summary",
|
||||
"input": {},
|
||||
"output": {
|
||||
"answer_length": 205
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## workflow.v2.summary
|
||||
```json
|
||||
{
|
||||
"event": "workflow_completed",
|
||||
"workflow_id": "v2.docs_explain.summary"
|
||||
}
|
||||
```
|
||||
|
||||
## process.v2.pipeline
|
||||
```json
|
||||
{
|
||||
"event": "answer_generated",
|
||||
"answer_mode": "grounded_summary",
|
||||
"answer_length": 205
|
||||
}
|
||||
```
|
||||
|
||||
## result
|
||||
```json
|
||||
{
|
||||
"status": "done",
|
||||
"answer": "На основе представленного контекста невозможно предоставить подробное объяснение работы метода health. \n\nФайлы, на которые я опирался:\n1. docs/README.md\n2. docs/architecture/telegram-notify-app-overview.md",
|
||||
"completed_at": "2026-04-07T18:21:01.793612+00:00"
|
||||
}
|
||||
```
|
||||
File diff suppressed because it is too large
Load Diff
@@ -4,17 +4,17 @@ from app.core.agent.processes.v2.models import V2AnchorType, V2RouteAnchors, V2R
|
||||
|
||||
|
||||
def anchor_signal_types(route: V2RouteResult) -> set[str]:
|
||||
hints = [str(item).strip().lower() for item in route.anchors.target_doc_hints if str(item or "").strip()]
|
||||
texts = _signal_texts(route)
|
||||
signals: set[str] = set()
|
||||
if route.subintent == V2Subintent.FIND_FILES:
|
||||
signals.add(V2AnchorType.FIND_FILES)
|
||||
if route.anchors.endpoint_paths or _has_hint(hints, "/api/"):
|
||||
if route.anchors.endpoint_paths or _has_any(texts, ("/api/", "api", "endpoint")):
|
||||
signals.add(V2AnchorType.API_ENDPOINT)
|
||||
if _has_hint(hints, "/architecture/"):
|
||||
if _has_any(texts, ("/architecture/", "architecture", "arch")):
|
||||
signals.add(V2AnchorType.ARCHITECTURE)
|
||||
if _has_hint(hints, "/logic/"):
|
||||
if _has_any(texts, ("/logic/", "logic", "workflow", "flow", "process")):
|
||||
signals.add(V2AnchorType.LOGIC_FLOW)
|
||||
if _has_hint(hints, "/domains/"):
|
||||
if route.anchors.entity_names or _has_any(texts, ("/domains/", "domain", "entity", "component")):
|
||||
signals.add(V2AnchorType.DOMAIN_ENTITY)
|
||||
return signals
|
||||
|
||||
@@ -44,5 +44,14 @@ def anchors_have_signal(anchors: V2RouteAnchors, signal: str, *, subintent: str
|
||||
return signal in anchor_signal_types(route)
|
||||
|
||||
|
||||
def _has_hint(hints: list[str], marker: str) -> bool:
|
||||
return any(marker in hint for hint in hints)
|
||||
def _signal_texts(route: V2RouteResult) -> list[str]:
|
||||
items = [
|
||||
*route.anchors.target_doc_hints,
|
||||
*route.anchors.file_names,
|
||||
*route.anchors.matched_aliases,
|
||||
]
|
||||
return [str(item).strip().lower() for item in items if str(item or "").strip()]
|
||||
|
||||
|
||||
def _has_any(items: list[str], markers: tuple[str, ...]) -> bool:
|
||||
return any(marker in item for item in items for marker in markers)
|
||||
|
||||
@@ -11,6 +11,8 @@ from app.core.rag.contracts.enums import RagLayer
|
||||
|
||||
|
||||
class DocsEvidenceAssembler:
|
||||
_API_PATH_PREFIXES = ("docs/api/", "docs/endpoints/", "docs/methods/", "api/", "endpoints/", "methods/")
|
||||
_GENERIC_DOC_MARKERS = ("readme", "overview", "index", "navigation", "related docs", "catalog")
|
||||
def assemble_summaries(self, rows: list[dict], route: V2RouteResult) -> list[RetrievedSummary]:
|
||||
items = self._rank_rows(rows, route, mode="summary")
|
||||
ranked = [
|
||||
@@ -71,10 +73,12 @@ class DocsEvidenceAssembler:
|
||||
"score": score,
|
||||
"score_breakdown": breakdown,
|
||||
"match_reason": self._match_reason(breakdown),
|
||||
"is_generic_doc": self._is_generic_doc(path, self._title(row, path), self._summary(row), row),
|
||||
}
|
||||
)
|
||||
ranked.sort(key=lambda item: (-item["score"], item["path"]))
|
||||
return self._ensure_target_docs_in_top_k(ranked, route, k=4 if mode == "find_files" else 3)
|
||||
ranked = self._ensure_target_docs_in_top_k(ranked, route, k=4 if mode == "find_files" else 3)
|
||||
return self._promote_specific_primary(ranked, route)
|
||||
|
||||
def _score_breakdown(self, row: dict, route: V2RouteResult, *, mode: str) -> dict[str, int]:
|
||||
path_raw = self._path(row)
|
||||
@@ -93,6 +97,7 @@ class DocsEvidenceAssembler:
|
||||
"alias_match": 0,
|
||||
"anchor_boost": 0,
|
||||
"target_doc_boost": 0,
|
||||
"specificity_boost": 0,
|
||||
"generic_penalty": 0,
|
||||
}
|
||||
if route.intent == "GENERAL_QA":
|
||||
@@ -100,6 +105,7 @@ class DocsEvidenceAssembler:
|
||||
hint_norm_lower = {normalize_doc_path(h).lower() for h in route.anchors.target_doc_hints if str(h or "").strip()}
|
||||
if normalize_doc_path(path_raw).lower() in hint_norm_lower:
|
||||
breakdown["target_doc_boost"] += 1000
|
||||
hint_texts = [str(hint or "").strip().lower() for hint in route.anchors.target_doc_hints if str(hint or "").strip()]
|
||||
if any(alias.lower() in " ".join([path, title, summary, entity]) for alias in route.anchors.matched_aliases):
|
||||
breakdown["alias_match"] += 500
|
||||
for token in query_tokens:
|
||||
@@ -111,10 +117,25 @@ class DocsEvidenceAssembler:
|
||||
breakdown["semantic"] += 20
|
||||
if self._compact(token) in compact_haystack:
|
||||
breakdown["alias_match"] += 250
|
||||
for hint in hint_texts:
|
||||
compact_hint = self._compact(hint)
|
||||
if compact_hint and compact_hint in compact_haystack:
|
||||
breakdown["target_doc_boost"] += 180
|
||||
elif hint and hint.strip("/") in " ".join([path, title, summary, entity]):
|
||||
breakdown["semantic"] += 70
|
||||
endpoint_text = self._summary(row).lower()
|
||||
for endpoint in route.anchors.endpoint_paths:
|
||||
normalized_endpoint = endpoint.strip().lower()
|
||||
endpoint_slug = normalized_endpoint.strip("/")
|
||||
if normalized_endpoint and normalized_endpoint in endpoint_text:
|
||||
breakdown["target_doc_boost"] += 260
|
||||
if endpoint_slug and endpoint_slug in filename:
|
||||
breakdown["filename_match"] += 200
|
||||
if any(endpoint.strip("/").lower() in filename for endpoint in route.anchors.endpoint_paths):
|
||||
breakdown["filename_match"] += 200
|
||||
signals = anchor_signal_types(route)
|
||||
breakdown["anchor_boost"] += self._anchor_boost(path, signals)
|
||||
breakdown["specificity_boost"] += self._specificity_boost(row, path, title, summary, route)
|
||||
breakdown["generic_penalty"] += self._generic_penalty(path, signals)
|
||||
if mode == "find_files":
|
||||
breakdown["path_match"] *= 3
|
||||
@@ -125,8 +146,8 @@ class DocsEvidenceAssembler:
|
||||
|
||||
def _anchor_boost(self, path: str, signals: set[str]) -> int:
|
||||
boost = 0
|
||||
if V2AnchorType.API_ENDPOINT in signals and path.startswith("docs/api/"):
|
||||
boost += 300
|
||||
if V2AnchorType.API_ENDPOINT in signals and path.startswith(self._API_PATH_PREFIXES):
|
||||
boost += 360
|
||||
if V2AnchorType.LOGIC_FLOW in signals and path.startswith("docs/logic/"):
|
||||
boost += 300
|
||||
if V2AnchorType.DOMAIN_ENTITY in signals and path.startswith("docs/domains/"):
|
||||
@@ -139,8 +160,11 @@ class DocsEvidenceAssembler:
|
||||
|
||||
def _generic_penalty(self, path: str, signals: set[str]) -> int:
|
||||
penalty = 0
|
||||
lowered = path.lower()
|
||||
if path == "docs/README.md" and V2AnchorType.ARCHITECTURE not in signals:
|
||||
penalty -= 200
|
||||
penalty -= 260
|
||||
if any(marker in lowered for marker in ("/readme", "readme.md", "/index", "/overview", "/catalog", "/navigation")):
|
||||
penalty -= 220
|
||||
if "/architecture/" in path and V2AnchorType.ARCHITECTURE not in signals and signals.intersection(
|
||||
{V2AnchorType.API_ENDPOINT, V2AnchorType.DOMAIN_ENTITY}
|
||||
):
|
||||
@@ -173,6 +197,17 @@ class DocsEvidenceAssembler:
|
||||
top.sort(key=lambda item: (-item["score"], item["path"]))
|
||||
return top + remaining
|
||||
|
||||
def _promote_specific_primary(self, ranked: list[dict], route: V2RouteResult) -> list[dict]:
|
||||
if len(ranked) < 2:
|
||||
return ranked
|
||||
first = ranked[0]
|
||||
if not first.get("is_generic_doc"):
|
||||
return ranked
|
||||
promoted = next((item for item in ranked[1:] if not item.get("is_generic_doc") and self._is_specific_candidate(item, route)), None)
|
||||
if promoted is None:
|
||||
return ranked
|
||||
return [promoted] + [item for item in ranked if item["path"] != promoted["path"]]
|
||||
|
||||
def _match_reason(self, breakdown: dict[str, int]) -> str:
|
||||
if breakdown["target_doc_boost"] > 0:
|
||||
return "exact_path"
|
||||
@@ -189,6 +224,53 @@ class DocsEvidenceAssembler:
|
||||
section = str(metadata.get("section_path") or "").lower()
|
||||
return "summary" in section or "свод" in section or "overview" in section
|
||||
|
||||
def _specificity_boost(self, row: dict, path: str, title: str, summary: str, route: V2RouteResult) -> int:
|
||||
boost = 0
|
||||
filename = path.split("/")[-1]
|
||||
lowered_title = title.lower()
|
||||
lowered_summary = summary.lower()
|
||||
if not self._is_generic_doc(path, title, summary, row):
|
||||
boost += 90
|
||||
if path.startswith(self._API_PATH_PREFIXES):
|
||||
boost += 160
|
||||
if "endpoint" in filename or "endpoint" in lowered_title or "method" in lowered_title:
|
||||
boost += 120
|
||||
if row.get("layer") == RagLayer.DOCS_DOC_CHUNKS and not self._looks_like_navigation_chunk(row):
|
||||
boost += 80
|
||||
for token in self._query_tokens(route):
|
||||
if token and token in filename:
|
||||
boost += 90
|
||||
if token and token in lowered_title:
|
||||
boost += 70
|
||||
if token and token in lowered_summary:
|
||||
boost += 40
|
||||
return boost
|
||||
|
||||
def _is_specific_candidate(self, item: dict, route: V2RouteResult) -> bool:
|
||||
breakdown = dict(item.get("score_breakdown") or {})
|
||||
if breakdown.get("target_doc_boost", 0) > 0:
|
||||
return True
|
||||
if breakdown.get("specificity_boost", 0) >= 160:
|
||||
return True
|
||||
return V2AnchorType.API_ENDPOINT in anchor_signal_types(route) and item["path"].startswith(self._API_PATH_PREFIXES)
|
||||
|
||||
def _is_generic_doc(self, path: str, title: str, summary: str, row: dict) -> bool:
|
||||
haystack = " ".join([path.lower(), title.lower(), summary.lower()])
|
||||
if any(marker in haystack for marker in self._GENERIC_DOC_MARKERS):
|
||||
return True
|
||||
return self._looks_like_navigation_chunk(row)
|
||||
|
||||
def _looks_like_navigation_chunk(self, row: dict) -> bool:
|
||||
text = self._summary(row).lower()
|
||||
if not text:
|
||||
return False
|
||||
lines = [line.strip() for line in text.splitlines() if line.strip()]
|
||||
bullet_lines = sum(1 for line in lines if line.startswith(("- ", "* ", "1.", "2.", "3.")))
|
||||
link_lines = sum(1 for line in lines if "](" in line or line.startswith("docs/"))
|
||||
if "related docs" in text or "navigation" in text:
|
||||
return True
|
||||
return bullet_lines >= 3 or link_lines >= 3
|
||||
|
||||
def _query_tokens(self, route: V2RouteResult) -> list[str]:
|
||||
values = list(route.target_terms) + list(route.anchors.matched_aliases)
|
||||
tokens: list[str] = []
|
||||
|
||||
@@ -8,6 +8,7 @@ class QueryFeatures:
|
||||
normalized_query: str
|
||||
target_terms: list[str]
|
||||
endpoint_paths: list[str]
|
||||
file_names: list[str]
|
||||
matched_aliases: list[str]
|
||||
target_doc_hints: list[str]
|
||||
file_markers: list[str]
|
||||
|
||||
@@ -34,10 +34,42 @@ class _MarkerScanner:
|
||||
"где описано",
|
||||
"документ с описанием",
|
||||
)
|
||||
_ARCHITECTURE_MARKERS = ("архитектура", "как устроено приложение", "как устроен сервис", "основные части системы", "из чего состоит")
|
||||
_LOGIC_MARKERS = ("цикл", "loop", "worker", "как работает отправка уведомлений", "логика отправки", "background job", "runtime loop")
|
||||
_ARCHITECTURE_MARKERS = (
|
||||
"архитектура",
|
||||
"архитектур",
|
||||
"architecture",
|
||||
"arch overview",
|
||||
"как устроено приложение",
|
||||
"как устроен сервис",
|
||||
"основные части системы",
|
||||
"из чего состоит",
|
||||
)
|
||||
_LOGIC_MARKERS = (
|
||||
"цикл",
|
||||
"loop",
|
||||
"flow",
|
||||
"workflow",
|
||||
"process",
|
||||
"worker",
|
||||
"как работает отправка уведомлений",
|
||||
"логика отправки",
|
||||
"background job",
|
||||
"runtime loop",
|
||||
)
|
||||
_DOMAIN_MARKERS = ("runtime health", "health model", "статусы здоровья", "сущность", "entity", "здоровье runtime")
|
||||
_ENDPOINT_MARKERS = ("endpoint", "метод api", "ручка", "эндпоинт")
|
||||
_ENDPOINT_MARKERS = (
|
||||
"endpoint",
|
||||
"api",
|
||||
"route",
|
||||
"method",
|
||||
"метод api",
|
||||
"метод",
|
||||
"метода",
|
||||
"ручка",
|
||||
"эндпоинт",
|
||||
"маршрут",
|
||||
"роут",
|
||||
)
|
||||
|
||||
def scan(self, lowered_query: str) -> dict[str, list[str]]:
|
||||
return {
|
||||
@@ -54,12 +86,13 @@ class _MarkerScanner:
|
||||
|
||||
class _EntityNameExtractor:
|
||||
_ENTITY_RE = re.compile(r"\b[A-Z][A-Za-z0-9_]+\b")
|
||||
_IGNORE = {"arch"}
|
||||
|
||||
def extract(self, query: str) -> list[str]:
|
||||
items: list[str] = []
|
||||
for match in self._ENTITY_RE.finditer(query):
|
||||
candidate = match.group(0).strip()
|
||||
if candidate and candidate not in items:
|
||||
if candidate and candidate.lower() not in self._IGNORE and candidate not in items:
|
||||
items.append(candidate)
|
||||
return items
|
||||
|
||||
@@ -92,33 +125,61 @@ class _FileNameExtractor:
|
||||
items.append(value)
|
||||
|
||||
|
||||
class _ProcessAnchorExtractor:
|
||||
_DOMAIN_KEYWORDS = {
|
||||
"billing": "billing",
|
||||
"notifications": "notifications",
|
||||
}
|
||||
_SUBDOMAIN_KEYWORDS = {
|
||||
"invoice": ("billing", "invoice"),
|
||||
"invoices": ("billing", "invoice"),
|
||||
"delivery_loop": ("notifications", "delivery_loop"),
|
||||
"delivery": ("notifications", "delivery_loop"),
|
||||
}
|
||||
|
||||
def extract(self, lowered_query: str) -> tuple[str | None, str | None]:
|
||||
domain = next((value for token, value in self._DOMAIN_KEYWORDS.items() if token in lowered_query), None)
|
||||
subdomain: str | None = None
|
||||
for token, mapping in self._SUBDOMAIN_KEYWORDS.items():
|
||||
if token in lowered_query:
|
||||
domain = domain or mapping[0]
|
||||
subdomain = mapping[1]
|
||||
break
|
||||
return domain, subdomain
|
||||
|
||||
|
||||
class V2AnchorExtractor:
|
||||
def __init__(
|
||||
self,
|
||||
marker_scanner: _MarkerScanner | None = None,
|
||||
entity_extractor: _EntityNameExtractor | None = None,
|
||||
file_name_extractor: _FileNameExtractor | None = None,
|
||||
process_anchor_extractor: _ProcessAnchorExtractor | None = None,
|
||||
) -> None:
|
||||
self._marker_scanner = marker_scanner or _MarkerScanner()
|
||||
self._entity_extractor = entity_extractor or _EntityNameExtractor()
|
||||
self._file_name_extractor = file_name_extractor or _FileNameExtractor()
|
||||
self._process_anchor_extractor = process_anchor_extractor or _ProcessAnchorExtractor()
|
||||
|
||||
def extract(self, normalized_query: str, terms: TargetTermsAnalysis) -> AnchorAnalysis:
|
||||
markers = self._marker_scanner.scan(normalized_query.lower())
|
||||
lowered_query = normalized_query.lower()
|
||||
markers = self._marker_scanner.scan(lowered_query)
|
||||
process_domain, process_subdomain = self._process_anchor_extractor.extract(lowered_query)
|
||||
anchors = V2RouteAnchors(
|
||||
entity_names=self._entity_extractor.extract(normalized_query),
|
||||
file_names=self._file_name_extractor.extract(normalized_query),
|
||||
endpoint_paths=list(terms.endpoint_paths),
|
||||
target_doc_hints=self._target_doc_hints(
|
||||
endpoint_paths=terms.endpoint_paths,
|
||||
api_like_terms=terms.api_like_terms,
|
||||
alias_docs=terms.alias_docs,
|
||||
architecture_markers=markers["architecture_markers"],
|
||||
logic_markers=markers["logic_markers"],
|
||||
domain_markers=markers["domain_markers"],
|
||||
),
|
||||
matched_aliases=list(terms.matched_aliases),
|
||||
process_domain=None,
|
||||
process_subdomain=None,
|
||||
process_domain=process_domain,
|
||||
process_subdomain=process_subdomain,
|
||||
)
|
||||
return AnchorAnalysis(
|
||||
anchors=anchors,
|
||||
@@ -133,6 +194,7 @@ class V2AnchorExtractor:
|
||||
self,
|
||||
*,
|
||||
endpoint_paths: list[str],
|
||||
api_like_terms: list[str],
|
||||
alias_docs: list[str],
|
||||
architecture_markers: list[str],
|
||||
logic_markers: list[str],
|
||||
@@ -145,13 +207,41 @@ class V2AnchorExtractor:
|
||||
"/actions/{action}": "docs/api/control-actions-endpoint.md",
|
||||
}
|
||||
for endpoint in endpoint_paths:
|
||||
for hint in self._endpoint_hint_variants(endpoint):
|
||||
self._append_unique(hints, hint)
|
||||
hint = endpoint_map.get(endpoint)
|
||||
if hint and hint not in hints:
|
||||
hints.append(hint)
|
||||
if architecture_markers and "docs/architecture/telegram-notify-app-overview.md" not in hints:
|
||||
hints.append("docs/architecture/telegram-notify-app-overview.md")
|
||||
if logic_markers and "docs/logic/telegram-notification-loop.md" not in hints:
|
||||
hints.append("docs/logic/telegram-notification-loop.md")
|
||||
if domain_markers and "docs/domains/runtime-health-entity.md" not in hints:
|
||||
hints.append("docs/domains/runtime-health-entity.md")
|
||||
self._append_unique(hints, hint)
|
||||
for term in api_like_terms:
|
||||
for hint in self._api_like_hint_variants(term):
|
||||
self._append_unique(hints, hint)
|
||||
if architecture_markers:
|
||||
self._append_unique(hints, "docs/architecture/telegram-notify-app-overview.md")
|
||||
if logic_markers:
|
||||
self._append_unique(hints, "docs/logic/telegram-notification-loop.md")
|
||||
if domain_markers:
|
||||
self._append_unique(hints, "docs/domains/runtime-health-entity.md")
|
||||
return hints
|
||||
|
||||
def _endpoint_hint_variants(self, endpoint: str) -> list[str]:
|
||||
normalized = str(endpoint or "").strip().lower()
|
||||
if not normalized:
|
||||
return []
|
||||
slug = normalized.strip("/").replace("/", "-").replace("{", "").replace("}", "")
|
||||
leaf = next((part for part in reversed(slug.split("-")) if part and part != "id"), "")
|
||||
hints: list[str] = [normalized]
|
||||
for value in (slug, leaf):
|
||||
if not value:
|
||||
continue
|
||||
hints.extend([value, f"{value}-endpoint", f"{value} endpoint"])
|
||||
return list(dict.fromkeys(hints))
|
||||
|
||||
def _api_like_hint_variants(self, term: str) -> list[str]:
|
||||
normalized = str(term or "").strip().lower().lstrip("/")
|
||||
if not normalized:
|
||||
return []
|
||||
return [normalized, f"/{normalized}", f"{normalized}-endpoint", f"{normalized} endpoint"]
|
||||
|
||||
def _append_unique(self, items: list[str], value: str | None) -> None:
|
||||
normalized = str(value or "").strip()
|
||||
if normalized and normalized not in items:
|
||||
items.append(normalized)
|
||||
|
||||
@@ -8,6 +8,7 @@ from dataclasses import dataclass
|
||||
class TargetTermsAnalysis:
|
||||
target_terms: list[str]
|
||||
endpoint_paths: list[str]
|
||||
api_like_terms: list[str]
|
||||
matched_aliases: list[str]
|
||||
alias_docs: list[str]
|
||||
|
||||
@@ -26,7 +27,7 @@ class _AliasMatcher:
|
||||
_AliasRule(("control actions", "управление runtime"), "/actions/{action}", "docs/api/control-actions-endpoint.md"),
|
||||
_AliasRule(("runtime health", "здоровье runtime", "статусы здоровья"), "runtime_health", "docs/domains/runtime-health-entity.md"),
|
||||
_AliasRule(("цикл отправки уведомлений", "notification loop", "worker loop"), "telegram-notify-loop", "docs/logic/telegram-notification-loop.md"),
|
||||
_AliasRule(("архитектура приложения", "overview"), "architecture_overview", "docs/architecture/telegram-notify-app-overview.md"),
|
||||
_AliasRule(("архитектура приложения",), "architecture_overview", "docs/architecture/telegram-notify-app-overview.md"),
|
||||
_AliasRule(("архитектура",), "architecture_overview", "docs/architecture/telegram-notify-app-overview.md"),
|
||||
_AliasRule(("каталог ошибок", "errors catalog"), "errors_catalog", "docs/errors/catalog.yaml"),
|
||||
_AliasRule(("файл-индекс документации", "docs index", "индекс документации"), "docs_index", "docs/README.md"),
|
||||
@@ -51,6 +52,7 @@ class _AliasMatcher:
|
||||
class _EndpointPathExtractor:
|
||||
_PATH_RE = re.compile(r"`([^`]+)`|(/[A-Za-z0-9_./{}-]+)")
|
||||
_VALID_ENDPOINT_RE = re.compile(r"^/[a-z0-9._/-]+(?:/\{[a-z0-9_]+\})?$")
|
||||
_DOC_EXTENSIONS = (".md", ".yaml", ".yml", ".json")
|
||||
|
||||
def extract(self, query: str) -> list[str]:
|
||||
values: list[str] = []
|
||||
@@ -68,28 +70,161 @@ class _EndpointPathExtractor:
|
||||
return trimmed.lower()
|
||||
|
||||
def _is_endpoint(self, token: str) -> bool:
|
||||
return bool(token and self._VALID_ENDPOINT_RE.fullmatch(token))
|
||||
if not token or not self._VALID_ENDPOINT_RE.fullmatch(token):
|
||||
return False
|
||||
return not token.endswith(self._DOC_EXTENSIONS)
|
||||
|
||||
def _append_unique(self, items: list[str], value: str) -> None:
|
||||
if value and value not in items:
|
||||
items.append(value)
|
||||
|
||||
|
||||
@dataclass(slots=True)
|
||||
class _ApiLikeAnchorAnalysis:
|
||||
endpoint_paths: list[str]
|
||||
candidate_terms: list[str]
|
||||
|
||||
|
||||
class _ApiLikeAnchorExtractor:
|
||||
_TOKEN_RE = re.compile(r"[A-Za-zА-Яа-я0-9_./{}-]+")
|
||||
_ASCII_ENDPOINT_RE = re.compile(r"^[a-z0-9]+(?:[-_][a-z0-9]+)*$")
|
||||
_API_MARKERS = {
|
||||
"api",
|
||||
"endpoint",
|
||||
"route",
|
||||
"method",
|
||||
"метод",
|
||||
"метода",
|
||||
"методу",
|
||||
"ручка",
|
||||
"ручки",
|
||||
"эндпоинт",
|
||||
"эндпоинта",
|
||||
"маршрут",
|
||||
"роут",
|
||||
}
|
||||
_EXPLAIN_MARKERS = {
|
||||
"как",
|
||||
"что",
|
||||
"делает",
|
||||
"работает",
|
||||
"объясни",
|
||||
"объяснить",
|
||||
"расскажи",
|
||||
"опиши",
|
||||
"смысл",
|
||||
}
|
||||
_NOISE_WORDS = _API_MARKERS | _EXPLAIN_MARKERS | {
|
||||
"про",
|
||||
"какой",
|
||||
"какая",
|
||||
"какие",
|
||||
"какого",
|
||||
"какую",
|
||||
"кратко",
|
||||
"нужен",
|
||||
"нужно",
|
||||
"у",
|
||||
}
|
||||
_SHORT_QUERY_TOKEN_LIMIT = 7
|
||||
|
||||
def extract(self, query: str, explicit_endpoint_paths: list[str]) -> _ApiLikeAnchorAnalysis:
|
||||
if explicit_endpoint_paths:
|
||||
return _ApiLikeAnchorAnalysis(endpoint_paths=list(explicit_endpoint_paths), candidate_terms=[])
|
||||
token_entries = self._token_entries(query)
|
||||
if not token_entries:
|
||||
return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=[])
|
||||
candidate_terms = [token for token, _start in token_entries if self._is_api_candidate(token)]
|
||||
if not candidate_terms:
|
||||
return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=[])
|
||||
if self._has_api_marker(token_entries):
|
||||
primary = self._primary_candidate(token_entries)
|
||||
endpoint_paths = [self._ensure_endpoint(primary)] if primary else []
|
||||
return _ApiLikeAnchorAnalysis(
|
||||
endpoint_paths=[path for path in endpoint_paths if path],
|
||||
candidate_terms=[primary] if primary else [],
|
||||
)
|
||||
if self._is_short_explain_query(token_entries) and len(candidate_terms) == 1:
|
||||
return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=list(candidate_terms))
|
||||
return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=[])
|
||||
|
||||
def _token_entries(self, query: str) -> list[tuple[str, int]]:
|
||||
entries: list[tuple[str, int]] = []
|
||||
for match in self._TOKEN_RE.finditer(query):
|
||||
token = str(match.group(0) or "").strip().strip("`'\"()[]!?.,:;").lower()
|
||||
if token:
|
||||
entries.append((token, match.start()))
|
||||
return entries
|
||||
|
||||
def _has_api_marker(self, token_entries: list[tuple[str, int]]) -> bool:
|
||||
return any(token in self._API_MARKERS for token, _start in token_entries)
|
||||
|
||||
def _is_short_explain_query(self, token_entries: list[tuple[str, int]]) -> bool:
|
||||
if len(token_entries) > self._SHORT_QUERY_TOKEN_LIMIT:
|
||||
return False
|
||||
return any(token in self._EXPLAIN_MARKERS for token, _start in token_entries)
|
||||
|
||||
def _primary_candidate(self, token_entries: list[tuple[str, int]]) -> str | None:
|
||||
marker_positions = [start for token, start in token_entries if token in self._API_MARKERS]
|
||||
candidates = [(token, start) for token, start in token_entries if self._is_api_candidate(token)]
|
||||
if not candidates:
|
||||
return None
|
||||
if not marker_positions:
|
||||
return candidates[-1][0]
|
||||
primary = min(
|
||||
candidates,
|
||||
key=lambda item: min(abs(item[1] - marker_pos) for marker_pos in marker_positions),
|
||||
)
|
||||
return primary[0]
|
||||
|
||||
def _is_api_candidate(self, token: str) -> bool:
|
||||
if (
|
||||
not token
|
||||
or token in self._NOISE_WORDS
|
||||
or token.startswith("docs/")
|
||||
or token.endswith((".md", ".yaml", ".yml", ".json"))
|
||||
):
|
||||
return False
|
||||
if token.startswith("/"):
|
||||
return True
|
||||
return self._ASCII_ENDPOINT_RE.fullmatch(token) is not None and len(token) >= 3
|
||||
|
||||
def _ensure_endpoint(self, token: str) -> str:
|
||||
return token if token.startswith("/") else f"/{token}"
|
||||
|
||||
|
||||
class _TermCollector:
|
||||
_TOKEN_RE = re.compile(r"[A-Za-zА-Яа-я0-9_./{}-]+")
|
||||
_IDENTIFIER_RE = re.compile(
|
||||
r"^(?:[a-z0-9]+(?:[_-][a-z0-9]+)+|[a-z]+[A-Z][A-Za-z0-9]+|(?:[A-Z][a-z0-9]+){2,})$"
|
||||
)
|
||||
_QUESTION_WORDS = {"что", "как", "где", "какой", "какие", "каком", "когда", "чего"}
|
||||
_INTENT_WORDS = {"объясни", "покажи", "найди", "расскажи", "дай", "опиши", "нужен"}
|
||||
_FILLER_WORDS = {"про", "там", "тут", "плз"}
|
||||
_INTENT_WORDS = {"объясни", "покажи", "найди", "расскажи", "дай", "опиши", "нужен", "show"}
|
||||
_FILLER_WORDS = {"про", "там", "тут", "плз", "pls", "for"}
|
||||
_MARKER_WORDS = {
|
||||
"файл",
|
||||
"файле",
|
||||
"file",
|
||||
"method",
|
||||
"метод",
|
||||
"метода",
|
||||
"методу",
|
||||
"route",
|
||||
"ручка",
|
||||
"ручки",
|
||||
"эндпоинт",
|
||||
"эндпоинта",
|
||||
"overview",
|
||||
"architecture",
|
||||
"arch",
|
||||
"flow",
|
||||
"process",
|
||||
"workflow",
|
||||
"док",
|
||||
"дока",
|
||||
"доках",
|
||||
"документ",
|
||||
"doc",
|
||||
"описан",
|
||||
"док-саммари",
|
||||
"summary",
|
||||
@@ -115,6 +250,7 @@ class _TermCollector:
|
||||
"service",
|
||||
"summary",
|
||||
"endpoint",
|
||||
"docs",
|
||||
}
|
||||
_MAX_TERMS = 7
|
||||
|
||||
@@ -191,19 +327,23 @@ class V2TargetTermsExtractor:
|
||||
self,
|
||||
alias_matcher: _AliasMatcher | None = None,
|
||||
endpoint_extractor: _EndpointPathExtractor | None = None,
|
||||
api_like_extractor: _ApiLikeAnchorExtractor | None = None,
|
||||
term_collector: _TermCollector | None = None,
|
||||
) -> None:
|
||||
self._alias_matcher = alias_matcher or _AliasMatcher()
|
||||
self._endpoint_extractor = endpoint_extractor or _EndpointPathExtractor()
|
||||
self._api_like_extractor = api_like_extractor or _ApiLikeAnchorExtractor()
|
||||
self._term_collector = term_collector or _TermCollector()
|
||||
|
||||
def extract(self, normalized_query: str) -> TargetTermsAnalysis:
|
||||
lowered = normalized_query.lower()
|
||||
endpoint_paths = self._endpoint_extractor.extract(normalized_query)
|
||||
api_like = self._api_like_extractor.extract(normalized_query, endpoint_paths)
|
||||
alias_terms, alias_docs, alias_hits = self._alias_matcher.match(lowered)
|
||||
return TargetTermsAnalysis(
|
||||
target_terms=self._term_collector.collect(normalized_query, alias_terms, endpoint_paths),
|
||||
endpoint_paths=endpoint_paths,
|
||||
target_terms=self._term_collector.collect(normalized_query, alias_terms, api_like.endpoint_paths),
|
||||
endpoint_paths=api_like.endpoint_paths,
|
||||
api_like_terms=api_like.candidate_terms,
|
||||
matched_aliases=alias_hits,
|
||||
alias_docs=alias_docs,
|
||||
)
|
||||
|
||||
@@ -44,6 +44,7 @@ class V2IntentRouter:
|
||||
normalized_query=normalized_query,
|
||||
target_terms=list(target_terms_analysis.target_terms),
|
||||
endpoint_paths=list(target_terms_analysis.endpoint_paths),
|
||||
file_names=list(anchor_analysis.anchors.file_names),
|
||||
matched_aliases=list(target_terms_analysis.matched_aliases),
|
||||
target_doc_hints=list(anchor_analysis.anchors.target_doc_hints),
|
||||
file_markers=list(anchor_analysis.file_markers),
|
||||
@@ -58,6 +59,7 @@ class V2IntentRouter:
|
||||
anchors=anchor_analysis.anchors,
|
||||
)
|
||||
llm_result = self._validator.validate(llm_candidate)
|
||||
llm_result = self._apply_deterministic_corrections(llm_result, features)
|
||||
if llm_result is not None:
|
||||
confidence = self._confidence_adjuster.adjust(float(llm_result["confidence"]), features)
|
||||
return V2RouteResult(
|
||||
@@ -99,3 +101,18 @@ class V2IntentRouter:
|
||||
)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
def _apply_deterministic_corrections(self, candidate: dict | None, features: QueryFeatures) -> dict | None:
|
||||
if candidate is None:
|
||||
return None
|
||||
if candidate.get("routing_domain") == "DOCS" and self._should_force_find_files(features):
|
||||
corrected = dict(candidate)
|
||||
corrected["subintent"] = "FIND_FILES"
|
||||
return corrected
|
||||
return candidate
|
||||
|
||||
def _should_force_find_files(self, features: QueryFeatures) -> bool:
|
||||
if features.file_markers or features.file_names:
|
||||
return True
|
||||
query = features.normalized_query.lower()
|
||||
return "show doc" in query or "show file" in query or "doc for" in query
|
||||
|
||||
@@ -6,7 +6,7 @@ from app.core.agent.processes.v2.models import V2Subintent
|
||||
|
||||
class DocsSubintentResolver:
|
||||
def resolve(self, features: QueryFeatures) -> str | None:
|
||||
if features.file_markers:
|
||||
if features.file_markers or self._has_file_like_anchor(features):
|
||||
return V2Subintent.FIND_FILES
|
||||
if any(
|
||||
(
|
||||
@@ -20,3 +20,9 @@ class DocsSubintentResolver:
|
||||
):
|
||||
return V2Subintent.SUMMARY
|
||||
return None
|
||||
|
||||
def _has_file_like_anchor(self, features: QueryFeatures) -> bool:
|
||||
return any(
|
||||
hint.endswith((".md", ".yaml", ".yml", ".json"))
|
||||
for hint in features.target_doc_hints
|
||||
) or any(token.endswith((".md", ".yaml", ".yml", ".json")) for token in features.file_names)
|
||||
|
||||
@@ -14,7 +14,6 @@ from app.core.agent.processes.v2.retrieval.target_doc_seeding import (
|
||||
merge_row_lists,
|
||||
normalize_doc_path,
|
||||
normalized_path_set,
|
||||
path_variants_for_rag_query,
|
||||
row_path,
|
||||
seed_candidates_from_target_hints,
|
||||
)
|
||||
@@ -121,11 +120,9 @@ class V2Process(AgentProcess):
|
||||
"retrieval_profile_selected",
|
||||
{"profile": plan.profile, "layers": plan.layers, "filters": plan.filters},
|
||||
)
|
||||
seeded_rows = await self._seed_candidates_from_target_hints(rag_session_id, plan.layers, route)
|
||||
semantic_rows = await self._rag_adapter.fetch_rows(rag_session_id, route.normalized_query, plan)
|
||||
metadata_rows = self._metadata_lookup_candidates([*seeded_rows, *semantic_rows], route)
|
||||
rows = self._merge_candidate_rows(seeded_rows, metadata_rows, semantic_rows)
|
||||
rows = await self._ensure_target_hints_in_pool(rag_session_id, rows, route)
|
||||
retrieved_rows = await self._rag_adapter.fetch_rows(rag_session_id, route.normalized_query, plan)
|
||||
metadata_rows = self._metadata_lookup_candidates(retrieved_rows, route)
|
||||
rows = self._merge_candidate_rows(retrieved_rows, metadata_rows)
|
||||
rows = seed_candidates_from_target_hints(rows, route.anchors.target_doc_hints, RagRowIndex(rows))
|
||||
self._print_missing_target_hints(route, rows)
|
||||
context.trace.module("process.v2.rag_retrieval").log(
|
||||
@@ -150,9 +147,9 @@ class V2Process(AgentProcess):
|
||||
"target_doc_hints": route.anchors.target_doc_hints,
|
||||
"candidate_docs_before_ranking": [self._trace_row(row) for row in rows[:8]],
|
||||
"sources": {
|
||||
"seeded": [self._trace_row(row) for row in seeded_rows[:5]],
|
||||
"seeded": [self._trace_row(row) for row in retrieved_rows[:5] if row_path(row) in {normalize_doc_path(h) for h in route.anchors.target_doc_hints}],
|
||||
"metadata_lookup": [self._trace_row(row) for row in metadata_rows[:5]],
|
||||
"semantic": [self._trace_row(row) for row in semantic_rows[:5]],
|
||||
"semantic": [self._trace_row(row) for row in retrieved_rows[:5]],
|
||||
},
|
||||
},
|
||||
)
|
||||
@@ -262,61 +259,11 @@ class V2Process(AgentProcess):
|
||||
if not str(hint or "").strip():
|
||||
continue
|
||||
normalized = normalize_doc_path(hint)
|
||||
if not normalized.startswith("docs/") or "." not in normalized.rsplit("/", 1)[-1]:
|
||||
continue
|
||||
if normalized not in candidate_paths:
|
||||
print("ERROR: target doc missing from candidates:", normalized)
|
||||
|
||||
async def _ensure_target_hints_in_pool(self, rag_session_id: str, rows: list[dict], route) -> list[dict]:
|
||||
hints_raw = [str(item).strip() for item in route.anchors.target_doc_hints if str(item or "").strip()]
|
||||
if not hints_raw:
|
||||
return rows
|
||||
pool = normalized_path_set(rows)
|
||||
missing_hints = [h for h in hints_raw if normalize_doc_path(h) not in pool]
|
||||
if not missing_hints:
|
||||
return rows
|
||||
variant_paths: list[str] = []
|
||||
for h in missing_hints:
|
||||
variant_paths.extend(path_variants_for_rag_query(h))
|
||||
variant_paths = list(dict.fromkeys(variant_paths))
|
||||
extra_exact = await self._rag_adapter.fetch_exact_paths(rag_session_id, paths=variant_paths, layers=None)
|
||||
pool2 = normalized_path_set(extra_exact)
|
||||
still_missing = [h for h in missing_hints if normalize_doc_path(h) not in pool2]
|
||||
fallback_rows: list[dict] = []
|
||||
if still_missing:
|
||||
needles = [normalize_doc_path(h).split("/")[-1] for h in still_missing]
|
||||
needles = list(dict.fromkeys(n for n in needles if n))
|
||||
if needles:
|
||||
fallback_rows = await self._rag_adapter.fetch_chunks_by_path_substrings(
|
||||
rag_session_id,
|
||||
path_needles=needles,
|
||||
layers=None,
|
||||
)
|
||||
return merge_row_lists(rows, extra_exact, fallback_rows)
|
||||
|
||||
async def _seed_candidates_from_target_hints(self, rag_session_id: str, layers: list[str], route) -> list[dict]:
|
||||
del layers # seed по пути должен видеть все слои (иначе D0-only чанки теряются при file_lookup).
|
||||
hints_raw = [str(item).strip() for item in route.anchors.target_doc_hints if str(item or "").strip()]
|
||||
if not hints_raw:
|
||||
return []
|
||||
variant_paths: list[str] = []
|
||||
for h in hints_raw:
|
||||
variant_paths.extend(path_variants_for_rag_query(h))
|
||||
variant_paths = list(dict.fromkeys(variant_paths))
|
||||
exact_rows = await self._rag_adapter.fetch_exact_paths(rag_session_id, paths=variant_paths, layers=None)
|
||||
paths_found = normalized_path_set(exact_rows)
|
||||
missing = [h for h in hints_raw if normalize_doc_path(h) not in paths_found]
|
||||
if not missing:
|
||||
return exact_rows
|
||||
needles = [normalize_doc_path(h).split("/")[-1] for h in missing]
|
||||
needles = list(dict.fromkeys(n for n in needles if n))
|
||||
if not needles:
|
||||
return exact_rows
|
||||
fallback_rows = await self._rag_adapter.fetch_chunks_by_path_substrings(
|
||||
rag_session_id,
|
||||
path_needles=needles,
|
||||
layers=None,
|
||||
)
|
||||
return merge_row_lists(exact_rows, fallback_rows)
|
||||
|
||||
def _metadata_lookup_candidates(self, rows: list[dict], route) -> list[dict]:
|
||||
return DocsMetadataLookupIndex(rows).lookup(route)
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
"""Intent-aware retrieval policy resolver для процесса v2."""
|
||||
"""Intent-aware retrieval policy resolver for process v2."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -8,91 +8,113 @@ from app.core.rag.contracts.enums import RagLayer
|
||||
from app.core.rag.retrieval.session_retriever import RetrievalPlan
|
||||
|
||||
|
||||
class V2RetrievalPolicyResolver:
|
||||
_SUMMARY_LAYERS = [
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_ENTITY_CATALOG,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
]
|
||||
_GENERAL_LAYERS = [
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
class _AnchorTermCollector:
|
||||
def prefer_like_patterns(self, route: V2RouteResult) -> list[str]:
|
||||
terms = self._hint_basenames(route)
|
||||
terms.extend(route.anchors.endpoint_paths)
|
||||
terms.extend(route.target_terms)
|
||||
terms.extend(route.anchors.file_names)
|
||||
terms.extend(route.anchors.entity_names)
|
||||
terms.extend(route.anchors.matched_aliases)
|
||||
terms.extend(self._process_terms(route))
|
||||
return [f"%{term.lower()}%" for term in _unique_terms(terms)]
|
||||
|
||||
def find_files_patterns(self, route: V2RouteResult) -> list[str]:
|
||||
if route.anchors.target_doc_hints:
|
||||
return [f"%{name.lower()}%" for name in self._hint_basenames(route)]
|
||||
return self.prefer_like_patterns(route)
|
||||
|
||||
def api_method_patterns(self, route: V2RouteResult) -> list[str]:
|
||||
terms = self._hint_basenames(route)
|
||||
terms.extend(route.anchors.target_doc_hints)
|
||||
terms.extend(route.anchors.endpoint_paths)
|
||||
terms.extend(route.target_terms)
|
||||
patterns: list[str] = []
|
||||
for term in _unique_terms(terms):
|
||||
lowered = term.lower()
|
||||
stripped = lowered.strip("/")
|
||||
if stripped:
|
||||
patterns.append(f"%{stripped}%")
|
||||
if lowered:
|
||||
patterns.append(f"%{lowered}%")
|
||||
return _unique_terms(patterns)
|
||||
|
||||
def _hint_basenames(self, route: V2RouteResult) -> list[str]:
|
||||
return [hint.rsplit("/", 1)[-1] for hint in route.anchors.target_doc_hints if str(hint).strip()]
|
||||
|
||||
def _process_terms(self, route: V2RouteResult) -> list[str]:
|
||||
terms: list[str] = []
|
||||
if route.anchors.process_domain:
|
||||
terms.append(route.anchors.process_domain)
|
||||
if route.anchors.process_subdomain:
|
||||
terms.append(route.anchors.process_subdomain)
|
||||
return terms
|
||||
|
||||
|
||||
class _RouteFilterBuilder:
|
||||
_API_DOC_PREFIXES = [
|
||||
"docs/api/",
|
||||
"docs/endpoints/",
|
||||
"docs/methods/",
|
||||
"api/",
|
||||
"endpoints/",
|
||||
"methods/",
|
||||
]
|
||||
|
||||
def resolve(self, route: V2RouteResult) -> RetrievalPlan:
|
||||
if route.intent == V2Intent.GENERAL_QA:
|
||||
return RetrievalPlan(
|
||||
profile="general_qa_grounded_summary",
|
||||
layers=list(self._GENERAL_LAYERS),
|
||||
limit=8,
|
||||
filters=self._general_filters(route),
|
||||
)
|
||||
if route.subintent == V2Subintent.FIND_FILES:
|
||||
return RetrievalPlan(
|
||||
profile="file_lookup",
|
||||
layers=[RagLayer.DOCS_DOCUMENT_CATALOG, RagLayer.DOCS_ENTITY_CATALOG],
|
||||
limit=12,
|
||||
filters=self._find_files_filters(route),
|
||||
)
|
||||
return RetrievalPlan(
|
||||
profile=self._summary_profile(route),
|
||||
layers=list(self._SUMMARY_LAYERS),
|
||||
limit=8,
|
||||
filters=self._summary_filters(route),
|
||||
)
|
||||
def __init__(self) -> None:
|
||||
self._terms = _AnchorTermCollector()
|
||||
|
||||
def _summary_profile(self, route: V2RouteResult) -> str:
|
||||
signals = anchor_signal_types(route)
|
||||
if len(signals - {V2AnchorType.FIND_FILES}) != 1:
|
||||
return "docs_summary_generic"
|
||||
mapping = {
|
||||
V2AnchorType.API_ENDPOINT: "docs_summary_api_endpoint",
|
||||
V2AnchorType.ARCHITECTURE: "docs_summary_architecture",
|
||||
V2AnchorType.LOGIC_FLOW: "docs_summary_logic_flow",
|
||||
V2AnchorType.DOMAIN_ENTITY: "docs_summary_domain_entity",
|
||||
}
|
||||
signal = next(iter(signals - {V2AnchorType.FIND_FILES}), None)
|
||||
return mapping.get(signal, "docs_summary_generic")
|
||||
|
||||
def _general_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
def general_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
return {
|
||||
"prefer_path_prefixes": ["docs/architecture/", "docs/"],
|
||||
"prefer_like_patterns": ["%README.md%", "%overview%"],
|
||||
"prefer_like_patterns": ["%readme.md%", "%overview%"],
|
||||
"target_doc_hints": list(route.anchors.target_doc_hints),
|
||||
}
|
||||
|
||||
def _summary_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
filters: dict[str, object] = {
|
||||
"prefer_path_prefixes": self._summary_prefixes(route),
|
||||
"prefer_like_patterns": self._prefer_like_patterns(route),
|
||||
"target_doc_hints": list(route.anchors.target_doc_hints),
|
||||
}
|
||||
def summary_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
if _is_api_method_explain(route):
|
||||
return self.api_method_filters(route)
|
||||
filters = self._base_filters(route)
|
||||
filters["prefer_path_prefixes"] = self._summary_prefixes(route)
|
||||
filters["prefer_like_patterns"] = self._terms.prefer_like_patterns(route)
|
||||
if V2AnchorType.API_ENDPOINT in anchor_signal_types(route):
|
||||
filters["path_prefixes"] = ["docs/api/", "docs/architecture/", "docs/"]
|
||||
filters["path_prefixes"] = ["docs/api/", "docs/"]
|
||||
return filters
|
||||
|
||||
def _find_files_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
def api_method_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
filters = self._base_filters(route)
|
||||
filters["path_prefixes"] = list(self._API_DOC_PREFIXES)
|
||||
filters["prefer_path_prefixes"] = list(self._API_DOC_PREFIXES)
|
||||
filters["prefer_like_patterns"] = self._terms.api_method_patterns(route)
|
||||
return filters
|
||||
|
||||
def find_files_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
filters = self._base_filters(route)
|
||||
prefixes = self._find_files_prefixes(route)
|
||||
if prefixes:
|
||||
filters["path_prefixes"] = prefixes
|
||||
filters["prefer_path_prefixes"] = self._find_files_prefer_prefixes(route, prefixes)
|
||||
filters["prefer_like_patterns"] = self._terms.find_files_patterns(route)
|
||||
return filters
|
||||
|
||||
def _base_filters(self, route: V2RouteResult) -> dict[str, object]:
|
||||
filters: dict[str, object] = {
|
||||
"prefer_path_prefixes": self._find_files_prefixes(route),
|
||||
"prefer_like_patterns": self._prefer_like_patterns(route),
|
||||
"target_doc_hints": list(route.anchors.target_doc_hints),
|
||||
}
|
||||
if route.anchors.target_doc_hints:
|
||||
filters["prefer_like_patterns"] = [f"%{path.split('/')[-1]}%" for path in route.anchors.target_doc_hints]
|
||||
if route.anchors.process_domain:
|
||||
filters["metadata.domain"] = route.anchors.process_domain
|
||||
if route.anchors.process_subdomain:
|
||||
filters["metadata.subdomain"] = route.anchors.process_subdomain
|
||||
return filters
|
||||
|
||||
def _prefer_like_patterns(self, route: V2RouteResult) -> list[str]:
|
||||
patterns: list[str] = []
|
||||
for path in route.anchors.target_doc_hints:
|
||||
patterns.append(f"%{path.split('/')[-1]}%")
|
||||
for endpoint in route.anchors.endpoint_paths:
|
||||
patterns.append(f"%{endpoint}%")
|
||||
return patterns
|
||||
|
||||
def _find_files_prefixes(self, route: V2RouteResult) -> list[str]:
|
||||
if route.anchors.target_doc_hints:
|
||||
prefixes = ["/".join(path.split("/")[:-1]) + "/" for path in route.anchors.target_doc_hints]
|
||||
return [prefix for prefix in prefixes if prefix]
|
||||
hint_prefixes = _prefixes_from_paths(route.anchors.target_doc_hints)
|
||||
if hint_prefixes:
|
||||
return hint_prefixes
|
||||
file_prefixes = [name for name in route.anchors.file_names if str(name).strip().startswith("docs/")]
|
||||
derived = _prefixes_from_paths(file_prefixes)
|
||||
if derived:
|
||||
return derived
|
||||
signals = anchor_signal_types(route)
|
||||
if V2AnchorType.API_ENDPOINT in signals:
|
||||
return ["docs/api/", "docs/"]
|
||||
@@ -104,6 +126,12 @@ class V2RetrievalPolicyResolver:
|
||||
return ["docs/domains/", "docs/"]
|
||||
return ["docs/"]
|
||||
|
||||
def _find_files_prefer_prefixes(self, route: V2RouteResult, prefixes: list[str]) -> list[str]:
|
||||
preferred = list(prefixes)
|
||||
if route.anchors.process_domain or route.anchors.process_subdomain:
|
||||
preferred.extend(["docs/domains/", "docs/logic/"])
|
||||
return _unique_terms(preferred or ["docs/"])
|
||||
|
||||
def _summary_prefixes(self, route: V2RouteResult) -> list[str]:
|
||||
signals = anchor_signal_types(route)
|
||||
prefixes: list[str] = []
|
||||
@@ -114,5 +142,129 @@ class V2RetrievalPolicyResolver:
|
||||
if V2AnchorType.LOGIC_FLOW in signals:
|
||||
prefixes.extend(["docs/logic/", "docs/architecture/", "docs/"])
|
||||
if V2AnchorType.DOMAIN_ENTITY in signals:
|
||||
prefixes.extend(["docs/domains/", "docs/api/", "docs/architecture/"])
|
||||
return list(dict.fromkeys(prefixes or ["docs/"]))
|
||||
prefixes.extend(["docs/domains/", "docs/", "docs/api/"])
|
||||
return _unique_terms(prefixes or ["docs/"])
|
||||
|
||||
|
||||
class V2RetrievalPolicyResolver:
|
||||
_GENERAL_LAYERS = [RagLayer.DOCS_DOCUMENT_CATALOG, RagLayer.DOCS_DOC_CHUNKS]
|
||||
_FIND_FILES_LAYERS = [RagLayer.DOCS_DOCUMENT_CATALOG, RagLayer.DOCS_ENTITY_CATALOG]
|
||||
_SUMMARY_LAYERS = {
|
||||
"docs_api_method_explain": [
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_FACT_INDEX,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
],
|
||||
"docs_summary_api_endpoint": [
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_FACT_INDEX,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
],
|
||||
"docs_summary_logic_flow": [
|
||||
RagLayer.DOCS_WORKFLOW_INDEX,
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
],
|
||||
"docs_summary_domain_entity": [
|
||||
RagLayer.DOCS_ENTITY_CATALOG,
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
],
|
||||
"docs_summary_architecture": [
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_RELATION_GRAPH,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
],
|
||||
"docs_summary_generic": [
|
||||
RagLayer.DOCS_DOCUMENT_CATALOG,
|
||||
RagLayer.DOCS_DOC_CHUNKS,
|
||||
],
|
||||
}
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._filters = _RouteFilterBuilder()
|
||||
|
||||
def resolve(self, route: V2RouteResult) -> RetrievalPlan:
|
||||
if route.intent == V2Intent.GENERAL_QA:
|
||||
return RetrievalPlan(
|
||||
profile="general_qa_grounded_summary",
|
||||
layers=list(self._GENERAL_LAYERS),
|
||||
limit=8,
|
||||
filters=self._filters.general_filters(route),
|
||||
)
|
||||
if route.subintent == V2Subintent.FIND_FILES:
|
||||
return RetrievalPlan(
|
||||
profile="file_lookup",
|
||||
layers=list(self._FIND_FILES_LAYERS),
|
||||
limit=12,
|
||||
filters=self._filters.find_files_filters(route),
|
||||
)
|
||||
profile = self._summary_profile(route)
|
||||
return RetrievalPlan(
|
||||
profile=profile,
|
||||
layers=list(self._SUMMARY_LAYERS[profile]),
|
||||
limit=10 if profile == "docs_api_method_explain" else 8,
|
||||
filters=self._filters.summary_filters(route),
|
||||
)
|
||||
|
||||
def _summary_profile(self, route: V2RouteResult) -> str:
|
||||
if _is_api_method_explain(route):
|
||||
return "docs_api_method_explain"
|
||||
meaningful = anchor_signal_types(route) - {V2AnchorType.FIND_FILES}
|
||||
if len(meaningful) != 1:
|
||||
return "docs_summary_generic"
|
||||
mapping = {
|
||||
V2AnchorType.API_ENDPOINT: "docs_summary_api_endpoint",
|
||||
V2AnchorType.ARCHITECTURE: "docs_summary_architecture",
|
||||
V2AnchorType.LOGIC_FLOW: "docs_summary_logic_flow",
|
||||
V2AnchorType.DOMAIN_ENTITY: "docs_summary_domain_entity",
|
||||
}
|
||||
return mapping.get(next(iter(meaningful)), "docs_summary_generic")
|
||||
|
||||
|
||||
def _prefixes_from_paths(paths: list[str]) -> list[str]:
|
||||
prefixes = []
|
||||
for path in paths:
|
||||
value = str(path).strip().strip("/")
|
||||
if "/" not in value:
|
||||
continue
|
||||
prefix = value.rsplit("/", 1)[0] + "/"
|
||||
if prefix:
|
||||
prefixes.append(prefix)
|
||||
return _unique_terms(prefixes)
|
||||
|
||||
|
||||
def _unique_terms(items: list[str]) -> list[str]:
|
||||
seen: set[str] = set()
|
||||
unique: list[str] = []
|
||||
for raw in items:
|
||||
value = str(raw or "").strip()
|
||||
if not value or value in seen:
|
||||
continue
|
||||
seen.add(value)
|
||||
unique.append(value)
|
||||
return unique
|
||||
|
||||
|
||||
def _is_api_method_explain(route: V2RouteResult) -> bool:
|
||||
if route.subintent != V2Subintent.SUMMARY:
|
||||
return False
|
||||
if route.anchors.endpoint_paths:
|
||||
return True
|
||||
if _has_api_like_hints(route.anchors.target_doc_hints):
|
||||
return True
|
||||
return V2AnchorType.API_ENDPOINT in anchor_signal_types(route)
|
||||
|
||||
|
||||
def _has_api_like_hints(hints: list[str]) -> bool:
|
||||
for hint in hints:
|
||||
value = str(hint or "").strip().lower()
|
||||
if not value:
|
||||
continue
|
||||
if value.startswith("/"):
|
||||
return True
|
||||
if value.startswith(("docs/api/", "docs/endpoints/", "docs/methods/")):
|
||||
return True
|
||||
if "endpoint" in value or "method" in value:
|
||||
return True
|
||||
return False
|
||||
|
||||
@@ -1,18 +1,23 @@
|
||||
"""Адаптер v2 к :class:`RagSessionRetriever` для подстановки в тестах."""
|
||||
"""Адаптер v2 к :class:`RagSessionRetriever` с plan-driven execution strategy."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from app.core.agent.processes.v2.retrieval.target_doc_seeding import (
|
||||
merge_row_lists,
|
||||
normalize_doc_path,
|
||||
path_variants_for_rag_query,
|
||||
)
|
||||
from app.core.rag.retrieval.session_retriever import RagSessionRetriever, RetrievalPlan
|
||||
|
||||
|
||||
class V2RagRetrievalAdapter:
|
||||
"""Обёртка над :class:`RagSessionRetriever` для подмены в тестах."""
|
||||
|
||||
class _PlanDrivenRetrieval:
|
||||
def __init__(self, retriever: RagSessionRetriever) -> None:
|
||||
self._retriever = retriever
|
||||
|
||||
async def fetch_rows(self, rag_session_id: str, query_text: str, plan: RetrievalPlan) -> list[dict]:
|
||||
return await self._retriever.retrieve(rag_session_id, query_text, plan)
|
||||
seeded_rows = await self._seed_from_target_hints(rag_session_id, plan)
|
||||
semantic_rows = await self._retriever.retrieve(rag_session_id, query_text, plan)
|
||||
return merge_row_lists(seeded_rows, semantic_rows)
|
||||
|
||||
async def fetch_exact_paths(self, rag_session_id: str, *, paths: list[str], layers: list[str] | None = None) -> list[dict]:
|
||||
return await self._retriever.retrieve_exact_files(rag_session_id, paths=paths, layers=layers)
|
||||
@@ -31,3 +36,73 @@ class V2RagRetrievalAdapter:
|
||||
layers=layers,
|
||||
limit=limit,
|
||||
)
|
||||
|
||||
async def _seed_from_target_hints(self, rag_session_id: str, plan: RetrievalPlan) -> list[dict]:
|
||||
hints = self._target_doc_hints(plan)
|
||||
if not hints:
|
||||
return []
|
||||
exact_rows = await self._fetch_exact_rows(rag_session_id, hints)
|
||||
missing = self._missing_hints(hints, exact_rows)
|
||||
if not missing:
|
||||
return exact_rows
|
||||
fallback_rows = await self._fetch_substring_rows(rag_session_id, missing)
|
||||
return merge_row_lists(exact_rows, fallback_rows)
|
||||
|
||||
async def _fetch_exact_rows(self, rag_session_id: str, hints: list[str]) -> list[dict]:
|
||||
variant_paths: list[str] = []
|
||||
for hint in hints:
|
||||
variant_paths.extend(path_variants_for_rag_query(hint))
|
||||
unique_paths = list(dict.fromkeys(path for path in variant_paths if path))
|
||||
if not unique_paths:
|
||||
return []
|
||||
return await self._retriever.retrieve_exact_files(rag_session_id, paths=unique_paths, layers=None)
|
||||
|
||||
async def _fetch_substring_rows(self, rag_session_id: str, hints: list[str]) -> list[dict]:
|
||||
needles = [normalize_doc_path(hint).split("/")[-1] for hint in hints]
|
||||
unique_needles = list(dict.fromkeys(needle for needle in needles if needle))
|
||||
if not unique_needles:
|
||||
return []
|
||||
return await self._retriever.retrieve_chunks_by_path_substrings(
|
||||
rag_session_id,
|
||||
path_needles=unique_needles,
|
||||
layers=None,
|
||||
limit=200,
|
||||
)
|
||||
|
||||
def _target_doc_hints(self, plan: RetrievalPlan) -> list[str]:
|
||||
raw = plan.filters.get("target_doc_hints")
|
||||
if not isinstance(raw, list):
|
||||
return []
|
||||
return [str(item).strip() for item in raw if str(item or "").strip()]
|
||||
|
||||
def _missing_hints(self, hints: list[str], rows: list[dict]) -> list[str]:
|
||||
pool = {normalize_doc_path(str(row.get("path") or "")) for row in rows}
|
||||
return [hint for hint in hints if normalize_doc_path(hint) not in pool]
|
||||
|
||||
|
||||
class V2RagRetrievalAdapter:
|
||||
"""Обёртка над :class:`RagSessionRetriever` для plan-driven retrieval и подмены в тестах."""
|
||||
|
||||
def __init__(self, retriever: RagSessionRetriever) -> None:
|
||||
self._retriever = _PlanDrivenRetrieval(retriever)
|
||||
|
||||
async def fetch_rows(self, rag_session_id: str, query_text: str, plan: RetrievalPlan) -> list[dict]:
|
||||
return await self._retriever.fetch_rows(rag_session_id, query_text, plan)
|
||||
|
||||
async def fetch_exact_paths(self, rag_session_id: str, *, paths: list[str], layers: list[str] | None = None) -> list[dict]:
|
||||
return await self._retriever.fetch_exact_paths(rag_session_id, paths=paths, layers=layers)
|
||||
|
||||
async def fetch_chunks_by_path_substrings(
|
||||
self,
|
||||
rag_session_id: str,
|
||||
*,
|
||||
path_needles: list[str],
|
||||
layers: list[str] | None = None,
|
||||
limit: int = 200,
|
||||
) -> list[dict]:
|
||||
return await self._retriever.fetch_chunks_by_path_substrings(
|
||||
rag_session_id,
|
||||
path_needles=path_needles,
|
||||
layers=layers,
|
||||
limit=limit,
|
||||
)
|
||||
|
||||
@@ -1,20 +1,24 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
import yaml
|
||||
|
||||
from app.core.rag.indexing.docs.chunkers.markdown_chunker import SectionChunk
|
||||
from app.core.rag.indexing.docs.models import IntegrationRecord
|
||||
|
||||
LOGGER = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DocsIntegrationExtractor:
|
||||
_SECTION_TITLES = {"integrations", "интеграции"}
|
||||
|
||||
def extract(self, sections: list[SectionChunk]) -> list[IntegrationRecord]:
|
||||
def extract(self, sections: list[SectionChunk], *, path: str = "") -> list[IntegrationRecord]:
|
||||
records: list[IntegrationRecord] = []
|
||||
for section in sections:
|
||||
if not self._is_integration_section(section.section_path):
|
||||
continue
|
||||
payload = self._payload(section.content)
|
||||
payload = self._payload(section.content, path=path, section_path=section.section_path)
|
||||
target = str(payload.get("target") or "").strip()
|
||||
if not target:
|
||||
continue
|
||||
@@ -40,7 +44,7 @@ class DocsIntegrationExtractor:
|
||||
parts = [item.strip().lower() for item in section_path.split(" > ") if item.strip()]
|
||||
return any(part in self._SECTION_TITLES for part in parts[:-1]) or (parts and parts[-1] in self._SECTION_TITLES)
|
||||
|
||||
def _payload(self, text: str) -> dict:
|
||||
def _payload(self, text: str, *, path: str, section_path: str) -> dict:
|
||||
payload: dict = {}
|
||||
details_lines: list[str] = []
|
||||
collecting_details = False
|
||||
@@ -61,15 +65,27 @@ class DocsIntegrationExtractor:
|
||||
collecting_details = True
|
||||
details_lines = []
|
||||
if value:
|
||||
payload[key] = self._yaml_value(value)
|
||||
payload[key] = self._yaml_value(
|
||||
value,
|
||||
path=path,
|
||||
section_path=section_path,
|
||||
field_name=key,
|
||||
fallback="",
|
||||
)
|
||||
continue
|
||||
collecting_details = False
|
||||
payload[key] = self._yaml_value(value)
|
||||
payload[key] = self._yaml_value(
|
||||
value,
|
||||
path=path,
|
||||
section_path=section_path,
|
||||
field_name=key,
|
||||
fallback=value,
|
||||
)
|
||||
if details_lines:
|
||||
payload["details"] = self._details_payload(details_lines)
|
||||
payload["details"] = self._details_payload(details_lines, path=path, section_path=section_path)
|
||||
return payload
|
||||
|
||||
def _details_payload(self, lines: list[str]) -> dict:
|
||||
def _details_payload(self, lines: list[str], *, path: str, section_path: str) -> dict:
|
||||
normalized: list[str] = []
|
||||
for raw_line in lines:
|
||||
line = raw_line[2:] if raw_line.startswith(" ") else raw_line
|
||||
@@ -78,7 +94,13 @@ class DocsIntegrationExtractor:
|
||||
if indent == 0 and stripped.startswith("- "):
|
||||
stripped = stripped[2:]
|
||||
normalized.append((" " * indent) + stripped)
|
||||
payload = yaml.safe_load("\n".join(normalized)) or {}
|
||||
payload = self._yaml_value(
|
||||
"\n".join(normalized),
|
||||
path=path,
|
||||
section_path=section_path,
|
||||
field_name="details",
|
||||
fallback={},
|
||||
) or {}
|
||||
return payload if isinstance(payload, dict) else {}
|
||||
|
||||
def _split_key_value(self, text: str) -> tuple[str, str]:
|
||||
@@ -87,7 +109,17 @@ class DocsIntegrationExtractor:
|
||||
key, value = text.split(":", 1)
|
||||
return key.strip(), value.strip()
|
||||
|
||||
def _yaml_value(self, value: str):
|
||||
def _yaml_value(self, value: str, *, path: str, section_path: str, field_name: str, fallback):
|
||||
if not value:
|
||||
return ""
|
||||
return yaml.safe_load(value)
|
||||
try:
|
||||
return yaml.safe_load(value)
|
||||
except yaml.YAMLError as exc:
|
||||
LOGGER.warning(
|
||||
"docs integration parse warning: path=%s section=%s field=%s reason=%s",
|
||||
path or "<unknown>",
|
||||
section_path,
|
||||
field_name,
|
||||
exc.__class__.__name__,
|
||||
)
|
||||
return fallback
|
||||
|
||||
@@ -1,5 +1,8 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from collections.abc import Callable
|
||||
|
||||
from app.core.rag.contracts import RagDocument, RagSource
|
||||
from app.core.rag.indexing.docs.chunkers.markdown_chunker import MarkdownDocChunker
|
||||
from app.core.rag.indexing.docs.classifier import DocsClassifier
|
||||
@@ -15,6 +18,8 @@ from app.core.rag.indexing.docs.relation_extractor import DocsRelationExtractor
|
||||
from app.core.rag.indexing.docs.support_layer_builder import DocsSupportLayerBuilder
|
||||
from app.core.rag.indexing.docs.workflow_extractor import DocsWorkflowExtractor
|
||||
|
||||
LOGGER = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DocsIndexingPipeline:
|
||||
def __init__(self) -> None:
|
||||
@@ -59,7 +64,11 @@ class DocsIndexingPipeline:
|
||||
for section in sections:
|
||||
docs.append(self._builder.build_doc_chunk(source, section, parsed.frontmatter, doc_kind))
|
||||
document_id = frontmatter_view.document_id or source.path
|
||||
for fact in self._facts.extract(parsed.frontmatter, sections):
|
||||
for fact in self._safe_extract(
|
||||
extractor_name="fact_extractor",
|
||||
path=path,
|
||||
run=lambda: self._facts.extract(parsed.frontmatter, sections),
|
||||
):
|
||||
docs.append(
|
||||
self._support_builder.build_fact(
|
||||
source,
|
||||
@@ -72,13 +81,29 @@ class DocsIndexingPipeline:
|
||||
subdomain=frontmatter_view.subdomain,
|
||||
)
|
||||
)
|
||||
for entity in self._entities.extract(parsed.frontmatter):
|
||||
for entity in self._safe_extract(
|
||||
extractor_name="entity_extractor",
|
||||
path=path,
|
||||
run=lambda: self._entities.extract(parsed.frontmatter),
|
||||
):
|
||||
docs.append(self._builder.build_entity_record(source, parsed.frontmatter, entity))
|
||||
for workflow in self._workflows.extract(parsed.detail_sections):
|
||||
for workflow in self._safe_extract(
|
||||
extractor_name="workflow_extractor",
|
||||
path=path,
|
||||
run=lambda: self._workflows.extract(parsed.detail_sections),
|
||||
):
|
||||
docs.append(self._support_builder.build_workflow_record(source, parsed.frontmatter, workflow))
|
||||
for edge in self._relations.extract(parsed.frontmatter, source_id=document_id):
|
||||
for edge in self._safe_extract(
|
||||
extractor_name="relation_extractor",
|
||||
path=path,
|
||||
run=lambda: self._relations.extract(parsed.frontmatter, source_id=document_id),
|
||||
):
|
||||
docs.append(self._support_builder.build_relation_record(source, parsed.frontmatter, edge))
|
||||
for integration in self._integrations.extract(sections):
|
||||
for integration in self._safe_extract(
|
||||
extractor_name="integration_extractor",
|
||||
path=path,
|
||||
run=lambda: self._integrations.extract(sections, path=path),
|
||||
):
|
||||
docs.append(self._support_builder.build_integration_record(source, parsed.frontmatter, integration))
|
||||
return docs
|
||||
|
||||
@@ -86,3 +111,15 @@ class DocsIndexingPipeline:
|
||||
tail = path.rsplit("/", 1)[-1]
|
||||
stem = tail.rsplit(".", 1)[0]
|
||||
return stem.replace("-", " ").replace("_", " ").strip().title()
|
||||
|
||||
def _safe_extract(self, *, extractor_name: str, path: str, run: Callable[[], list]) -> list:
|
||||
try:
|
||||
return run()
|
||||
except Exception as exc:
|
||||
LOGGER.warning(
|
||||
"docs pipeline extractor warning: path=%s extractor=%s reason=%s",
|
||||
path,
|
||||
extractor_name,
|
||||
exc.__class__.__name__,
|
||||
)
|
||||
return []
|
||||
|
||||
@@ -25,6 +25,8 @@ class RagQueryRepository:
|
||||
exclude_like_patterns: list[str] | None = None,
|
||||
prefer_path_prefixes: list[str] | None = None,
|
||||
prefer_like_patterns: list[str] | None = None,
|
||||
metadata_domain: str | None = None,
|
||||
metadata_subdomain: str | None = None,
|
||||
prefer_non_tests: bool = False,
|
||||
) -> list[dict]:
|
||||
sql, params = self._builder.build_retrieve(
|
||||
@@ -38,6 +40,8 @@ class RagQueryRepository:
|
||||
exclude_like_patterns=exclude_like_patterns,
|
||||
prefer_path_prefixes=prefer_path_prefixes,
|
||||
prefer_like_patterns=prefer_like_patterns,
|
||||
metadata_domain=metadata_domain,
|
||||
metadata_subdomain=metadata_subdomain,
|
||||
prefer_non_tests=prefer_non_tests,
|
||||
)
|
||||
with get_engine().connect() as conn:
|
||||
@@ -234,6 +238,54 @@ class RagQueryRepository:
|
||||
rows = conn.execute(stmt, params).mappings().fetchall()
|
||||
return [self._row_to_dict(row) for row in rows]
|
||||
|
||||
def retrieve_chunks_by_path_substrings(
|
||||
self,
|
||||
rag_session_id: str,
|
||||
*,
|
||||
path_needles: list[str],
|
||||
layers: list[str] | None = None,
|
||||
limit: int = 200,
|
||||
) -> list[dict]:
|
||||
normalized_needles = [str(item).strip().lower() for item in path_needles if str(item).strip()]
|
||||
if not normalized_needles:
|
||||
return []
|
||||
params: dict = {
|
||||
"sid": rag_session_id,
|
||||
"lim": max(1, int(limit)),
|
||||
}
|
||||
filters = ["rag_session_id = :sid"]
|
||||
like_parts: list[str] = []
|
||||
for idx, needle in enumerate(normalized_needles):
|
||||
key = f"needle_{idx}"
|
||||
params[key] = f"%{needle}%"
|
||||
like_parts.append(f"lower(path) LIKE :{key}")
|
||||
filters.append("(" + " OR ".join(like_parts) + ")")
|
||||
if layers:
|
||||
normalized_layers = [str(item).strip() for item in layers if str(item).strip()]
|
||||
if normalized_layers:
|
||||
params["layers"] = normalized_layers
|
||||
filters.append("layer IN :layers")
|
||||
stmt = text(
|
||||
f"""
|
||||
SELECT path, content, layer, title, metadata_json, span_start, span_end,
|
||||
0 AS lexical_rank,
|
||||
0 AS prefer_bonus,
|
||||
0 AS test_penalty,
|
||||
0 AS structural_rank,
|
||||
0 AS layer_rank,
|
||||
0 AS distance
|
||||
FROM rag_chunks
|
||||
WHERE {' AND '.join(filters)}
|
||||
ORDER BY path ASC, COALESCE(span_start, 0) ASC, COALESCE(chunk_index, 0) ASC
|
||||
LIMIT :lim
|
||||
"""
|
||||
)
|
||||
if "layers" in params:
|
||||
stmt = stmt.bindparams(bindparam("layers", expanding=True))
|
||||
with get_engine().connect() as conn:
|
||||
rows = conn.execute(stmt, params).mappings().fetchall()
|
||||
return [self._row_to_dict(row) for row in rows]
|
||||
|
||||
def _row_to_dict(self, row) -> dict:
|
||||
data = dict(row)
|
||||
raw_metadata = data.pop("metadata_json")
|
||||
|
||||
@@ -69,6 +69,8 @@ class RagRepository:
|
||||
exclude_like_patterns: list[str] | None = None,
|
||||
prefer_path_prefixes: list[str] | None = None,
|
||||
prefer_like_patterns: list[str] | None = None,
|
||||
metadata_domain: str | None = None,
|
||||
metadata_subdomain: str | None = None,
|
||||
prefer_non_tests: bool = False,
|
||||
) -> list[dict]:
|
||||
return self._query.retrieve(
|
||||
@@ -82,6 +84,8 @@ class RagRepository:
|
||||
exclude_like_patterns=exclude_like_patterns,
|
||||
prefer_path_prefixes=prefer_path_prefixes,
|
||||
prefer_like_patterns=prefer_like_patterns,
|
||||
metadata_domain=metadata_domain,
|
||||
metadata_subdomain=metadata_subdomain,
|
||||
prefer_non_tests=prefer_non_tests,
|
||||
)
|
||||
|
||||
@@ -141,3 +145,18 @@ class RagRepository:
|
||||
layers=layers,
|
||||
limit=limit,
|
||||
)
|
||||
|
||||
def retrieve_chunks_by_path_substrings(
|
||||
self,
|
||||
rag_session_id: str,
|
||||
*,
|
||||
path_needles: list[str],
|
||||
layers: list[str] | None = None,
|
||||
limit: int = 200,
|
||||
) -> list[dict]:
|
||||
return self._query.retrieve_chunks_by_path_substrings(
|
||||
rag_session_id,
|
||||
path_needles=path_needles,
|
||||
layers=layers,
|
||||
limit=limit,
|
||||
)
|
||||
|
||||
@@ -19,6 +19,8 @@ class RetrievalStatementBuilder:
|
||||
exclude_like_patterns: list[str] | None = None,
|
||||
prefer_path_prefixes: list[str] | None = None,
|
||||
prefer_like_patterns: list[str] | None = None,
|
||||
metadata_domain: str | None = None,
|
||||
metadata_subdomain: str | None = None,
|
||||
prefer_non_tests: bool = False,
|
||||
) -> tuple[str, dict]:
|
||||
emb = "[" + ",".join(str(x) for x in query_embedding) + "]"
|
||||
@@ -29,6 +31,8 @@ class RetrievalStatementBuilder:
|
||||
self._append_prefix_group(filters, params, "path", path_prefixes)
|
||||
self._append_prefix_group(filters, params, "exclude_prefix", exclude_path_prefixes, negate=True)
|
||||
self._append_like_group(filters, params, "exclude_like", exclude_like_patterns, negate=True)
|
||||
self._append_metadata_equals(filters, params, "metadata_domain", "domain", metadata_domain)
|
||||
self._append_metadata_equals(filters, params, "metadata_subdomain", "subdomain", metadata_subdomain)
|
||||
if layers:
|
||||
filters.append("layer = ANY(:layers)")
|
||||
params["layers"] = layers
|
||||
@@ -202,6 +206,20 @@ class RetrievalStatementBuilder:
|
||||
joined = " OR ".join(parts)
|
||||
filters.append(f"NOT ({joined})" if negate else f"({joined})")
|
||||
|
||||
def _append_metadata_equals(
|
||||
self,
|
||||
filters: list[str],
|
||||
params: dict,
|
||||
param_key: str,
|
||||
metadata_key: str,
|
||||
value: str | None,
|
||||
) -> None:
|
||||
normalized = str(value or "").strip().lower()
|
||||
if not normalized:
|
||||
return
|
||||
params[param_key] = normalized
|
||||
filters.append(f"lower(COALESCE({self._metadata_text(metadata_key)}, '')) = :{param_key}")
|
||||
|
||||
def _test_penalty_sql(
|
||||
self,
|
||||
enabled: bool,
|
||||
|
||||
@@ -94,4 +94,8 @@ class RagSessionRetriever:
|
||||
for key in keys:
|
||||
if key in filters:
|
||||
out[key] = filters[key]
|
||||
if "metadata.domain" in filters:
|
||||
out["metadata_domain"] = filters["metadata.domain"]
|
||||
if "metadata.subdomain" in filters:
|
||||
out["metadata_subdomain"] = filters["metadata.subdomain"]
|
||||
return out
|
||||
|
||||
@@ -6,7 +6,10 @@ Differences from `v3`:
|
||||
|
||||
- each YAML case targets a single isolated component;
|
||||
- results are written next to the suite in `cases/.../test_runs/...`;
|
||||
- the first supported component is `process_v2_intent_router`.
|
||||
- supported components are `process_v2_intent_router` and `process_v2_retrieval_policy_resolver`.
|
||||
Also available: `process_v2_router_plus_retrieval_policy` for the linked route -> plan chain,
|
||||
`process_v2_router_plus_retrieval_policy_rag` for the linked route -> plan -> rag chain,
|
||||
and `process_v2_full_chain` for the full route -> plan -> rag -> evidence -> workflow LLM chain.
|
||||
|
||||
## Run
|
||||
|
||||
@@ -23,3 +26,48 @@ PYTHONPATH=. python -m tests.pipeline_setup_v4.run \
|
||||
--cases-dir tests/pipeline_setup_v4/cases/suite_02/process_v2_intent_router/router_llm_first_v3.yaml \
|
||||
--run-name llm_first_v3
|
||||
```
|
||||
|
||||
Retrieval policy resolver suite:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=. python -m tests.pipeline_setup_v4.run \
|
||||
--cases-dir tests/pipeline_setup_v4/cases/suite_03/process_v2_retrieval_policy_resolver/cases.yaml \
|
||||
--run-name retrieval_policy_v1
|
||||
```
|
||||
|
||||
Linked router + retrieval policy suite:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=. python3 -m tests.pipeline_setup_v4.run \
|
||||
--cases-dir tests/pipeline_setup_v4/cases/suite_04/process_v2_router_plus_retrieval_policy \
|
||||
--run-name router_plus_policy_v1
|
||||
```
|
||||
|
||||
Inside `suite_04`, cases are split into:
|
||||
|
||||
- `strict_regression_cases.yaml` for contract-level invariants
|
||||
- `soft_observational_cases.yaml` for LLM-sensitive boundary scenarios
|
||||
|
||||
Quality-gate mini-pack:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=. python3 -m tests.pipeline_setup_v4.run \
|
||||
--cases-dir tests/pipeline_setup_v4/cases/suite_05/process_v2_router_plus_retrieval_policy_quality_gate/cases.yaml \
|
||||
--run-name router_plus_policy_qg_v1
|
||||
```
|
||||
|
||||
Linked router + retrieval policy + rag suite:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=src:. DATABASE_URL='postgresql+psycopg://agent:agent@127.0.0.1:5432/agent' python3 -m tests.pipeline_setup_v4.run \
|
||||
--cases-dir tests/pipeline_setup_v4/cases/suite_06/process_v2_router_plus_retrieval_policy_rag/cases.yaml \
|
||||
--run-name router_plus_policy_rag_v1
|
||||
```
|
||||
|
||||
Full process v2 chain with workflow LLM:
|
||||
|
||||
```bash
|
||||
PYTHONPATH=src:. DATABASE_URL='postgresql+psycopg://agent:agent@127.0.0.1:5432/agent' python3 -m tests.pipeline_setup_v4.run \
|
||||
--cases-dir tests/pipeline_setup_v4/cases/suite_07/process_v2_full_chain/cases.yaml \
|
||||
--run-name process_v2_full_chain_v1
|
||||
```
|
||||
|
||||
+540
@@ -0,0 +1,540 @@
|
||||
defaults:
|
||||
component: process_v2_retrieval_policy_resolver
|
||||
|
||||
cases:
|
||||
- id: general-overview-grounded
|
||||
route:
|
||||
routing_domain: GENERAL
|
||||
intent: GENERAL_QA
|
||||
subintent: SUMMARY
|
||||
user_query: "Что это за сервис?"
|
||||
normalized_query: "что это за сервис"
|
||||
anchors:
|
||||
target_doc_hints: []
|
||||
endpoint_paths: []
|
||||
expected:
|
||||
plan:
|
||||
profile: general_qa_grounded_summary
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
prefer_path_prefixes: [docs/architecture/, docs/]
|
||||
|
||||
- id: general-does-not-become-docs-summary
|
||||
route:
|
||||
routing_domain: GENERAL
|
||||
intent: GENERAL_QA
|
||||
subintent: SUMMARY
|
||||
user_query: "Дай общий обзор, включая /health"
|
||||
normalized_query: "дай общий обзор включая /health"
|
||||
anchors:
|
||||
endpoint_paths: ["/health"]
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
matched_aliases: ["api"]
|
||||
expected:
|
||||
plan:
|
||||
profile: general_qa_grounded_summary
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
|
||||
- id: find-files-with-target-hint
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "Покажи файл про health endpoint"
|
||||
normalized_query: "покажи файл про health endpoint"
|
||||
anchors:
|
||||
endpoint_paths: ["/health"]
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
path_prefixes: [docs/api/]
|
||||
prefer_like_patterns: ["%health-endpoint.md%"]
|
||||
|
||||
- id: find-files-endpoint-only
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "Где описан /send?"
|
||||
normalized_query: "где описан /send"
|
||||
anchors:
|
||||
endpoint_paths: ["/send"]
|
||||
target_doc_hints: []
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
path_prefixes: [docs/api/, docs/]
|
||||
prefer_like_patterns: ["%/send%"]
|
||||
|
||||
- id: find-files-entities-and-domain
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "В каком документе описан ManualSendWorker?"
|
||||
normalized_query: "в каком документе описан manualsendworker"
|
||||
anchors:
|
||||
entity_names: ["ManualSendWorker"]
|
||||
matched_aliases: ["manual send"]
|
||||
process_domain: "messaging"
|
||||
process_subdomain: "manual_send"
|
||||
target_doc_hints: []
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
metadata.domain: messaging
|
||||
metadata.subdomain: manual_send
|
||||
prefer_path_prefixes: [docs/domains/, docs/, docs/logic/]
|
||||
prefer_like_patterns: ["%manualsendworker%", "%manual send%", "%messaging%", "%manual_send%"]
|
||||
|
||||
- id: docs-summary-api-endpoint-health
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Объясни /health"
|
||||
normalized_query: "объясни /health"
|
||||
target_terms: ["health", "/health"]
|
||||
anchors:
|
||||
endpoint_paths: ["/health"]
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
layers: [D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
path_prefixes: [docs/api/, docs/]
|
||||
prefer_path_prefixes: [docs/api/, docs/]
|
||||
|
||||
- id: docs-summary-architecture
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Как устроена архитектура сервиса?"
|
||||
normalized_query: "как устроена архитектура сервиса"
|
||||
anchors:
|
||||
file_names: ["docs/architecture/runtime-manager.md"]
|
||||
target_doc_hints: ["docs/architecture/runtime-manager.md"]
|
||||
matched_aliases: ["architecture"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_architecture
|
||||
layers: [D1_DOCUMENT_CATALOG, D5_RELATION_GRAPH, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
target_doc_hints: ["docs/architecture/runtime-manager.md"]
|
||||
prefer_path_prefixes: [docs/architecture/, docs/]
|
||||
|
||||
- id: docs-summary-logic-flow
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Опиши workflow отправки уведомлений"
|
||||
normalized_query: "опиши workflow отправки уведомлений"
|
||||
anchors:
|
||||
matched_aliases: ["workflow"]
|
||||
process_domain: "notifications"
|
||||
process_subdomain: "delivery_loop"
|
||||
target_doc_hints: []
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_logic_flow
|
||||
layers: [D4_WORKFLOW_INDEX, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
metadata.domain: notifications
|
||||
metadata.subdomain: delivery_loop
|
||||
prefer_path_prefixes: [docs/logic/, docs/architecture/, docs/]
|
||||
|
||||
- id: docs-summary-domain-entity
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Что такое RuntimeManager?"
|
||||
normalized_query: "что такое runtimemanager"
|
||||
anchors:
|
||||
entity_names: ["RuntimeManager"]
|
||||
process_domain: "runtime"
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_domain_entity
|
||||
layers: [D3_ENTITY_CATALOG, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
metadata.domain: runtime
|
||||
prefer_path_prefixes: [docs/domains/, docs/, docs/api/]
|
||||
prefer_like_patterns: ["%runtimemanager%", "%runtime%"]
|
||||
|
||||
- id: docs-summary-generic-weak-signals
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Дай краткое summary документации"
|
||||
normalized_query: "дай краткое summary документации"
|
||||
anchors:
|
||||
target_doc_hints: []
|
||||
endpoint_paths: []
|
||||
entity_names: []
|
||||
matched_aliases: []
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
prefer_path_prefixes: [docs/]
|
||||
|
||||
- id: docs-summary-generic-conflicting-signals
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Как связан /health и RuntimeManager?"
|
||||
normalized_query: "как связан /health и runtimemanager"
|
||||
anchors:
|
||||
endpoint_paths: ["/health"]
|
||||
entity_names: ["RuntimeManager"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
|
||||
- id: find-files-stays-file-lookup-on-mixed-signals
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "Найди документ по architecture runtime manager"
|
||||
normalized_query: "найди документ по architecture runtime manager"
|
||||
anchors:
|
||||
entity_names: ["RuntimeManager"]
|
||||
matched_aliases: ["architecture"]
|
||||
file_names: ["docs/architecture/runtime-manager.md"]
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
path_prefixes: [docs/architecture/]
|
||||
|
||||
- id: resolver-survives-partial-empty-anchors
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Что там по docs?"
|
||||
normalized_query: "что там по docs"
|
||||
anchors:
|
||||
entity_names: []
|
||||
file_names: [""]
|
||||
endpoint_paths: []
|
||||
target_doc_hints: []
|
||||
matched_aliases: []
|
||||
process_domain:
|
||||
process_subdomain:
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
|
||||
- id: find-files-file-name-priority
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "Покажи документ manual-send"
|
||||
normalized_query: "покажи документ manual-send"
|
||||
anchors:
|
||||
file_names: ["docs/workflows/manual-send.md"]
|
||||
matched_aliases: ["manual send"]
|
||||
target_doc_hints: []
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
path_prefixes: [docs/workflows/]
|
||||
prefer_like_patterns: ["%docs/workflows/manual-send.md%", "%manual send%"]
|
||||
|
||||
- id: conflict-api-hint-vs-workflow-metadata
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Опиши flow для /health в notification loop"
|
||||
normalized_query: "опиши flow для /health в notification loop"
|
||||
anchors:
|
||||
endpoint_paths: ["/health"]
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
matched_aliases: ["workflow"]
|
||||
process_domain: "notifications"
|
||||
process_subdomain: "delivery_loop"
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
target_doc_hints: ["docs/api/health-endpoint.md"]
|
||||
metadata.domain: notifications
|
||||
metadata.subdomain: delivery_loop
|
||||
path_prefixes: [docs/api/, docs/]
|
||||
|
||||
- id: conflict-file-name-vs-architecture-alias
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Объясни architecture для notification loop"
|
||||
normalized_query: "объясни architecture для notification loop"
|
||||
anchors:
|
||||
file_names: ["docs/logic/notification-loop.md"]
|
||||
matched_aliases: ["architecture"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
prefer_path_prefixes: [docs/architecture/, docs/, docs/logic/]
|
||||
prefer_like_patterns: ["%docs/logic/notification-loop.md%", "%architecture%"]
|
||||
|
||||
- id: conflict-hint-vs-entity-soft-signals
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Что делает /send и ManualSendWorker?"
|
||||
normalized_query: "что делает /send и manualsendworker"
|
||||
anchors:
|
||||
endpoint_paths: ["/send"]
|
||||
target_doc_hints: ["docs/api/send-endpoint.md"]
|
||||
entity_names: ["ManualSendWorker"]
|
||||
matched_aliases: ["manual send"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
target_doc_hints: ["docs/api/send-endpoint.md"]
|
||||
path_prefixes: [docs/api/, docs/]
|
||||
prefer_like_patterns: ["%send-endpoint.md%", "%/send%", "%manualsendworker%", "%manual send%"]
|
||||
|
||||
- id: metadata-only-find-files
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "Найди документы по notifications delivery loop"
|
||||
normalized_query: "найди документы по notifications delivery loop"
|
||||
anchors:
|
||||
process_domain: "notifications"
|
||||
process_subdomain: "delivery_loop"
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
path_prefixes: [docs/]
|
||||
metadata.domain: notifications
|
||||
metadata.subdomain: delivery_loop
|
||||
prefer_path_prefixes: [docs/, docs/domains/, docs/logic/]
|
||||
|
||||
- id: metadata-only-generic-summary
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Дай summary по notifications delivery loop"
|
||||
normalized_query: "дай summary по notifications delivery loop"
|
||||
anchors:
|
||||
process_domain: "notifications"
|
||||
process_subdomain: "delivery_loop"
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
metadata.domain: notifications
|
||||
metadata.subdomain: delivery_loop
|
||||
prefer_path_prefixes: [docs/]
|
||||
prefer_like_patterns: ["%notifications%", "%delivery_loop%"]
|
||||
|
||||
- id: metadata-domain-entity-with-alias
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Объясни компонент billing"
|
||||
normalized_query: "объясни компонент billing"
|
||||
anchors:
|
||||
matched_aliases: ["component"]
|
||||
process_domain: "billing"
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_domain_entity
|
||||
layers: [D3_ENTITY_CATALOG, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
metadata.domain: billing
|
||||
prefer_path_prefixes: [docs/domains/, docs/, docs/api/]
|
||||
prefer_like_patterns: ["%component%", "%billing%"]
|
||||
|
||||
- id: alias-only-api
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Объясни api health"
|
||||
normalized_query: "объясни api health"
|
||||
anchors:
|
||||
matched_aliases: ["api endpoint"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
layers: [D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
path_prefixes: [docs/api/, docs/]
|
||||
prefer_like_patterns: ["%api endpoint%"]
|
||||
|
||||
- id: alias-only-architecture
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Расскажи про architecture"
|
||||
normalized_query: "расскажи про architecture"
|
||||
anchors:
|
||||
matched_aliases: ["architecture"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_architecture
|
||||
layers: [D1_DOCUMENT_CATALOG, D5_RELATION_GRAPH, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
prefer_path_prefixes: [docs/architecture/, docs/]
|
||||
prefer_like_patterns: ["%architecture%"]
|
||||
|
||||
- id: partial-only-endpoint-path
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Что делает /status?"
|
||||
normalized_query: "что делает /status"
|
||||
anchors:
|
||||
endpoint_paths: ["/status"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
layers: [D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
path_prefixes: [docs/api/, docs/]
|
||||
prefer_like_patterns: ["%/status%"]
|
||||
|
||||
- id: partial-only-target-doc-hint
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Объясни notification loop"
|
||||
normalized_query: "объясни notification loop"
|
||||
anchors:
|
||||
target_doc_hints: ["docs/logic/notification-loop.md"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_logic_flow
|
||||
layers: [D4_WORKFLOW_INDEX, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
target_doc_hints: ["docs/logic/notification-loop.md"]
|
||||
prefer_path_prefixes: [docs/logic/, docs/architecture/, docs/]
|
||||
|
||||
- id: generic-neutral-with-nonsemantic-hint
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Дай общий summary intro docs"
|
||||
normalized_query: "дай общий summary intro docs"
|
||||
anchors:
|
||||
target_doc_hints: ["docs/intro/overview.md"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
target_doc_hints: ["docs/intro/overview.md"]
|
||||
prefer_path_prefixes: [docs/]
|
||||
|
||||
- id: generic-neutral-weak-mixed-aliases
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: SUMMARY
|
||||
user_query: "Нужен общий summary про architecture component"
|
||||
normalized_query: "нужен общий summary про architecture component"
|
||||
anchors:
|
||||
matched_aliases: ["architecture", "component"]
|
||||
expected:
|
||||
plan:
|
||||
profile: docs_summary_generic
|
||||
layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
prefer_path_prefixes: [docs/architecture/, docs/, docs/domains/, docs/api/]
|
||||
|
||||
- id: find-files-hard-priority-with-multiple-hints
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent: FIND_FILES
|
||||
user_query: "Найди документы по /health и runtime manager"
|
||||
normalized_query: "найди документы по /health и runtime manager"
|
||||
anchors:
|
||||
endpoint_paths: ["/health"]
|
||||
entity_names: ["RuntimeManager"]
|
||||
matched_aliases: ["architecture"]
|
||||
target_doc_hints:
|
||||
- "docs/api/health-endpoint.md"
|
||||
- "docs/architecture/runtime-manager.md"
|
||||
expected:
|
||||
plan:
|
||||
profile: file_lookup
|
||||
layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
target_doc_hints:
|
||||
- "docs/api/health-endpoint.md"
|
||||
- "docs/architecture/runtime-manager.md"
|
||||
path_prefixes: [docs/api/, docs/architecture/]
|
||||
prefer_like_patterns: ["%health-endpoint.md%", "%runtime-manager.md%"]
|
||||
BIN
Binary file not shown.
BIN
Binary file not shown.
+199
@@ -0,0 +1,199 @@
|
||||
defaults:
|
||||
component: process_v2_router_plus_retrieval_policy
|
||||
|
||||
cases:
|
||||
- id: soft-architecture-summary
|
||||
query: "Как устроена архитектура приложения?"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-process-summary
|
||||
query: "Опиши процесс отправки уведомлений"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_logic_flow, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-domain-entity-summary
|
||||
query: "Что такое runtime health в документации?"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_domain_entity, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-runtime-health-document
|
||||
query: "Покажи документ про runtime health"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent_equals_any: [SUMMARY, FIND_FILES]
|
||||
retrieval_plan:
|
||||
profile_equals_any: [file_lookup, docs_summary_domain_entity, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-api-send-noisy
|
||||
query: "Нужен краткий док-саммари по api /send"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
|
||||
- id: soft-general-risks-architecture
|
||||
query: "Какие риски у такого подхода в архитектуре?"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [GENERAL, DOCS]
|
||||
intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [general_qa_grounded_summary, docs_summary_architecture, docs_summary_generic]
|
||||
|
||||
- id: soft-general-polling-webhook
|
||||
query: "Сравни polling и webhook в контексте сервиса"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [GENERAL, DOCS]
|
||||
intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [general_qa_grounded_summary, docs_summary_generic]
|
||||
|
||||
- id: soft-conflict-entity-plus-process
|
||||
query: "Объясни entity runtime health и runtime loop"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_domain_entity, docs_summary_generic]
|
||||
filters:
|
||||
prefer_path_prefixes_contains: [docs/domains/]
|
||||
|
||||
- id: soft-alias-handle-health
|
||||
query: "Объясни ручку /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
|
||||
- id: soft-alias-show-doc-handle-health
|
||||
query: "Покажи документ по ручке /health"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent_equals_any: [FIND_FILES, SUMMARY]
|
||||
retrieval_plan:
|
||||
profile_equals_any: [file_lookup, docs_summary_api_endpoint, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-alias-schema-overview
|
||||
query: "Нужен обзор по архитектуре notify app"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-alias-find-schema-file
|
||||
query: "Найди файл со схемой сервиса уведомлений"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent_equals_any: [FIND_FILES, SUMMARY]
|
||||
retrieval_plan:
|
||||
profile_equals_any: [file_lookup, docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-process-domain-summary
|
||||
query: "Объясни overview по billing invoice flow"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
process_domain: present
|
||||
process_subdomain: present
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_logic_flow, docs_summary_generic, docs_summary_architecture]
|
||||
|
||||
- id: soft-process-domain-find-files
|
||||
query: "Найди файл по billing invoice flow"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
process_domain: present
|
||||
process_subdomain: present
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
|
||||
- id: soft-noisy-arch-overview
|
||||
query: "arch overview по notify app"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [DOCS, GENERAL]
|
||||
intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: soft-noisy-file-send-endpoint
|
||||
query: "нужен файл где описан /send endpoint"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
|
||||
- id: soft-bare-file-token-preferences
|
||||
query: "health-endpoint.md"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
file_names_contains: ["health-endpoint.md"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
|
||||
- id: soft-doc-path-preferences
|
||||
query: "docs/api/health-endpoint.md"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
file_names_contains: ["docs/api/health-endpoint.md"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
+206
@@ -0,0 +1,206 @@
|
||||
defaults:
|
||||
component: process_v2_router_plus_retrieval_policy
|
||||
|
||||
cases:
|
||||
- id: strict-general-overview
|
||||
query: "Общий обзор сервиса"
|
||||
expected:
|
||||
router:
|
||||
domain: GENERAL
|
||||
intent: GENERAL_QA
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_not_contains: ["/health"]
|
||||
file_names_not_contains: ["/health"]
|
||||
retrieval_plan:
|
||||
profile: general_qa_grounded_summary
|
||||
layers_contains: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
|
||||
limit: 8
|
||||
filters:
|
||||
prefer_path_prefixes_contains: [docs/architecture/, docs/]
|
||||
path_prefixes: absent
|
||||
|
||||
- id: strict-api-summary-health
|
||||
query: "Объясни endpoint /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains: ["/health"]
|
||||
file_names_not_contains: ["/health"]
|
||||
retrieval_plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
filters:
|
||||
path_prefixes_contains: [docs/api/]
|
||||
prefer_path_prefixes_contains: [docs/api/]
|
||||
|
||||
- id: strict-find-files-health-described
|
||||
query: "Где описан endpoint /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains: ["/health"]
|
||||
file_names_not_contains: ["/health"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
layers_contains: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
|
||||
limit: 12
|
||||
filters:
|
||||
path_prefixes_contains: [docs/api/]
|
||||
prefer_path_prefixes_contains: [docs/api/]
|
||||
|
||||
- id: strict-find-files-health-show-file
|
||||
query: "Покажи файл с описанием /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains: ["/health"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
filters:
|
||||
path_prefixes_contains: [docs/api/]
|
||||
|
||||
- id: strict-runtime-health-find-files
|
||||
query: "Где описан runtime health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
filters:
|
||||
path_prefixes_contains_any: [docs/domains/, docs/]
|
||||
|
||||
- id: strict-noisy-runtime-health-find-files
|
||||
query: "runtime health где описано в docs"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
|
||||
- id: strict-doc-path-is-file-lookup
|
||||
query: "docs/api/health-endpoint.md"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
file_names_contains: ["docs/api/health-endpoint.md"]
|
||||
endpoint_paths_not_contains: ["/api/health-endpoint.md"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
filters:
|
||||
path_prefixes_contains: [docs/api/]
|
||||
|
||||
- id: strict-file-token-is-file-lookup
|
||||
query: "health-endpoint.md"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
file_names_contains: ["health-endpoint.md"]
|
||||
endpoint_paths_not_contains: ["health-endpoint.md"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
|
||||
- id: strict-noisy-english-show-doc
|
||||
query: "pls show doc for /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains: ["/health"]
|
||||
file_names_not_contains: ["/health"]
|
||||
target_terms_not_contains: [pls, show, doc, for]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
filters:
|
||||
path_prefixes_contains: [docs/api/]
|
||||
|
||||
- id: strict-bare-endpoint-anchor-invariant
|
||||
query: "/health"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [GENERAL, DOCS]
|
||||
intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
|
||||
subintent: SUMMARY
|
||||
anchors:
|
||||
endpoint_paths_contains: ["/health"]
|
||||
file_names_not_contains: ["/health"]
|
||||
|
||||
- id: strict-find-files-dominates-health-question
|
||||
query: "В каком файле описан `/health`?"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
|
||||
- id: strict-runtime-health-summary-not-file-lookup
|
||||
query: "Что делает runtime health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [docs_summary_domain_entity, docs_summary_generic]
|
||||
|
||||
- id: strict-general-purpose
|
||||
query: "Зачем нужен этот сервис?"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_equals_any: [GENERAL, DOCS]
|
||||
intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_equals_any: [general_qa_grounded_summary, docs_summary_generic]
|
||||
|
||||
- id: strict-conflict-summary-goes-generic
|
||||
query: "Как устроена архитектура endpoint /send"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: docs_summary_generic
|
||||
filters:
|
||||
path_prefixes_contains: [docs/api/]
|
||||
prefer_path_prefixes_contains: [docs/api/, docs/architecture/]
|
||||
|
||||
- id: strict-find-files-dominates-mixed-signals
|
||||
query: "В каком файле описан architecture flow отправки уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
BIN
Binary file not shown.
BIN
Binary file not shown.
BIN
Binary file not shown.
+115
@@ -0,0 +1,115 @@
|
||||
defaults:
|
||||
component: process_v2_router_plus_retrieval_policy
|
||||
|
||||
cases:
|
||||
- id: qg-t01-docs-overview-architecture
|
||||
query: "Объясни overview архитектуры сервиса уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_one_of: [docs_summary_architecture, docs_summary_generic]
|
||||
filters:
|
||||
prefer_path_prefixes_contains: [docs/architecture/]
|
||||
|
||||
- id: qg-t02-docs-overview-flow
|
||||
query: "Дай overview по flow отправки уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_one_of: [docs_summary_logic_flow, docs_summary_generic]
|
||||
filters:
|
||||
prefer_path_prefixes_contains: [docs/logic/]
|
||||
|
||||
- id: qg-t03-soft-arch-overview-notify
|
||||
query: "Arch overview по notify app"
|
||||
expected:
|
||||
route:
|
||||
routing_domain_one_of: [DOCS, GENERAL]
|
||||
intent_one_of: [DOC_EXPLAIN, GENERAL_QA]
|
||||
subintent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile_one_of: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
|
||||
|
||||
- id: qg-t04-process-summary-filters
|
||||
query: "Объясни billing invoice process"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
process_domain: present
|
||||
process_subdomain: present
|
||||
retrieval_plan:
|
||||
if_anchor_present_then_filter_present:
|
||||
- anchor: anchors.process_domain
|
||||
filter: filters.metadata.domain
|
||||
- anchor: anchors.process_subdomain
|
||||
filter: filters.metadata.subdomain
|
||||
profile_one_of: [docs_summary_logic_flow, docs_summary_generic]
|
||||
|
||||
- id: qg-t05-process-find-files-filters
|
||||
query: "Найди файл по billing invoice process"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
process_domain: present
|
||||
process_subdomain: present
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
if_anchor_present_then_filter_present:
|
||||
- anchor: anchors.process_domain
|
||||
filter: filters.metadata.domain
|
||||
- anchor: anchors.process_subdomain
|
||||
filter: filters.metadata.subdomain
|
||||
filters:
|
||||
prefer_path_prefixes_contains_any: [docs/domains/, docs/logic/]
|
||||
|
||||
- id: qg-t06-soft-process-shaped-input
|
||||
query: "billing invoice docs"
|
||||
expected:
|
||||
route:
|
||||
routing_domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
subintent_one_of: [FIND_FILES, SUMMARY]
|
||||
retrieval_plan:
|
||||
profile_one_of: [file_lookup, docs_summary_logic_flow, docs_summary_generic]
|
||||
|
||||
- id: qg-t07-clean-target-terms-architecture
|
||||
query: "Объясни architecture overview сервиса уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
target_terms_not_contains: ["объясни", "overview", "architecture"]
|
||||
retrieval_plan:
|
||||
profile_one_of: [docs_summary_architecture, docs_summary_generic]
|
||||
|
||||
- id: qg-t08-clean-target-terms-file-query
|
||||
query: "Найди doc for /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
target_terms_contains: ["/health"]
|
||||
target_terms_not_contains: ["найди", "doc", "for"]
|
||||
anchors:
|
||||
endpoint_paths_contains: ["/health"]
|
||||
file_names_not_contains: ["/health"]
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
BIN
Binary file not shown.
+193
@@ -0,0 +1,193 @@
|
||||
defaults:
|
||||
component: process_v2_router_plus_retrieval_policy_rag
|
||||
rag_session_id: "694cd10b-3842-4579-8d53-e54ec4291eae"
|
||||
|
||||
cases:
|
||||
- id: rag-t01-architecture-summary
|
||||
query: "Объясни overview архитектуры сервиса уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
retrieval_plan:
|
||||
profile: docs_summary_architecture
|
||||
filters:
|
||||
prefer_path_prefixes_contains:
|
||||
- "docs/architecture/"
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
layers_contains:
|
||||
- "D5_RELATION_GRAPH"
|
||||
- "D1_DOCUMENT_CATALOG"
|
||||
|
||||
- id: rag-t02-docs-index-find-files
|
||||
query: "Найди файл-индекс документации проекта"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/README.md"
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
filters:
|
||||
path_prefixes_contains:
|
||||
- "docs/"
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/README.md"
|
||||
layers_contains:
|
||||
- "D1_DOCUMENT_CATALOG"
|
||||
|
||||
- id: rag-t03-general-docs-overview
|
||||
query: "Что входит в документацию этого проекта?"
|
||||
expected:
|
||||
router:
|
||||
domain: GENERAL
|
||||
intent: GENERAL_QA
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: general_qa_grounded_summary
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/README.md"
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
layers_contains:
|
||||
- "D1_DOCUMENT_CATALOG"
|
||||
- "D0_DOC_CHUNKS"
|
||||
|
||||
- id: rag-t04-errors-catalog-find-files
|
||||
query: "В каком файле лежит каталог ошибок?"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/errors/catalog.yaml"
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
filters:
|
||||
path_prefixes_contains:
|
||||
- "docs/errors/"
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/errors/catalog.yaml"
|
||||
layers_contains:
|
||||
- "D1_DOCUMENT_CATALOG"
|
||||
|
||||
- id: rag-t05-errors-catalog-general
|
||||
query: "Объясни каталог ошибок"
|
||||
expected:
|
||||
router:
|
||||
domain: GENERAL
|
||||
intent: GENERAL_QA
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/errors/catalog.yaml"
|
||||
retrieval_plan:
|
||||
profile: general_qa_grounded_summary
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/errors/catalog.yaml"
|
||||
|
||||
- id: rag-t06-health-summary-chain
|
||||
query: "Объясни endpoint /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains:
|
||||
- "/health"
|
||||
file_names_not_contains:
|
||||
- "/health"
|
||||
retrieval_plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
filters:
|
||||
prefer_path_prefixes_contains:
|
||||
- "docs/api/"
|
||||
rag:
|
||||
paths_contains_any:
|
||||
- "docs/README.md"
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
layers_contains:
|
||||
- "D2_FACT_INDEX"
|
||||
|
||||
- id: rag-t07-health-find-files-empty
|
||||
query: "Где описан endpoint /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains:
|
||||
- "/health"
|
||||
target_doc_hints_contains:
|
||||
- "docs/api/health-endpoint.md"
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
rag:
|
||||
row_count: 0
|
||||
paths: absent
|
||||
layers: absent
|
||||
|
||||
- id: rag-t08-notifications-workflow-metadata
|
||||
query: "Объясни notifications workflow"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
process_domain: notifications
|
||||
retrieval_plan:
|
||||
profile: docs_summary_logic_flow
|
||||
filters:
|
||||
metadata.domain: notifications
|
||||
prefer_path_prefixes_contains:
|
||||
- "docs/logic/"
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
metadata_domains_contains:
|
||||
- "notifications"
|
||||
|
||||
- id: rag-t09-mixed-summary-generic
|
||||
query: "Как архитектурно устроен endpoint /send"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains:
|
||||
- "/send"
|
||||
retrieval_plan:
|
||||
profile: docs_summary_generic
|
||||
filters:
|
||||
prefer_path_prefixes_contains:
|
||||
- "docs/api/"
|
||||
- "docs/architecture/"
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
BIN
Binary file not shown.
@@ -0,0 +1,180 @@
|
||||
defaults:
|
||||
component: process_v2_full_chain
|
||||
rag_session_id: "694cd10b-3842-4579-8d53-e54ec4291eae"
|
||||
|
||||
cases:
|
||||
- id: full-t01-general-docs-overview
|
||||
query: "Что входит в документацию этого проекта?"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: docs_summary_generic
|
||||
rag:
|
||||
row_count: 0
|
||||
pipeline:
|
||||
answer_mode: insufficient_evidence
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_all:
|
||||
- "не найден"
|
||||
- "документ"
|
||||
|
||||
- id: full-t02-architecture-summary
|
||||
query: "Объясни overview архитектуры сервиса уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
retrieval_plan:
|
||||
profile: docs_summary_architecture
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
pipeline:
|
||||
answer_mode: grounded_summary
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_any:
|
||||
- ["RuntimeManager", "TelegramControlChannel"]
|
||||
- ["worker", "Telegram"]
|
||||
|
||||
- id: full-t03-runtime-health-summary
|
||||
query: "Что такое runtime health в этой документации?"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: docs_summary_domain_entity
|
||||
rag:
|
||||
row_count: 0
|
||||
pipeline:
|
||||
answer_mode: insufficient_evidence
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_all:
|
||||
- "не найден"
|
||||
- "документ"
|
||||
|
||||
- id: full-t04-logic-flow-summary
|
||||
query: "Кратко опиши цикл отправки уведомлений"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
retrieval_plan:
|
||||
profile: docs_summary_logic_flow
|
||||
rag:
|
||||
row_count: 0
|
||||
pipeline:
|
||||
answer_mode: insufficient_evidence
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_all:
|
||||
- "не найден"
|
||||
- "документ"
|
||||
|
||||
- id: full-t05-errors-catalog-find-files
|
||||
query: "В каком файле лежит каталог ошибок?"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/errors/catalog.yaml"
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/errors/catalog.yaml"
|
||||
pipeline:
|
||||
answer_mode: deterministic
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_all:
|
||||
- "docs/errors/catalog.yaml"
|
||||
|
||||
- id: full-t06-docs-index-find-files
|
||||
query: "Найди файл-индекс документации проекта"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: FIND_FILES
|
||||
route:
|
||||
anchors:
|
||||
target_doc_hints_contains:
|
||||
- "docs/README.md"
|
||||
retrieval_plan:
|
||||
profile: file_lookup
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/README.md"
|
||||
pipeline:
|
||||
answer_mode: deterministic
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_all:
|
||||
- "docs/README.md"
|
||||
|
||||
- id: full-t07-mixed-generic-summary
|
||||
query: "Как архитектурно устроен endpoint /send"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains:
|
||||
- "/send"
|
||||
retrieval_plan:
|
||||
profile: docs_summary_generic
|
||||
rag:
|
||||
paths_contains:
|
||||
- "docs/architecture/telegram-notify-app-overview.md"
|
||||
pipeline:
|
||||
answer_mode: grounded_summary
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_any:
|
||||
- ["Telegram", "/send"]
|
||||
- ["архитект", "endpoint"]
|
||||
|
||||
- id: full-t08-health-boundary
|
||||
query: "Объясни endpoint /health"
|
||||
expected:
|
||||
router:
|
||||
domain: DOCS
|
||||
intent: DOC_EXPLAIN
|
||||
sub_intent: SUMMARY
|
||||
route:
|
||||
anchors:
|
||||
endpoint_paths_contains:
|
||||
- "/health"
|
||||
file_names_not_contains:
|
||||
- "/health"
|
||||
retrieval_plan:
|
||||
profile: docs_summary_api_endpoint
|
||||
rag:
|
||||
row_count: 0
|
||||
pipeline:
|
||||
answer_mode: insufficient_evidence
|
||||
llm:
|
||||
non_empty: true
|
||||
contains_all:
|
||||
- "не найден"
|
||||
- "документ"
|
||||
@@ -64,8 +64,8 @@ class ArtifactWriter:
|
||||
f"- source_file: {result.case.source_file.as_posix()}",
|
||||
f"- passed: {result.passed}",
|
||||
"",
|
||||
"## Query",
|
||||
result.case.query,
|
||||
"## Input",
|
||||
result.case.display_input,
|
||||
"",
|
||||
"## Actual",
|
||||
"```json",
|
||||
@@ -96,7 +96,7 @@ class SummaryComposer:
|
||||
]
|
||||
for item in results:
|
||||
lines.append(
|
||||
f"| {item.case.case_id} | {item.case.component} | {self._cell(item.case.query)} | "
|
||||
f"| {item.case.case_id} | {item.case.component} | {self._cell(item.case.display_input)} | "
|
||||
f"{item.actual.get('intent') or '—'} | {item.actual.get('sub_intent') or '—'} | "
|
||||
f"{'✓' if item.passed else '✗'} |"
|
||||
)
|
||||
|
||||
@@ -4,7 +4,7 @@ from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
from tests.pipeline_setup_v4.core.models import CaseExpectations, RouterExpectation, V4Case
|
||||
from tests.pipeline_setup_v4.core.models import CaseExpectations, RetrievalPlanExpectation, RouterExpectation, V4Case
|
||||
|
||||
|
||||
class CaseDirectoryLoader:
|
||||
@@ -35,13 +35,28 @@ class CaseDirectoryLoader:
|
||||
case_id = str(raw.get("id") or "").strip()
|
||||
component = str(raw.get("component") or defaults.get("component") or "").strip()
|
||||
query = str(raw.get("query") or "").strip()
|
||||
if not case_id or not component or not query:
|
||||
raise ValueError(f"Invalid case in {path}: `id`, `component`, `query` are required")
|
||||
rag_session_id = str(raw.get("rag_session_id") or defaults.get("rag_session_id") or "").strip() or None
|
||||
route = dict(raw.get("route") or {})
|
||||
if not route and isinstance(defaults.get("route"), dict):
|
||||
route = dict(defaults.get("route") or {})
|
||||
if not case_id or not component:
|
||||
raise ValueError(f"Invalid case in {path}: `id` and `component` are required")
|
||||
if component in {
|
||||
"process_v2_intent_router",
|
||||
"process_v2_router_plus_retrieval_policy",
|
||||
"process_v2_router_plus_retrieval_policy_rag",
|
||||
"process_v2_full_chain",
|
||||
} and not query:
|
||||
raise ValueError(f"Invalid case in {path}: `query` is required for {component}")
|
||||
if component == "process_v2_retrieval_policy_resolver" and not route:
|
||||
raise ValueError(f"Invalid case in {path}: `route` is required for {component}")
|
||||
expected = dict(raw.get("expected") or {})
|
||||
return V4Case(
|
||||
case_id=case_id,
|
||||
component=component, # type: ignore[arg-type]
|
||||
query=query,
|
||||
rag_session_id=rag_session_id,
|
||||
route=route,
|
||||
source_file=path,
|
||||
expectations=self._to_expectations(expected),
|
||||
notes=str(raw.get("notes") or ""),
|
||||
@@ -50,10 +65,38 @@ class CaseDirectoryLoader:
|
||||
|
||||
def _to_expectations(self, raw: dict) -> CaseExpectations:
|
||||
router = dict(raw.get("router") or {})
|
||||
route = dict(raw.get("route") or {})
|
||||
retrieval_plan = dict(raw.get("retrieval_plan") or raw.get("plan") or {})
|
||||
rag = dict(raw.get("rag") or {})
|
||||
pipeline = dict(raw.get("pipeline") or {})
|
||||
llm = dict(raw.get("llm") or {})
|
||||
return CaseExpectations(
|
||||
router=RouterExpectation(
|
||||
domain=str(router.get("domain") or "").strip() or None,
|
||||
intent=str(router.get("intent") or "").strip() or None,
|
||||
sub_intent=str(router.get("sub_intent") or "").strip() or None,
|
||||
)
|
||||
),
|
||||
retrieval_plan=RetrievalPlanExpectation(
|
||||
profile=str(retrieval_plan.get("profile") or "").strip() or None,
|
||||
layers=tuple(str(item).strip() for item in retrieval_plan.get("layers") or [] if str(item).strip()),
|
||||
limit=int(retrieval_plan["limit"]) if retrieval_plan.get("limit") is not None else None,
|
||||
filters=self._plain_mapping(dict(retrieval_plan.get("filters") or {})),
|
||||
),
|
||||
route_assertions=route,
|
||||
retrieval_plan_assertions=retrieval_plan,
|
||||
rag_assertions=rag,
|
||||
pipeline_assertions=pipeline,
|
||||
llm_assertions=llm,
|
||||
)
|
||||
|
||||
def _plain_mapping(self, raw: dict[str, object]) -> dict[str, object]:
|
||||
plain: dict[str, object] = {}
|
||||
for key, value in raw.items():
|
||||
if self._is_assertion_key(key) or value in {"present", "absent"}:
|
||||
continue
|
||||
plain[key] = value
|
||||
return plain
|
||||
|
||||
def _is_assertion_key(self, key: str) -> bool:
|
||||
suffixes = ("_not_contains", "_contains_any", "_contains", "_equals_any", "_one_of")
|
||||
return any(key.endswith(suffix) for suffix in suffixes)
|
||||
|
||||
@@ -5,7 +5,13 @@ from pathlib import Path
|
||||
from typing import Literal
|
||||
|
||||
|
||||
ComponentKind = Literal["process_v2_intent_router"]
|
||||
ComponentKind = Literal[
|
||||
"process_v2_intent_router",
|
||||
"process_v2_retrieval_policy_resolver",
|
||||
"process_v2_router_plus_retrieval_policy",
|
||||
"process_v2_router_plus_retrieval_policy_rag",
|
||||
"process_v2_full_chain",
|
||||
]
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
@@ -15,21 +21,41 @@ class RouterExpectation:
|
||||
sub_intent: str | None = None
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
class RetrievalPlanExpectation:
|
||||
profile: str | None = None
|
||||
layers: tuple[str, ...] = ()
|
||||
limit: int | None = None
|
||||
filters: dict[str, object] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
class CaseExpectations:
|
||||
router: RouterExpectation = RouterExpectation()
|
||||
retrieval_plan: RetrievalPlanExpectation = field(default_factory=RetrievalPlanExpectation)
|
||||
route_assertions: dict[str, object] = field(default_factory=dict)
|
||||
retrieval_plan_assertions: dict[str, object] = field(default_factory=dict)
|
||||
rag_assertions: dict[str, object] = field(default_factory=dict)
|
||||
pipeline_assertions: dict[str, object] = field(default_factory=dict)
|
||||
llm_assertions: dict[str, object] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
class V4Case:
|
||||
case_id: str
|
||||
component: ComponentKind
|
||||
query: str
|
||||
source_file: Path
|
||||
expectations: CaseExpectations = CaseExpectations()
|
||||
query: str = ""
|
||||
rag_session_id: str | None = None
|
||||
route: dict[str, object] = field(default_factory=dict)
|
||||
expectations: CaseExpectations = field(default_factory=CaseExpectations)
|
||||
notes: str = ""
|
||||
tags: tuple[str, ...] = ()
|
||||
|
||||
@property
|
||||
def display_input(self) -> str:
|
||||
return self.query or self.route.get("user_query") or self.case_id
|
||||
|
||||
|
||||
@dataclass(slots=True, frozen=True)
|
||||
class ExecutionPayload:
|
||||
|
||||
@@ -1,17 +1,249 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Mapping, Sequence
|
||||
|
||||
from tests.pipeline_setup_v4.core.models import V4Case
|
||||
|
||||
|
||||
class CaseValidator:
|
||||
def validate(self, case: V4Case, actual: dict) -> list[str]:
|
||||
if case.component == "process_v2_intent_router":
|
||||
return self._validate_router(case, actual)
|
||||
if case.component == "process_v2_retrieval_policy_resolver":
|
||||
return self._validate_retrieval_plan(case, actual)
|
||||
if case.component == "process_v2_router_plus_retrieval_policy":
|
||||
return self._validate_router(case, actual) + self._validate_retrieval_plan(case, actual)
|
||||
if case.component == "process_v2_router_plus_retrieval_policy_rag":
|
||||
return self._validate_router(case, actual) + self._validate_retrieval_plan(case, actual) + self._validate_rag(case, actual)
|
||||
if case.component == "process_v2_full_chain":
|
||||
return (
|
||||
self._validate_router(case, actual)
|
||||
+ self._validate_retrieval_plan(case, actual)
|
||||
+ self._validate_rag(case, actual)
|
||||
+ self._validate_pipeline(case, actual)
|
||||
+ self._validate_llm(case, actual)
|
||||
)
|
||||
return [f"unsupported component for validation: {case.component}"]
|
||||
|
||||
def _validate_router(self, case: V4Case, actual: dict) -> list[str]:
|
||||
mismatches: list[str] = []
|
||||
expected = case.expectations.router
|
||||
self._check(expected.domain, actual.get("domain"), "domain", mismatches)
|
||||
self._check(expected.intent, actual.get("intent"), "intent", mismatches)
|
||||
self._check(expected.sub_intent, actual.get("sub_intent"), "sub_intent", mismatches)
|
||||
self._check_scalar(expected.domain, actual.get("domain"), "domain", mismatches)
|
||||
self._check_scalar(expected.intent, actual.get("intent"), "intent", mismatches)
|
||||
self._check_scalar(expected.sub_intent, actual.get("sub_intent"), "sub_intent", mismatches)
|
||||
route_actual = actual.get("route")
|
||||
if isinstance(route_actual, Mapping):
|
||||
self._check_assertions(case.expectations.route_assertions, route_actual, "route", mismatches)
|
||||
return mismatches
|
||||
|
||||
def _check(self, expected: str | None, actual: object, label: str, mismatches: list[str]) -> None:
|
||||
def _validate_retrieval_plan(self, case: V4Case, actual: dict) -> list[str]:
|
||||
mismatches: list[str] = []
|
||||
expected = case.expectations.retrieval_plan
|
||||
self._check_scalar(expected.profile, actual.get("profile"), "profile", mismatches)
|
||||
if expected.layers:
|
||||
self._check_scalar(list(expected.layers), actual.get("layers"), "layers", mismatches)
|
||||
self._check_scalar(expected.limit, actual.get("limit"), "limit", mismatches)
|
||||
self._check_subset(expected.filters, actual.get("filters"), "filters", mismatches)
|
||||
plan_actual = actual.get("retrieval_plan")
|
||||
if isinstance(plan_actual, Mapping):
|
||||
self._check_assertions(case.expectations.retrieval_plan_assertions, plan_actual, "retrieval_plan", mismatches)
|
||||
self._check_conditional_filter_assertions(case.expectations.retrieval_plan_assertions, actual, mismatches)
|
||||
return mismatches
|
||||
|
||||
def _validate_rag(self, case: V4Case, actual: dict) -> list[str]:
|
||||
mismatches: list[str] = []
|
||||
rag_actual = actual.get("rag")
|
||||
if isinstance(rag_actual, Mapping):
|
||||
self._check_assertions(case.expectations.rag_assertions, rag_actual, "rag", mismatches)
|
||||
elif case.expectations.rag_assertions:
|
||||
mismatches.append("rag: expected mapping, got missing")
|
||||
return mismatches
|
||||
|
||||
def _validate_pipeline(self, case: V4Case, actual: dict) -> list[str]:
|
||||
mismatches: list[str] = []
|
||||
pipeline_actual = actual.get("pipeline")
|
||||
if isinstance(pipeline_actual, Mapping):
|
||||
self._check_assertions(case.expectations.pipeline_assertions, pipeline_actual, "pipeline", mismatches)
|
||||
elif case.expectations.pipeline_assertions:
|
||||
mismatches.append("pipeline: expected mapping, got missing")
|
||||
return mismatches
|
||||
|
||||
def _validate_llm(self, case: V4Case, actual: dict) -> list[str]:
|
||||
mismatches: list[str] = []
|
||||
expected = case.expectations.llm_assertions
|
||||
if not expected:
|
||||
return mismatches
|
||||
llm_actual = actual.get("llm")
|
||||
if not isinstance(llm_actual, Mapping):
|
||||
mismatches.append("llm: expected mapping, got missing")
|
||||
return mismatches
|
||||
answer = str(llm_actual.get("answer") or "")
|
||||
lowered = answer.lower()
|
||||
if "non_empty" in expected:
|
||||
want_non_empty = bool(expected.get("non_empty"))
|
||||
if want_non_empty and not answer.strip():
|
||||
mismatches.append("llm.non_empty: expected non-empty answer")
|
||||
if not want_non_empty and answer.strip():
|
||||
mismatches.append("llm.non_empty: expected empty answer")
|
||||
if "contains_all" in expected:
|
||||
missing = [token for token in self._string_list(expected.get("contains_all")) if token.lower() not in lowered]
|
||||
if missing:
|
||||
mismatches.append(f"llm.contains_all: missing {missing}")
|
||||
if "contains_any" in expected and not self._matches_contains_any(lowered, expected.get("contains_any")):
|
||||
mismatches.append(f"llm.contains_any: no expected variant matched answer '{answer[:200]}'")
|
||||
for key, value in expected.items():
|
||||
if key in {"non_empty", "contains_all", "contains_any"}:
|
||||
continue
|
||||
if key not in llm_actual:
|
||||
mismatches.append(f"llm.{key}: missing")
|
||||
continue
|
||||
self._check_assertions(value, llm_actual.get(key), f"llm.{key}", mismatches)
|
||||
return mismatches
|
||||
|
||||
def _check_scalar(self, expected: object, actual: object, label: str, mismatches: list[str]) -> None:
|
||||
if expected is not None and expected != actual:
|
||||
mismatches.append(f"{label}: expected {expected}, got {actual}")
|
||||
|
||||
def _check_subset(self, expected: object, actual: object, label: str, mismatches: list[str]) -> None:
|
||||
if expected in (None, {}, []):
|
||||
return
|
||||
if isinstance(expected, Mapping):
|
||||
if not isinstance(actual, Mapping):
|
||||
mismatches.append(f"{label}: expected dict subset, got {actual}")
|
||||
return
|
||||
for key, value in expected.items():
|
||||
next_label = f"{label}.{key}"
|
||||
if key not in actual:
|
||||
mismatches.append(f"{next_label}: missing")
|
||||
continue
|
||||
self._check_subset(value, actual.get(key), next_label, mismatches)
|
||||
return
|
||||
if expected != actual:
|
||||
mismatches.append(f"{label}: expected {expected}, got {actual}")
|
||||
|
||||
def _check_assertions(self, expected: object, actual: object, label: str, mismatches: list[str]) -> None:
|
||||
if expected in (None, {}, []):
|
||||
return
|
||||
if not isinstance(expected, Mapping):
|
||||
self._check_scalar(expected, actual, label, mismatches)
|
||||
return
|
||||
if not isinstance(actual, Mapping):
|
||||
mismatches.append(f"{label}: expected mapping, got {actual}")
|
||||
return
|
||||
for key, value in expected.items():
|
||||
if key == "if_anchor_present_then_filter_present":
|
||||
continue
|
||||
if key.endswith("_not_contains"):
|
||||
self._assert_not_contains(actual.get(key.removesuffix("_not_contains")), value, f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if key.endswith("_contains"):
|
||||
self._assert_contains(actual.get(key.removesuffix("_contains")), value, f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if key.endswith("_contains_any"):
|
||||
self._assert_contains_any(actual.get(key.removesuffix("_contains_any")), value, f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if key.endswith("_equals_any"):
|
||||
self._assert_equals_any(actual.get(key.removesuffix("_equals_any")), value, f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if key.endswith("_one_of"):
|
||||
self._assert_equals_any(actual.get(key.removesuffix("_one_of")), value, f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if value == "present":
|
||||
self._assert_present(actual.get(key), f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if value == "absent":
|
||||
self._assert_absent(actual, key, f"{label}.{key}", mismatches)
|
||||
continue
|
||||
if key not in actual:
|
||||
mismatches.append(f"{label}.{key}: missing")
|
||||
continue
|
||||
self._check_assertions(value, actual.get(key), f"{label}.{key}", mismatches)
|
||||
|
||||
def _assert_contains(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
|
||||
actual_list = self._as_list(actual)
|
||||
expected_list = self._as_list(expected)
|
||||
missing = [item for item in expected_list if item not in actual_list]
|
||||
if missing:
|
||||
mismatches.append(f"{label}: missing {missing}, got {actual_list}")
|
||||
|
||||
def _assert_not_contains(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
|
||||
actual_list = self._as_list(actual)
|
||||
expected_list = self._as_list(expected)
|
||||
present = [item for item in expected_list if item in actual_list]
|
||||
if present:
|
||||
mismatches.append(f"{label}: unexpected {present}, got {actual_list}")
|
||||
|
||||
def _assert_contains_any(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
|
||||
actual_list = self._as_list(actual)
|
||||
expected_list = self._as_list(expected)
|
||||
if not any(item in actual_list for item in expected_list):
|
||||
mismatches.append(f"{label}: expected any of {expected_list}, got {actual_list}")
|
||||
|
||||
def _assert_equals_any(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
|
||||
expected_list = self._as_list(expected)
|
||||
if actual not in expected_list:
|
||||
mismatches.append(f"{label}: expected any of {expected_list}, got {actual}")
|
||||
|
||||
def _assert_present(self, actual: object, label: str, mismatches: list[str]) -> None:
|
||||
if actual is None or actual == "" or actual == [] or actual == {}:
|
||||
mismatches.append(f"{label}: expected present, got {actual}")
|
||||
|
||||
def _assert_absent(self, actual: Mapping, key: str, label: str, mismatches: list[str]) -> None:
|
||||
if key in actual and actual.get(key) not in (None, "", [], {}):
|
||||
mismatches.append(f"{label}: expected absent, got {actual.get(key)}")
|
||||
|
||||
def _check_conditional_filter_assertions(self, expected: object, actual: Mapping, mismatches: list[str]) -> None:
|
||||
if not isinstance(expected, Mapping):
|
||||
return
|
||||
rules = expected.get("if_anchor_present_then_filter_present")
|
||||
if not isinstance(rules, Sequence) or isinstance(rules, (str, bytes, bytearray)):
|
||||
return
|
||||
for idx, rule in enumerate(rules):
|
||||
if not isinstance(rule, Mapping):
|
||||
continue
|
||||
anchor_path = str(rule.get("anchor") or "").strip()
|
||||
filter_path = str(rule.get("filter") or "").strip()
|
||||
if not anchor_path or not filter_path:
|
||||
continue
|
||||
anchor_value = self._resolve_path(actual.get("route"), anchor_path)
|
||||
if anchor_value in (None, "", [], {}):
|
||||
continue
|
||||
filter_value = self._resolve_path(actual.get("retrieval_plan"), filter_path)
|
||||
if filter_value in (None, "", [], {}):
|
||||
mismatches.append(
|
||||
f"conditional[{idx}]: expected {filter_path} present because {anchor_path} is present"
|
||||
)
|
||||
|
||||
def _resolve_path(self, value: object, path: str) -> object:
|
||||
current = value
|
||||
parts = [item for item in path.split(".") if item]
|
||||
for idx, part in enumerate(parts):
|
||||
if not isinstance(current, Mapping):
|
||||
return None
|
||||
remainder = ".".join(parts[idx:])
|
||||
if remainder in current:
|
||||
return current.get(remainder)
|
||||
if part not in current:
|
||||
return None
|
||||
current = current.get(part)
|
||||
return current
|
||||
|
||||
def _as_list(self, value: object) -> list[object]:
|
||||
if value is None:
|
||||
return []
|
||||
if isinstance(value, Sequence) and not isinstance(value, (str, bytes, bytearray)):
|
||||
return list(value)
|
||||
return [value]
|
||||
|
||||
def _string_list(self, value: object) -> list[str]:
|
||||
return [str(item) for item in self._as_list(value) if str(item).strip()]
|
||||
|
||||
def _matches_contains_any(self, lowered_answer: str, expected: object) -> bool:
|
||||
variants = self._as_list(expected)
|
||||
for variant in variants:
|
||||
tokens = self._string_list(variant)
|
||||
if not tokens:
|
||||
continue
|
||||
if all(token.lower() in lowered_answer for token in tokens):
|
||||
return True
|
||||
return False
|
||||
|
||||
@@ -0,0 +1,121 @@
|
||||
"""Run full `process v2` flow in the v4 harness.
|
||||
|
||||
This module adapts the existing v3 `V2ProcessAdapter` so pipeline_setup_v4 can
|
||||
execute the real route -> retrieval -> evidence -> workflow LLM chain without
|
||||
duplicating runtime logic.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from tests.pipeline_setup_v3.core.models import CaseExpectations, CaseInput, V3Case
|
||||
from tests.pipeline_setup_v3.runtime.v2_process_adapter import V2ProcessAdapter
|
||||
from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
|
||||
|
||||
|
||||
class ProcessV2FullChainExecutor:
|
||||
def __init__(self) -> None:
|
||||
self._adapter = V2ProcessAdapter(workflow_llm_enabled=True)
|
||||
|
||||
def execute(self, case: V4Case) -> ExecutionPayload:
|
||||
if not case.rag_session_id:
|
||||
raise ValueError(f"Case '{case.case_id}' requires rag_session_id")
|
||||
payload = self._adapter.execute(self._build_case(case), case.rag_session_id)
|
||||
route = dict(payload.details.get("router_result") or {})
|
||||
retrieval_plan = dict(payload.details.get("retrieval_plan") or {})
|
||||
rows = list(payload.details.get("rows") or [])
|
||||
rag_summary = _summarize_rows(rows)
|
||||
pipeline_steps = list(payload.details.get("pipeline_steps") or [])
|
||||
pipeline_summary = {
|
||||
"answer_mode": str(payload.actual.get("answer_mode") or ""),
|
||||
"workflow_llm_enabled": True,
|
||||
"step_count": len(pipeline_steps),
|
||||
"steps": [str(step.get("step") or "") for step in pipeline_steps if str(step.get("step") or "").strip()],
|
||||
}
|
||||
answer = str(payload.details.get("answer") or payload.actual.get("llm_answer") or "")
|
||||
actual = {
|
||||
"domain": payload.actual.get("domain"),
|
||||
"intent": payload.actual.get("intent"),
|
||||
"sub_intent": payload.actual.get("sub_intent"),
|
||||
"profile": retrieval_plan.get("profile"),
|
||||
"layers": list(retrieval_plan.get("layers") or []),
|
||||
"limit": retrieval_plan.get("limit"),
|
||||
"filters": dict(retrieval_plan.get("filters") or {}),
|
||||
"answer_mode": payload.actual.get("answer_mode"),
|
||||
"route": {
|
||||
"routing_domain": route.get("routing_domain"),
|
||||
"intent": route.get("intent"),
|
||||
"subintent": route.get("subintent"),
|
||||
"target_terms": list(route.get("target_terms") or []),
|
||||
"anchors": dict(route.get("anchors") or {}),
|
||||
},
|
||||
"retrieval_plan": {
|
||||
"profile": retrieval_plan.get("profile"),
|
||||
"layers": list(retrieval_plan.get("layers") or []),
|
||||
"limit": retrieval_plan.get("limit"),
|
||||
"filters": dict(retrieval_plan.get("filters") or {}),
|
||||
},
|
||||
"rag": rag_summary,
|
||||
"pipeline": pipeline_summary,
|
||||
"llm": {
|
||||
"answer": answer,
|
||||
"non_empty": bool(answer.strip()),
|
||||
"length": len(answer),
|
||||
},
|
||||
}
|
||||
details = {
|
||||
"query": case.query,
|
||||
"rag_session_id": case.rag_session_id,
|
||||
"route": route,
|
||||
"retrieval_plan": actual["retrieval_plan"],
|
||||
"rag": {
|
||||
**rag_summary,
|
||||
"rows": rows[:20],
|
||||
},
|
||||
"pipeline": pipeline_summary,
|
||||
"answer": answer,
|
||||
"pipeline_steps": pipeline_steps,
|
||||
"logs": list(payload.details.get("logs") or []),
|
||||
"evidence": dict(payload.details.get("evidence") or {}),
|
||||
}
|
||||
return ExecutionPayload(actual=actual, details=details)
|
||||
|
||||
def _build_case(self, case: V4Case) -> V3Case:
|
||||
return V3Case(
|
||||
case_id=case.case_id,
|
||||
runner="process_v2",
|
||||
mode="full_chain",
|
||||
query=case.query,
|
||||
source_file=case.source_file,
|
||||
input=CaseInput(rag_session_id=case.rag_session_id),
|
||||
expectations=CaseExpectations(),
|
||||
notes=case.notes,
|
||||
tags=case.tags,
|
||||
)
|
||||
|
||||
|
||||
def _summarize_rows(rows: list[dict]) -> dict[str, object]:
|
||||
paths: list[str] = []
|
||||
layers: list[str] = []
|
||||
metadata_domains: list[str] = []
|
||||
metadata_subdomains: list[str] = []
|
||||
for row in rows:
|
||||
path = str(row.get("path") or "").strip()
|
||||
layer = str(row.get("layer") or "").strip()
|
||||
metadata = dict(row.get("metadata") or {})
|
||||
domain = str(metadata.get("domain") or "").strip()
|
||||
subdomain = str(metadata.get("subdomain") or "").strip()
|
||||
if path and path not in paths:
|
||||
paths.append(path)
|
||||
if layer and layer not in layers:
|
||||
layers.append(layer)
|
||||
if domain and domain not in metadata_domains:
|
||||
metadata_domains.append(domain)
|
||||
if subdomain and subdomain not in metadata_subdomains:
|
||||
metadata_subdomains.append(subdomain)
|
||||
return {
|
||||
"row_count": len(rows),
|
||||
"paths": paths,
|
||||
"layers": layers,
|
||||
"metadata_domains": metadata_domains,
|
||||
"metadata_subdomains": metadata_subdomains,
|
||||
}
|
||||
@@ -0,0 +1,51 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import asdict
|
||||
|
||||
from app.core.agent.processes.v2.models import V2RouteAnchors, V2RouteResult
|
||||
from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
|
||||
from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
|
||||
|
||||
|
||||
class ProcessV2RetrievalPolicyExecutor:
|
||||
def __init__(self) -> None:
|
||||
self._resolver = V2RetrievalPolicyResolver()
|
||||
|
||||
def execute(self, case: V4Case) -> ExecutionPayload:
|
||||
route = self._build_route(case.route)
|
||||
plan = self._resolver.resolve(route)
|
||||
actual = {
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
}
|
||||
details = {
|
||||
"route": asdict(route),
|
||||
"plan": actual,
|
||||
}
|
||||
return ExecutionPayload(actual=actual, details=details)
|
||||
|
||||
def _build_route(self, raw: dict[str, object]) -> V2RouteResult:
|
||||
anchors_raw = dict(raw.get("anchors") or {})
|
||||
return V2RouteResult(
|
||||
routing_domain=str(raw.get("routing_domain") or ""),
|
||||
intent=str(raw.get("intent") or ""),
|
||||
subintent=str(raw.get("subintent") or ""),
|
||||
user_query=str(raw.get("user_query") or raw.get("normalized_query") or raw.get("name") or "resolver case"),
|
||||
normalized_query=str(raw.get("normalized_query") or raw.get("user_query") or "resolver case"),
|
||||
target_terms=[str(item) for item in raw.get("target_terms") or [] if str(item).strip()],
|
||||
anchors=V2RouteAnchors(
|
||||
entity_names=[str(item) for item in anchors_raw.get("entity_names") or [] if str(item).strip()],
|
||||
file_names=[str(item) for item in anchors_raw.get("file_names") or [] if str(item).strip()],
|
||||
endpoint_paths=[str(item) for item in anchors_raw.get("endpoint_paths") or [] if str(item).strip()],
|
||||
target_doc_hints=[str(item) for item in anchors_raw.get("target_doc_hints") or [] if str(item).strip()],
|
||||
matched_aliases=[str(item) for item in anchors_raw.get("matched_aliases") or [] if str(item).strip()],
|
||||
process_domain=str(anchors_raw.get("process_domain") or "").strip() or None,
|
||||
process_subdomain=str(anchors_raw.get("process_subdomain") or "").strip() or None,
|
||||
),
|
||||
confidence=float(raw.get("confidence") or 1.0),
|
||||
routing_mode=str(raw.get("routing_mode") or "test_fixture"),
|
||||
llm_router_used=bool(raw.get("llm_router_used") or False),
|
||||
reason_short=str(raw.get("reason_short") or "fixture route"),
|
||||
)
|
||||
@@ -22,13 +22,23 @@ class _KeywordLlm:
|
||||
"где находится",
|
||||
"найди файл",
|
||||
"найди файлы",
|
||||
"show doc",
|
||||
"show file",
|
||||
"doc for",
|
||||
"file with",
|
||||
)
|
||||
_DOC_MARKERS = (
|
||||
"документац",
|
||||
"endpoint",
|
||||
"эндпоинт",
|
||||
"архитект",
|
||||
"architecture",
|
||||
"overview архитектуры",
|
||||
"arch overview",
|
||||
"процесс",
|
||||
"process",
|
||||
"flow",
|
||||
"workflow",
|
||||
"сущност",
|
||||
"worker",
|
||||
"цикл отправки уведомлений",
|
||||
@@ -43,6 +53,10 @@ class _KeywordLlm:
|
||||
"/health",
|
||||
"/send",
|
||||
"/actions/{action}",
|
||||
"billing invoice process",
|
||||
"billing invoice flow",
|
||||
"billing invoice docs",
|
||||
"notify app",
|
||||
)
|
||||
_GENERAL_MARKERS = (
|
||||
"что это за сервис",
|
||||
@@ -67,7 +81,7 @@ class _KeywordLlm:
|
||||
return json.dumps(route, ensure_ascii=False)
|
||||
|
||||
def _select(self, query: str) -> dict[str, object]:
|
||||
if any(marker in query for marker in self._FILE_MARKERS) or ("дока" in query and "покажи" in query):
|
||||
if any(marker in query for marker in self._FILE_MARKERS) or ("дока" in query and "покажи" in query) or ".md" in query:
|
||||
return self._route("DOCS", "DOC_EXPLAIN", "FIND_FILES", "file lookup")
|
||||
if any(marker in query for marker in self._GENERAL_MARKERS):
|
||||
return self._route("GENERAL", "GENERAL_QA", "SUMMARY", "general overview")
|
||||
|
||||
@@ -0,0 +1,79 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import asdict
|
||||
|
||||
from app.core.agent.processes.v2 import V2IntentRouter
|
||||
from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
|
||||
from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
|
||||
from tests.pipeline_setup_v4.executors.process_v2_router_executor import _KeywordLlm
|
||||
|
||||
|
||||
class ProcessV2RouterPlusPolicyExecutor:
|
||||
def __init__(self) -> None:
|
||||
self._router = V2IntentRouter(llm=_KeywordLlm(), enable_llm_disambiguation=True)
|
||||
self._resolver = V2RetrievalPolicyResolver()
|
||||
|
||||
def execute(self, case: V4Case) -> ExecutionPayload:
|
||||
route = self._router.route(case.query)
|
||||
plan = self._resolver.resolve(route)
|
||||
route_dump = asdict(route)
|
||||
actual = {
|
||||
"domain": route.routing_domain,
|
||||
"intent": route.intent,
|
||||
"sub_intent": route.subintent,
|
||||
"routing_mode": route.routing_mode,
|
||||
"llm_router_used": route.llm_router_used,
|
||||
"confidence": route.confidence,
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
"route": {
|
||||
"routing_domain": route.routing_domain,
|
||||
"intent": route.intent,
|
||||
"subintent": route.subintent,
|
||||
"target_terms": list(route.target_terms),
|
||||
"anchors": route_dump.get("anchors") or {},
|
||||
},
|
||||
"retrieval_plan": {
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
},
|
||||
}
|
||||
details = {
|
||||
"query": case.query,
|
||||
"route": route_dump,
|
||||
"plan": {
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
},
|
||||
"pipeline_steps": [
|
||||
{
|
||||
"step": "intent_router",
|
||||
"input": {"query": case.query},
|
||||
"output": {
|
||||
"domain": route.routing_domain,
|
||||
"intent": route.intent,
|
||||
"sub_intent": route.subintent,
|
||||
"reason_short": route.reason_short,
|
||||
"target_terms": list(route.target_terms),
|
||||
"anchors": route_dump.get("anchors") or {},
|
||||
},
|
||||
},
|
||||
{
|
||||
"step": "retrieval_policy_resolver",
|
||||
"input": {"route": route_dump},
|
||||
"output": {
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
},
|
||||
},
|
||||
],
|
||||
}
|
||||
return ExecutionPayload(actual=actual, details=details)
|
||||
@@ -0,0 +1,94 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from dataclasses import asdict
|
||||
|
||||
from app.core.agent.processes.v2 import V2IntentRouter
|
||||
from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
|
||||
from app.core.agent.processes.v2.retrieval.v2_rag_adapter import V2RagRetrievalAdapter
|
||||
from app.core.rag.persistence.repository import RagRepository
|
||||
from app.core.rag.retrieval.session_retriever import RagSessionRetriever
|
||||
from tests.pipeline_setup_v3.shared.rag_indexer import DeterministicEmbedder
|
||||
from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
|
||||
from tests.pipeline_setup_v4.executors.process_v2_router_executor import _KeywordLlm
|
||||
|
||||
|
||||
class ProcessV2RouterPlusPolicyRagExecutor:
|
||||
def __init__(self) -> None:
|
||||
self._router = V2IntentRouter(llm=_KeywordLlm(), enable_llm_disambiguation=True)
|
||||
self._resolver = V2RetrievalPolicyResolver()
|
||||
self._adapter = V2RagRetrievalAdapter(RagSessionRetriever(RagRepository(), DeterministicEmbedder()))
|
||||
|
||||
def execute(self, case: V4Case) -> ExecutionPayload:
|
||||
if not case.rag_session_id:
|
||||
raise ValueError(f"Case '{case.case_id}' requires rag_session_id")
|
||||
return asyncio.run(self._execute_async(case))
|
||||
|
||||
async def _execute_async(self, case: V4Case) -> ExecutionPayload:
|
||||
route = self._router.route(case.query)
|
||||
plan = self._resolver.resolve(route)
|
||||
rows = await self._adapter.fetch_rows(case.rag_session_id or "", route.normalized_query, plan)
|
||||
route_dump = asdict(route)
|
||||
rag_summary = _summarize_rows(rows)
|
||||
actual = {
|
||||
"domain": route.routing_domain,
|
||||
"intent": route.intent,
|
||||
"sub_intent": route.subintent,
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
"route": {
|
||||
"routing_domain": route.routing_domain,
|
||||
"intent": route.intent,
|
||||
"subintent": route.subintent,
|
||||
"target_terms": list(route.target_terms),
|
||||
"anchors": route_dump.get("anchors") or {},
|
||||
},
|
||||
"retrieval_plan": {
|
||||
"profile": plan.profile,
|
||||
"layers": list(plan.layers),
|
||||
"limit": plan.limit,
|
||||
"filters": dict(plan.filters),
|
||||
},
|
||||
"rag": rag_summary,
|
||||
}
|
||||
details = {
|
||||
"query": case.query,
|
||||
"rag_session_id": case.rag_session_id,
|
||||
"route": route_dump,
|
||||
"plan": actual["retrieval_plan"],
|
||||
"rag": {
|
||||
**rag_summary,
|
||||
"rows": rows[:20],
|
||||
},
|
||||
}
|
||||
return ExecutionPayload(actual=actual, details=details)
|
||||
|
||||
|
||||
def _summarize_rows(rows: list[dict]) -> dict[str, object]:
|
||||
paths: list[str] = []
|
||||
layers: list[str] = []
|
||||
metadata_domains: list[str] = []
|
||||
metadata_subdomains: list[str] = []
|
||||
for row in rows:
|
||||
path = str(row.get("path") or "").strip()
|
||||
layer = str(row.get("layer") or "").strip()
|
||||
metadata = dict(row.get("metadata") or {})
|
||||
domain = str(metadata.get("domain") or "").strip()
|
||||
subdomain = str(metadata.get("subdomain") or "").strip()
|
||||
if path and path not in paths:
|
||||
paths.append(path)
|
||||
if layer and layer not in layers:
|
||||
layers.append(layer)
|
||||
if domain and domain not in metadata_domains:
|
||||
metadata_domains.append(domain)
|
||||
if subdomain and subdomain not in metadata_subdomains:
|
||||
metadata_subdomains.append(subdomain)
|
||||
return {
|
||||
"row_count": len(rows),
|
||||
"paths": paths,
|
||||
"layers": layers,
|
||||
"metadata_domains": metadata_domains,
|
||||
"metadata_subdomains": metadata_subdomains,
|
||||
}
|
||||
@@ -1,18 +1,56 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from tests.pipeline_setup_v4.executors.process_v2_full_chain_executor import ProcessV2FullChainExecutor
|
||||
from tests.pipeline_setup_v4.executors.process_v2_retrieval_policy_executor import ProcessV2RetrievalPolicyExecutor
|
||||
from tests.pipeline_setup_v4.executors.process_v2_router_plus_policy_executor import ProcessV2RouterPlusPolicyExecutor
|
||||
from tests.pipeline_setup_v4.executors.process_v2_router_plus_policy_rag_executor import (
|
||||
ProcessV2RouterPlusPolicyRagExecutor,
|
||||
)
|
||||
from tests.pipeline_setup_v4.executors.process_v2_router_executor import ProcessV2IntentRouterExecutor
|
||||
|
||||
|
||||
class ExecutorRegistry:
|
||||
def __init__(self) -> None:
|
||||
self._router_executor: ProcessV2IntentRouterExecutor | None = None
|
||||
self._policy_executor: ProcessV2RetrievalPolicyExecutor | None = None
|
||||
self._router_plus_policy_executor: ProcessV2RouterPlusPolicyExecutor | None = None
|
||||
self._router_plus_policy_rag_executor: ProcessV2RouterPlusPolicyRagExecutor | None = None
|
||||
self._full_chain_executor: ProcessV2FullChainExecutor | None = None
|
||||
|
||||
def execute(self, component: str, case) -> object:
|
||||
if component == "process_v2_intent_router":
|
||||
return self._router().execute(case)
|
||||
if component == "process_v2_retrieval_policy_resolver":
|
||||
return self._policy().execute(case)
|
||||
if component == "process_v2_router_plus_retrieval_policy":
|
||||
return self._router_plus_policy().execute(case)
|
||||
if component == "process_v2_router_plus_retrieval_policy_rag":
|
||||
return self._router_plus_policy_rag().execute(case)
|
||||
if component == "process_v2_full_chain":
|
||||
return self._full_chain().execute(case)
|
||||
raise ValueError(f"Unsupported component: {component}")
|
||||
|
||||
def _router(self) -> ProcessV2IntentRouterExecutor:
|
||||
if self._router_executor is None:
|
||||
self._router_executor = ProcessV2IntentRouterExecutor()
|
||||
return self._router_executor
|
||||
|
||||
def _policy(self) -> ProcessV2RetrievalPolicyExecutor:
|
||||
if self._policy_executor is None:
|
||||
self._policy_executor = ProcessV2RetrievalPolicyExecutor()
|
||||
return self._policy_executor
|
||||
|
||||
def _router_plus_policy(self) -> ProcessV2RouterPlusPolicyExecutor:
|
||||
if self._router_plus_policy_executor is None:
|
||||
self._router_plus_policy_executor = ProcessV2RouterPlusPolicyExecutor()
|
||||
return self._router_plus_policy_executor
|
||||
|
||||
def _router_plus_policy_rag(self) -> ProcessV2RouterPlusPolicyRagExecutor:
|
||||
if self._router_plus_policy_rag_executor is None:
|
||||
self._router_plus_policy_rag_executor = ProcessV2RouterPlusPolicyRagExecutor()
|
||||
return self._router_plus_policy_rag_executor
|
||||
|
||||
def _full_chain(self) -> ProcessV2FullChainExecutor:
|
||||
if self._full_chain_executor is None:
|
||||
self._full_chain_executor = ProcessV2FullChainExecutor()
|
||||
return self._full_chain_executor
|
||||
|
||||
@@ -78,3 +78,32 @@ def test_find_files_prefers_exact_path_match() -> None:
|
||||
|
||||
assert files[0].path == "docs/domains/runtime-health-entity.md"
|
||||
assert files[0].match_reason in {"exact_path", "alias_match"}
|
||||
|
||||
|
||||
def test_summary_ranking_penalizes_overview_doc_when_specific_api_doc_exists() -> None:
|
||||
rows = [
|
||||
{
|
||||
"path": "docs/overview/health-overview.md",
|
||||
"title": "Health overview",
|
||||
"content": "",
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"metadata": {"summary_text": "Navigation page with related docs.", "document_id": "docs.health_overview"},
|
||||
},
|
||||
{
|
||||
"path": "docs/api/health-endpoint.md",
|
||||
"title": "Health endpoint",
|
||||
"content": "",
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"metadata": {"summary_text": "GET /health returns runtime status.", "document_id": "api.health"},
|
||||
},
|
||||
]
|
||||
route = _route(
|
||||
hints=["health", "/health", "health endpoint"],
|
||||
terms=["health"],
|
||||
)
|
||||
|
||||
docs = DocsEvidenceAssembler().assemble_summaries(rows, route)
|
||||
|
||||
assert docs[0].path == "docs/api/health-endpoint.md"
|
||||
assert docs[0].score_breakdown["specificity_boost"] > docs[1].score_breakdown["specificity_boost"]
|
||||
assert docs[1].score_breakdown["generic_penalty"] < 0
|
||||
|
||||
@@ -96,3 +96,38 @@ def test_router_reduces_confidence_for_short_vague_query() -> None:
|
||||
result = V2IntentRouter(llm=FakeLlm(_llm_response("GENERAL", "GENERAL_QA", "SUMMARY", confidence=0.8))).route("Что это?")
|
||||
|
||||
assert result.confidence < 0.8
|
||||
|
||||
|
||||
def test_router_routes_doc_path_to_find_files() -> None:
|
||||
result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("docs/api/health-endpoint.md")
|
||||
|
||||
assert result.subintent == "FIND_FILES"
|
||||
assert result.anchors.file_names == ["docs/api/health-endpoint.md"]
|
||||
assert result.anchors.endpoint_paths == []
|
||||
|
||||
|
||||
def test_router_routes_file_token_to_find_files() -> None:
|
||||
result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("health-endpoint.md")
|
||||
|
||||
assert result.subintent == "FIND_FILES"
|
||||
assert result.anchors.file_names == ["health-endpoint.md"]
|
||||
assert result.anchors.endpoint_paths == []
|
||||
|
||||
|
||||
def test_router_promotes_api_method_query_to_endpoint_specific_docs_summary() -> None:
|
||||
result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("Как работает метод health?")
|
||||
|
||||
assert result.intent == "DOC_EXPLAIN"
|
||||
assert result.subintent == "SUMMARY"
|
||||
assert result.anchors.endpoint_paths == ["/health"]
|
||||
assert "docs/api/health-endpoint.md" in result.anchors.target_doc_hints
|
||||
|
||||
|
||||
def test_router_keeps_short_api_like_token_as_strong_hint_without_explicit_path() -> None:
|
||||
result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("Что делает health?")
|
||||
|
||||
assert result.intent == "DOC_EXPLAIN"
|
||||
assert result.subintent == "SUMMARY"
|
||||
assert result.anchors.endpoint_paths == []
|
||||
assert "health endpoint" in result.anchors.target_doc_hints
|
||||
assert "health" in result.target_terms
|
||||
|
||||
@@ -51,6 +51,7 @@ def test_file_names_accepts_real_doc_path() -> None:
|
||||
anchors = V2AnchorExtractor().extract("docs/api/health.md", terms).anchors
|
||||
|
||||
assert anchors.file_names == ["docs/api/health.md"]
|
||||
assert anchors.endpoint_paths == []
|
||||
|
||||
|
||||
def test_file_names_rejects_endpoint_path() -> None:
|
||||
@@ -60,8 +61,63 @@ def test_file_names_rejects_endpoint_path() -> None:
|
||||
assert anchors.file_names == []
|
||||
|
||||
|
||||
def test_target_terms_drop_noisy_english_file_words() -> None:
|
||||
analysis = V2TargetTermsExtractor().extract("pls show doc for /health")
|
||||
|
||||
assert analysis.target_terms == ["/health"]
|
||||
|
||||
|
||||
def test_doc_path_does_not_become_endpoint_path() -> None:
|
||||
analysis = V2TargetTermsExtractor().extract("docs/api/health-endpoint.md")
|
||||
|
||||
assert analysis.endpoint_paths == []
|
||||
|
||||
|
||||
def test_target_terms_drop_architecture_marker_words() -> None:
|
||||
analysis = V2TargetTermsExtractor().extract("Объясни architecture overview сервиса уведомлений")
|
||||
|
||||
assert "объясни" not in analysis.target_terms
|
||||
assert "architecture" not in analysis.target_terms
|
||||
assert "overview" not in analysis.target_terms
|
||||
|
||||
|
||||
def test_anchor_extractor_extracts_process_domain_and_subdomain() -> None:
|
||||
terms = V2TargetTermsExtractor().extract("Объясни billing invoice process")
|
||||
anchors = V2AnchorExtractor().extract("Объясни billing invoice process", terms).anchors
|
||||
|
||||
assert anchors.process_domain == "billing"
|
||||
assert anchors.process_subdomain == "invoice"
|
||||
|
||||
|
||||
def test_file_names_rejects_identifier_like_token() -> None:
|
||||
terms = V2TargetTermsExtractor().extract("telegram_notify")
|
||||
anchors = V2AnchorExtractor().extract("telegram_notify", terms).anchors
|
||||
|
||||
assert anchors.file_names == []
|
||||
|
||||
|
||||
def test_target_terms_extracts_api_like_anchor_from_method_query() -> None:
|
||||
analysis = V2TargetTermsExtractor().extract("Как работает метод health?")
|
||||
|
||||
assert analysis.target_terms == ["/health", "health"]
|
||||
assert analysis.endpoint_paths == ["/health"]
|
||||
assert analysis.api_like_terms == ["health"]
|
||||
|
||||
|
||||
def test_anchor_extractor_builds_endpoint_hints_for_short_api_like_query() -> None:
|
||||
terms = V2TargetTermsExtractor().extract("Что делает health?")
|
||||
anchors = V2AnchorExtractor().extract("Что делает health?", terms).anchors
|
||||
|
||||
assert anchors.endpoint_paths == []
|
||||
assert "health" in anchors.target_doc_hints
|
||||
assert "/health" in anchors.target_doc_hints
|
||||
assert "health endpoint" in anchors.target_doc_hints
|
||||
|
||||
|
||||
def test_anchor_extractor_keeps_templated_endpoint_for_docs_query() -> None:
|
||||
terms = V2TargetTermsExtractor().extract("Расскажи про endpoint /users/{id}")
|
||||
anchors = V2AnchorExtractor().extract("Расскажи про endpoint /users/{id}", terms).anchors
|
||||
|
||||
assert anchors.endpoint_paths == ["/users/{id}"]
|
||||
assert "/users/{id}" in anchors.target_doc_hints
|
||||
assert "users endpoint" in anchors.target_doc_hints
|
||||
|
||||
@@ -284,3 +284,42 @@ def test_v2_process_can_disable_workflow_llm_for_general_summary() -> None:
|
||||
|
||||
assert "агрегированный статус runtime" in result.answer
|
||||
assert llm.calls == []
|
||||
|
||||
|
||||
def test_v2_process_prefers_canonical_health_doc_over_readme_for_method_query() -> None:
|
||||
llm = FakeLlm("Health explanation.")
|
||||
adapter = FakeRagAdapter(
|
||||
summary_rows=[
|
||||
{
|
||||
"path": "docs/README.md",
|
||||
"title": "README",
|
||||
"content": "",
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"metadata": {"summary_text": "General documentation index.", "document_id": "docs.readme"},
|
||||
},
|
||||
{
|
||||
"path": "docs/api/health-endpoint.md",
|
||||
"title": "Health endpoint",
|
||||
"content": "",
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"metadata": {
|
||||
"summary_text": "GET /health returns aggregated runtime status.",
|
||||
"document_id": "api.health",
|
||||
},
|
||||
},
|
||||
],
|
||||
file_rows=[],
|
||||
)
|
||||
process = _v2_process(llm, adapter)
|
||||
runtime = _context("Как работает метод health?")
|
||||
|
||||
result = asyncio.run(process.run(runtime))
|
||||
|
||||
assert result.answer == "Health explanation."
|
||||
assert llm.calls
|
||||
assert "docs/api/health-endpoint.md" in llm.calls[0][1]
|
||||
assert "docs/README.md" not in llm.calls[0][1]
|
||||
pipeline_events = [payload for _, title, payload in runtime.trace.events if title == "retrieval_profile_selected"]
|
||||
assert pipeline_events[0]["profile"] == "docs_api_method_explain"
|
||||
evidence_events = [payload for _, title, payload in runtime.trace.events if title == "evidence_assembled"]
|
||||
assert any(event.get("primary_doc") == "docs/api/health-endpoint.md" for event in evidence_events if isinstance(event, dict))
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
|
||||
from app.core.agent.processes.v2.retrieval.v2_rag_adapter import V2RagRetrievalAdapter
|
||||
from app.core.rag.retrieval.session_retriever import RetrievalPlan
|
||||
|
||||
|
||||
class FakeRetriever:
|
||||
def __init__(self) -> None:
|
||||
self.calls: list[tuple[str, object]] = []
|
||||
|
||||
async def retrieve(self, _rag_session_id: str, _query_text: str, _plan: RetrievalPlan) -> list[dict]:
|
||||
self.calls.append(("semantic", None))
|
||||
return [
|
||||
{
|
||||
"path": "docs/api/health-endpoint.md",
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"metadata": {},
|
||||
},
|
||||
{
|
||||
"path": "docs/api/secondary.md",
|
||||
"layer": "D0_DOC_CHUNKS",
|
||||
"metadata": {},
|
||||
},
|
||||
]
|
||||
|
||||
async def retrieve_exact_files(self, _rag_session_id: str, *, paths: list[str], layers=None, limit: int = 200) -> list[dict]:
|
||||
del layers, limit
|
||||
self.calls.append(("exact", list(paths)))
|
||||
if "docs/api/health-endpoint.md" in paths:
|
||||
return [
|
||||
{
|
||||
"path": "docs/api/health-endpoint.md",
|
||||
"layer": "D1_DOCUMENT_CATALOG",
|
||||
"metadata": {},
|
||||
}
|
||||
]
|
||||
return []
|
||||
|
||||
async def retrieve_chunks_by_path_substrings(
|
||||
self,
|
||||
_rag_session_id: str,
|
||||
*,
|
||||
path_needles: list[str],
|
||||
layers=None,
|
||||
limit: int = 200,
|
||||
) -> list[dict]:
|
||||
del layers, limit
|
||||
self.calls.append(("substring", list(path_needles)))
|
||||
return []
|
||||
|
||||
|
||||
def test_v2_rag_adapter_seeds_exact_rows_from_plan_hints() -> None:
|
||||
adapter = V2RagRetrievalAdapter(FakeRetriever())
|
||||
plan = RetrievalPlan(
|
||||
profile="docs_summary_api_endpoint",
|
||||
layers=["D1_DOCUMENT_CATALOG", "D2_FACT_INDEX", "D0_DOC_CHUNKS"],
|
||||
limit=8,
|
||||
filters={"target_doc_hints": ["docs/api/health-endpoint.md"]},
|
||||
)
|
||||
|
||||
rows = asyncio.run(adapter.fetch_rows("rag-1", "explain /health", plan))
|
||||
|
||||
assert rows[0]["path"] == "docs/api/health-endpoint.md"
|
||||
assert len(rows) == 2
|
||||
|
||||
|
||||
def test_v2_rag_adapter_uses_substring_fallback_for_missing_hint() -> None:
|
||||
retriever = FakeRetriever()
|
||||
adapter = V2RagRetrievalAdapter(retriever)
|
||||
plan = RetrievalPlan(
|
||||
profile="file_lookup",
|
||||
layers=["D1_DOCUMENT_CATALOG", "D3_ENTITY_CATALOG"],
|
||||
limit=12,
|
||||
filters={"target_doc_hints": ["docs/api/missing-health-endpoint.md"]},
|
||||
)
|
||||
|
||||
asyncio.run(adapter.fetch_rows("rag-1", "find file", plan))
|
||||
|
||||
assert ("substring", ["missing-health-endpoint.md"]) in retriever.calls
|
||||
@@ -4,46 +4,132 @@ from app.core.agent.processes.v2.models import V2Domain, V2Intent, V2RouteAnchor
|
||||
from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
|
||||
|
||||
|
||||
def _route(*, hints: list[str], endpoint_paths: list[str] | None = None, subintent: str = "SUMMARY", intent: str = "DOC_EXPLAIN") -> V2RouteResult:
|
||||
def _route(
|
||||
*,
|
||||
intent: str = V2Intent.DOC_EXPLAIN,
|
||||
subintent: str = V2Subintent.SUMMARY,
|
||||
entity_names: list[str] | None = None,
|
||||
file_names: list[str] | None = None,
|
||||
endpoint_paths: list[str] | None = None,
|
||||
target_doc_hints: list[str] | None = None,
|
||||
matched_aliases: list[str] | None = None,
|
||||
process_domain: str | None = None,
|
||||
process_subdomain: str | None = None,
|
||||
) -> V2RouteResult:
|
||||
return V2RouteResult(
|
||||
routing_domain=V2Domain.DOCS if intent == V2Intent.DOC_EXPLAIN else V2Domain.GENERAL,
|
||||
intent=intent,
|
||||
subintent=subintent,
|
||||
user_query="q",
|
||||
normalized_query="q",
|
||||
anchors=V2RouteAnchors(target_doc_hints=hints, endpoint_paths=endpoint_paths or []),
|
||||
anchors=V2RouteAnchors(
|
||||
entity_names=entity_names or [],
|
||||
file_names=file_names or [],
|
||||
endpoint_paths=endpoint_paths or [],
|
||||
target_doc_hints=target_doc_hints or [],
|
||||
matched_aliases=matched_aliases or [],
|
||||
process_domain=process_domain,
|
||||
process_subdomain=process_subdomain,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def test_policy_prefers_api_docs_for_endpoint_queries() -> None:
|
||||
def test_policy_maps_api_summary_to_fact_layers() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(
|
||||
_route(hints=["docs/api/health-endpoint.md"], endpoint_paths=["/health"])
|
||||
_route(
|
||||
endpoint_paths=["/health"],
|
||||
target_doc_hints=["docs/api/health-endpoint.md"],
|
||||
)
|
||||
)
|
||||
|
||||
assert plan.profile == "docs_summary_api_endpoint"
|
||||
assert plan.filters["path_prefixes"] == ["docs/api/", "docs/architecture/", "docs/"]
|
||||
assert plan.filters["prefer_path_prefixes"][0] == "docs/api/"
|
||||
assert plan.profile == "docs_api_method_explain"
|
||||
assert plan.layers == ["D1_DOCUMENT_CATALOG", "D2_FACT_INDEX", "D0_DOC_CHUNKS"]
|
||||
assert plan.filters["path_prefixes"] == [
|
||||
"docs/api/",
|
||||
"docs/endpoints/",
|
||||
"docs/methods/",
|
||||
"api/",
|
||||
"endpoints/",
|
||||
"methods/",
|
||||
]
|
||||
assert plan.filters["target_doc_hints"] == ["docs/api/health-endpoint.md"]
|
||||
|
||||
|
||||
def test_policy_prefers_logic_docs_for_logic_queries() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(_route(hints=["docs/logic/telegram-notification-loop.md"]))
|
||||
def test_policy_maps_logic_summary_to_workflow_layers_and_metadata_filters() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(
|
||||
_route(
|
||||
matched_aliases=["logic flow"],
|
||||
process_domain="notifications",
|
||||
process_subdomain="delivery_loop",
|
||||
)
|
||||
)
|
||||
|
||||
assert plan.profile == "docs_summary_logic_flow"
|
||||
assert plan.layers == ["D4_WORKFLOW_INDEX", "D1_DOCUMENT_CATALOG", "D0_DOC_CHUNKS"]
|
||||
assert plan.filters["metadata.domain"] == "notifications"
|
||||
assert plan.filters["metadata.subdomain"] == "delivery_loop"
|
||||
assert plan.filters["prefer_path_prefixes"][0] == "docs/logic/"
|
||||
|
||||
|
||||
def test_policy_uses_deterministic_find_files_profile() -> None:
|
||||
def test_policy_maps_entity_summary_to_entity_layers() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(_route(entity_names=["RuntimeManager"]))
|
||||
|
||||
assert plan.profile == "docs_summary_domain_entity"
|
||||
assert plan.layers == ["D3_ENTITY_CATALOG", "D1_DOCUMENT_CATALOG", "D0_DOC_CHUNKS"]
|
||||
assert "%runtimemanager%" in plan.filters["prefer_like_patterns"]
|
||||
|
||||
|
||||
def test_policy_keeps_api_method_profile_even_with_additional_entity_signal() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(
|
||||
_route(hints=["docs/api/health-endpoint.md"], endpoint_paths=["/health"], subintent=V2Subintent.FIND_FILES)
|
||||
_route(
|
||||
endpoint_paths=["/health"],
|
||||
entity_names=["RuntimeManager"],
|
||||
)
|
||||
)
|
||||
|
||||
assert plan.profile == "docs_api_method_explain"
|
||||
assert plan.layers == ["D1_DOCUMENT_CATALOG", "D2_FACT_INDEX", "D0_DOC_CHUNKS"]
|
||||
|
||||
|
||||
def test_policy_uses_api_method_profile_for_endpoint_like_hints_without_explicit_path() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(
|
||||
_route(
|
||||
target_doc_hints=["health", "/health", "health endpoint"],
|
||||
)
|
||||
)
|
||||
|
||||
assert plan.profile == "docs_api_method_explain"
|
||||
assert "%health%" in plan.filters["prefer_like_patterns"]
|
||||
|
||||
|
||||
def test_policy_uses_hard_and_soft_filters_for_find_files() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(
|
||||
_route(
|
||||
subintent=V2Subintent.FIND_FILES,
|
||||
file_names=["docs/workflows/manual-send.md"],
|
||||
entity_names=["ManualSendWorker"],
|
||||
matched_aliases=["manual send"],
|
||||
process_domain="messaging",
|
||||
process_subdomain="manual_send",
|
||||
)
|
||||
)
|
||||
|
||||
assert plan.profile == "file_lookup"
|
||||
assert plan.layers == ["D1_DOCUMENT_CATALOG", "D3_ENTITY_CATALOG"]
|
||||
assert "health-endpoint.md" in plan.filters["prefer_like_patterns"][0]
|
||||
assert plan.filters["path_prefixes"] == ["docs/workflows/"]
|
||||
assert plan.filters["metadata.domain"] == "messaging"
|
||||
assert "%manualsendworker%" in plan.filters["prefer_like_patterns"]
|
||||
|
||||
|
||||
def test_policy_uses_grounded_general_profile() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(_route(hints=[], intent=V2Intent.GENERAL_QA))
|
||||
def test_policy_keeps_general_routes_in_general_profile() -> None:
|
||||
plan = V2RetrievalPolicyResolver().resolve(
|
||||
_route(
|
||||
intent=V2Intent.GENERAL_QA,
|
||||
endpoint_paths=["/health"],
|
||||
target_doc_hints=["docs/api/health-endpoint.md"],
|
||||
)
|
||||
)
|
||||
|
||||
assert plan.profile == "general_qa_grounded_summary"
|
||||
assert plan.filters["prefer_path_prefixes"][0] == "docs/architecture/"
|
||||
assert plan.layers == ["D1_DOCUMENT_CATALOG", "D0_DOC_CHUNKS"]
|
||||
assert "path_prefixes" not in plan.filters
|
||||
|
||||
@@ -1,4 +1,8 @@
|
||||
import logging
|
||||
|
||||
from app.core.rag.contracts.enums import RagLayer
|
||||
from app.core.rag.indexing.docs.chunkers.markdown_chunker import SectionChunk
|
||||
from app.core.rag.indexing.docs.integration_extractor import DocsIntegrationExtractor
|
||||
from app.core.rag.indexing.docs.pipeline import DocsIndexingPipeline
|
||||
|
||||
|
||||
@@ -153,3 +157,150 @@ Create invoice
|
||||
assert integration_doc.metadata["target"] == "db.billing.invoices"
|
||||
assert integration_doc.metadata["target_type"] == "db"
|
||||
assert integration_doc.metadata["details"]["transaction"] == "required"
|
||||
|
||||
|
||||
def test_docs_integration_extractor_keeps_valid_blocks() -> None:
|
||||
extractor = DocsIntegrationExtractor()
|
||||
sections = [
|
||||
SectionChunk(
|
||||
section_path="Details > Интеграции > Billing DB",
|
||||
section_title="Billing DB",
|
||||
content=(
|
||||
"- target: db.billing.invoices\n"
|
||||
"- target_type: db\n"
|
||||
"- direction: outbound\n"
|
||||
"- interaction: writes\n"
|
||||
"- via: invoice repository\n"
|
||||
"- purpose: persist created invoices\n"
|
||||
"- details:\n"
|
||||
" - transaction: required\n"
|
||||
" - tables:\n"
|
||||
" - invoices\n"
|
||||
" - invoice_items\n"
|
||||
),
|
||||
order=0,
|
||||
)
|
||||
]
|
||||
|
||||
records = extractor.extract(sections, path="docs/billing/create_invoice.md")
|
||||
|
||||
assert len(records) == 1
|
||||
assert records[0].target == "db.billing.invoices"
|
||||
assert records[0].details["transaction"] == "required"
|
||||
assert records[0].details["tables"] == ["invoices", "invoice_items"]
|
||||
|
||||
|
||||
def test_docs_integration_extractor_soft_fails_on_markdown_like_yaml(caplog) -> None:
|
||||
extractor = DocsIntegrationExtractor()
|
||||
sections = [
|
||||
SectionChunk(
|
||||
section_path="Details > Интеграции > Runtime health provider",
|
||||
section_title="Runtime health provider",
|
||||
content=(
|
||||
"- target: runtime.health_provider\n"
|
||||
"- target_type: service\n"
|
||||
"- direction: outbound\n"
|
||||
"- interaction: depends_on\n"
|
||||
"- via: async callback `health_provider()`\n"
|
||||
"- purpose: получить агрегированный health runtime\n"
|
||||
"- details:\n"
|
||||
" - timeout_ms: 5000\n"
|
||||
" - response_type: `HealthPayload`\n"
|
||||
),
|
||||
order=0,
|
||||
)
|
||||
]
|
||||
|
||||
with caplog.at_level(logging.WARNING):
|
||||
records = extractor.extract(sections, path="docs/api/health-endpoint.md")
|
||||
|
||||
assert len(records) == 1
|
||||
assert records[0].target == "runtime.health_provider"
|
||||
assert records[0].via == "async callback `health_provider()`"
|
||||
assert records[0].details == {}
|
||||
assert "docs integration parse warning" in caplog.text
|
||||
assert "docs/api/health-endpoint.md" in caplog.text
|
||||
|
||||
|
||||
def test_docs_pipeline_keeps_other_layers_when_integration_block_is_invalid(caplog) -> None:
|
||||
pipeline = DocsIndexingPipeline()
|
||||
content = """---
|
||||
id: api.runtime.health
|
||||
type: api_method
|
||||
doc_type: api_method
|
||||
name: runtime_health
|
||||
title: Runtime Health API
|
||||
module: runtime
|
||||
domain: platform
|
||||
sub_domain: observability
|
||||
layer: application
|
||||
status: active
|
||||
related_docs: []
|
||||
links:
|
||||
uses_logic:
|
||||
- logic.runtime.health
|
||||
---
|
||||
# Runtime Health API
|
||||
|
||||
## Summary
|
||||
|
||||
Returns current runtime health.
|
||||
|
||||
## Details
|
||||
|
||||
### Описание
|
||||
|
||||
Возвращает агрегированное состояние runtime.
|
||||
|
||||
### Сценарий
|
||||
|
||||
**Название:**
|
||||
Read health
|
||||
|
||||
**Предусловия:**
|
||||
- runtime is running
|
||||
|
||||
**Триггер:**
|
||||
- client calls health endpoint
|
||||
|
||||
**Основной сценарий:**
|
||||
1. Read current state.
|
||||
2. Return payload.
|
||||
|
||||
### Входные параметры
|
||||
|
||||
| field | type | required |
|
||||
| --- | --- | --- |
|
||||
| verbose | boolean | no |
|
||||
|
||||
### Интеграции
|
||||
|
||||
#### Runtime health provider
|
||||
- target: runtime.health_provider
|
||||
- target_type: service
|
||||
- direction: outbound
|
||||
- interaction: depends_on
|
||||
- via: async callback `health_provider()`
|
||||
- purpose: получить агрегированный health runtime
|
||||
- details:
|
||||
- timeout_ms: 5000
|
||||
- response_type: `HealthPayload`
|
||||
"""
|
||||
|
||||
with caplog.at_level(logging.WARNING):
|
||||
docs = pipeline.index_file(
|
||||
repo_id="acme/proj",
|
||||
commit_sha="abc123",
|
||||
path="docs/api/health-endpoint.md",
|
||||
content=content,
|
||||
)
|
||||
|
||||
layers = {doc.layer for doc in docs}
|
||||
assert RagLayer.DOCS_DOCUMENT_CATALOG in layers
|
||||
assert RagLayer.DOCS_DOC_CHUNKS in layers
|
||||
assert RagLayer.DOCS_FACT_INDEX in layers
|
||||
assert RagLayer.DOCS_WORKFLOW_INDEX in layers
|
||||
assert RagLayer.DOCS_RELATION_GRAPH in layers
|
||||
assert RagLayer.DOCS_INTEGRATION_INDEX in layers
|
||||
assert "docs integration parse warning" in caplog.text
|
||||
assert all(doc.source.path == "docs/api/health-endpoint.md" for doc in docs)
|
||||
|
||||
@@ -45,6 +45,23 @@ def test_retrieve_builder_adds_prefer_bonus_sorting() -> None:
|
||||
assert params["prefer_like_0"] == "%/test\\_%.py"
|
||||
|
||||
|
||||
def test_retrieve_builder_adds_metadata_filters() -> None:
|
||||
builder = RetrievalStatementBuilder()
|
||||
|
||||
sql, params = builder.build_retrieve(
|
||||
"rag-1",
|
||||
[0.1, 0.2],
|
||||
query_text="notification flow",
|
||||
metadata_domain="notifications",
|
||||
metadata_subdomain="delivery_loop",
|
||||
)
|
||||
|
||||
assert "metadata_json->>'domain'" in sql
|
||||
assert "metadata_json->>'subdomain'" in sql
|
||||
assert params["metadata_domain"] == "notifications"
|
||||
assert params["metadata_subdomain"] == "delivery_loop"
|
||||
|
||||
|
||||
def test_lexical_builder_omits_test_filters_when_not_requested() -> None:
|
||||
builder = RetrievalStatementBuilder()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user