фиксирую состояние

2026-04-07 21:41:27 +03:00
parent 7387e5cc51
commit f62fb678b8
52 changed files with 4073 additions and 316 deletions
@@ -1,3 +1,9 @@
 .env
 .venv
 __pycache__
 # Pipeline harness: per-run artifacts (md/json from tests.pipeline_setup_v3/v4)
 tests/**/test_runs/**/*.md
 tests/**/test_runs/**/*.json
 tests/**/test_results/**/*.md
 tests/**/test_results/**/*.json
@@ -0,0 +1,4 @@
 #  Запросы
 1. Какие методы апи есть в проекте
 2. Какие методы апи есть для healthcheck
 3. Где документация на healthcheck
@@ -2,31 +2,44 @@
 ## 1. Архитектура
-Текущий `V2IntentRouter` состоит из следующих компонентов:
+Текущий `V2IntentRouter` реализован как **LLM-first router**.
 Deterministic-слой не выбирает маршрут по умолчанию и используется только для:
 - preprocessing
 - validation ответа LLM
 - fallback, если LLM не ответил или вернул невалидный маршрут
 Актуальные компоненты:
 - `router.py`
-  Главная точка входа и оркестратор.
+  Главная точка входа и оркестратор пайплайна.
 - `modules/normalizer.py`
  Нормализация текста запроса в `normalized_query`.
 - `modules/target_terms.py`
-  Извлечение `target_terms`, `endpoint_paths`, `matched_aliases`, `alias_docs`.
+  Извлечение retrieval-oriented `target_terms`, `endpoint_paths`, `matched_aliases`, `alias_docs`.
 - `modules/anchors.py`
-  Извлечение `anchors` и вспомогательных marker-сигналов.
+  Извлечение `anchors` и marker-сигналов для fallback и downstream retrieval.
- `routers/docs_subintent_resolver.py`
+- `routers/route_catalog.py`
-  Определение `subintent`.
+  Каталог допустимых маршрутов (`allowed_routes`).
 - `routers/deterministic.py`
  Детерминированное определение `routing_domain`, `intent`, `subintent`, `confidence`, `routing_mode`, `llm_router_used`, `reason_short`.
 - `routers/llm.py`
-  LLM-based определение `routing_domain`, `intent`, `subintent`, `confidence`, `reason_short`.
+  Основной LLM-router. Получает нормализованный запрос, `target_terms`, `anchors` и список допустимых маршрутов.
 - `routers/validator.py`
  Deterministic validator для enum-значений, комбинации маршрута и базовой нормализации `confidence`.
 - `routers/confidence.py`
  Пост-обработка confidence после ответа LLM.
 - `routers/fallback.py`
  Fallback-маршрутизация, если LLM не ответил или ответ не прошёл validator.
 - `routers/prompts.yml`
-  Prompt для LLM-router.
+  Prompt-контракт для LLM-router.
 ## 2. Контракт
@@ -53,7 +66,6 @@
 `V2RouteAnchors`:
 - `entity_names: list[str]`
 - `terms: list[str]`
 - `file_names: list[str]`
 - `endpoint_paths: list[str]`
 - `target_doc_hints: list[str]`
@@ -78,35 +90,61 @@
 - `SUMMARY`
 - `FIND_FILES`
-### Поддерживаемые маршруты
+### Допустимые маршруты
 - `GENERAL / GENERAL_QA / SUMMARY`
 - `DOCS / DOC_EXPLAIN / SUMMARY`
 - `DOCS / DOC_EXPLAIN / FIND_FILES`
-## 4. Флоу обработки запроса
+Эти маршруты централизованно заданы в `routers/route_catalog.py`.
 ## 4. Актуальный флоу
 Пайплайн обработки запроса:
 1. `router.py` принимает `user_query`.
 2. `modules/normalizer.py` строит `normalized_query`.
-3. `modules/target_terms.py` извлекает ключевые термы и alias-based сигналы.
+3. `modules/target_terms.py` извлекает:
-4. `modules/anchors.py` строит `anchors` и marker-сигналы.
+   - `target_terms`
   - `endpoint_paths`
   - `matched_aliases`
   - `alias_docs`
 4. `modules/anchors.py` строит:
   - `anchors`
   - `file_markers`
   - `architecture_markers`
   - `logic_markers`
   - `domain_markers`
   - `endpoint_markers`
 5. `router.py` собирает `QueryFeatures`.
-6. `routers/deterministic.py` пытается определить маршрут детерминированно.
+6. `routers/llm.py` вызывается как **основной селектор маршрута**.
-7. Если deterministic route найден, он сразу возвращается.
+7. `routers/validator.py` проверяет:
-8. Если deterministic route не найден, `router.py` вызывает `routers/llm.py`.
+   - что значения входят в допустимые enum
-9. Если LLM вернул валидный маршрут, собирается `V2RouteResult` с `routing_mode="llm_assisted"`.
+   - что комбинация маршрута разрешена
-10. Если LLM недоступен или не вернул валидный маршрут, используется fallback:
+   - что `confidence` можно привести к `float`
-    `GENERAL / GENERAL_QA / SUMMARY` с `routing_mode="llm_fallback"`.
+8. `routers/confidence.py` корректирует confidence на основе силы сигналов.
 9. Если ответ LLM валиден, возвращается `V2RouteResult` с `routing_mode="llm_default"`.
 10. Если LLM не ответил, вернул сломанный JSON или невалидный маршрут, `routers/fallback.py` строит fallback route:
    - `FIND_FILES`, если есть `file_markers`
    - `DOCS / DOC_EXPLAIN / SUMMARY`, если есть docs-oriented anchors
    - иначе `GENERAL / GENERAL_QA / SUMMARY`
 ## 5. Компоненты по флоу
 ### `router.py`
 - Задача
-  Собрать весь процесс роутинга в одной входной точке.
+  Оркестрировать полный routing pipeline.
 - Как решает
-  Последовательно вызывает normalizer, target terms extractor, anchors extractor, deterministic router и при необходимости LLM router.
+  Последовательно вызывает:
  - normalizer
  - target terms extractor
  - anchor extractor
  - LLM router
  - validator
  - confidence adjuster
  - fallback router
 - Вход
  `user_query: str`
@@ -117,7 +155,7 @@
 ### `modules/normalizer.py`
 - Задача
-  Привести запрос к стабильной форме для дальнейшего анализа.
+  Привести запрос к стабильной форме для анализа.
 - Как решает
  Схлопывает лишние пробелы через `" ".join(...split())`.
@@ -131,14 +169,29 @@
 ### `modules/target_terms.py`
 - Задача
-  Выделить ключевые термы и retrieval-сигналы из запроса.
+  Построить **чистое retrieval-поле** `target_terms`.
 - Как решает
-  Использует:
+  Использует позитивную модель отбора и включает в `target_terms` только:
-  - regex для path/entity-like фрагментов
+  - endpoint paths
-  - список stop-words
+  - identifier-like tokens
-  - alias rules с фразами и каноническими термами
+  - alias canonical terms
-  - эвристику для `/health`
+  - domain terms
  Исключаются:
  - question words
  - intent words
  - filler/noisy words
  - marker words
  - короткие токены `< 3`, если это не endpoint или alias
  - битые path-like токены
  Дополнительно:
  - lowercase
  - trim punctuation по краям
  - dedupe
  - ограничение до `7` элементов
  - приоритет: endpoints → identifiers → aliases → domain terms
 - Вход
  `normalized_query: str`
@@ -153,117 +206,141 @@
 ### `modules/anchors.py`
 - Задача
-  Построить полный набор `anchors` и doc-oriented marker-сигналов.
+  Построить `anchors` и marker-сигналы, не смешивая их с `target_terms`.
 - Как решает
-  Использует:
+  Извлекает:
-  - regex для `entity_names` и `file_names`
+  - `entity_names` из PascalCase-like токенов
-  - словари marker-фраз:
+  - `file_names` только по жёстким правилам:
-    - file markers
+    - `*.md`, `*.yaml`, `*.yml`, `*.json`
-    - architecture markers
+    - `docs/...`, `doc/...`, `documentation/...`
-    - logic markers
+  - `endpoint_paths` из `TargetTermsAnalysis`
-    - domain markers
+  - `target_doc_hints` из alias docs, endpoint map и marker-сигналов
    - endpoint markers
  - map `endpoint -> target_doc_hint`
  - alias docs из `TargetTermsAnalysis`
- Вход
+  Marker-сигналы живут отдельно:
  - `normalized_query: str`
  - `TargetTermsAnalysis`
 - Выход
  `AnchorAnalysis`:
  - `anchors`
  - `file_markers`
  - `architecture_markers`
  - `logic_markers`
  - `domain_markers`
  - `endpoint_markers`
 ### `routers/docs_subintent_resolver.py`
 - Задача
  Определить `subintent`.
 - Как решает
  Эвристика:
  - если есть `file_markers` -> `FIND_FILES`
  - если есть doc-signals (`endpoint_paths`, `endpoint_markers`, `architecture_markers`, `logic_markers`, `domain_markers`, `target_doc_hints`) -> `SUMMARY`
  - иначе `None`
 - Вход
-  `QueryFeatures`
+  - `normalized_query: str`
  - `TargetTermsAnalysis`
 - Выход
-  `subintent: str | None`
+  `AnchorAnalysis`
-### `routers/deterministic.py`
+### `routers/route_catalog.py`
 - Задача
-  Детерминированно определить маршрут без LLM там, где это возможно.
+  Держать один источник истины для допустимых маршрутов.
 - Как решает
-  Использует:
+  Возвращает:
-  - `DocsSubintentResolver`
+  - список `allowed_routes` для payload LLM
-  - проверку conflicting doc anchors
+  - проверку допустимости комбинации `routing_domain + intent + subintent`
  - список general markers
  Правила:
  - `FIND_FILES` -> `DOCS / DOC_EXPLAIN / FIND_FILES`
  - `subintent != None` и нет конфликта doc-signals -> `DOCS / DOC_EXPLAIN / SUMMARY`
  - general marker -> `GENERAL / GENERAL_QA / SUMMARY`
 - Вход
  - `user_query: str`
  - `QueryFeatures`
  - `anchors: V2RouteAnchors`
 - Выход
  `V2RouteResult | None`
 ### `routers/llm.py`
 - Задача
-  Определить маршрут через LLM, если deterministic routing не дал результата.
+  Выбрать маршрут через LLM как основной селектор.
 - Как решает
  Формирует JSON payload из:
  - `user_query`
  - `normalized_query`
  - `target_terms`
  - `anchors`
-  - списка допустимых маршрутов
+  - `allowed_routes`
  Затем:
  - вызывает LLM
  - парсит JSON
-  - валидирует маршрут по whitelist
+  - возвращает сырой candidate route без deterministic business-routing
  - нормализует `confidence`
 - Вход
  - `user_query: str`
  - `normalized_query: str`
  - `target_terms: list[str]`
  - `anchors: dict`
 - Выход
-  `dict | None`:
+  `dict | None`
 ### `routers/validator.py`
 - Задача
  Deterministic validation ответа LLM.
 - Как решает
  Проверяет:
  - что `routing_domain`, `intent`, `subintent` заполнены
  - что комбинация маршрута входит в `route_catalog`
  - что `confidence` можно привести к числу
 - Вход
  `dict | None`
 - Выход
  Валидированный `dict | None`
 ### `routers/confidence.py`
 - Задача
  Сделать confidence осмысленным после ответа LLM.
 - Как решает
  Корректирует confidence:
  - `-0.1`, если нет strong anchors
  - `-0.1`, если запрос короткий или vague
  - `+0.05`, если есть явный signal (`file_markers`, `endpoint_paths`, `endpoint_markers`)
  - затем clamp в диапазон `0.0..1.0`
 - Вход
  - `confidence: float`
  - `QueryFeatures`
 - Выход
  `confidence: float`
 ### `routers/fallback.py`
 - Задача
  Построить deterministic fallback, если LLM невалиден.
 - Как решает
  Правила:
  - есть `file_markers` → `DOCS / DOC_EXPLAIN / FIND_FILES`
  - есть docs-signals (`endpoint_paths`, `target_doc_hints`, `matched_aliases`, marker groups) → `DOCS / DOC_EXPLAIN / SUMMARY`
  - иначе → `GENERAL / GENERAL_QA / SUMMARY`
 - Вход
  - `user_query: str`
  - `QueryFeatures`
  - `anchors: V2RouteAnchors`
  - `llm_attempted: bool`
 - Выход
  `V2RouteResult`
 ### `routers/prompts.yml`
 - Задача
  Задать LLM-router контракт ответа и guidance по confidence.
 - Как решает
  Ограничивает модель только `allowed_routes` и требует JSON с полями:
  - `routing_domain`
  - `intent`
  - `subintent`
  - `confidence`
  - `reason_short`
-### `routers/prompts.yml`
+## 6. Ключевые инварианты
- Задача
+- LLM является default router.
-  Задать LLM-router формальный контракт ответа.
+- Deterministic-слой не принимает основной routing decision.
-
+- `target_terms` содержат только retrieval-useful terms.
- Как решает
+- `anchors` не содержат `terms`.
-  Описывает допустимые маршруты и требует вернуть только JSON.
+- `/health` и другие endpoint paths не должны попадать в `file_names`, если это не файл с расширением.
-
+- `file_names` содержат только реальные file/doc paths.
- Вход
+- Fallback используется только если LLM недоступен или вернул невалидный маршрут.
  Payload от `routers/llm.py`
 - Выход
  Структурированный JSON-ответ LLM
@@ -0,0 +1,316 @@
 # V2RetrievalPolicyResolver Architecture
 ## 1. Роль компонента
 `V2RetrievalPolicyResolver` это deterministic bridge между `V2IntentRouter` и docs-RAG retrieval.
 Компонент работает поверх уже готового `V2RouteResult` и не делает повторную интерпретацию пользовательского текста:
 - не вызывает LLM;
 - не меняет `intent` и `subintent`;
 - не ранжирует документы;
 - не собирает evidence.
 Его задача: собрать один `RetrievalPlan` с полями:
 - `profile`
 - `layers`
 - `limit`
 - `filters`
 ## 2. Зависимости
 Актуальная реализация опирается на:
 - `src/app/core/agent/processes/v2/retrieval/policy_resolver.py`
 - `src/app/core/agent/processes/v2/anchor_signals.py`
 - `src/app/core/agent/processes/v2/models.py`
 - `src/app/core/rag/contracts/enums.py`
 - `src/app/core/agent/processes/v2/retrieval/v2_rag_adapter.py`
 - `src/app/core/rag/retrieval/session_retriever.py`
 - `src/app/core/rag/persistence/repository.py`
 - `src/app/core/rag/persistence/query_repository.py`
 - `src/app/core/rag/persistence/retrieval_statement_builder.py`
 ## 3. Входной контракт
 Resolver использует:
 - `route.intent`
 - `route.subintent`
 - `route.anchors.entity_names`
 - `route.anchors.file_names`
 - `route.anchors.endpoint_paths`
 - `route.anchors.target_doc_hints`
 - `route.anchors.matched_aliases`
 - `route.anchors.process_domain`
 - `route.anchors.process_subdomain`
 `route.target_terms` в текущей реализации profile/filter branching не влияет.
 ## 4. Верхнеуровневый branching
 `resolve(route)` имеет три ветки:
 1. `GENERAL_QA` -> `general_qa_grounded_summary`
 2. `FIND_FILES` -> `file_lookup`
 3. иначе -> docs summary branch
 Инварианты:
 - `GENERAL_QA` всегда остаётся general profile;
 - `FIND_FILES` всегда остаётся `file_lookup`;
 - resolver всегда возвращает один валидный `RetrievalPlan`.
 ## 5. Внутренняя декомпозиция
 Текущая реализация разбита на два helper-класса.
 ### `_AnchorTermCollector`
 Собирает термы для `prefer_like_patterns`.
 Источники:
 - basename из `target_doc_hints`
 - `endpoint_paths`
 - `file_names`
 - `entity_names`
 - `matched_aliases`
 - `process_domain`
 - `process_subdomain`
 Все значения нормализуются в lower-case и превращаются в SQL-like patterns вида `"%term%"`.
 Для `FIND_FILES` действует отдельное правило:
 - если есть `target_doc_hints`, `prefer_like_patterns` строится только по basename hints;
 - иначе используется общий набор collected terms.
 ### `_RouteFilterBuilder`
 Собирает `filters` для трёх веток:
 - `general_filters(route)`
 - `summary_filters(route)`
 - `find_files_filters(route)`
 Дополнительно содержит path selection:
 - `_summary_prefixes(route)`
 - `_find_files_prefixes(route)`
 - `_find_files_prefer_prefixes(route)`
 ## 6. Signal detection
 Summary profile и часть path preferences зависят от `anchor_signal_types(route)`.
 Сигналы вычисляются так:
 - `FIND_FILES`
  - если `route.subintent == FIND_FILES`
 - `API_ENDPOINT`
  - если есть `endpoint_paths`
  - или в `target_doc_hints` / `file_names` / `matched_aliases` встречаются маркеры `"/api/"`, `"api"`, `"endpoint"`
 - `ARCHITECTURE`
  - если в `target_doc_hints` / `file_names` / `matched_aliases` встречаются `"/architecture/"`, `"architecture"`, `"arch"`
 - `LOGIC_FLOW`
  - если в `target_doc_hints` / `file_names` / `matched_aliases` встречаются `"/logic/"`, `"logic"`, `"workflow"`, `"flow"`, `"process"`
 - `DOMAIN_ENTITY`
  - если есть `entity_names`
  - или в `target_doc_hints` / `file_names` / `matched_aliases` встречаются `"/domains/"`, `"domain"`, `"entity"`, `"component"`
 Важно:
 - `process_domain` и `process_subdomain` сейчас **не участвуют** в signal detection;
 - они влияют только на filters и `prefer_like_patterns`.
 ## 7. Summary profile selection
 Метод `_summary_profile(route)` использует:
 - `meaningful = anchor_signal_types(route) - {FIND_FILES}`
 Правило:
 - если meaningful signal не ровно один -> `docs_summary_generic`
 - если ровно один:
  - `API_ENDPOINT` -> `docs_summary_api_endpoint`
  - `ARCHITECTURE` -> `docs_summary_architecture`
  - `LOGIC_FLOW` -> `docs_summary_logic_flow`
  - `DOMAIN_ENTITY` -> `docs_summary_domain_entity`
 Следствие:
 - конфликт API + architecture -> generic;
 - API + entity -> generic;
 - weak/no signals -> generic.
 ## 8. Profiles, layers, limits
 ### `general_qa_grounded_summary`
 - condition: `route.intent == GENERAL_QA`
 - layers: `[D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
 - limit: `8`
 ### `file_lookup`
 - condition: `route.subintent == FIND_FILES`
 - layers: `[D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]`
 - limit: `12`
 ### `docs_summary_api_endpoint`
 - layers: `[D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]`
 - limit: `8`
 ### `docs_summary_logic_flow`
 - layers: `[D4_WORKFLOW_INDEX, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
 - limit: `8`
 ### `docs_summary_domain_entity`
 - layers: `[D3_ENTITY_CATALOG, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
 - limit: `8`
 ### `docs_summary_architecture`
 - layers: `[D1_DOCUMENT_CATALOG, D5_RELATION_GRAPH, D0_DOC_CHUNKS]`
 - limit: `8`
 ### `docs_summary_generic`
 - layers: `[D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]`
 - limit: `8`
 ## 9. Filters by branch
 ### General branch
 `general_filters(route)` возвращает:
 - `prefer_path_prefixes = ["docs/architecture/", "docs/"]`
 - `prefer_like_patterns = ["%readme.md%", "%overview%"]`
 - `target_doc_hints = list(route.anchors.target_doc_hints)`
 Это обзорный, но не узкий plan: hard `path_prefixes` здесь нет.
 ### Summary branch
 `summary_filters(route)` всегда включает:
 - `target_doc_hints`
 - `metadata.domain`, если есть `process_domain`
 - `metadata.subdomain`, если есть `process_subdomain`
 - `prefer_path_prefixes`
 - `prefer_like_patterns`
 Дополнительно:
 - если есть `API_ENDPOINT` signal, добавляется hard `path_prefixes = ["docs/api/", "docs/"]`
 `prefer_path_prefixes` для summary:
 - API -> `["docs/api/", "docs/"]`
 - ARCHITECTURE -> `["docs/architecture/", "docs/"]`
 - LOGIC_FLOW -> `["docs/logic/", "docs/architecture/", "docs/"]`
 - DOMAIN_ENTITY -> `["docs/domains/", "docs/", "docs/api/"]`
 - empty signals -> `["docs/"]`
 Если сигналов несколько, prefixes объединяются и dedupe-ятся с сохранением порядка.
 ### FIND_FILES branch
 `find_files_filters(route)` всегда включает:
 - `target_doc_hints`
 - `metadata.domain`, если есть `process_domain`
 - `metadata.subdomain`, если есть `process_subdomain`
 - `path_prefixes`
 - `prefer_path_prefixes`
 - `prefer_like_patterns`
 `path_prefixes` для `FIND_FILES` выбираются по приоритету:
 1. директории из `target_doc_hints`
 2. директории из `file_names`, если путь начинается с `docs/`
 3. signal-based fallback:
   - API -> `["docs/api/", "docs/"]`
   - ARCHITECTURE -> `["docs/architecture/", "docs/"]`
   - LOGIC_FLOW -> `["docs/logic/", "docs/"]`
   - DOMAIN_ENTITY -> `["docs/domains/", "docs/"]`
 4. default -> `["docs/"]`
 `prefer_path_prefixes` для `FIND_FILES`:
 - начинается с `path_prefixes`
 - если есть `process_domain` или `process_subdomain`, дополнительно добавляет:
  - `"docs/domains/"`
  - `"docs/logic/"`
 ## 10. Hard и soft сигналы в текущей реализации
 В терминах текущего кода:
 Hard-ish / narrowing filters:
 - `path_prefixes`
 - `metadata.domain`
 - `metadata.subdomain`
 Soft preferences:
 - `prefer_path_prefixes`
 - `prefer_like_patterns`
 Отдельно:
 - `target_doc_hints` всегда сохраняются в `RetrievalPlan.filters`, но **не маппятся напрямую** в `RagRepository.retrieve(...)` как SQL hard filter.
 То есть сейчас `target_doc_hints` это не прямой DB filter, а downstream anchor для других шагов пайплайна и для deterministic exact-doc seeding logic.
 ## 11. Интеграция с retrieval stack
 Следующий слой после resolver теперь исполняет plan не напрямую в `V2Process`, а через `V2RagRetrievalAdapter`.
 `V2RagRetrievalAdapter.fetch_rows(...)` использует `RetrievalPlan` так:
 - читает `filters["target_doc_hints"]` из самого плана;
 - делает exact-path seed через `retrieve_exact_files(...)`;
 - для missing hints делает substring fallback через `retrieve_chunks_by_path_substrings(...)`;
 - затем делает обычный semantic retrieve через `RagSessionRetriever.retrieve(...)`;
 - объединяет exact / substring / semantic rows через dedupe merge.
 Это важный сдвиг: execution strategy теперь зависит от **контракта `RetrievalPlan`**, а не от скрытой route-specific логики внутри `V2Process`.
 `RagSessionRetriever._map_filters()` прокидывает в `RagRepository.retrieve(...)`:
 - `path_prefixes`
 - `exclude_path_prefixes`
 - `exclude_like_patterns`
 - `prefer_path_prefixes`
 - `prefer_like_patterns`
 - `prefer_non_tests`
 - `metadata_domain` из `filters["metadata.domain"]`
 - `metadata_subdomain` из `filters["metadata.subdomain"]`
 `RetrievalStatementBuilder.build_retrieve(...)` добавляет SQL predicates:
 - `lower(metadata_json->>'domain') = :metadata_domain`
 - `lower(metadata_json->>'subdomain') = :metadata_subdomain`
 Таким образом:
 - `process_domain/process_subdomain` реально участвуют в retrieval query;
 - `target_doc_hints` реально участвуют в retrieval execution strategy на уровне adapter;
 - `V2RetrievalPolicyResolver` определяет plan contract, а следующий шаг исполняет этот contract более буквально.
 ## 12. Актуальные ограничения
 - Логика полностью deterministic.
 - `target_terms` сейчас не участвуют в branching resolver.
 - `process_domain/process_subdomain` не влияют на summary profile selection.
 - API signal добавляет `path_prefixes` даже в generic summary, если среди конфликтующих сигналов присутствует API.
 - `target_doc_hints` не являются прямым SQL filter внутри обычного `retrieve`, но используются adapter-уровнем для exact-path / substring seeding до semantic retrieval.
@@ -4,17 +4,17 @@ from app.core.agent.processes.v2.models import V2AnchorType, V2RouteAnchors, V2R
 def anchor_signal_types(route: V2RouteResult) -> set[str]:
-    hints = [str(item).strip().lower() for item in route.anchors.target_doc_hints if str(item or "").strip()]
+    texts = _signal_texts(route)
    signals: set[str] = set()
    if route.subintent == V2Subintent.FIND_FILES:
        signals.add(V2AnchorType.FIND_FILES)
-    if route.anchors.endpoint_paths or _has_hint(hints, "/api/"):
+    if route.anchors.endpoint_paths or _has_any(texts, ("/api/", "api", "endpoint")):
        signals.add(V2AnchorType.API_ENDPOINT)
-    if _has_hint(hints, "/architecture/"):
+    if _has_any(texts, ("/architecture/", "architecture", "arch")):
        signals.add(V2AnchorType.ARCHITECTURE)
-    if _has_hint(hints, "/logic/"):
+    if _has_any(texts, ("/logic/", "logic", "workflow", "flow", "process")):
        signals.add(V2AnchorType.LOGIC_FLOW)
-    if _has_hint(hints, "/domains/"):
+    if route.anchors.entity_names or _has_any(texts, ("/domains/", "domain", "entity", "component")):
        signals.add(V2AnchorType.DOMAIN_ENTITY)
    return signals
@@ -44,5 +44,14 @@ def anchors_have_signal(anchors: V2RouteAnchors, signal: str, *, subintent: str
    return signal in anchor_signal_types(route)
-def _has_hint(hints: list[str], marker: str) -> bool:
+def _signal_texts(route: V2RouteResult) -> list[str]:
-    return any(marker in hint for hint in hints)
+    items = [
        *route.anchors.target_doc_hints,
        *route.anchors.file_names,
        *route.anchors.matched_aliases,
    ]
    return [str(item).strip().lower() for item in items if str(item or "").strip()]
 def _has_any(items: list[str], markers: tuple[str, ...]) -> bool:
    return any(marker in item for item in items for marker in markers)
@@ -11,6 +11,8 @@ from app.core.rag.contracts.enums import RagLayer
 class DocsEvidenceAssembler:
    _API_PATH_PREFIXES = ("docs/api/", "docs/endpoints/", "docs/methods/", "api/", "endpoints/", "methods/")
    _GENERIC_DOC_MARKERS = ("readme", "overview", "index", "navigation", "related docs", "catalog")
    def assemble_summaries(self, rows: list[dict], route: V2RouteResult) -> list[RetrievedSummary]:
        items = self._rank_rows(rows, route, mode="summary")
        ranked = [
@@ -71,10 +73,12 @@ class DocsEvidenceAssembler:
                    "score": score,
                    "score_breakdown": breakdown,
                    "match_reason": self._match_reason(breakdown),
                    "is_generic_doc": self._is_generic_doc(path, self._title(row, path), self._summary(row), row),
                }
            )
        ranked.sort(key=lambda item: (-item["score"], item["path"]))
-        return self._ensure_target_docs_in_top_k(ranked, route, k=4 if mode == "find_files" else 3)
+        ranked = self._ensure_target_docs_in_top_k(ranked, route, k=4 if mode == "find_files" else 3)
        return self._promote_specific_primary(ranked, route)
    def _score_breakdown(self, row: dict, route: V2RouteResult, *, mode: str) -> dict[str, int]:
        path_raw = self._path(row)
@@ -93,6 +97,7 @@ class DocsEvidenceAssembler:
            "alias_match": 0,
            "anchor_boost": 0,
            "target_doc_boost": 0,
            "specificity_boost": 0,
            "generic_penalty": 0,
        }
        if route.intent == "GENERAL_QA":
@@ -100,6 +105,7 @@ class DocsEvidenceAssembler:
        hint_norm_lower = {normalize_doc_path(h).lower() for h in route.anchors.target_doc_hints if str(h or "").strip()}
        if normalize_doc_path(path_raw).lower() in hint_norm_lower:
            breakdown["target_doc_boost"] += 1000
        hint_texts = [str(hint or "").strip().lower() for hint in route.anchors.target_doc_hints if str(hint or "").strip()]
        if any(alias.lower() in " ".join([path, title, summary, entity]) for alias in route.anchors.matched_aliases):
            breakdown["alias_match"] += 500
        for token in query_tokens:
@@ -111,10 +117,25 @@ class DocsEvidenceAssembler:
                breakdown["semantic"] += 20
            if self._compact(token) in compact_haystack:
                breakdown["alias_match"] += 250
        for hint in hint_texts:
            compact_hint = self._compact(hint)
            if compact_hint and compact_hint in compact_haystack:
                breakdown["target_doc_boost"] += 180
            elif hint and hint.strip("/") in " ".join([path, title, summary, entity]):
                breakdown["semantic"] += 70
        endpoint_text = self._summary(row).lower()
        for endpoint in route.anchors.endpoint_paths:
            normalized_endpoint = endpoint.strip().lower()
            endpoint_slug = normalized_endpoint.strip("/")
            if normalized_endpoint and normalized_endpoint in endpoint_text:
                breakdown["target_doc_boost"] += 260
            if endpoint_slug and endpoint_slug in filename:
                breakdown["filename_match"] += 200
        if any(endpoint.strip("/").lower() in filename for endpoint in route.anchors.endpoint_paths):
            breakdown["filename_match"] += 200
        signals = anchor_signal_types(route)
        breakdown["anchor_boost"] += self._anchor_boost(path, signals)
        breakdown["specificity_boost"] += self._specificity_boost(row, path, title, summary, route)
        breakdown["generic_penalty"] += self._generic_penalty(path, signals)
        if mode == "find_files":
            breakdown["path_match"] *= 3
@@ -125,8 +146,8 @@ class DocsEvidenceAssembler:
    def _anchor_boost(self, path: str, signals: set[str]) -> int:
        boost = 0
-        if V2AnchorType.API_ENDPOINT in signals and path.startswith("docs/api/"):
+        if V2AnchorType.API_ENDPOINT in signals and path.startswith(self._API_PATH_PREFIXES):
-            boost += 300
+            boost += 360
        if V2AnchorType.LOGIC_FLOW in signals and path.startswith("docs/logic/"):
            boost += 300
        if V2AnchorType.DOMAIN_ENTITY in signals and path.startswith("docs/domains/"):
@@ -139,8 +160,11 @@ class DocsEvidenceAssembler:
    def _generic_penalty(self, path: str, signals: set[str]) -> int:
        penalty = 0
        lowered = path.lower()
        if path == "docs/README.md" and V2AnchorType.ARCHITECTURE not in signals:
-            penalty -= 200
+            penalty -= 260
        if any(marker in lowered for marker in ("/readme", "readme.md", "/index", "/overview", "/catalog", "/navigation")):
            penalty -= 220
        if "/architecture/" in path and V2AnchorType.ARCHITECTURE not in signals and signals.intersection(
            {V2AnchorType.API_ENDPOINT, V2AnchorType.DOMAIN_ENTITY}
        ):
@@ -173,6 +197,17 @@ class DocsEvidenceAssembler:
        top.sort(key=lambda item: (-item["score"], item["path"]))
        return top + remaining
    def _promote_specific_primary(self, ranked: list[dict], route: V2RouteResult) -> list[dict]:
        if len(ranked) < 2:
            return ranked
        first = ranked[0]
        if not first.get("is_generic_doc"):
            return ranked
        promoted = next((item for item in ranked[1:] if not item.get("is_generic_doc") and self._is_specific_candidate(item, route)), None)
        if promoted is None:
            return ranked
        return [promoted] + [item for item in ranked if item["path"] != promoted["path"]]
    def _match_reason(self, breakdown: dict[str, int]) -> str:
        if breakdown["target_doc_boost"] > 0:
            return "exact_path"
@@ -189,6 +224,53 @@ class DocsEvidenceAssembler:
        section = str(metadata.get("section_path") or "").lower()
        return "summary" in section or "свод" in section or "overview" in section
    def _specificity_boost(self, row: dict, path: str, title: str, summary: str, route: V2RouteResult) -> int:
        boost = 0
        filename = path.split("/")[-1]
        lowered_title = title.lower()
        lowered_summary = summary.lower()
        if not self._is_generic_doc(path, title, summary, row):
            boost += 90
        if path.startswith(self._API_PATH_PREFIXES):
            boost += 160
        if "endpoint" in filename or "endpoint" in lowered_title or "method" in lowered_title:
            boost += 120
        if row.get("layer") == RagLayer.DOCS_DOC_CHUNKS and not self._looks_like_navigation_chunk(row):
            boost += 80
        for token in self._query_tokens(route):
            if token and token in filename:
                boost += 90
            if token and token in lowered_title:
                boost += 70
            if token and token in lowered_summary:
                boost += 40
        return boost
    def _is_specific_candidate(self, item: dict, route: V2RouteResult) -> bool:
        breakdown = dict(item.get("score_breakdown") or {})
        if breakdown.get("target_doc_boost", 0) > 0:
            return True
        if breakdown.get("specificity_boost", 0) >= 160:
            return True
        return V2AnchorType.API_ENDPOINT in anchor_signal_types(route) and item["path"].startswith(self._API_PATH_PREFIXES)
    def _is_generic_doc(self, path: str, title: str, summary: str, row: dict) -> bool:
        haystack = " ".join([path.lower(), title.lower(), summary.lower()])
        if any(marker in haystack for marker in self._GENERIC_DOC_MARKERS):
            return True
        return self._looks_like_navigation_chunk(row)
    def _looks_like_navigation_chunk(self, row: dict) -> bool:
        text = self._summary(row).lower()
        if not text:
            return False
        lines = [line.strip() for line in text.splitlines() if line.strip()]
        bullet_lines = sum(1 for line in lines if line.startswith(("- ", "* ", "1.", "2.", "3.")))
        link_lines = sum(1 for line in lines if "](" in line or line.startswith("docs/"))
        if "related docs" in text or "navigation" in text:
            return True
        return bullet_lines >= 3 or link_lines >= 3
    def _query_tokens(self, route: V2RouteResult) -> list[str]:
        values = list(route.target_terms) + list(route.anchors.matched_aliases)
        tokens: list[str] = []
@@ -8,6 +8,7 @@ class QueryFeatures:
    normalized_query: str
    target_terms: list[str]
    endpoint_paths: list[str]
    file_names: list[str]
    matched_aliases: list[str]
    target_doc_hints: list[str]
    file_markers: list[str]
@@ -34,10 +34,42 @@ class _MarkerScanner:
        "где описано",
        "документ с описанием",
    )
-    _ARCHITECTURE_MARKERS = ("архитектура", "как устроено приложение", "как устроен сервис", "основные части системы", "из чего состоит")
+    _ARCHITECTURE_MARKERS = (
-    _LOGIC_MARKERS = ("цикл", "loop", "worker", "как работает отправка уведомлений", "логика отправки", "background job", "runtime loop")
+        "архитектура",
        "архитектур",
        "architecture",
        "arch overview",
        "как устроено приложение",
        "как устроен сервис",
        "основные части системы",
        "из чего состоит",
    )
    _LOGIC_MARKERS = (
        "цикл",
        "loop",
        "flow",
        "workflow",
        "process",
        "worker",
        "как работает отправка уведомлений",
        "логика отправки",
        "background job",
        "runtime loop",
    )
    _DOMAIN_MARKERS = ("runtime health", "health model", "статусы здоровья", "сущность", "entity", "здоровье runtime")
-    _ENDPOINT_MARKERS = ("endpoint", "метод api", "ручка", "эндпоинт")
+    _ENDPOINT_MARKERS = (
        "endpoint",
        "api",
        "route",
        "method",
        "метод api",
        "метод",
        "метода",
        "ручка",
        "эндпоинт",
        "маршрут",
        "роут",
    )
    def scan(self, lowered_query: str) -> dict[str, list[str]]:
        return {
@@ -54,12 +86,13 @@ class _MarkerScanner:
 class _EntityNameExtractor:
    _ENTITY_RE = re.compile(r"\b[A-Z][A-Za-z0-9_]+\b")
    _IGNORE = {"arch"}
    def extract(self, query: str) -> list[str]:
        items: list[str] = []
        for match in self._ENTITY_RE.finditer(query):
            candidate = match.group(0).strip()
-            if candidate and candidate not in items:
+            if candidate and candidate.lower() not in self._IGNORE and candidate not in items:
                items.append(candidate)
        return items
@@ -92,33 +125,61 @@ class _FileNameExtractor:
            items.append(value)
 class _ProcessAnchorExtractor:
    _DOMAIN_KEYWORDS = {
        "billing": "billing",
        "notifications": "notifications",
    }
    _SUBDOMAIN_KEYWORDS = {
        "invoice": ("billing", "invoice"),
        "invoices": ("billing", "invoice"),
        "delivery_loop": ("notifications", "delivery_loop"),
        "delivery": ("notifications", "delivery_loop"),
    }
    def extract(self, lowered_query: str) -> tuple[str | None, str | None]:
        domain = next((value for token, value in self._DOMAIN_KEYWORDS.items() if token in lowered_query), None)
        subdomain: str | None = None
        for token, mapping in self._SUBDOMAIN_KEYWORDS.items():
            if token in lowered_query:
                domain = domain or mapping[0]
                subdomain = mapping[1]
                break
        return domain, subdomain
 class V2AnchorExtractor:
    def __init__(
        self,
        marker_scanner: _MarkerScanner | None = None,
        entity_extractor: _EntityNameExtractor | None = None,
        file_name_extractor: _FileNameExtractor | None = None,
        process_anchor_extractor: _ProcessAnchorExtractor | None = None,
    ) -> None:
        self._marker_scanner = marker_scanner or _MarkerScanner()
        self._entity_extractor = entity_extractor or _EntityNameExtractor()
        self._file_name_extractor = file_name_extractor or _FileNameExtractor()
        self._process_anchor_extractor = process_anchor_extractor or _ProcessAnchorExtractor()
    def extract(self, normalized_query: str, terms: TargetTermsAnalysis) -> AnchorAnalysis:
-        markers = self._marker_scanner.scan(normalized_query.lower())
+        lowered_query = normalized_query.lower()
        markers = self._marker_scanner.scan(lowered_query)
        process_domain, process_subdomain = self._process_anchor_extractor.extract(lowered_query)
        anchors = V2RouteAnchors(
            entity_names=self._entity_extractor.extract(normalized_query),
            file_names=self._file_name_extractor.extract(normalized_query),
            endpoint_paths=list(terms.endpoint_paths),
            target_doc_hints=self._target_doc_hints(
                endpoint_paths=terms.endpoint_paths,
                api_like_terms=terms.api_like_terms,
                alias_docs=terms.alias_docs,
                architecture_markers=markers["architecture_markers"],
                logic_markers=markers["logic_markers"],
                domain_markers=markers["domain_markers"],
            ),
            matched_aliases=list(terms.matched_aliases),
-            process_domain=None,
+            process_domain=process_domain,
-            process_subdomain=None,
+            process_subdomain=process_subdomain,
        )
        return AnchorAnalysis(
            anchors=anchors,
@@ -133,6 +194,7 @@ class V2AnchorExtractor:
        self,
        *,
        endpoint_paths: list[str],
        api_like_terms: list[str],
        alias_docs: list[str],
        architecture_markers: list[str],
        logic_markers: list[str],
@@ -145,13 +207,41 @@ class V2AnchorExtractor:
            "/actions/{action}": "docs/api/control-actions-endpoint.md",
        }
        for endpoint in endpoint_paths:
            for hint in self._endpoint_hint_variants(endpoint):
                self._append_unique(hints, hint)
            hint = endpoint_map.get(endpoint)
-            if hint and hint not in hints:
+            self._append_unique(hints, hint)
-                hints.append(hint)
+        for term in api_like_terms:
-        if architecture_markers and "docs/architecture/telegram-notify-app-overview.md" not in hints:
+            for hint in self._api_like_hint_variants(term):
-            hints.append("docs/architecture/telegram-notify-app-overview.md")
+                self._append_unique(hints, hint)
-        if logic_markers and "docs/logic/telegram-notification-loop.md" not in hints:
+        if architecture_markers:
-            hints.append("docs/logic/telegram-notification-loop.md")
+            self._append_unique(hints, "docs/architecture/telegram-notify-app-overview.md")
-        if domain_markers and "docs/domains/runtime-health-entity.md" not in hints:
+        if logic_markers:
-            hints.append("docs/domains/runtime-health-entity.md")
+            self._append_unique(hints, "docs/logic/telegram-notification-loop.md")
        if domain_markers:
            self._append_unique(hints, "docs/domains/runtime-health-entity.md")
        return hints
    def _endpoint_hint_variants(self, endpoint: str) -> list[str]:
        normalized = str(endpoint or "").strip().lower()
        if not normalized:
            return []
        slug = normalized.strip("/").replace("/", "-").replace("{", "").replace("}", "")
        leaf = next((part for part in reversed(slug.split("-")) if part and part != "id"), "")
        hints: list[str] = [normalized]
        for value in (slug, leaf):
            if not value:
                continue
            hints.extend([value, f"{value}-endpoint", f"{value} endpoint"])
        return list(dict.fromkeys(hints))
    def _api_like_hint_variants(self, term: str) -> list[str]:
        normalized = str(term or "").strip().lower().lstrip("/")
        if not normalized:
            return []
        return [normalized, f"/{normalized}", f"{normalized}-endpoint", f"{normalized} endpoint"]
    def _append_unique(self, items: list[str], value: str | None) -> None:
        normalized = str(value or "").strip()
        if normalized and normalized not in items:
            items.append(normalized)
@@ -8,6 +8,7 @@ from dataclasses import dataclass
 class TargetTermsAnalysis:
    target_terms: list[str]
    endpoint_paths: list[str]
    api_like_terms: list[str]
    matched_aliases: list[str]
    alias_docs: list[str]
@@ -26,7 +27,7 @@ class _AliasMatcher:
        _AliasRule(("control actions", "управление runtime"), "/actions/{action}", "docs/api/control-actions-endpoint.md"),
        _AliasRule(("runtime health", "здоровье runtime", "статусы здоровья"), "runtime_health", "docs/domains/runtime-health-entity.md"),
        _AliasRule(("цикл отправки уведомлений", "notification loop", "worker loop"), "telegram-notify-loop", "docs/logic/telegram-notification-loop.md"),
-        _AliasRule(("архитектура приложения", "overview"), "architecture_overview", "docs/architecture/telegram-notify-app-overview.md"),
+        _AliasRule(("архитектура приложения",), "architecture_overview", "docs/architecture/telegram-notify-app-overview.md"),
        _AliasRule(("архитектура",), "architecture_overview", "docs/architecture/telegram-notify-app-overview.md"),
        _AliasRule(("каталог ошибок", "errors catalog"), "errors_catalog", "docs/errors/catalog.yaml"),
        _AliasRule(("файл-индекс документации", "docs index", "индекс документации"), "docs_index", "docs/README.md"),
@@ -51,6 +52,7 @@ class _AliasMatcher:
 class _EndpointPathExtractor:
    _PATH_RE = re.compile(r"`([^`]+)`|(/[A-Za-z0-9_./{}-]+)")
    _VALID_ENDPOINT_RE = re.compile(r"^/[a-z0-9._/-]+(?:/\{[a-z0-9_]+\})?$")
    _DOC_EXTENSIONS = (".md", ".yaml", ".yml", ".json")
    def extract(self, query: str) -> list[str]:
        values: list[str] = []
@@ -68,28 +70,161 @@ class _EndpointPathExtractor:
        return trimmed.lower()
    def _is_endpoint(self, token: str) -> bool:
-        return bool(token and self._VALID_ENDPOINT_RE.fullmatch(token))
+        if not token or not self._VALID_ENDPOINT_RE.fullmatch(token):
            return False
        return not token.endswith(self._DOC_EXTENSIONS)
    def _append_unique(self, items: list[str], value: str) -> None:
        if value and value not in items:
            items.append(value)
@dataclass(slots=True)
 class _ApiLikeAnchorAnalysis:
    endpoint_paths: list[str]
    candidate_terms: list[str]
 class _ApiLikeAnchorExtractor:
    _TOKEN_RE = re.compile(r"[A-Za-zА-Яа-я0-9_./{}-]+")
    _ASCII_ENDPOINT_RE = re.compile(r"^[a-z0-9]+(?:[-_][a-z0-9]+)*$")
    _API_MARKERS = {
        "api",
        "endpoint",
        "route",
        "method",
        "метод",
        "метода",
        "методу",
        "ручка",
        "ручки",
        "эндпоинт",
        "эндпоинта",
        "маршрут",
        "роут",
    }
    _EXPLAIN_MARKERS = {
        "как",
        "что",
        "делает",
        "работает",
        "объясни",
        "объяснить",
        "расскажи",
        "опиши",
        "смысл",
    }
    _NOISE_WORDS = _API_MARKERS | _EXPLAIN_MARKERS | {
        "про",
        "какой",
        "какая",
        "какие",
        "какого",
        "какую",
        "кратко",
        "нужен",
        "нужно",
        "у",
    }
    _SHORT_QUERY_TOKEN_LIMIT = 7
    def extract(self, query: str, explicit_endpoint_paths: list[str]) -> _ApiLikeAnchorAnalysis:
        if explicit_endpoint_paths:
            return _ApiLikeAnchorAnalysis(endpoint_paths=list(explicit_endpoint_paths), candidate_terms=[])
        token_entries = self._token_entries(query)
        if not token_entries:
            return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=[])
        candidate_terms = [token for token, _start in token_entries if self._is_api_candidate(token)]
        if not candidate_terms:
            return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=[])
        if self._has_api_marker(token_entries):
            primary = self._primary_candidate(token_entries)
            endpoint_paths = [self._ensure_endpoint(primary)] if primary else []
            return _ApiLikeAnchorAnalysis(
                endpoint_paths=[path for path in endpoint_paths if path],
                candidate_terms=[primary] if primary else [],
            )
        if self._is_short_explain_query(token_entries) and len(candidate_terms) == 1:
            return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=list(candidate_terms))
        return _ApiLikeAnchorAnalysis(endpoint_paths=[], candidate_terms=[])
    def _token_entries(self, query: str) -> list[tuple[str, int]]:
        entries: list[tuple[str, int]] = []
        for match in self._TOKEN_RE.finditer(query):
            token = str(match.group(0) or "").strip().strip("`'\"()[]!?.,:;").lower()
            if token:
                entries.append((token, match.start()))
        return entries
    def _has_api_marker(self, token_entries: list[tuple[str, int]]) -> bool:
        return any(token in self._API_MARKERS for token, _start in token_entries)
    def _is_short_explain_query(self, token_entries: list[tuple[str, int]]) -> bool:
        if len(token_entries) > self._SHORT_QUERY_TOKEN_LIMIT:
            return False
        return any(token in self._EXPLAIN_MARKERS for token, _start in token_entries)
    def _primary_candidate(self, token_entries: list[tuple[str, int]]) -> str | None:
        marker_positions = [start for token, start in token_entries if token in self._API_MARKERS]
        candidates = [(token, start) for token, start in token_entries if self._is_api_candidate(token)]
        if not candidates:
            return None
        if not marker_positions:
            return candidates[-1][0]
        primary = min(
            candidates,
            key=lambda item: min(abs(item[1] - marker_pos) for marker_pos in marker_positions),
        )
        return primary[0]
    def _is_api_candidate(self, token: str) -> bool:
        if (
            not token
            or token in self._NOISE_WORDS
            or token.startswith("docs/")
            or token.endswith((".md", ".yaml", ".yml", ".json"))
        ):
            return False
        if token.startswith("/"):
            return True
        return self._ASCII_ENDPOINT_RE.fullmatch(token) is not None and len(token) >= 3
    def _ensure_endpoint(self, token: str) -> str:
        return token if token.startswith("/") else f"/{token}"
 class _TermCollector:
    _TOKEN_RE = re.compile(r"[A-Za-zА-Яа-я0-9_./{}-]+")
    _IDENTIFIER_RE = re.compile(
        r"^(?:[a-z0-9]+(?:[_-][a-z0-9]+)+|[a-z]+[A-Z][A-Za-z0-9]+|(?:[A-Z][a-z0-9]+){2,})$"
    )
    _QUESTION_WORDS = {"что", "как", "где", "какой", "какие", "каком", "когда", "чего"}
-    _INTENT_WORDS = {"объясни", "покажи", "найди", "расскажи", "дай", "опиши", "нужен"}
+    _INTENT_WORDS = {"объясни", "покажи", "найди", "расскажи", "дай", "опиши", "нужен", "show"}
-    _FILLER_WORDS = {"про", "там", "тут", "плз"}
+    _FILLER_WORDS = {"про", "там", "тут", "плз", "pls", "for"}
    _MARKER_WORDS = {
        "файл",
        "файле",
        "file",
        "method",
        "метод",
        "метода",
        "методу",
        "route",
        "ручка",
        "ручки",
        "эндпоинт",
        "эндпоинта",
        "overview",
        "architecture",
        "arch",
        "flow",
        "process",
        "workflow",
        "док",
        "дока",
        "доках",
        "документ",
        "doc",
        "описан",
        "док-саммари",
        "summary",
@@ -115,6 +250,7 @@ class _TermCollector:
        "service",
        "summary",
        "endpoint",
        "docs",
    }
    _MAX_TERMS = 7
@@ -191,19 +327,23 @@ class V2TargetTermsExtractor:
        self,
        alias_matcher: _AliasMatcher | None = None,
        endpoint_extractor: _EndpointPathExtractor | None = None,
        api_like_extractor: _ApiLikeAnchorExtractor | None = None,
        term_collector: _TermCollector | None = None,
    ) -> None:
        self._alias_matcher = alias_matcher or _AliasMatcher()
        self._endpoint_extractor = endpoint_extractor or _EndpointPathExtractor()
        self._api_like_extractor = api_like_extractor or _ApiLikeAnchorExtractor()
        self._term_collector = term_collector or _TermCollector()
    def extract(self, normalized_query: str) -> TargetTermsAnalysis:
        lowered = normalized_query.lower()
        endpoint_paths = self._endpoint_extractor.extract(normalized_query)
        api_like = self._api_like_extractor.extract(normalized_query, endpoint_paths)
        alias_terms, alias_docs, alias_hits = self._alias_matcher.match(lowered)
        return TargetTermsAnalysis(
-            target_terms=self._term_collector.collect(normalized_query, alias_terms, endpoint_paths),
+            target_terms=self._term_collector.collect(normalized_query, alias_terms, api_like.endpoint_paths),
-            endpoint_paths=endpoint_paths,
+            endpoint_paths=api_like.endpoint_paths,
            api_like_terms=api_like.candidate_terms,
            matched_aliases=alias_hits,
            alias_docs=alias_docs,
        )
@@ -44,6 +44,7 @@ class V2IntentRouter:
            normalized_query=normalized_query,
            target_terms=list(target_terms_analysis.target_terms),
            endpoint_paths=list(target_terms_analysis.endpoint_paths),
            file_names=list(anchor_analysis.anchors.file_names),
            matched_aliases=list(target_terms_analysis.matched_aliases),
            target_doc_hints=list(anchor_analysis.anchors.target_doc_hints),
            file_markers=list(anchor_analysis.file_markers),
@@ -58,6 +59,7 @@ class V2IntentRouter:
            anchors=anchor_analysis.anchors,
        )
        llm_result = self._validator.validate(llm_candidate)
        llm_result = self._apply_deterministic_corrections(llm_result, features)
        if llm_result is not None:
            confidence = self._confidence_adjuster.adjust(float(llm_result["confidence"]), features)
            return V2RouteResult(
@@ -99,3 +101,18 @@ class V2IntentRouter:
            )
        except Exception:
            return None
    def _apply_deterministic_corrections(self, candidate: dict | None, features: QueryFeatures) -> dict | None:
        if candidate is None:
            return None
        if candidate.get("routing_domain") == "DOCS" and self._should_force_find_files(features):
            corrected = dict(candidate)
            corrected["subintent"] = "FIND_FILES"
            return corrected
        return candidate
    def _should_force_find_files(self, features: QueryFeatures) -> bool:
        if features.file_markers or features.file_names:
            return True
        query = features.normalized_query.lower()
        return "show doc" in query or "show file" in query or "doc for" in query
@@ -6,7 +6,7 @@ from app.core.agent.processes.v2.models import V2Subintent
 class DocsSubintentResolver:
    def resolve(self, features: QueryFeatures) -> str | None:
-        if features.file_markers:
+        if features.file_markers or self._has_file_like_anchor(features):
            return V2Subintent.FIND_FILES
        if any(
            (
@@ -20,3 +20,9 @@ class DocsSubintentResolver:
        ):
            return V2Subintent.SUMMARY
        return None
    def _has_file_like_anchor(self, features: QueryFeatures) -> bool:
        return any(
            hint.endswith((".md", ".yaml", ".yml", ".json"))
            for hint in features.target_doc_hints
        ) or any(token.endswith((".md", ".yaml", ".yml", ".json")) for token in features.file_names)
@@ -14,7 +14,6 @@ from app.core.agent.processes.v2.retrieval.target_doc_seeding import (
    merge_row_lists,
    normalize_doc_path,
    normalized_path_set,
    path_variants_for_rag_query,
    row_path,
    seed_candidates_from_target_hints,
 )
@@ -121,11 +120,9 @@ class V2Process(AgentProcess):
            "retrieval_profile_selected",
            {"profile": plan.profile, "layers": plan.layers, "filters": plan.filters},
        )
-        seeded_rows = await self._seed_candidates_from_target_hints(rag_session_id, plan.layers, route)
+        retrieved_rows = await self._rag_adapter.fetch_rows(rag_session_id, route.normalized_query, plan)
-        semantic_rows = await self._rag_adapter.fetch_rows(rag_session_id, route.normalized_query, plan)
+        metadata_rows = self._metadata_lookup_candidates(retrieved_rows, route)
-        metadata_rows = self._metadata_lookup_candidates([*seeded_rows, *semantic_rows], route)
+        rows = self._merge_candidate_rows(retrieved_rows, metadata_rows)
        rows = self._merge_candidate_rows(seeded_rows, metadata_rows, semantic_rows)
        rows = await self._ensure_target_hints_in_pool(rag_session_id, rows, route)
        rows = seed_candidates_from_target_hints(rows, route.anchors.target_doc_hints, RagRowIndex(rows))
        self._print_missing_target_hints(route, rows)
        context.trace.module("process.v2.rag_retrieval").log(
@@ -150,9 +147,9 @@ class V2Process(AgentProcess):
                "target_doc_hints": route.anchors.target_doc_hints,
                "candidate_docs_before_ranking": [self._trace_row(row) for row in rows[:8]],
                "sources": {
-                    "seeded": [self._trace_row(row) for row in seeded_rows[:5]],
+                    "seeded": [self._trace_row(row) for row in retrieved_rows[:5] if row_path(row) in {normalize_doc_path(h) for h in route.anchors.target_doc_hints}],
                    "metadata_lookup": [self._trace_row(row) for row in metadata_rows[:5]],
-                    "semantic": [self._trace_row(row) for row in semantic_rows[:5]],
+                    "semantic": [self._trace_row(row) for row in retrieved_rows[:5]],
                },
            },
        )
@@ -262,61 +259,11 @@ class V2Process(AgentProcess):
            if not str(hint or "").strip():
                continue
            normalized = normalize_doc_path(hint)
            if not normalized.startswith("docs/") or "." not in normalized.rsplit("/", 1)[-1]:
                continue
            if normalized not in candidate_paths:
                print("ERROR: target doc missing from candidates:", normalized)
    async def _ensure_target_hints_in_pool(self, rag_session_id: str, rows: list[dict], route) -> list[dict]:
        hints_raw = [str(item).strip() for item in route.anchors.target_doc_hints if str(item or "").strip()]
        if not hints_raw:
            return rows
        pool = normalized_path_set(rows)
        missing_hints = [h for h in hints_raw if normalize_doc_path(h) not in pool]
        if not missing_hints:
            return rows
        variant_paths: list[str] = []
        for h in missing_hints:
            variant_paths.extend(path_variants_for_rag_query(h))
        variant_paths = list(dict.fromkeys(variant_paths))
        extra_exact = await self._rag_adapter.fetch_exact_paths(rag_session_id, paths=variant_paths, layers=None)
        pool2 = normalized_path_set(extra_exact)
        still_missing = [h for h in missing_hints if normalize_doc_path(h) not in pool2]
        fallback_rows: list[dict] = []
        if still_missing:
            needles = [normalize_doc_path(h).split("/")[-1] for h in still_missing]
            needles = list(dict.fromkeys(n for n in needles if n))
            if needles:
                fallback_rows = await self._rag_adapter.fetch_chunks_by_path_substrings(
                    rag_session_id,
                    path_needles=needles,
                    layers=None,
                )
        return merge_row_lists(rows, extra_exact, fallback_rows)
    async def _seed_candidates_from_target_hints(self, rag_session_id: str, layers: list[str], route) -> list[dict]:
        del layers  # seed по пути должен видеть все слои (иначе D0-only чанки теряются при file_lookup).
        hints_raw = [str(item).strip() for item in route.anchors.target_doc_hints if str(item or "").strip()]
        if not hints_raw:
            return []
        variant_paths: list[str] = []
        for h in hints_raw:
            variant_paths.extend(path_variants_for_rag_query(h))
        variant_paths = list(dict.fromkeys(variant_paths))
        exact_rows = await self._rag_adapter.fetch_exact_paths(rag_session_id, paths=variant_paths, layers=None)
        paths_found = normalized_path_set(exact_rows)
        missing = [h for h in hints_raw if normalize_doc_path(h) not in paths_found]
        if not missing:
            return exact_rows
        needles = [normalize_doc_path(h).split("/")[-1] for h in missing]
        needles = list(dict.fromkeys(n for n in needles if n))
        if not needles:
            return exact_rows
        fallback_rows = await self._rag_adapter.fetch_chunks_by_path_substrings(
            rag_session_id,
            path_needles=needles,
            layers=None,
        )
        return merge_row_lists(exact_rows, fallback_rows)
    def _metadata_lookup_candidates(self, rows: list[dict], route) -> list[dict]:
        return DocsMetadataLookupIndex(rows).lookup(route)
@@ -1,4 +1,4 @@
-"""Intent-aware retrieval policy resolver для процесса v2."""
+"""Intent-aware retrieval policy resolver for process v2."""
 from __future__ import annotations
@@ -8,91 +8,113 @@ from app.core.rag.contracts.enums import RagLayer
 from app.core.rag.retrieval.session_retriever import RetrievalPlan
-class V2RetrievalPolicyResolver:
+class _AnchorTermCollector:
-    _SUMMARY_LAYERS = [
+    def prefer_like_patterns(self, route: V2RouteResult) -> list[str]:
-        RagLayer.DOCS_DOCUMENT_CATALOG,
+        terms = self._hint_basenames(route)
-        RagLayer.DOCS_ENTITY_CATALOG,
+        terms.extend(route.anchors.endpoint_paths)
-        RagLayer.DOCS_DOC_CHUNKS,
+        terms.extend(route.target_terms)
-    ]
+        terms.extend(route.anchors.file_names)
-    _GENERAL_LAYERS = [
+        terms.extend(route.anchors.entity_names)
-        RagLayer.DOCS_DOCUMENT_CATALOG,
+        terms.extend(route.anchors.matched_aliases)
-        RagLayer.DOCS_DOC_CHUNKS,
+        terms.extend(self._process_terms(route))
        return [f"%{term.lower()}%" for term in _unique_terms(terms)]
    def find_files_patterns(self, route: V2RouteResult) -> list[str]:
        if route.anchors.target_doc_hints:
            return [f"%{name.lower()}%" for name in self._hint_basenames(route)]
        return self.prefer_like_patterns(route)
    def api_method_patterns(self, route: V2RouteResult) -> list[str]:
        terms = self._hint_basenames(route)
        terms.extend(route.anchors.target_doc_hints)
        terms.extend(route.anchors.endpoint_paths)
        terms.extend(route.target_terms)
        patterns: list[str] = []
        for term in _unique_terms(terms):
            lowered = term.lower()
            stripped = lowered.strip("/")
            if stripped:
                patterns.append(f"%{stripped}%")
            if lowered:
                patterns.append(f"%{lowered}%")
        return _unique_terms(patterns)
    def _hint_basenames(self, route: V2RouteResult) -> list[str]:
        return [hint.rsplit("/", 1)[-1] for hint in route.anchors.target_doc_hints if str(hint).strip()]
    def _process_terms(self, route: V2RouteResult) -> list[str]:
        terms: list[str] = []
        if route.anchors.process_domain:
            terms.append(route.anchors.process_domain)
        if route.anchors.process_subdomain:
            terms.append(route.anchors.process_subdomain)
        return terms
 class _RouteFilterBuilder:
    _API_DOC_PREFIXES = [
        "docs/api/",
        "docs/endpoints/",
        "docs/methods/",
        "api/",
        "endpoints/",
        "methods/",
    ]
-    def resolve(self, route: V2RouteResult) -> RetrievalPlan:
+    def __init__(self) -> None:
-        if route.intent == V2Intent.GENERAL_QA:
+        self._terms = _AnchorTermCollector()
            return RetrievalPlan(
                profile="general_qa_grounded_summary",
                layers=list(self._GENERAL_LAYERS),
                limit=8,
                filters=self._general_filters(route),
            )
        if route.subintent == V2Subintent.FIND_FILES:
            return RetrievalPlan(
                profile="file_lookup",
                layers=[RagLayer.DOCS_DOCUMENT_CATALOG, RagLayer.DOCS_ENTITY_CATALOG],
                limit=12,
                filters=self._find_files_filters(route),
            )
        return RetrievalPlan(
            profile=self._summary_profile(route),
            layers=list(self._SUMMARY_LAYERS),
            limit=8,
            filters=self._summary_filters(route),
        )
-    def _summary_profile(self, route: V2RouteResult) -> str:
+    def general_filters(self, route: V2RouteResult) -> dict[str, object]:
        signals = anchor_signal_types(route)
        if len(signals - {V2AnchorType.FIND_FILES}) != 1:
            return "docs_summary_generic"
        mapping = {
            V2AnchorType.API_ENDPOINT: "docs_summary_api_endpoint",
            V2AnchorType.ARCHITECTURE: "docs_summary_architecture",
            V2AnchorType.LOGIC_FLOW: "docs_summary_logic_flow",
            V2AnchorType.DOMAIN_ENTITY: "docs_summary_domain_entity",
        }
        signal = next(iter(signals - {V2AnchorType.FIND_FILES}), None)
        return mapping.get(signal, "docs_summary_generic")
    def _general_filters(self, route: V2RouteResult) -> dict[str, object]:
        return {
            "prefer_path_prefixes": ["docs/architecture/", "docs/"],
-            "prefer_like_patterns": ["%README.md%", "%overview%"],
+            "prefer_like_patterns": ["%readme.md%", "%overview%"],
            "target_doc_hints": list(route.anchors.target_doc_hints),
        }
-    def _summary_filters(self, route: V2RouteResult) -> dict[str, object]:
+    def summary_filters(self, route: V2RouteResult) -> dict[str, object]:
-        filters: dict[str, object] = {
+        if _is_api_method_explain(route):
-            "prefer_path_prefixes": self._summary_prefixes(route),
+            return self.api_method_filters(route)
-            "prefer_like_patterns": self._prefer_like_patterns(route),
+        filters = self._base_filters(route)
-            "target_doc_hints": list(route.anchors.target_doc_hints),
+        filters["prefer_path_prefixes"] = self._summary_prefixes(route)
-        }
+        filters["prefer_like_patterns"] = self._terms.prefer_like_patterns(route)
        if V2AnchorType.API_ENDPOINT in anchor_signal_types(route):
-            filters["path_prefixes"] = ["docs/api/", "docs/architecture/", "docs/"]
+            filters["path_prefixes"] = ["docs/api/", "docs/"]
        return filters
-    def _find_files_filters(self, route: V2RouteResult) -> dict[str, object]:
+    def api_method_filters(self, route: V2RouteResult) -> dict[str, object]:
        filters = self._base_filters(route)
        filters["path_prefixes"] = list(self._API_DOC_PREFIXES)
        filters["prefer_path_prefixes"] = list(self._API_DOC_PREFIXES)
        filters["prefer_like_patterns"] = self._terms.api_method_patterns(route)
        return filters
    def find_files_filters(self, route: V2RouteResult) -> dict[str, object]:
        filters = self._base_filters(route)
        prefixes = self._find_files_prefixes(route)
        if prefixes:
            filters["path_prefixes"] = prefixes
        filters["prefer_path_prefixes"] = self._find_files_prefer_prefixes(route, prefixes)
        filters["prefer_like_patterns"] = self._terms.find_files_patterns(route)
        return filters
    def _base_filters(self, route: V2RouteResult) -> dict[str, object]:
        filters: dict[str, object] = {
            "prefer_path_prefixes": self._find_files_prefixes(route),
            "prefer_like_patterns": self._prefer_like_patterns(route),
            "target_doc_hints": list(route.anchors.target_doc_hints),
        }
-        if route.anchors.target_doc_hints:
+        if route.anchors.process_domain:
-            filters["prefer_like_patterns"] = [f"%{path.split('/')[-1]}%" for path in route.anchors.target_doc_hints]
+            filters["metadata.domain"] = route.anchors.process_domain
        if route.anchors.process_subdomain:
            filters["metadata.subdomain"] = route.anchors.process_subdomain
        return filters
    def _prefer_like_patterns(self, route: V2RouteResult) -> list[str]:
        patterns: list[str] = []
        for path in route.anchors.target_doc_hints:
            patterns.append(f"%{path.split('/')[-1]}%")
        for endpoint in route.anchors.endpoint_paths:
            patterns.append(f"%{endpoint}%")
        return patterns
    def _find_files_prefixes(self, route: V2RouteResult) -> list[str]:
-        if route.anchors.target_doc_hints:
+        hint_prefixes = _prefixes_from_paths(route.anchors.target_doc_hints)
-            prefixes = ["/".join(path.split("/")[:-1]) + "/" for path in route.anchors.target_doc_hints]
+        if hint_prefixes:
-            return [prefix for prefix in prefixes if prefix]
+            return hint_prefixes
        file_prefixes = [name for name in route.anchors.file_names if str(name).strip().startswith("docs/")]
        derived = _prefixes_from_paths(file_prefixes)
        if derived:
            return derived
        signals = anchor_signal_types(route)
        if V2AnchorType.API_ENDPOINT in signals:
            return ["docs/api/", "docs/"]
@@ -104,6 +126,12 @@ class V2RetrievalPolicyResolver:
            return ["docs/domains/", "docs/"]
        return ["docs/"]
    def _find_files_prefer_prefixes(self, route: V2RouteResult, prefixes: list[str]) -> list[str]:
        preferred = list(prefixes)
        if route.anchors.process_domain or route.anchors.process_subdomain:
            preferred.extend(["docs/domains/", "docs/logic/"])
        return _unique_terms(preferred or ["docs/"])
    def _summary_prefixes(self, route: V2RouteResult) -> list[str]:
        signals = anchor_signal_types(route)
        prefixes: list[str] = []
@@ -114,5 +142,129 @@ class V2RetrievalPolicyResolver:
        if V2AnchorType.LOGIC_FLOW in signals:
            prefixes.extend(["docs/logic/", "docs/architecture/", "docs/"])
        if V2AnchorType.DOMAIN_ENTITY in signals:
-            prefixes.extend(["docs/domains/", "docs/api/", "docs/architecture/"])
+            prefixes.extend(["docs/domains/", "docs/", "docs/api/"])
-        return list(dict.fromkeys(prefixes or ["docs/"]))
+        return _unique_terms(prefixes or ["docs/"])
 class V2RetrievalPolicyResolver:
    _GENERAL_LAYERS = [RagLayer.DOCS_DOCUMENT_CATALOG, RagLayer.DOCS_DOC_CHUNKS]
    _FIND_FILES_LAYERS = [RagLayer.DOCS_DOCUMENT_CATALOG, RagLayer.DOCS_ENTITY_CATALOG]
    _SUMMARY_LAYERS = {
        "docs_api_method_explain": [
            RagLayer.DOCS_DOCUMENT_CATALOG,
            RagLayer.DOCS_FACT_INDEX,
            RagLayer.DOCS_DOC_CHUNKS,
        ],
        "docs_summary_api_endpoint": [
            RagLayer.DOCS_DOCUMENT_CATALOG,
            RagLayer.DOCS_FACT_INDEX,
            RagLayer.DOCS_DOC_CHUNKS,
        ],
        "docs_summary_logic_flow": [
            RagLayer.DOCS_WORKFLOW_INDEX,
            RagLayer.DOCS_DOCUMENT_CATALOG,
            RagLayer.DOCS_DOC_CHUNKS,
        ],
        "docs_summary_domain_entity": [
            RagLayer.DOCS_ENTITY_CATALOG,
            RagLayer.DOCS_DOCUMENT_CATALOG,
            RagLayer.DOCS_DOC_CHUNKS,
        ],
        "docs_summary_architecture": [
            RagLayer.DOCS_DOCUMENT_CATALOG,
            RagLayer.DOCS_RELATION_GRAPH,
            RagLayer.DOCS_DOC_CHUNKS,
        ],
        "docs_summary_generic": [
            RagLayer.DOCS_DOCUMENT_CATALOG,
            RagLayer.DOCS_DOC_CHUNKS,
        ],
    }
    def __init__(self) -> None:
        self._filters = _RouteFilterBuilder()
    def resolve(self, route: V2RouteResult) -> RetrievalPlan:
        if route.intent == V2Intent.GENERAL_QA:
            return RetrievalPlan(
                profile="general_qa_grounded_summary",
                layers=list(self._GENERAL_LAYERS),
                limit=8,
                filters=self._filters.general_filters(route),
            )
        if route.subintent == V2Subintent.FIND_FILES:
            return RetrievalPlan(
                profile="file_lookup",
                layers=list(self._FIND_FILES_LAYERS),
                limit=12,
                filters=self._filters.find_files_filters(route),
            )
        profile = self._summary_profile(route)
        return RetrievalPlan(
            profile=profile,
            layers=list(self._SUMMARY_LAYERS[profile]),
            limit=10 if profile == "docs_api_method_explain" else 8,
            filters=self._filters.summary_filters(route),
        )
    def _summary_profile(self, route: V2RouteResult) -> str:
        if _is_api_method_explain(route):
            return "docs_api_method_explain"
        meaningful = anchor_signal_types(route) - {V2AnchorType.FIND_FILES}
        if len(meaningful) != 1:
            return "docs_summary_generic"
        mapping = {
            V2AnchorType.API_ENDPOINT: "docs_summary_api_endpoint",
            V2AnchorType.ARCHITECTURE: "docs_summary_architecture",
            V2AnchorType.LOGIC_FLOW: "docs_summary_logic_flow",
            V2AnchorType.DOMAIN_ENTITY: "docs_summary_domain_entity",
        }
        return mapping.get(next(iter(meaningful)), "docs_summary_generic")
 def _prefixes_from_paths(paths: list[str]) -> list[str]:
    prefixes = []
    for path in paths:
        value = str(path).strip().strip("/")
        if "/" not in value:
            continue
        prefix = value.rsplit("/", 1)[0] + "/"
        if prefix:
            prefixes.append(prefix)
    return _unique_terms(prefixes)
 def _unique_terms(items: list[str]) -> list[str]:
    seen: set[str] = set()
    unique: list[str] = []
    for raw in items:
        value = str(raw or "").strip()
        if not value or value in seen:
            continue
        seen.add(value)
        unique.append(value)
    return unique
 def _is_api_method_explain(route: V2RouteResult) -> bool:
    if route.subintent != V2Subintent.SUMMARY:
        return False
    if route.anchors.endpoint_paths:
        return True
    if _has_api_like_hints(route.anchors.target_doc_hints):
        return True
    return V2AnchorType.API_ENDPOINT in anchor_signal_types(route)
 def _has_api_like_hints(hints: list[str]) -> bool:
    for hint in hints:
        value = str(hint or "").strip().lower()
        if not value:
            continue
        if value.startswith("/"):
            return True
        if value.startswith(("docs/api/", "docs/endpoints/", "docs/methods/")):
            return True
        if "endpoint" in value or "method" in value:
            return True
    return False
@@ -1,18 +1,23 @@
-"""Адаптер v2 к :class:`RagSessionRetriever` для подстановки в тестах."""
+"""Адаптер v2 к :class:`RagSessionRetriever` с plan-driven execution strategy."""
 from __future__ import annotations
 from app.core.agent.processes.v2.retrieval.target_doc_seeding import (
    merge_row_lists,
    normalize_doc_path,
    path_variants_for_rag_query,
 )
 from app.core.rag.retrieval.session_retriever import RagSessionRetriever, RetrievalPlan
-class V2RagRetrievalAdapter:
+class _PlanDrivenRetrieval:
    """Обёртка над :class:`RagSessionRetriever` для подмены в тестах."""
    def __init__(self, retriever: RagSessionRetriever) -> None:
        self._retriever = retriever
    async def fetch_rows(self, rag_session_id: str, query_text: str, plan: RetrievalPlan) -> list[dict]:
-        return await self._retriever.retrieve(rag_session_id, query_text, plan)
+        seeded_rows = await self._seed_from_target_hints(rag_session_id, plan)
        semantic_rows = await self._retriever.retrieve(rag_session_id, query_text, plan)
        return merge_row_lists(seeded_rows, semantic_rows)
    async def fetch_exact_paths(self, rag_session_id: str, *, paths: list[str], layers: list[str] | None = None) -> list[dict]:
        return await self._retriever.retrieve_exact_files(rag_session_id, paths=paths, layers=layers)
@@ -31,3 +36,73 @@ class V2RagRetrievalAdapter:
            layers=layers,
            limit=limit,
        )
    async def _seed_from_target_hints(self, rag_session_id: str, plan: RetrievalPlan) -> list[dict]:
        hints = self._target_doc_hints(plan)
        if not hints:
            return []
        exact_rows = await self._fetch_exact_rows(rag_session_id, hints)
        missing = self._missing_hints(hints, exact_rows)
        if not missing:
            return exact_rows
        fallback_rows = await self._fetch_substring_rows(rag_session_id, missing)
        return merge_row_lists(exact_rows, fallback_rows)
    async def _fetch_exact_rows(self, rag_session_id: str, hints: list[str]) -> list[dict]:
        variant_paths: list[str] = []
        for hint in hints:
            variant_paths.extend(path_variants_for_rag_query(hint))
        unique_paths = list(dict.fromkeys(path for path in variant_paths if path))
        if not unique_paths:
            return []
        return await self._retriever.retrieve_exact_files(rag_session_id, paths=unique_paths, layers=None)
    async def _fetch_substring_rows(self, rag_session_id: str, hints: list[str]) -> list[dict]:
        needles = [normalize_doc_path(hint).split("/")[-1] for hint in hints]
        unique_needles = list(dict.fromkeys(needle for needle in needles if needle))
        if not unique_needles:
            return []
        return await self._retriever.retrieve_chunks_by_path_substrings(
            rag_session_id,
            path_needles=unique_needles,
            layers=None,
            limit=200,
        )
    def _target_doc_hints(self, plan: RetrievalPlan) -> list[str]:
        raw = plan.filters.get("target_doc_hints")
        if not isinstance(raw, list):
            return []
        return [str(item).strip() for item in raw if str(item or "").strip()]
    def _missing_hints(self, hints: list[str], rows: list[dict]) -> list[str]:
        pool = {normalize_doc_path(str(row.get("path") or "")) for row in rows}
        return [hint for hint in hints if normalize_doc_path(hint) not in pool]
 class V2RagRetrievalAdapter:
    """Обёртка над :class:`RagSessionRetriever` для plan-driven retrieval и подмены в тестах."""
    def __init__(self, retriever: RagSessionRetriever) -> None:
        self._retriever = _PlanDrivenRetrieval(retriever)
    async def fetch_rows(self, rag_session_id: str, query_text: str, plan: RetrievalPlan) -> list[dict]:
        return await self._retriever.fetch_rows(rag_session_id, query_text, plan)
    async def fetch_exact_paths(self, rag_session_id: str, *, paths: list[str], layers: list[str] | None = None) -> list[dict]:
        return await self._retriever.fetch_exact_paths(rag_session_id, paths=paths, layers=layers)
    async def fetch_chunks_by_path_substrings(
        self,
        rag_session_id: str,
        *,
        path_needles: list[str],
        layers: list[str] | None = None,
        limit: int = 200,
    ) -> list[dict]:
        return await self._retriever.fetch_chunks_by_path_substrings(
            rag_session_id,
            path_needles=path_needles,
            layers=layers,
            limit=limit,
        )
@@ -1,20 +1,24 @@
 from __future__ import annotations
 import logging
 import yaml
 from app.core.rag.indexing.docs.chunkers.markdown_chunker import SectionChunk
 from app.core.rag.indexing.docs.models import IntegrationRecord
 LOGGER = logging.getLogger(__name__)
 class DocsIntegrationExtractor:
    _SECTION_TITLES = {"integrations", "интеграции"}
-    def extract(self, sections: list[SectionChunk]) -> list[IntegrationRecord]:
+    def extract(self, sections: list[SectionChunk], *, path: str = "") -> list[IntegrationRecord]:
        records: list[IntegrationRecord] = []
        for section in sections:
            if not self._is_integration_section(section.section_path):
                continue
-            payload = self._payload(section.content)
+            payload = self._payload(section.content, path=path, section_path=section.section_path)
            target = str(payload.get("target") or "").strip()
            if not target:
                continue
@@ -40,7 +44,7 @@ class DocsIntegrationExtractor:
        parts = [item.strip().lower() for item in section_path.split(" > ") if item.strip()]
        return any(part in self._SECTION_TITLES for part in parts[:-1]) or (parts and parts[-1] in self._SECTION_TITLES)
-    def _payload(self, text: str) -> dict:
+    def _payload(self, text: str, *, path: str, section_path: str) -> dict:
        payload: dict = {}
        details_lines: list[str] = []
        collecting_details = False
@@ -61,15 +65,27 @@ class DocsIntegrationExtractor:
                collecting_details = True
                details_lines = []
                if value:
-                    payload[key] = self._yaml_value(value)
+                    payload[key] = self._yaml_value(
                        value,
                        path=path,
                        section_path=section_path,
                        field_name=key,
                        fallback="",
                    )
                continue
            collecting_details = False
-            payload[key] = self._yaml_value(value)
+            payload[key] = self._yaml_value(
                value,
                path=path,
                section_path=section_path,
                field_name=key,
                fallback=value,
            )
        if details_lines:
-            payload["details"] = self._details_payload(details_lines)
+            payload["details"] = self._details_payload(details_lines, path=path, section_path=section_path)
        return payload
-    def _details_payload(self, lines: list[str]) -> dict:
+    def _details_payload(self, lines: list[str], *, path: str, section_path: str) -> dict:
        normalized: list[str] = []
        for raw_line in lines:
            line = raw_line[2:] if raw_line.startswith("  ") else raw_line
@@ -78,7 +94,13 @@ class DocsIntegrationExtractor:
            if indent == 0 and stripped.startswith("- "):
                stripped = stripped[2:]
            normalized.append((" " * indent) + stripped)
-        payload = yaml.safe_load("\n".join(normalized)) or {}
+        payload = self._yaml_value(
            "\n".join(normalized),
            path=path,
            section_path=section_path,
            field_name="details",
            fallback={},
        ) or {}
        return payload if isinstance(payload, dict) else {}
    def _split_key_value(self, text: str) -> tuple[str, str]:
@@ -87,7 +109,17 @@ class DocsIntegrationExtractor:
        key, value = text.split(":", 1)
        return key.strip(), value.strip()
-    def _yaml_value(self, value: str):
+    def _yaml_value(self, value: str, *, path: str, section_path: str, field_name: str, fallback):
        if not value:
            return ""
        try:
            return yaml.safe_load(value)
        except yaml.YAMLError as exc:
            LOGGER.warning(
                "docs integration parse warning: path=%s section=%s field=%s reason=%s",
                path or "<unknown>",
                section_path,
                field_name,
                exc.__class__.__name__,
            )
            return fallback
@@ -1,5 +1,8 @@
 from __future__ import annotations
 import logging
 from collections.abc import Callable
 from app.core.rag.contracts import RagDocument, RagSource
 from app.core.rag.indexing.docs.chunkers.markdown_chunker import MarkdownDocChunker
 from app.core.rag.indexing.docs.classifier import DocsClassifier
@@ -15,6 +18,8 @@ from app.core.rag.indexing.docs.relation_extractor import DocsRelationExtractor
 from app.core.rag.indexing.docs.support_layer_builder import DocsSupportLayerBuilder
 from app.core.rag.indexing.docs.workflow_extractor import DocsWorkflowExtractor
 LOGGER = logging.getLogger(__name__)
 class DocsIndexingPipeline:
    def __init__(self) -> None:
@@ -59,7 +64,11 @@ class DocsIndexingPipeline:
        for section in sections:
            docs.append(self._builder.build_doc_chunk(source, section, parsed.frontmatter, doc_kind))
        document_id = frontmatter_view.document_id or source.path
-        for fact in self._facts.extract(parsed.frontmatter, sections):
+        for fact in self._safe_extract(
            extractor_name="fact_extractor",
            path=path,
            run=lambda: self._facts.extract(parsed.frontmatter, sections),
        ):
            docs.append(
                self._support_builder.build_fact(
                    source,
@@ -72,13 +81,29 @@ class DocsIndexingPipeline:
                    subdomain=frontmatter_view.subdomain,
                )
            )
-        for entity in self._entities.extract(parsed.frontmatter):
+        for entity in self._safe_extract(
            extractor_name="entity_extractor",
            path=path,
            run=lambda: self._entities.extract(parsed.frontmatter),
        ):
            docs.append(self._builder.build_entity_record(source, parsed.frontmatter, entity))
-        for workflow in self._workflows.extract(parsed.detail_sections):
+        for workflow in self._safe_extract(
            extractor_name="workflow_extractor",
            path=path,
            run=lambda: self._workflows.extract(parsed.detail_sections),
        ):
            docs.append(self._support_builder.build_workflow_record(source, parsed.frontmatter, workflow))
-        for edge in self._relations.extract(parsed.frontmatter, source_id=document_id):
+        for edge in self._safe_extract(
            extractor_name="relation_extractor",
            path=path,
            run=lambda: self._relations.extract(parsed.frontmatter, source_id=document_id),
        ):
            docs.append(self._support_builder.build_relation_record(source, parsed.frontmatter, edge))
-        for integration in self._integrations.extract(sections):
+        for integration in self._safe_extract(
            extractor_name="integration_extractor",
            path=path,
            run=lambda: self._integrations.extract(sections, path=path),
        ):
            docs.append(self._support_builder.build_integration_record(source, parsed.frontmatter, integration))
        return docs
@@ -86,3 +111,15 @@ class DocsIndexingPipeline:
        tail = path.rsplit("/", 1)[-1]
        stem = tail.rsplit(".", 1)[0]
        return stem.replace("-", " ").replace("_", " ").strip().title()
    def _safe_extract(self, *, extractor_name: str, path: str, run: Callable[[], list]) -> list:
        try:
            return run()
        except Exception as exc:
            LOGGER.warning(
                "docs pipeline extractor warning: path=%s extractor=%s reason=%s",
                path,
                extractor_name,
                exc.__class__.__name__,
            )
            return []
@@ -25,6 +25,8 @@ class RagQueryRepository:
        exclude_like_patterns: list[str] | None = None,
        prefer_path_prefixes: list[str] | None = None,
        prefer_like_patterns: list[str] | None = None,
        metadata_domain: str | None = None,
        metadata_subdomain: str | None = None,
        prefer_non_tests: bool = False,
    ) -> list[dict]:
        sql, params = self._builder.build_retrieve(
@@ -38,6 +40,8 @@ class RagQueryRepository:
            exclude_like_patterns=exclude_like_patterns,
            prefer_path_prefixes=prefer_path_prefixes,
            prefer_like_patterns=prefer_like_patterns,
            metadata_domain=metadata_domain,
            metadata_subdomain=metadata_subdomain,
            prefer_non_tests=prefer_non_tests,
        )
        with get_engine().connect() as conn:
@@ -234,6 +238,54 @@ class RagQueryRepository:
            rows = conn.execute(stmt, params).mappings().fetchall()
        return [self._row_to_dict(row) for row in rows]
    def retrieve_chunks_by_path_substrings(
        self,
        rag_session_id: str,
        *,
        path_needles: list[str],
        layers: list[str] | None = None,
        limit: int = 200,
    ) -> list[dict]:
        normalized_needles = [str(item).strip().lower() for item in path_needles if str(item).strip()]
        if not normalized_needles:
            return []
        params: dict = {
            "sid": rag_session_id,
            "lim": max(1, int(limit)),
        }
        filters = ["rag_session_id = :sid"]
        like_parts: list[str] = []
        for idx, needle in enumerate(normalized_needles):
            key = f"needle_{idx}"
            params[key] = f"%{needle}%"
            like_parts.append(f"lower(path) LIKE :{key}")
        filters.append("(" + " OR ".join(like_parts) + ")")
        if layers:
            normalized_layers = [str(item).strip() for item in layers if str(item).strip()]
            if normalized_layers:
                params["layers"] = normalized_layers
                filters.append("layer IN :layers")
        stmt = text(
            f"""
            SELECT path, content, layer, title, metadata_json, span_start, span_end,
                   0 AS lexical_rank,
                   0 AS prefer_bonus,
                   0 AS test_penalty,
                   0 AS structural_rank,
                   0 AS layer_rank,
                   0 AS distance
            FROM rag_chunks
            WHERE {' AND '.join(filters)}
            ORDER BY path ASC, COALESCE(span_start, 0) ASC, COALESCE(chunk_index, 0) ASC
            LIMIT :lim
            """
        )
        if "layers" in params:
            stmt = stmt.bindparams(bindparam("layers", expanding=True))
        with get_engine().connect() as conn:
            rows = conn.execute(stmt, params).mappings().fetchall()
        return [self._row_to_dict(row) for row in rows]
    def _row_to_dict(self, row) -> dict:
        data = dict(row)
        raw_metadata = data.pop("metadata_json")
@@ -69,6 +69,8 @@ class RagRepository:
        exclude_like_patterns: list[str] | None = None,
        prefer_path_prefixes: list[str] | None = None,
        prefer_like_patterns: list[str] | None = None,
        metadata_domain: str | None = None,
        metadata_subdomain: str | None = None,
        prefer_non_tests: bool = False,
    ) -> list[dict]:
        return self._query.retrieve(
@@ -82,6 +84,8 @@ class RagRepository:
            exclude_like_patterns=exclude_like_patterns,
            prefer_path_prefixes=prefer_path_prefixes,
            prefer_like_patterns=prefer_like_patterns,
            metadata_domain=metadata_domain,
            metadata_subdomain=metadata_subdomain,
            prefer_non_tests=prefer_non_tests,
        )
@@ -141,3 +145,18 @@ class RagRepository:
            layers=layers,
            limit=limit,
        )
    def retrieve_chunks_by_path_substrings(
        self,
        rag_session_id: str,
        *,
        path_needles: list[str],
        layers: list[str] | None = None,
        limit: int = 200,
    ) -> list[dict]:
        return self._query.retrieve_chunks_by_path_substrings(
            rag_session_id,
            path_needles=path_needles,
            layers=layers,
            limit=limit,
        )
@@ -19,6 +19,8 @@ class RetrievalStatementBuilder:
        exclude_like_patterns: list[str] | None = None,
        prefer_path_prefixes: list[str] | None = None,
        prefer_like_patterns: list[str] | None = None,
        metadata_domain: str | None = None,
        metadata_subdomain: str | None = None,
        prefer_non_tests: bool = False,
    ) -> tuple[str, dict]:
        emb = "[" + ",".join(str(x) for x in query_embedding) + "]"
@@ -29,6 +31,8 @@ class RetrievalStatementBuilder:
        self._append_prefix_group(filters, params, "path", path_prefixes)
        self._append_prefix_group(filters, params, "exclude_prefix", exclude_path_prefixes, negate=True)
        self._append_like_group(filters, params, "exclude_like", exclude_like_patterns, negate=True)
        self._append_metadata_equals(filters, params, "metadata_domain", "domain", metadata_domain)
        self._append_metadata_equals(filters, params, "metadata_subdomain", "subdomain", metadata_subdomain)
        if layers:
            filters.append("layer = ANY(:layers)")
            params["layers"] = layers
@@ -202,6 +206,20 @@ class RetrievalStatementBuilder:
        joined = " OR ".join(parts)
        filters.append(f"NOT ({joined})" if negate else f"({joined})")
    def _append_metadata_equals(
        self,
        filters: list[str],
        params: dict,
        param_key: str,
        metadata_key: str,
        value: str | None,
    ) -> None:
        normalized = str(value or "").strip().lower()
        if not normalized:
            return
        params[param_key] = normalized
        filters.append(f"lower(COALESCE({self._metadata_text(metadata_key)}, '')) = :{param_key}")
    def _test_penalty_sql(
        self,
        enabled: bool,
@@ -94,4 +94,8 @@ class RagSessionRetriever:
        for key in keys:
            if key in filters:
                out[key] = filters[key]
        if "metadata.domain" in filters:
            out["metadata_domain"] = filters["metadata.domain"]
        if "metadata.subdomain" in filters:
            out["metadata_subdomain"] = filters["metadata.subdomain"]
        return out
@@ -6,7 +6,10 @@ Differences from `v3`:
 - each YAML case targets a single isolated component;
 - results are written next to the suite in `cases/.../test_runs/...`;
- the first supported component is `process_v2_intent_router`.
+- supported components are `process_v2_intent_router` and `process_v2_retrieval_policy_resolver`.
  Also available: `process_v2_router_plus_retrieval_policy` for the linked route -> plan chain,
  `process_v2_router_plus_retrieval_policy_rag` for the linked route -> plan -> rag chain,
  and `process_v2_full_chain` for the full route -> plan -> rag -> evidence -> workflow LLM chain.
 ## Run
@@ -23,3 +26,48 @@ PYTHONPATH=. python -m tests.pipeline_setup_v4.run \
  --cases-dir tests/pipeline_setup_v4/cases/suite_02/process_v2_intent_router/router_llm_first_v3.yaml \
  --run-name llm_first_v3
 ```
 Retrieval policy resolver suite:
 ```bash
 PYTHONPATH=. python -m tests.pipeline_setup_v4.run \
  --cases-dir tests/pipeline_setup_v4/cases/suite_03/process_v2_retrieval_policy_resolver/cases.yaml \
  --run-name retrieval_policy_v1
 ```
 Linked router + retrieval policy suite:
 ```bash
 PYTHONPATH=. python3 -m tests.pipeline_setup_v4.run \
  --cases-dir tests/pipeline_setup_v4/cases/suite_04/process_v2_router_plus_retrieval_policy \
  --run-name router_plus_policy_v1
 ```
 Inside `suite_04`, cases are split into:
 - `strict_regression_cases.yaml` for contract-level invariants
 - `soft_observational_cases.yaml` for LLM-sensitive boundary scenarios
 Quality-gate mini-pack:
 ```bash
 PYTHONPATH=. python3 -m tests.pipeline_setup_v4.run \
  --cases-dir tests/pipeline_setup_v4/cases/suite_05/process_v2_router_plus_retrieval_policy_quality_gate/cases.yaml \
  --run-name router_plus_policy_qg_v1
 ```
 Linked router + retrieval policy + rag suite:
 ```bash
 PYTHONPATH=src:. DATABASE_URL='postgresql+psycopg://agent:agent@127.0.0.1:5432/agent' python3 -m tests.pipeline_setup_v4.run \
  --cases-dir tests/pipeline_setup_v4/cases/suite_06/process_v2_router_plus_retrieval_policy_rag/cases.yaml \
  --run-name router_plus_policy_rag_v1
 ```
 Full process v2 chain with workflow LLM:
 ```bash
 PYTHONPATH=src:. DATABASE_URL='postgresql+psycopg://agent:agent@127.0.0.1:5432/agent' python3 -m tests.pipeline_setup_v4.run \
  --cases-dir tests/pipeline_setup_v4/cases/suite_07/process_v2_full_chain/cases.yaml \
  --run-name process_v2_full_chain_v1
 ```
@@ -0,0 +1,540 @@
 defaults:
  component: process_v2_retrieval_policy_resolver
 cases:
  - id: general-overview-grounded
    route:
      routing_domain: GENERAL
      intent: GENERAL_QA
      subintent: SUMMARY
      user_query: "Что это за сервис?"
      normalized_query: "что это за сервис"
      anchors:
        target_doc_hints: []
        endpoint_paths: []
    expected:
      plan:
        profile: general_qa_grounded_summary
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          prefer_path_prefixes: [docs/architecture/, docs/]
  - id: general-does-not-become-docs-summary
    route:
      routing_domain: GENERAL
      intent: GENERAL_QA
      subintent: SUMMARY
      user_query: "Дай общий обзор, включая /health"
      normalized_query: "дай общий обзор включая /health"
      anchors:
        endpoint_paths: ["/health"]
        target_doc_hints: ["docs/api/health-endpoint.md"]
        matched_aliases: ["api"]
    expected:
      plan:
        profile: general_qa_grounded_summary
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
  - id: find-files-with-target-hint
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "Покажи файл про health endpoint"
      normalized_query: "покажи файл про health endpoint"
      anchors:
        endpoint_paths: ["/health"]
        target_doc_hints: ["docs/api/health-endpoint.md"]
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          target_doc_hints: ["docs/api/health-endpoint.md"]
          path_prefixes: [docs/api/]
          prefer_like_patterns: ["%health-endpoint.md%"]
  - id: find-files-endpoint-only
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "Где описан /send?"
      normalized_query: "где описан /send"
      anchors:
        endpoint_paths: ["/send"]
        target_doc_hints: []
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          path_prefixes: [docs/api/, docs/]
          prefer_like_patterns: ["%/send%"]
  - id: find-files-entities-and-domain
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "В каком документе описан ManualSendWorker?"
      normalized_query: "в каком документе описан manualsendworker"
      anchors:
        entity_names: ["ManualSendWorker"]
        matched_aliases: ["manual send"]
        process_domain: "messaging"
        process_subdomain: "manual_send"
        target_doc_hints: []
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          metadata.domain: messaging
          metadata.subdomain: manual_send
          prefer_path_prefixes: [docs/domains/, docs/, docs/logic/]
          prefer_like_patterns: ["%manualsendworker%", "%manual send%", "%messaging%", "%manual_send%"]
  - id: docs-summary-api-endpoint-health
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Объясни /health"
      normalized_query: "объясни /health"
      target_terms: ["health", "/health"]
      anchors:
        endpoint_paths: ["/health"]
        target_doc_hints: ["docs/api/health-endpoint.md"]
    expected:
      plan:
        profile: docs_summary_api_endpoint
        layers: [D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]
        limit: 8
        filters:
          target_doc_hints: ["docs/api/health-endpoint.md"]
          path_prefixes: [docs/api/, docs/]
          prefer_path_prefixes: [docs/api/, docs/]
  - id: docs-summary-architecture
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Как устроена архитектура сервиса?"
      normalized_query: "как устроена архитектура сервиса"
      anchors:
        file_names: ["docs/architecture/runtime-manager.md"]
        target_doc_hints: ["docs/architecture/runtime-manager.md"]
        matched_aliases: ["architecture"]
    expected:
      plan:
        profile: docs_summary_architecture
        layers: [D1_DOCUMENT_CATALOG, D5_RELATION_GRAPH, D0_DOC_CHUNKS]
        limit: 8
        filters:
          target_doc_hints: ["docs/architecture/runtime-manager.md"]
          prefer_path_prefixes: [docs/architecture/, docs/]
  - id: docs-summary-logic-flow
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Опиши workflow отправки уведомлений"
      normalized_query: "опиши workflow отправки уведомлений"
      anchors:
        matched_aliases: ["workflow"]
        process_domain: "notifications"
        process_subdomain: "delivery_loop"
        target_doc_hints: []
    expected:
      plan:
        profile: docs_summary_logic_flow
        layers: [D4_WORKFLOW_INDEX, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          metadata.domain: notifications
          metadata.subdomain: delivery_loop
          prefer_path_prefixes: [docs/logic/, docs/architecture/, docs/]
  - id: docs-summary-domain-entity
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Что такое RuntimeManager?"
      normalized_query: "что такое runtimemanager"
      anchors:
        entity_names: ["RuntimeManager"]
        process_domain: "runtime"
    expected:
      plan:
        profile: docs_summary_domain_entity
        layers: [D3_ENTITY_CATALOG, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          metadata.domain: runtime
          prefer_path_prefixes: [docs/domains/, docs/, docs/api/]
          prefer_like_patterns: ["%runtimemanager%", "%runtime%"]
  - id: docs-summary-generic-weak-signals
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Дай краткое summary документации"
      normalized_query: "дай краткое summary документации"
      anchors:
        target_doc_hints: []
        endpoint_paths: []
        entity_names: []
        matched_aliases: []
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          prefer_path_prefixes: [docs/]
  - id: docs-summary-generic-conflicting-signals
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Как связан /health и RuntimeManager?"
      normalized_query: "как связан /health и runtimemanager"
      anchors:
        endpoint_paths: ["/health"]
        entity_names: ["RuntimeManager"]
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
  - id: find-files-stays-file-lookup-on-mixed-signals
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "Найди документ по architecture runtime manager"
      normalized_query: "найди документ по architecture runtime manager"
      anchors:
        entity_names: ["RuntimeManager"]
        matched_aliases: ["architecture"]
        file_names: ["docs/architecture/runtime-manager.md"]
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          path_prefixes: [docs/architecture/]
  - id: resolver-survives-partial-empty-anchors
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Что там по docs?"
      normalized_query: "что там по docs"
      anchors:
        entity_names: []
        file_names: [""]
        endpoint_paths: []
        target_doc_hints: []
        matched_aliases: []
        process_domain:
        process_subdomain:
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
  - id: find-files-file-name-priority
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "Покажи документ manual-send"
      normalized_query: "покажи документ manual-send"
      anchors:
        file_names: ["docs/workflows/manual-send.md"]
        matched_aliases: ["manual send"]
        target_doc_hints: []
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          path_prefixes: [docs/workflows/]
          prefer_like_patterns: ["%docs/workflows/manual-send.md%", "%manual send%"]
  - id: conflict-api-hint-vs-workflow-metadata
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Опиши flow для /health в notification loop"
      normalized_query: "опиши flow для /health в notification loop"
      anchors:
        endpoint_paths: ["/health"]
        target_doc_hints: ["docs/api/health-endpoint.md"]
        matched_aliases: ["workflow"]
        process_domain: "notifications"
        process_subdomain: "delivery_loop"
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          target_doc_hints: ["docs/api/health-endpoint.md"]
          metadata.domain: notifications
          metadata.subdomain: delivery_loop
          path_prefixes: [docs/api/, docs/]
  - id: conflict-file-name-vs-architecture-alias
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Объясни architecture для notification loop"
      normalized_query: "объясни architecture для notification loop"
      anchors:
        file_names: ["docs/logic/notification-loop.md"]
        matched_aliases: ["architecture"]
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          prefer_path_prefixes: [docs/architecture/, docs/, docs/logic/]
          prefer_like_patterns: ["%docs/logic/notification-loop.md%", "%architecture%"]
  - id: conflict-hint-vs-entity-soft-signals
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Что делает /send и ManualSendWorker?"
      normalized_query: "что делает /send и manualsendworker"
      anchors:
        endpoint_paths: ["/send"]
        target_doc_hints: ["docs/api/send-endpoint.md"]
        entity_names: ["ManualSendWorker"]
        matched_aliases: ["manual send"]
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          target_doc_hints: ["docs/api/send-endpoint.md"]
          path_prefixes: [docs/api/, docs/]
          prefer_like_patterns: ["%send-endpoint.md%", "%/send%", "%manualsendworker%", "%manual send%"]
  - id: metadata-only-find-files
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "Найди документы по notifications delivery loop"
      normalized_query: "найди документы по notifications delivery loop"
      anchors:
        process_domain: "notifications"
        process_subdomain: "delivery_loop"
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          path_prefixes: [docs/]
          metadata.domain: notifications
          metadata.subdomain: delivery_loop
          prefer_path_prefixes: [docs/, docs/domains/, docs/logic/]
  - id: metadata-only-generic-summary
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Дай summary по notifications delivery loop"
      normalized_query: "дай summary по notifications delivery loop"
      anchors:
        process_domain: "notifications"
        process_subdomain: "delivery_loop"
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          metadata.domain: notifications
          metadata.subdomain: delivery_loop
          prefer_path_prefixes: [docs/]
          prefer_like_patterns: ["%notifications%", "%delivery_loop%"]
  - id: metadata-domain-entity-with-alias
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Объясни компонент billing"
      normalized_query: "объясни компонент billing"
      anchors:
        matched_aliases: ["component"]
        process_domain: "billing"
    expected:
      plan:
        profile: docs_summary_domain_entity
        layers: [D3_ENTITY_CATALOG, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          metadata.domain: billing
          prefer_path_prefixes: [docs/domains/, docs/, docs/api/]
          prefer_like_patterns: ["%component%", "%billing%"]
  - id: alias-only-api
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Объясни api health"
      normalized_query: "объясни api health"
      anchors:
        matched_aliases: ["api endpoint"]
    expected:
      plan:
        profile: docs_summary_api_endpoint
        layers: [D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]
        limit: 8
        filters:
          path_prefixes: [docs/api/, docs/]
          prefer_like_patterns: ["%api endpoint%"]
  - id: alias-only-architecture
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Расскажи про architecture"
      normalized_query: "расскажи про architecture"
      anchors:
        matched_aliases: ["architecture"]
    expected:
      plan:
        profile: docs_summary_architecture
        layers: [D1_DOCUMENT_CATALOG, D5_RELATION_GRAPH, D0_DOC_CHUNKS]
        limit: 8
        filters:
          prefer_path_prefixes: [docs/architecture/, docs/]
          prefer_like_patterns: ["%architecture%"]
  - id: partial-only-endpoint-path
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Что делает /status?"
      normalized_query: "что делает /status"
      anchors:
        endpoint_paths: ["/status"]
    expected:
      plan:
        profile: docs_summary_api_endpoint
        layers: [D1_DOCUMENT_CATALOG, D2_FACT_INDEX, D0_DOC_CHUNKS]
        limit: 8
        filters:
          path_prefixes: [docs/api/, docs/]
          prefer_like_patterns: ["%/status%"]
  - id: partial-only-target-doc-hint
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Объясни notification loop"
      normalized_query: "объясни notification loop"
      anchors:
        target_doc_hints: ["docs/logic/notification-loop.md"]
    expected:
      plan:
        profile: docs_summary_logic_flow
        layers: [D4_WORKFLOW_INDEX, D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          target_doc_hints: ["docs/logic/notification-loop.md"]
          prefer_path_prefixes: [docs/logic/, docs/architecture/, docs/]
  - id: generic-neutral-with-nonsemantic-hint
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Дай общий summary intro docs"
      normalized_query: "дай общий summary intro docs"
      anchors:
        target_doc_hints: ["docs/intro/overview.md"]
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          target_doc_hints: ["docs/intro/overview.md"]
          prefer_path_prefixes: [docs/]
  - id: generic-neutral-weak-mixed-aliases
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: SUMMARY
      user_query: "Нужен общий summary про architecture component"
      normalized_query: "нужен общий summary про architecture component"
      anchors:
        matched_aliases: ["architecture", "component"]
    expected:
      plan:
        profile: docs_summary_generic
        layers: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          prefer_path_prefixes: [docs/architecture/, docs/, docs/domains/, docs/api/]
  - id: find-files-hard-priority-with-multiple-hints
    route:
      routing_domain: DOCS
      intent: DOC_EXPLAIN
      subintent: FIND_FILES
      user_query: "Найди документы по /health и runtime manager"
      normalized_query: "найди документы по /health и runtime manager"
      anchors:
        endpoint_paths: ["/health"]
        entity_names: ["RuntimeManager"]
        matched_aliases: ["architecture"]
        target_doc_hints:
          - "docs/api/health-endpoint.md"
          - "docs/architecture/runtime-manager.md"
    expected:
      plan:
        profile: file_lookup
        layers: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          target_doc_hints:
            - "docs/api/health-endpoint.md"
            - "docs/architecture/runtime-manager.md"
          path_prefixes: [docs/api/, docs/architecture/]
          prefer_like_patterns: ["%health-endpoint.md%", "%runtime-manager.md%"]
@@ -0,0 +1,199 @@
 defaults:
  component: process_v2_router_plus_retrieval_policy
 cases:
  - id: soft-architecture-summary
    query: "Как устроена архитектура приложения?"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-process-summary
    query: "Опиши процесс отправки уведомлений"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_logic_flow, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-domain-entity-summary
    query: "Что такое runtime health в документации?"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_domain_entity, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-runtime-health-document
    query: "Покажи документ про runtime health"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent_equals_any: [SUMMARY, FIND_FILES]
      retrieval_plan:
        profile_equals_any: [file_lookup, docs_summary_domain_entity, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-api-send-noisy
    query: "Нужен краткий док-саммари по api /send"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile: docs_summary_api_endpoint
  - id: soft-general-risks-architecture
    query: "Какие риски у такого подхода в архитектуре?"
    expected:
      route:
        routing_domain_equals_any: [GENERAL, DOCS]
        intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [general_qa_grounded_summary, docs_summary_architecture, docs_summary_generic]
  - id: soft-general-polling-webhook
    query: "Сравни polling и webhook в контексте сервиса"
    expected:
      route:
        routing_domain_equals_any: [GENERAL, DOCS]
        intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [general_qa_grounded_summary, docs_summary_generic]
  - id: soft-conflict-entity-plus-process
    query: "Объясни entity runtime health и runtime loop"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_domain_entity, docs_summary_generic]
        filters:
          prefer_path_prefixes_contains: [docs/domains/]
  - id: soft-alias-handle-health
    query: "Объясни ручку /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile: docs_summary_api_endpoint
  - id: soft-alias-show-doc-handle-health
    query: "Покажи документ по ручке /health"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent_equals_any: [FIND_FILES, SUMMARY]
      retrieval_plan:
        profile_equals_any: [file_lookup, docs_summary_api_endpoint, general_qa_grounded_summary]
  - id: soft-alias-schema-overview
    query: "Нужен обзор по архитектуре notify app"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-alias-find-schema-file
    query: "Найди файл со схемой сервиса уведомлений"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent_equals_any: [FIND_FILES, SUMMARY]
      retrieval_plan:
        profile_equals_any: [file_lookup, docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-process-domain-summary
    query: "Объясни overview по billing invoice flow"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          process_domain: present
          process_subdomain: present
      retrieval_plan:
        profile_equals_any: [docs_summary_logic_flow, docs_summary_generic, docs_summary_architecture]
  - id: soft-process-domain-find-files
    query: "Найди файл по billing invoice flow"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          process_domain: present
          process_subdomain: present
      retrieval_plan:
        profile: file_lookup
  - id: soft-noisy-arch-overview
    query: "arch overview по notify app"
    expected:
      route:
        routing_domain_equals_any: [DOCS, GENERAL]
        intent_equals_any: [DOC_EXPLAIN, GENERAL_QA]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
  - id: soft-noisy-file-send-endpoint
    query: "нужен файл где описан /send endpoint"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      retrieval_plan:
        profile: file_lookup
  - id: soft-bare-file-token-preferences
    query: "health-endpoint.md"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          file_names_contains: ["health-endpoint.md"]
      retrieval_plan:
        profile: file_lookup
  - id: soft-doc-path-preferences
    query: "docs/api/health-endpoint.md"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          file_names_contains: ["docs/api/health-endpoint.md"]
      retrieval_plan:
        profile: file_lookup
@@ -0,0 +1,206 @@
 defaults:
  component: process_v2_router_plus_retrieval_policy
 cases:
  - id: strict-general-overview
    query: "Общий обзор сервиса"
    expected:
      router:
        domain: GENERAL
        intent: GENERAL_QA
        sub_intent: SUMMARY
      route:
        anchors:
          endpoint_paths_not_contains: ["/health"]
          file_names_not_contains: ["/health"]
      retrieval_plan:
        profile: general_qa_grounded_summary
        layers_contains: [D1_DOCUMENT_CATALOG, D0_DOC_CHUNKS]
        limit: 8
        filters:
          prefer_path_prefixes_contains: [docs/architecture/, docs/]
          path_prefixes: absent
  - id: strict-api-summary-health
    query: "Объясни endpoint /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          endpoint_paths_contains: ["/health"]
          file_names_not_contains: ["/health"]
      retrieval_plan:
        profile: docs_summary_api_endpoint
        filters:
          path_prefixes_contains: [docs/api/]
          prefer_path_prefixes_contains: [docs/api/]
  - id: strict-find-files-health-described
    query: "Где описан endpoint /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          endpoint_paths_contains: ["/health"]
          file_names_not_contains: ["/health"]
      retrieval_plan:
        profile: file_lookup
        layers_contains: [D1_DOCUMENT_CATALOG, D3_ENTITY_CATALOG]
        limit: 12
        filters:
          path_prefixes_contains: [docs/api/]
          prefer_path_prefixes_contains: [docs/api/]
  - id: strict-find-files-health-show-file
    query: "Покажи файл с описанием /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          endpoint_paths_contains: ["/health"]
      retrieval_plan:
        profile: file_lookup
        filters:
          path_prefixes_contains: [docs/api/]
  - id: strict-runtime-health-find-files
    query: "Где описан runtime health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      retrieval_plan:
        profile: file_lookup
        filters:
          path_prefixes_contains_any: [docs/domains/, docs/]
  - id: strict-noisy-runtime-health-find-files
    query: "runtime health где описано в docs"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      retrieval_plan:
        profile: file_lookup
  - id: strict-doc-path-is-file-lookup
    query: "docs/api/health-endpoint.md"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          file_names_contains: ["docs/api/health-endpoint.md"]
          endpoint_paths_not_contains: ["/api/health-endpoint.md"]
      retrieval_plan:
        profile: file_lookup
        filters:
          path_prefixes_contains: [docs/api/]
  - id: strict-file-token-is-file-lookup
    query: "health-endpoint.md"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          file_names_contains: ["health-endpoint.md"]
          endpoint_paths_not_contains: ["health-endpoint.md"]
      retrieval_plan:
        profile: file_lookup
  - id: strict-noisy-english-show-doc
    query: "pls show doc for /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          endpoint_paths_contains: ["/health"]
          file_names_not_contains: ["/health"]
        target_terms_not_contains: [pls, show, doc, for]
      retrieval_plan:
        profile: file_lookup
        filters:
          path_prefixes_contains: [docs/api/]
  - id: strict-bare-endpoint-anchor-invariant
    query: "/health"
    expected:
      route:
        routing_domain_equals_any: [GENERAL, DOCS]
        intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
        subintent: SUMMARY
        anchors:
          endpoint_paths_contains: ["/health"]
          file_names_not_contains: ["/health"]
  - id: strict-find-files-dominates-health-question
    query: "В каком файле описан `/health`?"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      retrieval_plan:
        profile: file_lookup
  - id: strict-runtime-health-summary-not-file-lookup
    query: "Что делает runtime health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile_equals_any: [docs_summary_domain_entity, docs_summary_generic]
  - id: strict-general-purpose
    query: "Зачем нужен этот сервис?"
    expected:
      route:
        routing_domain_equals_any: [GENERAL, DOCS]
        intent_equals_any: [GENERAL_QA, DOC_EXPLAIN]
        subintent: SUMMARY
      retrieval_plan:
        profile_equals_any: [general_qa_grounded_summary, docs_summary_generic]
  - id: strict-conflict-summary-goes-generic
    query: "Как устроена архитектура endpoint /send"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile: docs_summary_generic
        filters:
          path_prefixes_contains: [docs/api/]
          prefer_path_prefixes_contains: [docs/api/, docs/architecture/]
  - id: strict-find-files-dominates-mixed-signals
    query: "В каком файле описан architecture flow отправки уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      retrieval_plan:
        profile: file_lookup
@@ -0,0 +1,115 @@
 defaults:
  component: process_v2_router_plus_retrieval_policy
 cases:
  - id: qg-t01-docs-overview-architecture
    query: "Объясни overview архитектуры сервиса уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile_one_of: [docs_summary_architecture, docs_summary_generic]
        filters:
          prefer_path_prefixes_contains: [docs/architecture/]
  - id: qg-t02-docs-overview-flow
    query: "Дай overview по flow отправки уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile_one_of: [docs_summary_logic_flow, docs_summary_generic]
        filters:
          prefer_path_prefixes_contains: [docs/logic/]
  - id: qg-t03-soft-arch-overview-notify
    query: "Arch overview по notify app"
    expected:
      route:
        routing_domain_one_of: [DOCS, GENERAL]
        intent_one_of: [DOC_EXPLAIN, GENERAL_QA]
        subintent: SUMMARY
      retrieval_plan:
        profile_one_of: [docs_summary_architecture, docs_summary_generic, general_qa_grounded_summary]
  - id: qg-t04-process-summary-filters
    query: "Объясни billing invoice process"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          process_domain: present
          process_subdomain: present
      retrieval_plan:
        if_anchor_present_then_filter_present:
          - anchor: anchors.process_domain
            filter: filters.metadata.domain
          - anchor: anchors.process_subdomain
            filter: filters.metadata.subdomain
        profile_one_of: [docs_summary_logic_flow, docs_summary_generic]
  - id: qg-t05-process-find-files-filters
    query: "Найди файл по billing invoice process"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          process_domain: present
          process_subdomain: present
      retrieval_plan:
        profile: file_lookup
        if_anchor_present_then_filter_present:
          - anchor: anchors.process_domain
            filter: filters.metadata.domain
          - anchor: anchors.process_subdomain
            filter: filters.metadata.subdomain
        filters:
          prefer_path_prefixes_contains_any: [docs/domains/, docs/logic/]
  - id: qg-t06-soft-process-shaped-input
    query: "billing invoice docs"
    expected:
      route:
        routing_domain: DOCS
        intent: DOC_EXPLAIN
        subintent_one_of: [FIND_FILES, SUMMARY]
      retrieval_plan:
        profile_one_of: [file_lookup, docs_summary_logic_flow, docs_summary_generic]
  - id: qg-t07-clean-target-terms-architecture
    query: "Объясни architecture overview сервиса уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        target_terms_not_contains: ["объясни", "overview", "architecture"]
      retrieval_plan:
        profile_one_of: [docs_summary_architecture, docs_summary_generic]
  - id: qg-t08-clean-target-terms-file-query
    query: "Найди doc for /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        target_terms_contains: ["/health"]
        target_terms_not_contains: ["найди", "doc", "for"]
        anchors:
          endpoint_paths_contains: ["/health"]
          file_names_not_contains: ["/health"]
      retrieval_plan:
        profile: file_lookup
@@ -0,0 +1,193 @@
 defaults:
  component: process_v2_router_plus_retrieval_policy_rag
  rag_session_id: "694cd10b-3842-4579-8d53-e54ec4291eae"
 cases:
  - id: rag-t01-architecture-summary
    query: "Объясни overview архитектуры сервиса уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/architecture/telegram-notify-app-overview.md"
      retrieval_plan:
        profile: docs_summary_architecture
        filters:
          prefer_path_prefixes_contains:
            - "docs/architecture/"
      rag:
        paths_contains:
          - "docs/architecture/telegram-notify-app-overview.md"
        layers_contains:
          - "D5_RELATION_GRAPH"
          - "D1_DOCUMENT_CATALOG"
  - id: rag-t02-docs-index-find-files
    query: "Найди файл-индекс документации проекта"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/README.md"
      retrieval_plan:
        profile: file_lookup
        filters:
          path_prefixes_contains:
            - "docs/"
      rag:
        paths_contains:
          - "docs/README.md"
        layers_contains:
          - "D1_DOCUMENT_CATALOG"
  - id: rag-t03-general-docs-overview
    query: "Что входит в документацию этого проекта?"
    expected:
      router:
        domain: GENERAL
        intent: GENERAL_QA
        sub_intent: SUMMARY
      retrieval_plan:
        profile: general_qa_grounded_summary
      rag:
        paths_contains:
          - "docs/README.md"
          - "docs/architecture/telegram-notify-app-overview.md"
        layers_contains:
          - "D1_DOCUMENT_CATALOG"
          - "D0_DOC_CHUNKS"
  - id: rag-t04-errors-catalog-find-files
    query: "В каком файле лежит каталог ошибок?"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/errors/catalog.yaml"
      retrieval_plan:
        profile: file_lookup
        filters:
          path_prefixes_contains:
            - "docs/errors/"
      rag:
        paths_contains:
          - "docs/errors/catalog.yaml"
        layers_contains:
          - "D1_DOCUMENT_CATALOG"
  - id: rag-t05-errors-catalog-general
    query: "Объясни каталог ошибок"
    expected:
      router:
        domain: GENERAL
        intent: GENERAL_QA
        sub_intent: SUMMARY
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/errors/catalog.yaml"
      retrieval_plan:
        profile: general_qa_grounded_summary
      rag:
        paths_contains:
          - "docs/errors/catalog.yaml"
  - id: rag-t06-health-summary-chain
    query: "Объясни endpoint /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          endpoint_paths_contains:
            - "/health"
          file_names_not_contains:
            - "/health"
      retrieval_plan:
        profile: docs_summary_api_endpoint
        filters:
          prefer_path_prefixes_contains:
            - "docs/api/"
      rag:
        paths_contains_any:
          - "docs/README.md"
          - "docs/architecture/telegram-notify-app-overview.md"
        layers_contains:
          - "D2_FACT_INDEX"
  - id: rag-t07-health-find-files-empty
    query: "Где описан endpoint /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          endpoint_paths_contains:
            - "/health"
          target_doc_hints_contains:
            - "docs/api/health-endpoint.md"
      retrieval_plan:
        profile: file_lookup
      rag:
        row_count: 0
        paths: absent
        layers: absent
  - id: rag-t08-notifications-workflow-metadata
    query: "Объясни notifications workflow"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          process_domain: notifications
      retrieval_plan:
        profile: docs_summary_logic_flow
        filters:
          metadata.domain: notifications
          prefer_path_prefixes_contains:
            - "docs/logic/"
      rag:
        paths_contains:
          - "docs/architecture/telegram-notify-app-overview.md"
        metadata_domains_contains:
          - "notifications"
  - id: rag-t09-mixed-summary-generic
    query: "Как архитектурно устроен endpoint /send"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          endpoint_paths_contains:
            - "/send"
      retrieval_plan:
        profile: docs_summary_generic
        filters:
          prefer_path_prefixes_contains:
            - "docs/api/"
            - "docs/architecture/"
      rag:
        paths_contains:
          - "docs/architecture/telegram-notify-app-overview.md"
@@ -0,0 +1,180 @@
 defaults:
  component: process_v2_full_chain
  rag_session_id: "694cd10b-3842-4579-8d53-e54ec4291eae"
 cases:
  - id: full-t01-general-docs-overview
    query: "Что входит в документацию этого проекта?"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile: docs_summary_generic
      rag:
        row_count: 0
      pipeline:
        answer_mode: insufficient_evidence
      llm:
        non_empty: true
        contains_all:
          - "не найден"
          - "документ"
  - id: full-t02-architecture-summary
    query: "Объясни overview архитектуры сервиса уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/architecture/telegram-notify-app-overview.md"
      retrieval_plan:
        profile: docs_summary_architecture
      rag:
        paths_contains:
          - "docs/architecture/telegram-notify-app-overview.md"
      pipeline:
        answer_mode: grounded_summary
      llm:
        non_empty: true
        contains_any:
          - ["RuntimeManager", "TelegramControlChannel"]
          - ["worker", "Telegram"]
  - id: full-t03-runtime-health-summary
    query: "Что такое runtime health в этой документации?"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile: docs_summary_domain_entity
      rag:
        row_count: 0
      pipeline:
        answer_mode: insufficient_evidence
      llm:
        non_empty: true
        contains_all:
          - "не найден"
          - "документ"
  - id: full-t04-logic-flow-summary
    query: "Кратко опиши цикл отправки уведомлений"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      retrieval_plan:
        profile: docs_summary_logic_flow
      rag:
        row_count: 0
      pipeline:
        answer_mode: insufficient_evidence
      llm:
        non_empty: true
        contains_all:
          - "не найден"
          - "документ"
  - id: full-t05-errors-catalog-find-files
    query: "В каком файле лежит каталог ошибок?"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/errors/catalog.yaml"
      retrieval_plan:
        profile: file_lookup
      rag:
        paths_contains:
          - "docs/errors/catalog.yaml"
      pipeline:
        answer_mode: deterministic
      llm:
        non_empty: true
        contains_all:
          - "docs/errors/catalog.yaml"
  - id: full-t06-docs-index-find-files
    query: "Найди файл-индекс документации проекта"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: FIND_FILES
      route:
        anchors:
          target_doc_hints_contains:
            - "docs/README.md"
      retrieval_plan:
        profile: file_lookup
      rag:
        paths_contains:
          - "docs/README.md"
      pipeline:
        answer_mode: deterministic
      llm:
        non_empty: true
        contains_all:
          - "docs/README.md"
  - id: full-t07-mixed-generic-summary
    query: "Как архитектурно устроен endpoint /send"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          endpoint_paths_contains:
            - "/send"
      retrieval_plan:
        profile: docs_summary_generic
      rag:
        paths_contains:
          - "docs/architecture/telegram-notify-app-overview.md"
      pipeline:
        answer_mode: grounded_summary
      llm:
        non_empty: true
        contains_any:
          - ["Telegram", "/send"]
          - ["архитект", "endpoint"]
  - id: full-t08-health-boundary
    query: "Объясни endpoint /health"
    expected:
      router:
        domain: DOCS
        intent: DOC_EXPLAIN
        sub_intent: SUMMARY
      route:
        anchors:
          endpoint_paths_contains:
            - "/health"
          file_names_not_contains:
            - "/health"
      retrieval_plan:
        profile: docs_summary_api_endpoint
      rag:
        row_count: 0
      pipeline:
        answer_mode: insufficient_evidence
      llm:
        non_empty: true
        contains_all:
          - "не найден"
          - "документ"
@@ -64,8 +64,8 @@ class ArtifactWriter:
            f"- source_file: {result.case.source_file.as_posix()}",
            f"- passed: {result.passed}",
            "",
-            "## Query",
+            "## Input",
-            result.case.query,
+            result.case.display_input,
            "",
            "## Actual",
            "```json",
@@ -96,7 +96,7 @@ class SummaryComposer:
        ]
        for item in results:
            lines.append(
-                f"| {item.case.case_id} | {item.case.component} | {self._cell(item.case.query)} | "
+                f"| {item.case.case_id} | {item.case.component} | {self._cell(item.case.display_input)} | "
                f"{item.actual.get('intent') or '—'} | {item.actual.get('sub_intent') or '—'} | "
                f"{'✓' if item.passed else '✗'} |"
            )
@@ -4,7 +4,7 @@ from pathlib import Path
 import yaml
-from tests.pipeline_setup_v4.core.models import CaseExpectations, RouterExpectation, V4Case
+from tests.pipeline_setup_v4.core.models import CaseExpectations, RetrievalPlanExpectation, RouterExpectation, V4Case
 class CaseDirectoryLoader:
@@ -35,13 +35,28 @@ class CaseDirectoryLoader:
        case_id = str(raw.get("id") or "").strip()
        component = str(raw.get("component") or defaults.get("component") or "").strip()
        query = str(raw.get("query") or "").strip()
-        if not case_id or not component or not query:
+        rag_session_id = str(raw.get("rag_session_id") or defaults.get("rag_session_id") or "").strip() or None
-            raise ValueError(f"Invalid case in {path}: `id`, `component`, `query` are required")
+        route = dict(raw.get("route") or {})
        if not route and isinstance(defaults.get("route"), dict):
            route = dict(defaults.get("route") or {})
        if not case_id or not component:
            raise ValueError(f"Invalid case in {path}: `id` and `component` are required")
        if component in {
            "process_v2_intent_router",
            "process_v2_router_plus_retrieval_policy",
            "process_v2_router_plus_retrieval_policy_rag",
            "process_v2_full_chain",
        } and not query:
            raise ValueError(f"Invalid case in {path}: `query` is required for {component}")
        if component == "process_v2_retrieval_policy_resolver" and not route:
            raise ValueError(f"Invalid case in {path}: `route` is required for {component}")
        expected = dict(raw.get("expected") or {})
        return V4Case(
            case_id=case_id,
            component=component,  # type: ignore[arg-type]
            query=query,
            rag_session_id=rag_session_id,
            route=route,
            source_file=path,
            expectations=self._to_expectations(expected),
            notes=str(raw.get("notes") or ""),
@@ -50,10 +65,38 @@ class CaseDirectoryLoader:
    def _to_expectations(self, raw: dict) -> CaseExpectations:
        router = dict(raw.get("router") or {})
        route = dict(raw.get("route") or {})
        retrieval_plan = dict(raw.get("retrieval_plan") or raw.get("plan") or {})
        rag = dict(raw.get("rag") or {})
        pipeline = dict(raw.get("pipeline") or {})
        llm = dict(raw.get("llm") or {})
        return CaseExpectations(
            router=RouterExpectation(
                domain=str(router.get("domain") or "").strip() or None,
                intent=str(router.get("intent") or "").strip() or None,
                sub_intent=str(router.get("sub_intent") or "").strip() or None,
            ),
            retrieval_plan=RetrievalPlanExpectation(
                profile=str(retrieval_plan.get("profile") or "").strip() or None,
                layers=tuple(str(item).strip() for item in retrieval_plan.get("layers") or [] if str(item).strip()),
                limit=int(retrieval_plan["limit"]) if retrieval_plan.get("limit") is not None else None,
                filters=self._plain_mapping(dict(retrieval_plan.get("filters") or {})),
            ),
            route_assertions=route,
            retrieval_plan_assertions=retrieval_plan,
            rag_assertions=rag,
            pipeline_assertions=pipeline,
            llm_assertions=llm,
        )
-        )
+
    def _plain_mapping(self, raw: dict[str, object]) -> dict[str, object]:
        plain: dict[str, object] = {}
        for key, value in raw.items():
            if self._is_assertion_key(key) or value in {"present", "absent"}:
                continue
            plain[key] = value
        return plain
    def _is_assertion_key(self, key: str) -> bool:
        suffixes = ("_not_contains", "_contains_any", "_contains", "_equals_any", "_one_of")
        return any(key.endswith(suffix) for suffix in suffixes)
@@ -5,7 +5,13 @@ from pathlib import Path
 from typing import Literal
-ComponentKind = Literal["process_v2_intent_router"]
+ComponentKind = Literal[
    "process_v2_intent_router",
    "process_v2_retrieval_policy_resolver",
    "process_v2_router_plus_retrieval_policy",
    "process_v2_router_plus_retrieval_policy_rag",
    "process_v2_full_chain",
 ]
@dataclass(slots=True, frozen=True)
@@ -15,21 +21,41 @@ class RouterExpectation:
    sub_intent: str | None = None
@dataclass(slots=True, frozen=True)
 class RetrievalPlanExpectation:
    profile: str | None = None
    layers: tuple[str, ...] = ()
    limit: int | None = None
    filters: dict[str, object] = field(default_factory=dict)
@dataclass(slots=True, frozen=True)
 class CaseExpectations:
    router: RouterExpectation = RouterExpectation()
    retrieval_plan: RetrievalPlanExpectation = field(default_factory=RetrievalPlanExpectation)
    route_assertions: dict[str, object] = field(default_factory=dict)
    retrieval_plan_assertions: dict[str, object] = field(default_factory=dict)
    rag_assertions: dict[str, object] = field(default_factory=dict)
    pipeline_assertions: dict[str, object] = field(default_factory=dict)
    llm_assertions: dict[str, object] = field(default_factory=dict)
@dataclass(slots=True, frozen=True)
 class V4Case:
    case_id: str
    component: ComponentKind
    query: str
    source_file: Path
-    expectations: CaseExpectations = CaseExpectations()
+    query: str = ""
    rag_session_id: str | None = None
    route: dict[str, object] = field(default_factory=dict)
    expectations: CaseExpectations = field(default_factory=CaseExpectations)
    notes: str = ""
    tags: tuple[str, ...] = ()
    @property
    def display_input(self) -> str:
        return self.query or self.route.get("user_query") or self.case_id
@dataclass(slots=True, frozen=True)
 class ExecutionPayload:
@@ -1,17 +1,249 @@
 from __future__ import annotations
 from collections.abc import Mapping, Sequence
 from tests.pipeline_setup_v4.core.models import V4Case
 class CaseValidator:
    def validate(self, case: V4Case, actual: dict) -> list[str]:
        if case.component == "process_v2_intent_router":
            return self._validate_router(case, actual)
        if case.component == "process_v2_retrieval_policy_resolver":
            return self._validate_retrieval_plan(case, actual)
        if case.component == "process_v2_router_plus_retrieval_policy":
            return self._validate_router(case, actual) + self._validate_retrieval_plan(case, actual)
        if case.component == "process_v2_router_plus_retrieval_policy_rag":
            return self._validate_router(case, actual) + self._validate_retrieval_plan(case, actual) + self._validate_rag(case, actual)
        if case.component == "process_v2_full_chain":
            return (
                self._validate_router(case, actual)
                + self._validate_retrieval_plan(case, actual)
                + self._validate_rag(case, actual)
                + self._validate_pipeline(case, actual)
                + self._validate_llm(case, actual)
            )
        return [f"unsupported component for validation: {case.component}"]
    def _validate_router(self, case: V4Case, actual: dict) -> list[str]:
        mismatches: list[str] = []
        expected = case.expectations.router
-        self._check(expected.domain, actual.get("domain"), "domain", mismatches)
+        self._check_scalar(expected.domain, actual.get("domain"), "domain", mismatches)
-        self._check(expected.intent, actual.get("intent"), "intent", mismatches)
+        self._check_scalar(expected.intent, actual.get("intent"), "intent", mismatches)
-        self._check(expected.sub_intent, actual.get("sub_intent"), "sub_intent", mismatches)
+        self._check_scalar(expected.sub_intent, actual.get("sub_intent"), "sub_intent", mismatches)
        route_actual = actual.get("route")
        if isinstance(route_actual, Mapping):
            self._check_assertions(case.expectations.route_assertions, route_actual, "route", mismatches)
        return mismatches
-    def _check(self, expected: str | None, actual: object, label: str, mismatches: list[str]) -> None:
+    def _validate_retrieval_plan(self, case: V4Case, actual: dict) -> list[str]:
        mismatches: list[str] = []
        expected = case.expectations.retrieval_plan
        self._check_scalar(expected.profile, actual.get("profile"), "profile", mismatches)
        if expected.layers:
            self._check_scalar(list(expected.layers), actual.get("layers"), "layers", mismatches)
        self._check_scalar(expected.limit, actual.get("limit"), "limit", mismatches)
        self._check_subset(expected.filters, actual.get("filters"), "filters", mismatches)
        plan_actual = actual.get("retrieval_plan")
        if isinstance(plan_actual, Mapping):
            self._check_assertions(case.expectations.retrieval_plan_assertions, plan_actual, "retrieval_plan", mismatches)
            self._check_conditional_filter_assertions(case.expectations.retrieval_plan_assertions, actual, mismatches)
        return mismatches
    def _validate_rag(self, case: V4Case, actual: dict) -> list[str]:
        mismatches: list[str] = []
        rag_actual = actual.get("rag")
        if isinstance(rag_actual, Mapping):
            self._check_assertions(case.expectations.rag_assertions, rag_actual, "rag", mismatches)
        elif case.expectations.rag_assertions:
            mismatches.append("rag: expected mapping, got missing")
        return mismatches
    def _validate_pipeline(self, case: V4Case, actual: dict) -> list[str]:
        mismatches: list[str] = []
        pipeline_actual = actual.get("pipeline")
        if isinstance(pipeline_actual, Mapping):
            self._check_assertions(case.expectations.pipeline_assertions, pipeline_actual, "pipeline", mismatches)
        elif case.expectations.pipeline_assertions:
            mismatches.append("pipeline: expected mapping, got missing")
        return mismatches
    def _validate_llm(self, case: V4Case, actual: dict) -> list[str]:
        mismatches: list[str] = []
        expected = case.expectations.llm_assertions
        if not expected:
            return mismatches
        llm_actual = actual.get("llm")
        if not isinstance(llm_actual, Mapping):
            mismatches.append("llm: expected mapping, got missing")
            return mismatches
        answer = str(llm_actual.get("answer") or "")
        lowered = answer.lower()
        if "non_empty" in expected:
            want_non_empty = bool(expected.get("non_empty"))
            if want_non_empty and not answer.strip():
                mismatches.append("llm.non_empty: expected non-empty answer")
            if not want_non_empty and answer.strip():
                mismatches.append("llm.non_empty: expected empty answer")
        if "contains_all" in expected:
            missing = [token for token in self._string_list(expected.get("contains_all")) if token.lower() not in lowered]
            if missing:
                mismatches.append(f"llm.contains_all: missing {missing}")
        if "contains_any" in expected and not self._matches_contains_any(lowered, expected.get("contains_any")):
            mismatches.append(f"llm.contains_any: no expected variant matched answer '{answer[:200]}'")
        for key, value in expected.items():
            if key in {"non_empty", "contains_all", "contains_any"}:
                continue
            if key not in llm_actual:
                mismatches.append(f"llm.{key}: missing")
                continue
            self._check_assertions(value, llm_actual.get(key), f"llm.{key}", mismatches)
        return mismatches
    def _check_scalar(self, expected: object, actual: object, label: str, mismatches: list[str]) -> None:
        if expected is not None and expected != actual:
            mismatches.append(f"{label}: expected {expected}, got {actual}")
    def _check_subset(self, expected: object, actual: object, label: str, mismatches: list[str]) -> None:
        if expected in (None, {}, []):
            return
        if isinstance(expected, Mapping):
            if not isinstance(actual, Mapping):
                mismatches.append(f"{label}: expected dict subset, got {actual}")
                return
            for key, value in expected.items():
                next_label = f"{label}.{key}"
                if key not in actual:
                    mismatches.append(f"{next_label}: missing")
                    continue
                self._check_subset(value, actual.get(key), next_label, mismatches)
            return
        if expected != actual:
            mismatches.append(f"{label}: expected {expected}, got {actual}")
    def _check_assertions(self, expected: object, actual: object, label: str, mismatches: list[str]) -> None:
        if expected in (None, {}, []):
            return
        if not isinstance(expected, Mapping):
            self._check_scalar(expected, actual, label, mismatches)
            return
        if not isinstance(actual, Mapping):
            mismatches.append(f"{label}: expected mapping, got {actual}")
            return
        for key, value in expected.items():
            if key == "if_anchor_present_then_filter_present":
                continue
            if key.endswith("_not_contains"):
                self._assert_not_contains(actual.get(key.removesuffix("_not_contains")), value, f"{label}.{key}", mismatches)
                continue
            if key.endswith("_contains"):
                self._assert_contains(actual.get(key.removesuffix("_contains")), value, f"{label}.{key}", mismatches)
                continue
            if key.endswith("_contains_any"):
                self._assert_contains_any(actual.get(key.removesuffix("_contains_any")), value, f"{label}.{key}", mismatches)
                continue
            if key.endswith("_equals_any"):
                self._assert_equals_any(actual.get(key.removesuffix("_equals_any")), value, f"{label}.{key}", mismatches)
                continue
            if key.endswith("_one_of"):
                self._assert_equals_any(actual.get(key.removesuffix("_one_of")), value, f"{label}.{key}", mismatches)
                continue
            if value == "present":
                self._assert_present(actual.get(key), f"{label}.{key}", mismatches)
                continue
            if value == "absent":
                self._assert_absent(actual, key, f"{label}.{key}", mismatches)
                continue
            if key not in actual:
                mismatches.append(f"{label}.{key}: missing")
                continue
            self._check_assertions(value, actual.get(key), f"{label}.{key}", mismatches)
    def _assert_contains(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
        actual_list = self._as_list(actual)
        expected_list = self._as_list(expected)
        missing = [item for item in expected_list if item not in actual_list]
        if missing:
            mismatches.append(f"{label}: missing {missing}, got {actual_list}")
    def _assert_not_contains(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
        actual_list = self._as_list(actual)
        expected_list = self._as_list(expected)
        present = [item for item in expected_list if item in actual_list]
        if present:
            mismatches.append(f"{label}: unexpected {present}, got {actual_list}")
    def _assert_contains_any(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
        actual_list = self._as_list(actual)
        expected_list = self._as_list(expected)
        if not any(item in actual_list for item in expected_list):
            mismatches.append(f"{label}: expected any of {expected_list}, got {actual_list}")
    def _assert_equals_any(self, actual: object, expected: object, label: str, mismatches: list[str]) -> None:
        expected_list = self._as_list(expected)
        if actual not in expected_list:
            mismatches.append(f"{label}: expected any of {expected_list}, got {actual}")
    def _assert_present(self, actual: object, label: str, mismatches: list[str]) -> None:
        if actual is None or actual == "" or actual == [] or actual == {}:
            mismatches.append(f"{label}: expected present, got {actual}")
    def _assert_absent(self, actual: Mapping, key: str, label: str, mismatches: list[str]) -> None:
        if key in actual and actual.get(key) not in (None, "", [], {}):
            mismatches.append(f"{label}: expected absent, got {actual.get(key)}")
    def _check_conditional_filter_assertions(self, expected: object, actual: Mapping, mismatches: list[str]) -> None:
        if not isinstance(expected, Mapping):
            return
        rules = expected.get("if_anchor_present_then_filter_present")
        if not isinstance(rules, Sequence) or isinstance(rules, (str, bytes, bytearray)):
            return
        for idx, rule in enumerate(rules):
            if not isinstance(rule, Mapping):
                continue
            anchor_path = str(rule.get("anchor") or "").strip()
            filter_path = str(rule.get("filter") or "").strip()
            if not anchor_path or not filter_path:
                continue
            anchor_value = self._resolve_path(actual.get("route"), anchor_path)
            if anchor_value in (None, "", [], {}):
                continue
            filter_value = self._resolve_path(actual.get("retrieval_plan"), filter_path)
            if filter_value in (None, "", [], {}):
                mismatches.append(
                    f"conditional[{idx}]: expected {filter_path} present because {anchor_path} is present"
                )
    def _resolve_path(self, value: object, path: str) -> object:
        current = value
        parts = [item for item in path.split(".") if item]
        for idx, part in enumerate(parts):
            if not isinstance(current, Mapping):
                return None
            remainder = ".".join(parts[idx:])
            if remainder in current:
                return current.get(remainder)
            if part not in current:
                return None
            current = current.get(part)
        return current
    def _as_list(self, value: object) -> list[object]:
        if value is None:
            return []
        if isinstance(value, Sequence) and not isinstance(value, (str, bytes, bytearray)):
            return list(value)
        return [value]
    def _string_list(self, value: object) -> list[str]:
        return [str(item) for item in self._as_list(value) if str(item).strip()]
    def _matches_contains_any(self, lowered_answer: str, expected: object) -> bool:
        variants = self._as_list(expected)
        for variant in variants:
            tokens = self._string_list(variant)
            if not tokens:
                continue
            if all(token.lower() in lowered_answer for token in tokens):
                return True
        return False
@@ -0,0 +1,121 @@
 """Run full `process v2` flow in the v4 harness.
 This module adapts the existing v3 `V2ProcessAdapter` so pipeline_setup_v4 can
 execute the real route -> retrieval -> evidence -> workflow LLM chain without
 duplicating runtime logic.
 """
 from __future__ import annotations
 from tests.pipeline_setup_v3.core.models import CaseExpectations, CaseInput, V3Case
 from tests.pipeline_setup_v3.runtime.v2_process_adapter import V2ProcessAdapter
 from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
 class ProcessV2FullChainExecutor:
    def __init__(self) -> None:
        self._adapter = V2ProcessAdapter(workflow_llm_enabled=True)
    def execute(self, case: V4Case) -> ExecutionPayload:
        if not case.rag_session_id:
            raise ValueError(f"Case '{case.case_id}' requires rag_session_id")
        payload = self._adapter.execute(self._build_case(case), case.rag_session_id)
        route = dict(payload.details.get("router_result") or {})
        retrieval_plan = dict(payload.details.get("retrieval_plan") or {})
        rows = list(payload.details.get("rows") or [])
        rag_summary = _summarize_rows(rows)
        pipeline_steps = list(payload.details.get("pipeline_steps") or [])
        pipeline_summary = {
            "answer_mode": str(payload.actual.get("answer_mode") or ""),
            "workflow_llm_enabled": True,
            "step_count": len(pipeline_steps),
            "steps": [str(step.get("step") or "") for step in pipeline_steps if str(step.get("step") or "").strip()],
        }
        answer = str(payload.details.get("answer") or payload.actual.get("llm_answer") or "")
        actual = {
            "domain": payload.actual.get("domain"),
            "intent": payload.actual.get("intent"),
            "sub_intent": payload.actual.get("sub_intent"),
            "profile": retrieval_plan.get("profile"),
            "layers": list(retrieval_plan.get("layers") or []),
            "limit": retrieval_plan.get("limit"),
            "filters": dict(retrieval_plan.get("filters") or {}),
            "answer_mode": payload.actual.get("answer_mode"),
            "route": {
                "routing_domain": route.get("routing_domain"),
                "intent": route.get("intent"),
                "subintent": route.get("subintent"),
                "target_terms": list(route.get("target_terms") or []),
                "anchors": dict(route.get("anchors") or {}),
            },
            "retrieval_plan": {
                "profile": retrieval_plan.get("profile"),
                "layers": list(retrieval_plan.get("layers") or []),
                "limit": retrieval_plan.get("limit"),
                "filters": dict(retrieval_plan.get("filters") or {}),
            },
            "rag": rag_summary,
            "pipeline": pipeline_summary,
            "llm": {
                "answer": answer,
                "non_empty": bool(answer.strip()),
                "length": len(answer),
            },
        }
        details = {
            "query": case.query,
            "rag_session_id": case.rag_session_id,
            "route": route,
            "retrieval_plan": actual["retrieval_plan"],
            "rag": {
                **rag_summary,
                "rows": rows[:20],
            },
            "pipeline": pipeline_summary,
            "answer": answer,
            "pipeline_steps": pipeline_steps,
            "logs": list(payload.details.get("logs") or []),
            "evidence": dict(payload.details.get("evidence") or {}),
        }
        return ExecutionPayload(actual=actual, details=details)
    def _build_case(self, case: V4Case) -> V3Case:
        return V3Case(
            case_id=case.case_id,
            runner="process_v2",
            mode="full_chain",
            query=case.query,
            source_file=case.source_file,
            input=CaseInput(rag_session_id=case.rag_session_id),
            expectations=CaseExpectations(),
            notes=case.notes,
            tags=case.tags,
        )
 def _summarize_rows(rows: list[dict]) -> dict[str, object]:
    paths: list[str] = []
    layers: list[str] = []
    metadata_domains: list[str] = []
    metadata_subdomains: list[str] = []
    for row in rows:
        path = str(row.get("path") or "").strip()
        layer = str(row.get("layer") or "").strip()
        metadata = dict(row.get("metadata") or {})
        domain = str(metadata.get("domain") or "").strip()
        subdomain = str(metadata.get("subdomain") or "").strip()
        if path and path not in paths:
            paths.append(path)
        if layer and layer not in layers:
            layers.append(layer)
        if domain and domain not in metadata_domains:
            metadata_domains.append(domain)
        if subdomain and subdomain not in metadata_subdomains:
            metadata_subdomains.append(subdomain)
    return {
        "row_count": len(rows),
        "paths": paths,
        "layers": layers,
        "metadata_domains": metadata_domains,
        "metadata_subdomains": metadata_subdomains,
    }
@@ -0,0 +1,51 @@
 from __future__ import annotations
 from dataclasses import asdict
 from app.core.agent.processes.v2.models import V2RouteAnchors, V2RouteResult
 from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
 from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
 class ProcessV2RetrievalPolicyExecutor:
    def __init__(self) -> None:
        self._resolver = V2RetrievalPolicyResolver()
    def execute(self, case: V4Case) -> ExecutionPayload:
        route = self._build_route(case.route)
        plan = self._resolver.resolve(route)
        actual = {
            "profile": plan.profile,
            "layers": list(plan.layers),
            "limit": plan.limit,
            "filters": dict(plan.filters),
        }
        details = {
            "route": asdict(route),
            "plan": actual,
        }
        return ExecutionPayload(actual=actual, details=details)
    def _build_route(self, raw: dict[str, object]) -> V2RouteResult:
        anchors_raw = dict(raw.get("anchors") or {})
        return V2RouteResult(
            routing_domain=str(raw.get("routing_domain") or ""),
            intent=str(raw.get("intent") or ""),
            subintent=str(raw.get("subintent") or ""),
            user_query=str(raw.get("user_query") or raw.get("normalized_query") or raw.get("name") or "resolver case"),
            normalized_query=str(raw.get("normalized_query") or raw.get("user_query") or "resolver case"),
            target_terms=[str(item) for item in raw.get("target_terms") or [] if str(item).strip()],
            anchors=V2RouteAnchors(
                entity_names=[str(item) for item in anchors_raw.get("entity_names") or [] if str(item).strip()],
                file_names=[str(item) for item in anchors_raw.get("file_names") or [] if str(item).strip()],
                endpoint_paths=[str(item) for item in anchors_raw.get("endpoint_paths") or [] if str(item).strip()],
                target_doc_hints=[str(item) for item in anchors_raw.get("target_doc_hints") or [] if str(item).strip()],
                matched_aliases=[str(item) for item in anchors_raw.get("matched_aliases") or [] if str(item).strip()],
                process_domain=str(anchors_raw.get("process_domain") or "").strip() or None,
                process_subdomain=str(anchors_raw.get("process_subdomain") or "").strip() or None,
            ),
            confidence=float(raw.get("confidence") or 1.0),
            routing_mode=str(raw.get("routing_mode") or "test_fixture"),
            llm_router_used=bool(raw.get("llm_router_used") or False),
            reason_short=str(raw.get("reason_short") or "fixture route"),
        )
@@ -22,13 +22,23 @@ class _KeywordLlm:
        "где находится",
        "найди файл",
        "найди файлы",
        "show doc",
        "show file",
        "doc for",
        "file with",
    )
    _DOC_MARKERS = (
        "документац",
        "endpoint",
        "эндпоинт",
        "архитект",
        "architecture",
        "overview архитектуры",
        "arch overview",
        "процесс",
        "process",
        "flow",
        "workflow",
        "сущност",
        "worker",
        "цикл отправки уведомлений",
@@ -43,6 +53,10 @@ class _KeywordLlm:
        "/health",
        "/send",
        "/actions/{action}",
        "billing invoice process",
        "billing invoice flow",
        "billing invoice docs",
        "notify app",
    )
    _GENERAL_MARKERS = (
        "что это за сервис",
@@ -67,7 +81,7 @@ class _KeywordLlm:
        return json.dumps(route, ensure_ascii=False)
    def _select(self, query: str) -> dict[str, object]:
-        if any(marker in query for marker in self._FILE_MARKERS) or ("дока" in query and "покажи" in query):
+        if any(marker in query for marker in self._FILE_MARKERS) or ("дока" in query and "покажи" in query) or ".md" in query:
            return self._route("DOCS", "DOC_EXPLAIN", "FIND_FILES", "file lookup")
        if any(marker in query for marker in self._GENERAL_MARKERS):
            return self._route("GENERAL", "GENERAL_QA", "SUMMARY", "general overview")
@@ -0,0 +1,79 @@
 from __future__ import annotations
 from dataclasses import asdict
 from app.core.agent.processes.v2 import V2IntentRouter
 from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
 from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
 from tests.pipeline_setup_v4.executors.process_v2_router_executor import _KeywordLlm
 class ProcessV2RouterPlusPolicyExecutor:
    def __init__(self) -> None:
        self._router = V2IntentRouter(llm=_KeywordLlm(), enable_llm_disambiguation=True)
        self._resolver = V2RetrievalPolicyResolver()
    def execute(self, case: V4Case) -> ExecutionPayload:
        route = self._router.route(case.query)
        plan = self._resolver.resolve(route)
        route_dump = asdict(route)
        actual = {
            "domain": route.routing_domain,
            "intent": route.intent,
            "sub_intent": route.subintent,
            "routing_mode": route.routing_mode,
            "llm_router_used": route.llm_router_used,
            "confidence": route.confidence,
            "profile": plan.profile,
            "layers": list(plan.layers),
            "limit": plan.limit,
            "filters": dict(plan.filters),
            "route": {
                "routing_domain": route.routing_domain,
                "intent": route.intent,
                "subintent": route.subintent,
                "target_terms": list(route.target_terms),
                "anchors": route_dump.get("anchors") or {},
            },
            "retrieval_plan": {
                "profile": plan.profile,
                "layers": list(plan.layers),
                "limit": plan.limit,
                "filters": dict(plan.filters),
            },
        }
        details = {
            "query": case.query,
            "route": route_dump,
            "plan": {
                "profile": plan.profile,
                "layers": list(plan.layers),
                "limit": plan.limit,
                "filters": dict(plan.filters),
            },
            "pipeline_steps": [
                {
                    "step": "intent_router",
                    "input": {"query": case.query},
                    "output": {
                        "domain": route.routing_domain,
                        "intent": route.intent,
                        "sub_intent": route.subintent,
                        "reason_short": route.reason_short,
                        "target_terms": list(route.target_terms),
                        "anchors": route_dump.get("anchors") or {},
                    },
                },
                {
                    "step": "retrieval_policy_resolver",
                    "input": {"route": route_dump},
                    "output": {
                        "profile": plan.profile,
                        "layers": list(plan.layers),
                        "limit": plan.limit,
                        "filters": dict(plan.filters),
                    },
                },
            ],
        }
        return ExecutionPayload(actual=actual, details=details)
@@ -0,0 +1,94 @@
 from __future__ import annotations
 import asyncio
 from dataclasses import asdict
 from app.core.agent.processes.v2 import V2IntentRouter
 from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
 from app.core.agent.processes.v2.retrieval.v2_rag_adapter import V2RagRetrievalAdapter
 from app.core.rag.persistence.repository import RagRepository
 from app.core.rag.retrieval.session_retriever import RagSessionRetriever
 from tests.pipeline_setup_v3.shared.rag_indexer import DeterministicEmbedder
 from tests.pipeline_setup_v4.core.models import ExecutionPayload, V4Case
 from tests.pipeline_setup_v4.executors.process_v2_router_executor import _KeywordLlm
 class ProcessV2RouterPlusPolicyRagExecutor:
    def __init__(self) -> None:
        self._router = V2IntentRouter(llm=_KeywordLlm(), enable_llm_disambiguation=True)
        self._resolver = V2RetrievalPolicyResolver()
        self._adapter = V2RagRetrievalAdapter(RagSessionRetriever(RagRepository(), DeterministicEmbedder()))
    def execute(self, case: V4Case) -> ExecutionPayload:
        if not case.rag_session_id:
            raise ValueError(f"Case '{case.case_id}' requires rag_session_id")
        return asyncio.run(self._execute_async(case))
    async def _execute_async(self, case: V4Case) -> ExecutionPayload:
        route = self._router.route(case.query)
        plan = self._resolver.resolve(route)
        rows = await self._adapter.fetch_rows(case.rag_session_id or "", route.normalized_query, plan)
        route_dump = asdict(route)
        rag_summary = _summarize_rows(rows)
        actual = {
            "domain": route.routing_domain,
            "intent": route.intent,
            "sub_intent": route.subintent,
            "profile": plan.profile,
            "layers": list(plan.layers),
            "limit": plan.limit,
            "filters": dict(plan.filters),
            "route": {
                "routing_domain": route.routing_domain,
                "intent": route.intent,
                "subintent": route.subintent,
                "target_terms": list(route.target_terms),
                "anchors": route_dump.get("anchors") or {},
            },
            "retrieval_plan": {
                "profile": plan.profile,
                "layers": list(plan.layers),
                "limit": plan.limit,
                "filters": dict(plan.filters),
            },
            "rag": rag_summary,
        }
        details = {
            "query": case.query,
            "rag_session_id": case.rag_session_id,
            "route": route_dump,
            "plan": actual["retrieval_plan"],
            "rag": {
                **rag_summary,
                "rows": rows[:20],
            },
        }
        return ExecutionPayload(actual=actual, details=details)
 def _summarize_rows(rows: list[dict]) -> dict[str, object]:
    paths: list[str] = []
    layers: list[str] = []
    metadata_domains: list[str] = []
    metadata_subdomains: list[str] = []
    for row in rows:
        path = str(row.get("path") or "").strip()
        layer = str(row.get("layer") or "").strip()
        metadata = dict(row.get("metadata") or {})
        domain = str(metadata.get("domain") or "").strip()
        subdomain = str(metadata.get("subdomain") or "").strip()
        if path and path not in paths:
            paths.append(path)
        if layer and layer not in layers:
            layers.append(layer)
        if domain and domain not in metadata_domains:
            metadata_domains.append(domain)
        if subdomain and subdomain not in metadata_subdomains:
            metadata_subdomains.append(subdomain)
    return {
        "row_count": len(rows),
        "paths": paths,
        "layers": layers,
        "metadata_domains": metadata_domains,
        "metadata_subdomains": metadata_subdomains,
    }
@@ -1,18 +1,56 @@
 from __future__ import annotations
 from tests.pipeline_setup_v4.executors.process_v2_full_chain_executor import ProcessV2FullChainExecutor
 from tests.pipeline_setup_v4.executors.process_v2_retrieval_policy_executor import ProcessV2RetrievalPolicyExecutor
 from tests.pipeline_setup_v4.executors.process_v2_router_plus_policy_executor import ProcessV2RouterPlusPolicyExecutor
 from tests.pipeline_setup_v4.executors.process_v2_router_plus_policy_rag_executor import (
    ProcessV2RouterPlusPolicyRagExecutor,
 )
 from tests.pipeline_setup_v4.executors.process_v2_router_executor import ProcessV2IntentRouterExecutor
 class ExecutorRegistry:
    def __init__(self) -> None:
        self._router_executor: ProcessV2IntentRouterExecutor | None = None
        self._policy_executor: ProcessV2RetrievalPolicyExecutor | None = None
        self._router_plus_policy_executor: ProcessV2RouterPlusPolicyExecutor | None = None
        self._router_plus_policy_rag_executor: ProcessV2RouterPlusPolicyRagExecutor | None = None
        self._full_chain_executor: ProcessV2FullChainExecutor | None = None
    def execute(self, component: str, case) -> object:
        if component == "process_v2_intent_router":
            return self._router().execute(case)
        if component == "process_v2_retrieval_policy_resolver":
            return self._policy().execute(case)
        if component == "process_v2_router_plus_retrieval_policy":
            return self._router_plus_policy().execute(case)
        if component == "process_v2_router_plus_retrieval_policy_rag":
            return self._router_plus_policy_rag().execute(case)
        if component == "process_v2_full_chain":
            return self._full_chain().execute(case)
        raise ValueError(f"Unsupported component: {component}")
    def _router(self) -> ProcessV2IntentRouterExecutor:
        if self._router_executor is None:
            self._router_executor = ProcessV2IntentRouterExecutor()
        return self._router_executor
    def _policy(self) -> ProcessV2RetrievalPolicyExecutor:
        if self._policy_executor is None:
            self._policy_executor = ProcessV2RetrievalPolicyExecutor()
        return self._policy_executor
    def _router_plus_policy(self) -> ProcessV2RouterPlusPolicyExecutor:
        if self._router_plus_policy_executor is None:
            self._router_plus_policy_executor = ProcessV2RouterPlusPolicyExecutor()
        return self._router_plus_policy_executor
    def _router_plus_policy_rag(self) -> ProcessV2RouterPlusPolicyRagExecutor:
        if self._router_plus_policy_rag_executor is None:
            self._router_plus_policy_rag_executor = ProcessV2RouterPlusPolicyRagExecutor()
        return self._router_plus_policy_rag_executor
    def _full_chain(self) -> ProcessV2FullChainExecutor:
        if self._full_chain_executor is None:
            self._full_chain_executor = ProcessV2FullChainExecutor()
        return self._full_chain_executor
@@ -78,3 +78,32 @@ def test_find_files_prefers_exact_path_match() -> None:
    assert files[0].path == "docs/domains/runtime-health-entity.md"
    assert files[0].match_reason in {"exact_path", "alias_match"}
 def test_summary_ranking_penalizes_overview_doc_when_specific_api_doc_exists() -> None:
    rows = [
        {
            "path": "docs/overview/health-overview.md",
            "title": "Health overview",
            "content": "",
            "layer": "D1_DOCUMENT_CATALOG",
            "metadata": {"summary_text": "Navigation page with related docs.", "document_id": "docs.health_overview"},
        },
        {
            "path": "docs/api/health-endpoint.md",
            "title": "Health endpoint",
            "content": "",
            "layer": "D1_DOCUMENT_CATALOG",
            "metadata": {"summary_text": "GET /health returns runtime status.", "document_id": "api.health"},
        },
    ]
    route = _route(
        hints=["health", "/health", "health endpoint"],
        terms=["health"],
    )
    docs = DocsEvidenceAssembler().assemble_summaries(rows, route)
    assert docs[0].path == "docs/api/health-endpoint.md"
    assert docs[0].score_breakdown["specificity_boost"] > docs[1].score_breakdown["specificity_boost"]
    assert docs[1].score_breakdown["generic_penalty"] < 0
@@ -96,3 +96,38 @@ def test_router_reduces_confidence_for_short_vague_query() -> None:
    result = V2IntentRouter(llm=FakeLlm(_llm_response("GENERAL", "GENERAL_QA", "SUMMARY", confidence=0.8))).route("Что это?")
    assert result.confidence < 0.8
 def test_router_routes_doc_path_to_find_files() -> None:
    result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("docs/api/health-endpoint.md")
    assert result.subintent == "FIND_FILES"
    assert result.anchors.file_names == ["docs/api/health-endpoint.md"]
    assert result.anchors.endpoint_paths == []
 def test_router_routes_file_token_to_find_files() -> None:
    result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("health-endpoint.md")
    assert result.subintent == "FIND_FILES"
    assert result.anchors.file_names == ["health-endpoint.md"]
    assert result.anchors.endpoint_paths == []
 def test_router_promotes_api_method_query_to_endpoint_specific_docs_summary() -> None:
    result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("Как работает метод health?")
    assert result.intent == "DOC_EXPLAIN"
    assert result.subintent == "SUMMARY"
    assert result.anchors.endpoint_paths == ["/health"]
    assert "docs/api/health-endpoint.md" in result.anchors.target_doc_hints
 def test_router_keeps_short_api_like_token_as_strong_hint_without_explicit_path() -> None:
    result = V2IntentRouter(llm=FakeLlm(_llm_response("DOCS", "DOC_EXPLAIN", "SUMMARY"))).route("Что делает health?")
    assert result.intent == "DOC_EXPLAIN"
    assert result.subintent == "SUMMARY"
    assert result.anchors.endpoint_paths == []
    assert "health endpoint" in result.anchors.target_doc_hints
    assert "health" in result.target_terms
@@ -51,6 +51,7 @@ def test_file_names_accepts_real_doc_path() -> None:
    anchors = V2AnchorExtractor().extract("docs/api/health.md", terms).anchors
    assert anchors.file_names == ["docs/api/health.md"]
    assert anchors.endpoint_paths == []
 def test_file_names_rejects_endpoint_path() -> None:
@@ -60,8 +61,63 @@ def test_file_names_rejects_endpoint_path() -> None:
    assert anchors.file_names == []
 def test_target_terms_drop_noisy_english_file_words() -> None:
    analysis = V2TargetTermsExtractor().extract("pls show doc for /health")
    assert analysis.target_terms == ["/health"]
 def test_doc_path_does_not_become_endpoint_path() -> None:
    analysis = V2TargetTermsExtractor().extract("docs/api/health-endpoint.md")
    assert analysis.endpoint_paths == []
 def test_target_terms_drop_architecture_marker_words() -> None:
    analysis = V2TargetTermsExtractor().extract("Объясни architecture overview сервиса уведомлений")
    assert "объясни" not in analysis.target_terms
    assert "architecture" not in analysis.target_terms
    assert "overview" not in analysis.target_terms
 def test_anchor_extractor_extracts_process_domain_and_subdomain() -> None:
    terms = V2TargetTermsExtractor().extract("Объясни billing invoice process")
    anchors = V2AnchorExtractor().extract("Объясни billing invoice process", terms).anchors
    assert anchors.process_domain == "billing"
    assert anchors.process_subdomain == "invoice"
 def test_file_names_rejects_identifier_like_token() -> None:
    terms = V2TargetTermsExtractor().extract("telegram_notify")
    anchors = V2AnchorExtractor().extract("telegram_notify", terms).anchors
    assert anchors.file_names == []
 def test_target_terms_extracts_api_like_anchor_from_method_query() -> None:
    analysis = V2TargetTermsExtractor().extract("Как работает метод health?")
    assert analysis.target_terms == ["/health", "health"]
    assert analysis.endpoint_paths == ["/health"]
    assert analysis.api_like_terms == ["health"]
 def test_anchor_extractor_builds_endpoint_hints_for_short_api_like_query() -> None:
    terms = V2TargetTermsExtractor().extract("Что делает health?")
    anchors = V2AnchorExtractor().extract("Что делает health?", terms).anchors
    assert anchors.endpoint_paths == []
    assert "health" in anchors.target_doc_hints
    assert "/health" in anchors.target_doc_hints
    assert "health endpoint" in anchors.target_doc_hints
 def test_anchor_extractor_keeps_templated_endpoint_for_docs_query() -> None:
    terms = V2TargetTermsExtractor().extract("Расскажи про endpoint /users/{id}")
    anchors = V2AnchorExtractor().extract("Расскажи про endpoint /users/{id}", terms).anchors
    assert anchors.endpoint_paths == ["/users/{id}"]
    assert "/users/{id}" in anchors.target_doc_hints
    assert "users endpoint" in anchors.target_doc_hints
@@ -284,3 +284,42 @@ def test_v2_process_can_disable_workflow_llm_for_general_summary() -> None:
    assert "агрегированный статус runtime" in result.answer
    assert llm.calls == []
 def test_v2_process_prefers_canonical_health_doc_over_readme_for_method_query() -> None:
    llm = FakeLlm("Health explanation.")
    adapter = FakeRagAdapter(
        summary_rows=[
            {
                "path": "docs/README.md",
                "title": "README",
                "content": "",
                "layer": "D1_DOCUMENT_CATALOG",
                "metadata": {"summary_text": "General documentation index.", "document_id": "docs.readme"},
            },
            {
                "path": "docs/api/health-endpoint.md",
                "title": "Health endpoint",
                "content": "",
                "layer": "D1_DOCUMENT_CATALOG",
                "metadata": {
                    "summary_text": "GET /health returns aggregated runtime status.",
                    "document_id": "api.health",
                },
            },
        ],
        file_rows=[],
    )
    process = _v2_process(llm, adapter)
    runtime = _context("Как работает метод health?")
    result = asyncio.run(process.run(runtime))
    assert result.answer == "Health explanation."
    assert llm.calls
    assert "docs/api/health-endpoint.md" in llm.calls[0][1]
    assert "docs/README.md" not in llm.calls[0][1]
    pipeline_events = [payload for _, title, payload in runtime.trace.events if title == "retrieval_profile_selected"]
    assert pipeline_events[0]["profile"] == "docs_api_method_explain"
    evidence_events = [payload for _, title, payload in runtime.trace.events if title == "evidence_assembled"]
    assert any(event.get("primary_doc") == "docs/api/health-endpoint.md" for event in evidence_events if isinstance(event, dict))
@@ -0,0 +1,81 @@
 from __future__ import annotations
 import asyncio
 from app.core.agent.processes.v2.retrieval.v2_rag_adapter import V2RagRetrievalAdapter
 from app.core.rag.retrieval.session_retriever import RetrievalPlan
 class FakeRetriever:
    def __init__(self) -> None:
        self.calls: list[tuple[str, object]] = []
    async def retrieve(self, _rag_session_id: str, _query_text: str, _plan: RetrievalPlan) -> list[dict]:
        self.calls.append(("semantic", None))
        return [
            {
                "path": "docs/api/health-endpoint.md",
                "layer": "D1_DOCUMENT_CATALOG",
                "metadata": {},
            },
            {
                "path": "docs/api/secondary.md",
                "layer": "D0_DOC_CHUNKS",
                "metadata": {},
            },
        ]
    async def retrieve_exact_files(self, _rag_session_id: str, *, paths: list[str], layers=None, limit: int = 200) -> list[dict]:
        del layers, limit
        self.calls.append(("exact", list(paths)))
        if "docs/api/health-endpoint.md" in paths:
            return [
                {
                    "path": "docs/api/health-endpoint.md",
                    "layer": "D1_DOCUMENT_CATALOG",
                    "metadata": {},
                }
            ]
        return []
    async def retrieve_chunks_by_path_substrings(
        self,
        _rag_session_id: str,
        *,
        path_needles: list[str],
        layers=None,
        limit: int = 200,
    ) -> list[dict]:
        del layers, limit
        self.calls.append(("substring", list(path_needles)))
        return []
 def test_v2_rag_adapter_seeds_exact_rows_from_plan_hints() -> None:
    adapter = V2RagRetrievalAdapter(FakeRetriever())
    plan = RetrievalPlan(
        profile="docs_summary_api_endpoint",
        layers=["D1_DOCUMENT_CATALOG", "D2_FACT_INDEX", "D0_DOC_CHUNKS"],
        limit=8,
        filters={"target_doc_hints": ["docs/api/health-endpoint.md"]},
    )
    rows = asyncio.run(adapter.fetch_rows("rag-1", "explain /health", plan))
    assert rows[0]["path"] == "docs/api/health-endpoint.md"
    assert len(rows) == 2
 def test_v2_rag_adapter_uses_substring_fallback_for_missing_hint() -> None:
    retriever = FakeRetriever()
    adapter = V2RagRetrievalAdapter(retriever)
    plan = RetrievalPlan(
        profile="file_lookup",
        layers=["D1_DOCUMENT_CATALOG", "D3_ENTITY_CATALOG"],
        limit=12,
        filters={"target_doc_hints": ["docs/api/missing-health-endpoint.md"]},
    )
    asyncio.run(adapter.fetch_rows("rag-1", "find file", plan))
    assert ("substring", ["missing-health-endpoint.md"]) in retriever.calls
@@ -4,46 +4,132 @@ from app.core.agent.processes.v2.models import V2Domain, V2Intent, V2RouteAnchor
 from app.core.agent.processes.v2.retrieval.policy_resolver import V2RetrievalPolicyResolver
-def _route(*, hints: list[str], endpoint_paths: list[str] | None = None, subintent: str = "SUMMARY", intent: str = "DOC_EXPLAIN") -> V2RouteResult:
+def _route(
    *,
    intent: str = V2Intent.DOC_EXPLAIN,
    subintent: str = V2Subintent.SUMMARY,
    entity_names: list[str] | None = None,
    file_names: list[str] | None = None,
    endpoint_paths: list[str] | None = None,
    target_doc_hints: list[str] | None = None,
    matched_aliases: list[str] | None = None,
    process_domain: str | None = None,
    process_subdomain: str | None = None,
 ) -> V2RouteResult:
    return V2RouteResult(
        routing_domain=V2Domain.DOCS if intent == V2Intent.DOC_EXPLAIN else V2Domain.GENERAL,
        intent=intent,
        subintent=subintent,
        user_query="q",
        normalized_query="q",
-        anchors=V2RouteAnchors(target_doc_hints=hints, endpoint_paths=endpoint_paths or []),
+        anchors=V2RouteAnchors(
            entity_names=entity_names or [],
            file_names=file_names or [],
            endpoint_paths=endpoint_paths or [],
            target_doc_hints=target_doc_hints or [],
            matched_aliases=matched_aliases or [],
            process_domain=process_domain,
            process_subdomain=process_subdomain,
        ),
    )
-def test_policy_prefers_api_docs_for_endpoint_queries() -> None:
+def test_policy_maps_api_summary_to_fact_layers() -> None:
    plan = V2RetrievalPolicyResolver().resolve(
-        _route(hints=["docs/api/health-endpoint.md"], endpoint_paths=["/health"])
+        _route(
            endpoint_paths=["/health"],
            target_doc_hints=["docs/api/health-endpoint.md"],
        )
    )
-    assert plan.profile == "docs_summary_api_endpoint"
+    assert plan.profile == "docs_api_method_explain"
-    assert plan.filters["path_prefixes"] == ["docs/api/", "docs/architecture/", "docs/"]
+    assert plan.layers == ["D1_DOCUMENT_CATALOG", "D2_FACT_INDEX", "D0_DOC_CHUNKS"]
-    assert plan.filters["prefer_path_prefixes"][0] == "docs/api/"
+    assert plan.filters["path_prefixes"] == [
        "docs/api/",
        "docs/endpoints/",
        "docs/methods/",
        "api/",
        "endpoints/",
        "methods/",
    ]
    assert plan.filters["target_doc_hints"] == ["docs/api/health-endpoint.md"]
-def test_policy_prefers_logic_docs_for_logic_queries() -> None:
+def test_policy_maps_logic_summary_to_workflow_layers_and_metadata_filters() -> None:
-    plan = V2RetrievalPolicyResolver().resolve(_route(hints=["docs/logic/telegram-notification-loop.md"]))
+    plan = V2RetrievalPolicyResolver().resolve(
        _route(
            matched_aliases=["logic flow"],
            process_domain="notifications",
            process_subdomain="delivery_loop",
        )
    )
    assert plan.profile == "docs_summary_logic_flow"
    assert plan.layers == ["D4_WORKFLOW_INDEX", "D1_DOCUMENT_CATALOG", "D0_DOC_CHUNKS"]
    assert plan.filters["metadata.domain"] == "notifications"
    assert plan.filters["metadata.subdomain"] == "delivery_loop"
    assert plan.filters["prefer_path_prefixes"][0] == "docs/logic/"
-def test_policy_uses_deterministic_find_files_profile() -> None:
+def test_policy_maps_entity_summary_to_entity_layers() -> None:
    plan = V2RetrievalPolicyResolver().resolve(_route(entity_names=["RuntimeManager"]))
    assert plan.profile == "docs_summary_domain_entity"
    assert plan.layers == ["D3_ENTITY_CATALOG", "D1_DOCUMENT_CATALOG", "D0_DOC_CHUNKS"]
    assert "%runtimemanager%" in plan.filters["prefer_like_patterns"]
 def test_policy_keeps_api_method_profile_even_with_additional_entity_signal() -> None:
    plan = V2RetrievalPolicyResolver().resolve(
-        _route(hints=["docs/api/health-endpoint.md"], endpoint_paths=["/health"], subintent=V2Subintent.FIND_FILES)
+        _route(
            endpoint_paths=["/health"],
            entity_names=["RuntimeManager"],
        )
    )
    assert plan.profile == "docs_api_method_explain"
    assert plan.layers == ["D1_DOCUMENT_CATALOG", "D2_FACT_INDEX", "D0_DOC_CHUNKS"]
 def test_policy_uses_api_method_profile_for_endpoint_like_hints_without_explicit_path() -> None:
    plan = V2RetrievalPolicyResolver().resolve(
        _route(
            target_doc_hints=["health", "/health", "health endpoint"],
        )
    )
    assert plan.profile == "docs_api_method_explain"
    assert "%health%" in plan.filters["prefer_like_patterns"]
 def test_policy_uses_hard_and_soft_filters_for_find_files() -> None:
    plan = V2RetrievalPolicyResolver().resolve(
        _route(
            subintent=V2Subintent.FIND_FILES,
            file_names=["docs/workflows/manual-send.md"],
            entity_names=["ManualSendWorker"],
            matched_aliases=["manual send"],
            process_domain="messaging",
            process_subdomain="manual_send",
        )
    )
    assert plan.profile == "file_lookup"
    assert plan.layers == ["D1_DOCUMENT_CATALOG", "D3_ENTITY_CATALOG"]
-    assert "health-endpoint.md" in plan.filters["prefer_like_patterns"][0]
+    assert plan.filters["path_prefixes"] == ["docs/workflows/"]
    assert plan.filters["metadata.domain"] == "messaging"
    assert "%manualsendworker%" in plan.filters["prefer_like_patterns"]
-def test_policy_uses_grounded_general_profile() -> None:
+def test_policy_keeps_general_routes_in_general_profile() -> None:
-    plan = V2RetrievalPolicyResolver().resolve(_route(hints=[], intent=V2Intent.GENERAL_QA))
+    plan = V2RetrievalPolicyResolver().resolve(
        _route(
            intent=V2Intent.GENERAL_QA,
            endpoint_paths=["/health"],
            target_doc_hints=["docs/api/health-endpoint.md"],
        )
    )
    assert plan.profile == "general_qa_grounded_summary"
-    assert plan.filters["prefer_path_prefixes"][0] == "docs/architecture/"
+    assert plan.layers == ["D1_DOCUMENT_CATALOG", "D0_DOC_CHUNKS"]
    assert "path_prefixes" not in plan.filters
@@ -1,4 +1,8 @@
 import logging
 from app.core.rag.contracts.enums import RagLayer
 from app.core.rag.indexing.docs.chunkers.markdown_chunker import SectionChunk
 from app.core.rag.indexing.docs.integration_extractor import DocsIntegrationExtractor
 from app.core.rag.indexing.docs.pipeline import DocsIndexingPipeline
@@ -153,3 +157,150 @@ Create invoice
    assert integration_doc.metadata["target"] == "db.billing.invoices"
    assert integration_doc.metadata["target_type"] == "db"
    assert integration_doc.metadata["details"]["transaction"] == "required"
 def test_docs_integration_extractor_keeps_valid_blocks() -> None:
    extractor = DocsIntegrationExtractor()
    sections = [
        SectionChunk(
            section_path="Details > Интеграции > Billing DB",
            section_title="Billing DB",
            content=(
                "- target: db.billing.invoices\n"
                "- target_type: db\n"
                "- direction: outbound\n"
                "- interaction: writes\n"
                "- via: invoice repository\n"
                "- purpose: persist created invoices\n"
                "- details:\n"
                "  - transaction: required\n"
                "  - tables:\n"
                "      - invoices\n"
                "      - invoice_items\n"
            ),
            order=0,
        )
    ]
    records = extractor.extract(sections, path="docs/billing/create_invoice.md")
    assert len(records) == 1
    assert records[0].target == "db.billing.invoices"
    assert records[0].details["transaction"] == "required"
    assert records[0].details["tables"] == ["invoices", "invoice_items"]
 def test_docs_integration_extractor_soft_fails_on_markdown_like_yaml(caplog) -> None:
    extractor = DocsIntegrationExtractor()
    sections = [
        SectionChunk(
            section_path="Details > Интеграции > Runtime health provider",
            section_title="Runtime health provider",
            content=(
                "- target: runtime.health_provider\n"
                "- target_type: service\n"
                "- direction: outbound\n"
                "- interaction: depends_on\n"
                "- via: async callback `health_provider()`\n"
                "- purpose: получить агрегированный health runtime\n"
                "- details:\n"
                "  - timeout_ms: 5000\n"
                "  - response_type: `HealthPayload`\n"
            ),
            order=0,
        )
    ]
    with caplog.at_level(logging.WARNING):
        records = extractor.extract(sections, path="docs/api/health-endpoint.md")
    assert len(records) == 1
    assert records[0].target == "runtime.health_provider"
    assert records[0].via == "async callback `health_provider()`"
    assert records[0].details == {}
    assert "docs integration parse warning" in caplog.text
    assert "docs/api/health-endpoint.md" in caplog.text
 def test_docs_pipeline_keeps_other_layers_when_integration_block_is_invalid(caplog) -> None:
    pipeline = DocsIndexingPipeline()
    content = """---
 id: api.runtime.health
 type: api_method
 doc_type: api_method
 name: runtime_health
 title: Runtime Health API
 module: runtime
 domain: platform
 sub_domain: observability
 layer: application
 status: active
 related_docs: []
 links:
  uses_logic:
    - logic.runtime.health
 ---
 # Runtime Health API
 ## Summary
 Returns current runtime health.
 ## Details
 ### Описание
 Возвращает агрегированное состояние runtime.
 ### Сценарий
 **Название:**
 Read health
 **Предусловия:**
 - runtime is running
 **Триггер:**
 - client calls health endpoint
 **Основной сценарий:**
 1. Read current state.
 2. Return payload.
 ### Входные параметры
 | field | type | required |
 | --- | --- | --- |
 | verbose | boolean | no |
 ### Интеграции
 #### Runtime health provider
 - target: runtime.health_provider
 - target_type: service
 - direction: outbound
 - interaction: depends_on
 - via: async callback `health_provider()`
 - purpose: получить агрегированный health runtime
 - details:
  - timeout_ms: 5000
  - response_type: `HealthPayload`
 """
    with caplog.at_level(logging.WARNING):
        docs = pipeline.index_file(
            repo_id="acme/proj",
            commit_sha="abc123",
            path="docs/api/health-endpoint.md",
            content=content,
        )
    layers = {doc.layer for doc in docs}
    assert RagLayer.DOCS_DOCUMENT_CATALOG in layers
    assert RagLayer.DOCS_DOC_CHUNKS in layers
    assert RagLayer.DOCS_FACT_INDEX in layers
    assert RagLayer.DOCS_WORKFLOW_INDEX in layers
    assert RagLayer.DOCS_RELATION_GRAPH in layers
    assert RagLayer.DOCS_INTEGRATION_INDEX in layers
    assert "docs integration parse warning" in caplog.text
    assert all(doc.source.path == "docs/api/health-endpoint.md" for doc in docs)
@@ -45,6 +45,23 @@ def test_retrieve_builder_adds_prefer_bonus_sorting() -> None:
    assert params["prefer_like_0"] == "%/test\\_%.py"
 def test_retrieve_builder_adds_metadata_filters() -> None:
    builder = RetrievalStatementBuilder()
    sql, params = builder.build_retrieve(
        "rag-1",
        [0.1, 0.2],
        query_text="notification flow",
        metadata_domain="notifications",
        metadata_subdomain="delivery_loop",
    )
    assert "metadata_json->>'domain'" in sql
    assert "metadata_json->>'subdomain'" in sql
    assert params["metadata_domain"] == "notifications"
    assert params["metadata_subdomain"] == "delivery_loop"
 def test_lexical_builder_omits_test_filters_when_not_requested() -> None:
    builder = RetrievalStatementBuilder()