Фиксация изменений

2026-03-05 11:03:17 +03:00
parent 1ef0b4d68c
commit 417b8b6f72
261 changed files with 8215 additions and 332 deletions
--- a/docs/architecture/llm_inventory.md
+++ b/docs/architecture/llm_inventory.md
@@ -0,0 +1,270 @@
+# LLM Inventory
+
+## Provider and SDK
+
+- Provider in code: GigaChat / Sber
+- Local SDK style: custom thin HTTP client over `requests`
+- Core files:
+  - `app/modules/shared/gigachat/client.py`
+  - `app/modules/shared/gigachat/settings.py`
+  - `app/modules/shared/gigachat/token_provider.py`
+  - `app/modules/agent/llm/service.py`
+
+There is no OpenAI SDK, Azure SDK, or local model runtime in the current implementation.
+
+## Configuration
+
+Model and endpoint configuration are read from environment in `GigaChatSettings.from_env()`:
+
+- `GIGACHAT_AUTH_URL`
+  - default: `https://ngw.devices.sberbank.ru:9443/api/v2/oauth`
+- `GIGACHAT_API_URL`
+  - default: `https://gigachat.devices.sberbank.ru/api/v1`
+- `GIGACHAT_SCOPE`
+  - default: `GIGACHAT_API_PERS`
+- `GIGACHAT_TOKEN`
+  - required for auth
+- `GIGACHAT_SSL_VERIFY`
+  - default: `true`
+- `GIGACHAT_MODEL`
+  - default: `GigaChat`
+- `GIGACHAT_EMBEDDING_MODEL`
+  - default: `Embeddings`
+- `AGENT_PROMPTS_DIR`
+  - optional prompt directory override
+
+PostgreSQL config for retrieval storage is separate:
+
+- `DATABASE_URL`
+  - default: `postgresql+psycopg://agent:agent@db:5432/agent`
+
+## Default models
+
+- Chat/completions model default: `GigaChat`
+- Embedding model default: `Embeddings`
+
+## Completion payload
+
+Observed payload sent by `GigaChatClient.complete(...)`:
+
+```json
+{
+  "model": "GigaChat",
+  "messages": [
+    {"role": "system", "content": "<prompt template text>"},
+    {"role": "user", "content": "<runtime user input>"}
+  ]
+}
+```
+
+Endpoint:
+
+- `POST {GIGACHAT_API_URL}/chat/completions`
+
+Observed response handling:
+
+- reads `choices[0].message.content`
+- if no choices: returns empty string
+
+## Embeddings payload
+
+Observed payload sent by `GigaChatClient.embed(...)`:
+
+```json
+{
+  "model": "Embeddings",
+  "input": [
+    "<text1>",
+    "<text2>"
+  ]
+}
+```
+
+Endpoint:
+
+- `POST {GIGACHAT_API_URL}/embeddings`
+
+Observed response handling:
+
+- expects `data` list
+- maps each `item.embedding` to `list[float]`
+
+## Parameters
+
+### Explicitly implemented
+
+- `model`
+- `messages`
+- `input`
+- HTTP timeout:
+  - completions: `90s`
+  - embeddings: `90s`
+  - auth: `30s`
+- TLS verification flag:
+  - `verify=settings.ssl_verify`
+
+### Not implemented in payload
+
+- `temperature`
+- `top_p`
+- `max_tokens`
+- `response_format`
+- tools/function calling
+- streaming
+- seed
+- stop sequences
+
+`ASSUMPTION:` the service uses provider defaults for sampling and output length because these fields are not sent in the request payload.
+
+## Context and budget limits
+
+There is no centralized token budget manager in the current code.
+
+Observed practical limits instead:
+
+- prompt file text is loaded as-is from disk
+- user input is passed as-is
+- RAG context shaping happens outside the LLM client
+- docs indexing summary truncation:
+  - docs module catalog summary: `4000` chars
+  - docs policy text: `4000` chars
+- project QA source bundle caps:
+  - top `12` rag items
+  - top `10` file candidates
+- logging truncation only:
+  - LLM input/output logs capped at `1500` chars for logs
+
+`ASSUMPTION:` there is no explicit max-context enforcement before chat completion requests. The current system relies on upstream graph logic to keep inputs small enough.
+
+## Retry, backoff, timeout
+
+### Timeouts
+
+- auth: `30s`
+- chat completion: `90s`
+- embeddings: `90s`
+
+### Retry
+
+- Generic async retry wrapper exists in `app/modules/shared/retry_executor.py`
+- It retries only:
+  - `TimeoutError`
+  - `ConnectionError`
+  - `OSError`
+- Retry constants:
+  - `MAX_RETRIES = 5`
+  - backoff: `0.1 * attempt` seconds
+
+### Important current limitation
+
+- `GigaChatClient` raises `GigaChatError` on HTTP and request failures.
+- `RetryExecutor` does not catch `GigaChatError`.
+- Result: LLM and embeddings calls are effectively not retried by this generic retry helper unless errors are converted upstream.
+
+## Prompt formation
+
+Prompt loading is handled by `PromptLoader`:
+
+- base dir: `app/modules/agent/prompts`
+- override: `AGENT_PROMPTS_DIR`
+- file naming convention: `<prompt_name>.txt`
+
+Prompt composition model today:
+
+- system prompt:
+  - full contents of selected prompt file
+- user prompt:
+  - raw runtime input string passed by the caller
+- no separate developer prompt layer in the application payload
+
+If a prompt file is missing:
+
+- fallback system prompt: `You are a helpful assistant.`
+
+## Prompt templates present
+
+- `router_intent`
+- `general_answer`
+- `project_answer`
+- `docs_detect`
+- `docs_strategy`
+- `docs_plan_sections`
+- `docs_generation`
+- `docs_self_check`
+- `docs_execution_summary`
+- `project_edits_plan`
+- `project_edits_hunks`
+- `project_edits_self_check`
+
+## Key LLM call entrypoints
+
+### Composition roots
+
+- `app/modules/agent/module.py`
+  - builds `GigaChatSettings`
+  - builds `GigaChatTokenProvider`
+  - builds `GigaChatClient`
+  - builds `PromptLoader`
+  - builds `AgentLlmService`
+- `app/modules/rag_session/module.py`
+  - builds the same provider stack for embeddings used by RAG
+
+### Main abstraction
+
+- `AgentLlmService.generate(prompt_name, user_input, log_context=None)`
+
+### Current generate callsites
+
+- `app/modules/agent/engine/router/intent_classifier.py`
+  - `router_intent`
+- `app/modules/agent/engine/graphs/base_graph.py`
+  - `general_answer`
+- `app/modules/agent/engine/graphs/project_qa_graph.py`
+  - `project_answer`
+- `app/modules/agent/engine/graphs/docs_graph_logic.py`
+  - `docs_detect`
+  - `docs_strategy`
+  - `docs_plan_sections`
+  - `docs_generation`
+  - `docs_self_check`
+  - `docs_execution_summary`-like usage via summary step
+- `app/modules/agent/engine/graphs/project_edits_logic.py`
+  - `project_edits_plan`
+  - `project_edits_self_check`
+  - `project_edits_hunks`
+
+## Logging and observability
+
+`AgentLlmService` logs:
+
+- input:
+  - `graph llm input: context=... prompt=... user_input=...`
+- output:
+  - `graph llm output: context=... prompt=... output=...`
+
+Log truncation:
+
+- 1500 chars
+
+RAG retrieval logs separately in `RagService`, but without embedding vectors.
+
+## Integration with retrieval
+
+There are two distinct GigaChat usages:
+
+1. Chat/completion path for agent reasoning and generation
+2. Embedding path for RAG indexing and retrieval
+
+The embedding adapter is `GigaChatEmbedder`, used by:
+
+- `app/modules/rag/services/rag_service.py`
+
+## Notable limitations
+
+- Single provider coupling: chat and embeddings both depend on GigaChat-specific endpoints.
+- No model routing by scenario.
+- No tool/function calling.
+- No centralized prompt token budgeting.
+- No explicit retry for `GigaChatError`.
+- No streaming completions.
+- No structured response mode beyond prompt conventions and downstream parsing.