work/agent

Fork 0

Files

zosimovaa 417b8b6f72 Фиксация изменений

2026-03-05 11:03:17 +03:00

6.5 KiB

Raw Blame History

LLM Inventory

Provider and SDK

Provider in code: GigaChat / Sber
Local SDK style: custom thin HTTP client over requests
Core files:
- app/modules/shared/gigachat/client.py
- app/modules/shared/gigachat/settings.py
- app/modules/shared/gigachat/token_provider.py
- app/modules/agent/llm/service.py

There is no OpenAI SDK, Azure SDK, or local model runtime in the current implementation.

Configuration

Model and endpoint configuration are read from environment in GigaChatSettings.from_env():

GIGACHAT_AUTH_URL
- default: https://ngw.devices.sberbank.ru:9443/api/v2/oauth
GIGACHAT_API_URL
- default: https://gigachat.devices.sberbank.ru/api/v1
GIGACHAT_SCOPE
- default: GIGACHAT_API_PERS
GIGACHAT_TOKEN
- required for auth
GIGACHAT_SSL_VERIFY
- default: true
GIGACHAT_MODEL
- default: GigaChat
GIGACHAT_EMBEDDING_MODEL
- default: Embeddings
AGENT_PROMPTS_DIR
- optional prompt directory override

PostgreSQL config for retrieval storage is separate:

DATABASE_URL
- default: postgresql+psycopg://agent:agent@db:5432/agent

Default models

Chat/completions model default: GigaChat
Embedding model default: Embeddings

Completion payload

Observed payload sent by GigaChatClient.complete(...):

{
  "model": "GigaChat",
  "messages": [
    {"role": "system", "content": "<prompt template text>"},
    {"role": "user", "content": "<runtime user input>"}
  ]
}

Endpoint:

POST {GIGACHAT_API_URL}/chat/completions

Observed response handling:

reads choices[0].message.content
if no choices: returns empty string

Embeddings payload

Observed payload sent by GigaChatClient.embed(...):

{
  "model": "Embeddings",
  "input": [
    "<text1>",
    "<text2>"
  ]
}

Endpoint:

POST {GIGACHAT_API_URL}/embeddings

Observed response handling:

expects data list
maps each item.embedding to list[float]

Parameters

Explicitly implemented

model
messages
input
HTTP timeout:
- completions: 90s
- embeddings: 90s
- auth: 30s
TLS verification flag:
- verify=settings.ssl_verify

Not implemented in payload

temperature
top_p
max_tokens
response_format
tools/function calling
streaming
seed
stop sequences

ASSUMPTION: the service uses provider defaults for sampling and output length because these fields are not sent in the request payload.

Context and budget limits

There is no centralized token budget manager in the current code.

Observed practical limits instead:

prompt file text is loaded as-is from disk
user input is passed as-is
RAG context shaping happens outside the LLM client
docs indexing summary truncation:
- docs module catalog summary: 4000 chars
- docs policy text: 4000 chars
project QA source bundle caps:
- top 12 rag items
- top 10 file candidates
logging truncation only:
- LLM input/output logs capped at 1500 chars for logs

ASSUMPTION: there is no explicit max-context enforcement before chat completion requests. The current system relies on upstream graph logic to keep inputs small enough.

Retry, backoff, timeout

Timeouts

auth: 30s
chat completion: 90s
embeddings: 90s

Retry

Generic async retry wrapper exists in app/modules/shared/retry_executor.py
It retries only:
- TimeoutError
- ConnectionError
- OSError
Retry constants:
- MAX_RETRIES = 5
- backoff: 0.1 * attempt seconds

Important current limitation

GigaChatClient raises GigaChatError on HTTP and request failures.
RetryExecutor does not catch GigaChatError.
Result: LLM and embeddings calls are effectively not retried by this generic retry helper unless errors are converted upstream.

Prompt formation

Prompt loading is handled by PromptLoader:

base dir: app/modules/agent/prompts
override: AGENT_PROMPTS_DIR
file naming convention: <prompt_name>.txt

Prompt composition model today:

system prompt:
- full contents of selected prompt file
user prompt:
- raw runtime input string passed by the caller
no separate developer prompt layer in the application payload

If a prompt file is missing:

fallback system prompt: You are a helpful assistant.

Prompt templates present

router_intent
general_answer
project_answer
docs_detect
docs_strategy
docs_plan_sections
docs_generation
docs_self_check
docs_execution_summary
project_edits_plan
project_edits_hunks
project_edits_self_check

Key LLM call entrypoints

Composition roots

app/modules/agent/module.py
- builds GigaChatSettings
- builds GigaChatTokenProvider
- builds GigaChatClient
- builds PromptLoader
- builds AgentLlmService
app/modules/rag_session/module.py
- builds the same provider stack for embeddings used by RAG

Main abstraction

AgentLlmService.generate(prompt_name, user_input, log_context=None)

Current generate callsites

app/modules/agent/engine/router/intent_classifier.py
- router_intent
app/modules/agent/engine/graphs/base_graph.py
- general_answer
app/modules/agent/engine/graphs/project_qa_graph.py
- project_answer
app/modules/agent/engine/graphs/docs_graph_logic.py
- docs_detect
- docs_strategy
- docs_plan_sections
- docs_generation
- docs_self_check
- docs_execution_summary-like usage via summary step
app/modules/agent/engine/graphs/project_edits_logic.py
- project_edits_plan
- project_edits_self_check
- project_edits_hunks

Logging and observability

AgentLlmService logs:

input:
- graph llm input: context=... prompt=... user_input=...
output:
- graph llm output: context=... prompt=... output=...

Log truncation:

1500 chars

RAG retrieval logs separately in RagService, but without embedding vectors.

Integration with retrieval

There are two distinct GigaChat usages:

Chat/completion path for agent reasoning and generation
Embedding path for RAG indexing and retrieval

The embedding adapter is GigaChatEmbedder, used by:

app/modules/rag/services/rag_service.py

Notable limitations

Single provider coupling: chat and embeddings both depend on GigaChat-specific endpoints.
No model routing by scenario.
No tool/function calling.
No centralized prompt token budgeting.
No explicit retry for GigaChatError.
No streaming completions.
No structured response mode beyond prompt conventions and downstream parsing.

6.5 KiB Raw Blame History