Files
RagAgent/README.md

2.9 KiB

RAG Agent (Postgres)

Custom RAG agent that indexes text files from a git repository into Postgres and answers queries using retrieval + LLM generation. Commits are tied to stories; indexing and retrieval can be scoped by story.

Quick start

  1. (Optional) Run Postgres and the app via Docker (clone the repo first):
    • git clone git@git.lesha.spb.ru:alex/RagAgent.git && cd RagAgent
    • docker compose up -d — starts Postgres and the RAG app in one network rag_net; app connects to DB at host postgres.
    • On first start (empty DB), scripts in docker/postgres-init/ run automatically (extension + tables). To disable, comment out the init volume in docker-compose.yml.
    • Default DSN inside the app: postgresql://rag:rag_secret@postgres:5432/rag. Override with POSTGRES_* and RAG_REPO_PATH (path to your knowledge-base repo, mounted into the app container).
    • Run commands: docker compose run --rm app index --story my-branch, docker compose run --rm app ask "Question?".
  2. Configure environment variables:
    • RAG_REPO_PATH — path to git repo with text files
    • RAG_DB_DSN — Postgres DSN (e.g. postgresql://rag:rag_secret@localhost:5432/rag)
    • RAG_EMBEDDINGS_DIM — embedding vector dimension (e.g. 1536)
  3. Create DB schema (only if not using Docker, or if init was disabled):
    • python scripts/create_db.py (or psql "$RAG_DB_DSN" -f scripts/schema.sql)
  4. Index files for a story (e.g. branch name as story slug):
    • rag-agent index --story my-branch --changed --base-ref HEAD~1 --head-ref HEAD
  5. Ask a question (optionally scoped to a story):
    • rag-agent ask "What is covered?"
    • rag-agent ask "What is covered?" --story my-branch

Git hook (index on commit)

Install the post-commit hook so changed files are indexed after each commit:

cp scripts/post-commit .git/hooks/post-commit && chmod +x .git/hooks/post-commit

Story for the commit is taken from (in order): env RAG_STORY, file .rag-story in repo root (one line = slug), or current branch name.

DB structure

  • stories — story slug (e.g. branch name); documents and chunks are tied to a story.
  • documents — path + version per story; unique (story_id, path).
  • chunks — text chunks with embeddings (pgvector); updated when documents are re-indexed.

Scripts: scripts/create_db.py (Python, uses ensure_schema and RAG_* env), scripts/schema.sql (raw SQL).

Embeddings (GigaChat)

If GIGACHAT_CREDENTIALS is set (e.g. in .env for local runs), embeddings use GigaChat API; otherwise the stub client is used. Optional env: GIGACHAT_EMBEDDINGS_MODEL (default Embeddings), GIGACHAT_VERIFY_SSL (true/false). Ensure RAG_EMBEDDINGS_DIM matches the model output (see GigaChat docs).

Notes

  • LLM client is still a stub; replace it in src/rag_agent/agent/pipeline.py for real answers.
  • This project requires Postgres with the pgvector extension.