гих хук и сохранение изменений в контексте стори
This commit is contained in:
57
README.md
57
README.md
@@ -1,8 +1,7 @@
|
||||
# RAG Agent (Postgres)
|
||||
|
||||
Custom RAG agent that indexes text files from a git repository into Postgres
|
||||
and answers queries using retrieval + LLM generation. Commits are tied to
|
||||
**stories**; indexing and retrieval can be scoped by story.
|
||||
and answers queries using retrieval + LLM generation. **Changes are always in the context of a Story**: the unit of work is the story, not individual commits. The agent indexes **all changes from all commits** in the story range (base_ref..head_ref); per-commit indexing is not used.
|
||||
|
||||
## Quick start
|
||||
|
||||
@@ -18,12 +17,29 @@ and answers queries using retrieval + LLM generation. Commits are tied to
|
||||
- `RAG_EMBEDDINGS_DIM` — embedding vector dimension (e.g. `1536`)
|
||||
3. Create DB schema (only if not using Docker, or if init was disabled):
|
||||
- `python scripts/create_db.py` (or `psql "$RAG_DB_DSN" -f scripts/schema.sql`)
|
||||
4. Index files for a story (e.g. branch name as story slug):
|
||||
- `rag-agent index --story my-branch --changed --base-ref HEAD~1 --head-ref HEAD`
|
||||
4. Index files for a story (e.g. branch name as story slug). Use the **full story range** so all commits in the story are included:
|
||||
- `rag-agent index --story my-branch --changed --base-ref main --head-ref HEAD`
|
||||
- Or `--base-ref auto` to use merge-base(default-branch, head-ref) as the start of the story.
|
||||
5. Ask a question (optionally scoped to a story):
|
||||
- `rag-agent ask "What is covered?"`
|
||||
- `rag-agent ask "What is covered?" --story my-branch`
|
||||
|
||||
## Webhook: index on push to remote
|
||||
|
||||
When the app runs as a service in Docker, it can start a **webhook server** so that each push to the remote repository triggers a pull and incremental indexing.
|
||||
|
||||
1. Start the stack with the webhook server (default in Docker):
|
||||
- `docker compose up -d` — app runs `rag-agent serve` and listens on port 8000.
|
||||
- Repo is mounted at `RAG_REPO_PATH` (e.g. `/data`) **writable**, so the container can run `git fetch` + `git merge --ff-only` to pull changes.
|
||||
2. Clone the knowledge-base repo into the mounted directory (once), e.g. on the host: `git clone <url> ./data` so that `./data` is the worktree (or set `RAG_REPO_PATH` to that path and mount it).
|
||||
3. In GitHub (or GitLab) add a **Webhook**:
|
||||
- URL: `http://<your-server>:8000/webhook` (use HTTPS in production and put a reverse proxy in front).
|
||||
- Content type: `application/json`.
|
||||
- Secret: set a shared secret and export `WEBHOOK_SECRET` in the app environment (Docker: in `docker-compose.yml` or `.env`). If `WEBHOOK_SECRET` is empty, signature is not checked.
|
||||
4. On each push to a branch, the server receives the webhook, pulls that branch into the worktree, and runs `rag-agent index --story <branch> --changed --base-ref <old_head> --head-ref <new_head>` so only changed files are re-indexed.
|
||||
|
||||
Health check: `GET http://<host>:8000/health` → `ok`. Port is configurable via `WEBHOOK_PORT` (default 8000) in docker-compose.
|
||||
|
||||
## Git hook (index on commit)
|
||||
|
||||
Install the post-commit hook so changed files are indexed after each commit:
|
||||
@@ -34,11 +50,40 @@ cp scripts/post-commit .git/hooks/post-commit && chmod +x .git/hooks/post-commit
|
||||
|
||||
Story for the commit is taken from (in order): env `RAG_STORY`, file `.rag-story` in repo root (one line = slug), or current branch name.
|
||||
|
||||
## Git hook (server-side)
|
||||
|
||||
Use `scripts/post-receive` in the **bare repo** on the server so that pushes trigger indexing.
|
||||
|
||||
1. On the server, create a **non-bare clone** (worktree) that the hook will update and use for indexing, e.g. `git clone /path/to/repo.git /var/rag-worktree/repo`.
|
||||
2. In the bare repo, install the hook: `cp /path/to/RagAgent/scripts/post-receive /path/to/repo.git/hooks/post-receive && chmod +x .../post-receive`.
|
||||
3. Set env for the hook (e.g. in the hook or via systemd/sshd): `RAG_REPO_PATH=/var/rag-worktree/repo`, `RAG_DB_DSN=...`, `RAG_EMBEDDINGS_DIM=...`. Optionally `RAG_AGENT_VENV` (path to venv with `rag-agent`) or `RAG_AGENT_SRC` + `RAG_AGENT_PYTHON` for `python -m rag_agent.cli`.
|
||||
4. On each push the hook updates the worktree to the new commit, then runs `rag-agent index --changed --base-ref main --head-ref newrev --story <branch>` so the story contains **all commits** on the branch (from main to newrev).
|
||||
|
||||
Story is taken from the ref name (e.g. `refs/heads/main` → `main`).
|
||||
|
||||
## DB structure
|
||||
|
||||
- **stories** — story slug (e.g. branch name); documents and chunks are tied to a story.
|
||||
- **stories** — story slug (e.g. branch name); documents and chunks are tied to a story. Optional: `indexed_base_ref`, `indexed_head_ref`, `indexed_at` record the git range that was indexed (all commits in that range belong to the story).
|
||||
- **documents** — path + version per story; unique `(story_id, path)`.
|
||||
- **chunks** — text chunks with embeddings (pgvector); updated when documents are re-indexed.
|
||||
- **chunks** — text chunks with embeddings (pgvector), plus:
|
||||
- `start_line`, `end_line` — position in the source file (for requirements/use-case files).
|
||||
- `change_type` — `added` | `modified` | `unchanged` (relative to base ref when indexing with `--changed`).
|
||||
- `previous_content` — for `modified` chunks, the content before the change (for test-case generation).
|
||||
|
||||
Indexing is **always per-story**: `base_ref..head_ref` defines the set of commits that belong to the story. Use `--base-ref main` (or `auto`) and `--head-ref HEAD` so the story contains all commits on the branch, not a single commit. When you run `index --changed`, the base ref is compared to head; each chunk is marked as added, modified, or unchanged.
|
||||
|
||||
### What changed in a story (for test cases)
|
||||
|
||||
To get only the chunks that were added or modified in a story (e.g. to generate test cases for the changed part):
|
||||
|
||||
```python
|
||||
from rag_agent.index import fetch_changed_chunks
|
||||
|
||||
changed = fetch_changed_chunks(conn, story_id)
|
||||
for r in changed:
|
||||
# r.path, r.content, r.change_type, r.start_line, r.end_line, r.previous_content
|
||||
...
|
||||
```
|
||||
|
||||
Scripts: `scripts/create_db.py` (Python, uses `ensure_schema` and `RAG_*` env), `scripts/schema.sql` (raw SQL).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user