rag-proxy¶
A RAG (Retrieval-Augmented Generation) proxy server that lets you chat with your documents.
Usage¶
Description¶
Enables "Chat with your Data" by running a local proxy server:
- Start the server, pointing to your documents folder and LLM
- The server watches the folder and indexes documents into a ChromaDB vector store
- Point any OpenAI-compatible client to this server's URL
- When you ask a question, the server retrieves relevant chunks and adds them to the prompt
Installation¶
Requires the rag extra:
Examples¶
# With local LLM (Ollama)
agent-cli rag-proxy \
--docs-folder ~/Documents/Notes \
--openai-base-url http://localhost:11434/v1 \
--port 8000
# With OpenAI
agent-cli rag-proxy \
--docs-folder ~/Documents/Notes \
--openai-api-key sk-... \
--port 8000
# Use with agent-cli chat
agent-cli chat --openai-base-url http://localhost:8000/v1 --llm-provider openai
Options¶
RAG Configuration¶
| Option | Description | Default |
|---|---|---|
--docs-folder PATH |
Folder to watch for documents | ./rag_docs |
--chroma-path PATH |
ChromaDB persistence directory | ./rag_db |
--limit N |
Number of chunks to retrieve per query | 3 |
--rag-tools / --no-rag-tools |
Allow agent to fetch full documents | true |
LLM Configuration¶
| Option | Description |
|---|---|
--openai-base-url |
OpenAI-compatible API URL (e.g., Ollama) |
--openai-api-key |
OpenAI API key |
--embedding-model |
Model for embeddings |
Server Configuration¶
| Option | Description | Default |
|---|---|---|
--host |
Host/IP to bind to | 0.0.0.0 |
--port |
Port to bind to | 8000 |
General Options¶
| Option | Description | Default |
|---|---|---|
--log-level |
Logging level | INFO |
--config PATH |
Path to a TOML configuration file | - |
--print-args |
Print resolved arguments including config values | false |
Supported Document Types¶
Text files (loaded directly):
.txt,.md,.json,.py,.js,.ts,.yaml,.yml,.rs,.go.c,.cpp,.h,.sh,.toml,.rst,.ini,.cfg
Rich documents (converted via MarkItDown):
.pdf,.docx,.pptx,.xlsx,.html,.htm,.csv,.xml
Architecture¶
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your Client │────▶│ RAG Proxy │────▶│ LLM Backend │
│ (chat, curl) │◀────│ :8000 │◀────│ (Ollama/OpenAI) │
└─────────────────┘ └────────┬────────┘ └─────────────────┘
│
┌────────▼────────┐
│ ChromaDB │
│ (Vector Store) │
└────────┬────────┘
│
┌────────▼────────┐
│ docs-folder │
│ (Your Files) │
└─────────────────┘
Usage with Other Clients¶
Any OpenAI-compatible client can use the RAG proxy:
# curl
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "What do my notes say about X?"}]}'
# Python (openai library)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Summarize my project notes"}]
)
Tips¶
- The server automatically re-indexes when files change
- Use
--limitto control how many document chunks are retrieved - Enable
--rag-toolsfor the agent to request full documents when snippets aren't enough