memory
Long-term memory system for conversations with subcommands for memory management.
Commands
memory proxy- Long-term memory chat proxy servermemory add- Add memories directly without LLM extraction
memory proxy
A middleware server that gives any OpenAI-compatible app long-term memory.
Usage
Description
Acts as a proxy between your chat client and an LLM provider:
- Intercepts chat requests
- Retrieves relevant memories from a local vector database
- Injects memories into the system prompt
- Forwards the augmented request to the LLM
- Extracts new facts from the conversation and stores them
Key Features
- Simple Markdown Files: Memories stored as human-readable Markdown
- Automatic Version Control: Built-in Git integration
- Lightweight & Local: Runs entirely on your machine
- Proxy Middleware: Works with any OpenAI-compatible endpoint
Installation
Examples
# With local LLM (Ollama) - uses default embedding model
agent-cli memory proxy \
--memory-path ./memory_db \
--openai-base-url http://localhost:11434/v1
# With local Ollama embedding model (requires: ollama pull nomic-embed-text)
agent-cli memory proxy \
--memory-path ./memory_db \
--openai-base-url http://localhost:11434/v1 \
--embedding-model nomic-embed-text
# Use with agent-cli chat
agent-cli chat --openai-base-url http://localhost:8100/v1 --llm-provider openai
Options
Memory Configuration
| Option | Default | Description |
|---|---|---|
--memory-path |
./memory_db |
Directory for memory storage. Contains entries/ (Markdown files) and chroma/ (vector index). Created automatically if it doesn't exist. |
--default-top-k |
5 |
Number of relevant memories to inject into each request. Higher values provide more context but increase token usage. |
--max-entries |
500 |
Maximum entries per conversation before oldest are evicted. Summaries are preserved separately. |
--mmr-lambda |
0.7 |
MMR lambda (0-1): higher favors relevance, lower favors diversity. |
--recency-weight |
0.2 |
Weight for recency vs semantic relevance (0.0-1.0). At 0.2: 20% recency, 80% semantic similarity. |
--score-threshold |
0.35 |
Minimum semantic relevance threshold (0.0-1.0). Memories below this score are discarded to reduce noise. |
--summarization/--no-summarization |
true |
Extract facts and generate summaries after each turn using the LLM. Disable to only store raw conversation turns. |
--git-versioning/--no-git-versioning |
true |
Auto-commit memory changes to git. Initializes a repo in --memory-path if needed. Provides full history of memory evolution. |
LLM: OpenAI-compatible
| Option | Default | Description |
|---|---|---|
--openai-base-url |
- | Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1). |
--openai-api-key |
- | Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable. |
LLM Configuration
| Option | Default | Description |
|---|---|---|
--embedding-base-url |
- | Base URL for embedding API. Falls back to --openai-base-url if not set. Useful when using different providers for chat vs embeddings. |
--embedding-model |
text-embedding-3-small |
Embedding model to use for vectorization. |
Server Configuration
| Option | Default | Description |
|---|---|---|
--host |
0.0.0.0 |
Host/IP to bind API servers to. |
--port |
8100 |
Port to bind to |
General Options
| Option | Default | Description |
|---|---|---|
--log-level |
info |
Set logging level. |
--config |
- | Path to a TOML configuration file. |
--print-args |
false |
Print the command line arguments, including variables taken from the configuration file. |
memory add
Add memories directly to the store without LLM extraction.
Usage
Description
Useful for bulk imports or seeding memories. The memory proxy file watcher will auto-index new files.
Examples
# Add single memories as arguments
agent-cli memory add "User likes coffee" "User lives in Amsterdam"
# Read from JSON file
agent-cli memory add -f memories.json
# Read from stdin (plain text, one per line)
echo "User prefers dark mode" | agent-cli memory add -f -
# Read JSON from stdin
echo '["Fact one", "Fact two"]' | agent-cli memory add -f -
# Specify conversation ID
agent-cli memory add -c work "Project deadline is Friday"
Options
Options
| Option | Default | Description |
|---|---|---|
--file, -f |
- | Read memories from file. Use '-' for stdin. Supports JSON array, JSON object with 'memories' key, or plain text (one per line). |
--conversation-id, -c |
default |
Conversation namespace for these memories. Memories are retrieved per-conversation unless shared globally. |
--memory-path |
./memory_db |
Directory for memory storage (same as memory proxy --memory-path). |
--git-versioning/--no-git-versioning |
true |
Auto-commit changes to git for version history. |
General Options
| Option | Default | Description |
|---|---|---|
--quiet, -q |
false |
Suppress console output from rich. |
--config |
- | Path to a TOML configuration file. |
--print-args |
false |
Print the command line arguments, including variables taken from the configuration file. |
File Format
Supports:
- JSON array: ["fact 1", "fact 2"]
- JSON object with memories key: {"memories": ["fact 1", "fact 2"]}
- Plain text (one fact per line)
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your Client │────▶│ Memory Proxy │────▶│ LLM Backend │
│ │◀────│ :8100 │◀────│ │
└─────────────────┘ └────────┬────────┘ └─────────────────┘
│
┌────────────┼────────────┐
│ │ │
┌────────▼───┐ ┌─────▼─────┐ ┌───▼───────┐
│ ChromaDB │ │ Markdown │ │ Git │
│ (Vector) │ │ (Files) │ │ (Version) │
└────────────┘ └───────────┘ └───────────┘
Memory Files
Stored as Markdown under {memory-path}/entries/<conversation_id>/:
entries/
<conversation_id>/
facts/
<timestamp>__<uuid>.md
turns/
user/<timestamp>__<uuid>.md
assistant/<timestamp>__<uuid>.md
summaries/
summary.md
See Memory System Architecture for the full schema and metadata format.
Related
- Configuration - Config file keys for memory proxy defaults
- rag-proxy - Document RAG proxy server (contrast with memory)
- RAG System Architecture - How RAG indexing and retrieval works