memory

Long-term memory system for conversations with subcommands for memory management.

Commands

memory proxy - Long-term memory chat proxy server
memory add - Add memories directly without LLM extraction

memory proxy

A middleware server that gives any OpenAI-compatible app long-term memory.

Usage

agent-cli memory proxy [OPTIONS]

Description

Acts as a proxy between your chat client and an LLM provider:

Intercepts chat requests
Retrieves relevant memories from a local vector database
Injects memories into the system prompt
Forwards the augmented request to the LLM
Extracts new facts from the conversation and stores them

Key Features

Simple Markdown Files: Memories stored as human-readable Markdown
Automatic Version Control: Built-in Git integration
Lightweight & Local: Runs entirely on your machine
Proxy Middleware: Works with any OpenAI-compatible endpoint

Installation

pip install "agent-cli[memory]"
# or from repo
uv sync --extra memory

Examples

# With local LLM (Ollama) - uses default embedding model
agent-cli memory proxy \
  --memory-path ./memory_db \
  --openai-base-url http://localhost:11434/v1

# With local Ollama embedding model (requires: ollama pull nomic-embed-text)
agent-cli memory proxy \
  --memory-path ./memory_db \
  --openai-base-url http://localhost:11434/v1 \
  --embedding-model nomic-embed-text

# Use with agent-cli chat
agent-cli chat --openai-base-url http://localhost:8100/v1 --llm-provider openai

Options

Memory Configuration

Option	Default	Description
`--memory-path`	`./memory_db`	Directory for memory storage. Contains `entries/` (Markdown files) and `chroma/` (vector index). Created automatically if it doesn't exist.
`--default-top-k`	`5`	Number of relevant memories to inject into each request. Higher values provide more context but increase token usage.
`--max-entries`	`500`	Maximum entries per conversation before oldest are evicted. Summaries are preserved separately.
`--mmr-lambda`	`0.7`	MMR lambda (0-1): higher favors relevance, lower favors diversity.
`--recency-weight`	`0.2`	Weight for recency vs semantic relevance (0.0-1.0). At 0.2: 20% recency, 80% semantic similarity.
`--score-threshold`	`0.35`	Minimum semantic relevance threshold (0.0-1.0). Memories below this score are discarded to reduce noise.
`--summarization/--no-summarization`	`true`	Extract facts and generate summaries after each turn using the LLM. Disable to only store raw conversation turns.
`--git-versioning/--no-git-versioning`	`true`	Auto-commit memory changes to git. Initializes a repo in `--memory-path` if needed. Provides full history of memory evolution.

LLM: OpenAI-compatible

Option	Default	Description
`--openai-base-url`	-	Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).
`--openai-api-key`	-	Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.

LLM Configuration

Option	Default	Description
`--embedding-base-url`	-	Base URL for embedding API. Falls back to `--openai-base-url` if not set. Useful when using different providers for chat vs embeddings.
`--embedding-model`	`text-embedding-3-small`	Embedding model to use for vectorization.

Server Configuration

Option	Default	Description
`--host`	`0.0.0.0`	Host/IP to bind API servers to.
`--port`	`8100`	Port to bind to

General Options

Option	Default	Description
`--log-level`	`info`	Set logging level.
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.

memory add

Add memories directly to the store without LLM extraction.

Usage

agent-cli memory add [MEMORIES]... [OPTIONS]

Description

Useful for bulk imports or seeding memories. The memory proxy file watcher will auto-index new files.

Examples

# Add single memories as arguments
agent-cli memory add "User likes coffee" "User lives in Amsterdam"

# Read from JSON file
agent-cli memory add -f memories.json

# Read from stdin (plain text, one per line)
echo "User prefers dark mode" | agent-cli memory add -f -

# Read JSON from stdin
echo '["Fact one", "Fact two"]' | agent-cli memory add -f -

# Specify conversation ID
agent-cli memory add -c work "Project deadline is Friday"

Options

Option	Default	Description
`--file, -f`	-	Read memories from file. Use '-' for stdin. Supports JSON array, JSON object with 'memories' key, or plain text (one per line).
`--conversation-id, -c`	`default`	Conversation namespace for these memories. Memories are retrieved per-conversation unless shared globally.
`--memory-path`	`./memory_db`	Directory for memory storage (same as `memory proxy --memory-path`).
`--git-versioning/--no-git-versioning`	`true`	Auto-commit changes to git for version history.

General Options

Option	Default	Description
`--quiet, -q`	`false`	Suppress console output from rich.
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.

File Format

Supports: - JSON array: ["fact 1", "fact 2"] - JSON object with memories key: {"memories": ["fact 1", "fact 2"]} - Plain text (one fact per line)

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Your Client   │────▶│  Memory Proxy   │────▶│   LLM Backend   │
│                 │◀────│  :8100          │◀────│                 │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    │            │            │
           ┌────────▼───┐  ┌─────▼─────┐  ┌───▼───────┐
           │  ChromaDB  │  │ Markdown  │  │    Git    │
           │  (Vector)  │  │  (Files)  │  │ (Version) │
           └────────────┘  └───────────┘  └───────────┘

Memory Files

Stored as Markdown under {memory-path}/entries/<conversation_id>/:

entries/
  <conversation_id>/
    facts/
      <timestamp>__<uuid>.md
    turns/
      user/<timestamp>__<uuid>.md
      assistant/<timestamp>__<uuid>.md
    summaries/
      summary.md

See Memory System Architecture for the full schema and metadata format.

Configuration - Config file keys for memory proxy defaults
rag-proxy - Document RAG proxy server (contrast with memory)
RAG System Architecture - How RAG indexing and retrieval works

memory

Commands

memory proxy

Usage

Description

Key Features

Installation

Examples

Options

Memory Configuration

LLM: OpenAI-compatible

LLM Configuration

Server Configuration

General Options

memory add

Usage

Description

Examples

Options

Options

General Options

File Format

Architecture

Memory Files

Related