Skip to content

memory

Long-term memory system for conversations with subcommands for memory management.

Commands


memory proxy

A middleware server that gives any OpenAI-compatible app long-term memory.

Usage

agent-cli memory proxy [OPTIONS]

Description

Acts as a proxy between your chat client and an LLM provider:

  1. Intercepts chat requests
  2. Retrieves relevant memories from a local vector database
  3. Injects memories into the system prompt
  4. Forwards the augmented request to the LLM
  5. Extracts new facts from the conversation and stores them

Key Features

  • Simple Markdown Files: Memories stored as human-readable Markdown
  • Automatic Version Control: Built-in Git integration
  • Lightweight & Local: Runs entirely on your machine
  • Proxy Middleware: Works with any OpenAI-compatible endpoint

Installation

pip install "agent-cli[memory]"
# or from repo
uv sync --extra memory

Examples

# With local LLM (Ollama) - uses default embedding model
agent-cli memory proxy \
  --memory-path ./memory_db \
  --openai-base-url http://localhost:11434/v1

# With local Ollama embedding model (requires: ollama pull nomic-embed-text)
agent-cli memory proxy \
  --memory-path ./memory_db \
  --openai-base-url http://localhost:11434/v1 \
  --embedding-model nomic-embed-text

# Use with agent-cli chat
agent-cli chat --openai-base-url http://localhost:8100/v1 --llm-provider openai

Options

Memory Configuration

Option Default Description
--memory-path ./memory_db Directory for memory storage. Contains entries/ (Markdown files) and chroma/ (vector index). Created automatically if it doesn't exist.
--default-top-k 5 Number of relevant memories to inject into each request. Higher values provide more context but increase token usage.
--max-entries 500 Maximum entries per conversation before oldest are evicted. Summaries are preserved separately.
--mmr-lambda 0.7 MMR lambda (0-1): higher favors relevance, lower favors diversity.
--recency-weight 0.2 Weight for recency vs semantic relevance (0.0-1.0). At 0.2: 20% recency, 80% semantic similarity.
--score-threshold 0.35 Minimum semantic relevance threshold (0.0-1.0). Memories below this score are discarded to reduce noise.
--summarization/--no-summarization true Extract facts and generate summaries after each turn using the LLM. Disable to only store raw conversation turns.
--git-versioning/--no-git-versioning true Auto-commit memory changes to git. Initializes a repo in --memory-path if needed. Provides full history of memory evolution.

LLM: OpenAI-compatible

Option Default Description
--openai-base-url - Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).
--openai-api-key - Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.

LLM Configuration

Option Default Description
--embedding-base-url - Base URL for embedding API. Falls back to --openai-base-url if not set. Useful when using different providers for chat vs embeddings.
--embedding-model text-embedding-3-small Embedding model to use for vectorization.

Server Configuration

Option Default Description
--host 0.0.0.0 Host/IP to bind API servers to.
--port 8100 Port to bind to

General Options

Option Default Description
--log-level info Set logging level.
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.

memory add

Add memories directly to the store without LLM extraction.

Usage

agent-cli memory add [MEMORIES]... [OPTIONS]

Description

Useful for bulk imports or seeding memories. The memory proxy file watcher will auto-index new files.

Examples

# Add single memories as arguments
agent-cli memory add "User likes coffee" "User lives in Amsterdam"

# Read from JSON file
agent-cli memory add -f memories.json

# Read from stdin (plain text, one per line)
echo "User prefers dark mode" | agent-cli memory add -f -

# Read JSON from stdin
echo '["Fact one", "Fact two"]' | agent-cli memory add -f -

# Specify conversation ID
agent-cli memory add -c work "Project deadline is Friday"

Options

Options

Option Default Description
--file, -f - Read memories from file. Use '-' for stdin. Supports JSON array, JSON object with 'memories' key, or plain text (one per line).
--conversation-id, -c default Conversation namespace for these memories. Memories are retrieved per-conversation unless shared globally.
--memory-path ./memory_db Directory for memory storage (same as memory proxy --memory-path).
--git-versioning/--no-git-versioning true Auto-commit changes to git for version history.

General Options

Option Default Description
--quiet, -q false Suppress console output from rich.
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.

File Format

Supports: - JSON array: ["fact 1", "fact 2"] - JSON object with memories key: {"memories": ["fact 1", "fact 2"]} - Plain text (one fact per line)


Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Your Client   │────▶│  Memory Proxy   │────▶│   LLM Backend   │
│                 │◀────│  :8100          │◀────│                 │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                    ┌────────────┼────────────┐
                    │            │            │
           ┌────────▼───┐  ┌─────▼─────┐  ┌───▼───────┐
           │  ChromaDB  │  │ Markdown  │  │    Git    │
           │  (Vector)  │  │  (Files)  │  │ (Version) │
           └────────────┘  └───────────┘  └───────────┘

Memory Files

Stored as Markdown under {memory-path}/entries/<conversation_id>/:

entries/
  <conversation_id>/
    facts/
      <timestamp>__<uuid>.md
    turns/
      user/<timestamp>__<uuid>.md
      assistant/<timestamp>__<uuid>.md
    summaries/
      summary.md

See Memory System Architecture for the full schema and metadata format.