Skip to content

Docker Installation

Universal Docker setup that works on any platform with Docker support.

Warning

Important Limitations

  • macOS: Docker does not support GPU acceleration. For 10x better performance, use macOS native setup
  • Linux: Requires NVIDIA Container Toolkit for GPU acceleration

Prerequisites

  • Docker and Docker Compose installed
  • At least 8GB RAM available for Docker
  • 10GB free disk space
  • For GPU: NVIDIA Container Toolkit (installation guide)

Quick Start

  1. Start all services with GPU acceleration:
docker compose -f docker/docker-compose.yml --profile cuda up

Or for CPU-only:

docker compose -f docker/docker-compose.yml --profile cpu up
  1. Check if services are running:
docker compose -f docker/docker-compose.yml logs
  1. Install agent-cli:
uv tool install agent-cli -p 3.13
# or: pip install agent-cli
  1. Test the setup:
    agent-cli autocorrect "this has an eror"
    

Services Overview

The Docker setup provides:

Service Image Port Purpose
whisper agent-cli-whisper (custom) 10300/10301 Speech-to-text (Faster Whisper)
tts agent-cli-tts (custom) 10200/10201 Text-to-speech (Kokoro/Piper)
transcribe-proxy agent-cli-transcribe-proxy 61337 ASR proxy for iOS/external apps
rag-proxy agent-cli-rag-proxy 8000 Document-aware chat (RAG)
memory-proxy agent-cli-memory-proxy 8100 Long-term memory chat
ollama ollama/ollama 11434 LLM server
openwakeword rhasspy/wyoming-openwakeword 10400 Wake word detection

Configuration

Environment Variables

# Whisper ASR
WHISPER_MODEL=large-v3      # Model: tiny, base, small, medium, large-v3
WHISPER_TTL=300             # Seconds before unloading idle model

# TTS
TTS_MODEL=kokoro            # For CUDA: kokoro, For CPU: en_US-lessac-medium
TTS_BACKEND=kokoro          # Backend: kokoro (GPU), piper (CPU)
TTS_TTL=300                 # Seconds before unloading idle model

# Transcription Proxy
PROXY_PORT=61337            # Port for transcription proxy
ASR_PROVIDER=wyoming        # ASR provider: wyoming, openai, gemini
ASR_WYOMING_IP=whisper      # Wyoming server hostname (container name in compose)
ASR_WYOMING_PORT=10300      # Wyoming server port
LLM_PROVIDER=ollama         # LLM provider: ollama, openai, gemini
LLM_OLLAMA_MODEL=gemma3:4b  # Ollama model name
LLM_OLLAMA_HOST=http://ollama:11434  # Ollama server URL (container name)
LLM_OPENAI_MODEL=gpt-4.1-nano  # OpenAI model (if using openai provider)
OPENAI_API_KEY=sk-...       # OpenAI API key (if using openai provider)

# RAG Proxy
RAG_PORT=8000               # Port for RAG proxy
RAG_LIMIT=3                 # Number of document chunks per query
RAG_ENABLE_TOOLS=true       # Enable read_full_document tool
EMBEDDING_MODEL=text-embedding-3-small  # Embedding model for RAG/memory

# Memory Proxy
MEMORY_PORT=8100            # Port for memory proxy
MEMORY_TOP_K=5              # Number of memories per query
MEMORY_MAX_ENTRIES=500      # Max entries per conversation before eviction
MEMORY_SUMMARIZATION=true   # Enable fact extraction from conversations
MEMORY_GIT_VERSIONING=true  # Enable git versioning for memory changes

GPU Support

The CUDA profile automatically enables GPU for Whisper and TTS. For Ollama GPU support, edit the compose file and uncomment the deploy section under the ollama service.

Managing Services

# Start services in background
docker compose -f docker/docker-compose.yml --profile cuda up -d

# Stop services
docker compose -f docker/docker-compose.yml --profile cuda down

# View logs
docker compose -f docker/docker-compose.yml logs -f

# Rebuild from source
docker compose -f docker/docker-compose.yml --profile cuda up --build

Data Persistence

Services store data in Docker volumes:

  • agent-cli-whisper-cache - Whisper models
  • agent-cli-tts-cache - TTS models and voices
  • agent-cli-ollama-data - Ollama models
  • agent-cli-openwakeword-data - Wake word models
  • agent-cli-rag-docs - Documents to index for RAG
  • agent-cli-rag-db - RAG vector database (ChromaDB)
  • agent-cli-rag-cache - RAG embedding models
  • agent-cli-memory-data - Memory entries and vector index
  • agent-cli-memory-cache - Memory embedding models

Ports Reference

Port Service Protocol
8000 RAG Proxy HTTP API
8100 Memory Proxy HTTP API
10200 TTS Wyoming
10201 TTS HTTP API
10300 Whisper Wyoming
10301 Whisper HTTP API
10400 OpenWakeWord Wyoming
11434 Ollama HTTP API
61337 Transcription Proxy HTTP API

Alternative: Native Installation

For better performance, consider platform-specific native installation: