Docker Installation
Universal Docker setup that works on any platform with Docker support.
Warning
Important Limitations
- macOS: Docker does not support GPU acceleration. For 10x better performance, use macOS native setup
- Linux: Requires NVIDIA Container Toolkit for GPU acceleration
Prerequisites
- Docker and Docker Compose installed
- At least 8GB RAM available for Docker
- 10GB free disk space
- For GPU: NVIDIA Container Toolkit (installation guide)
Quick Start
- Start all services with GPU acceleration:
Or for CPU-only:
- Check if services are running:
- Install agent-cli:
- Test the setup:
Services Overview
The Docker setup provides:
| Service | Image | Port | Purpose |
|---|---|---|---|
| whisper | agent-cli-whisper (custom) | 10300/10301 | Speech-to-text (Faster Whisper) |
| tts | agent-cli-tts (custom) | 10200/10201 | Text-to-speech (Kokoro/Piper) |
| transcribe-proxy | agent-cli-transcribe-proxy | 61337 | ASR proxy for iOS/external apps |
| rag-proxy | agent-cli-rag-proxy | 8000 | Document-aware chat (RAG) |
| memory-proxy | agent-cli-memory-proxy | 8100 | Long-term memory chat |
| ollama | ollama/ollama | 11434 | LLM server |
| openwakeword | rhasspy/wyoming-openwakeword | 10400 | Wake word detection |
Configuration
Environment Variables
# Whisper ASR
WHISPER_MODEL=large-v3 # Model: tiny, base, small, medium, large-v3
WHISPER_TTL=300 # Seconds before unloading idle model
# TTS
TTS_MODEL=kokoro # For CUDA: kokoro, For CPU: en_US-lessac-medium
TTS_BACKEND=kokoro # Backend: kokoro (GPU), piper (CPU)
TTS_TTL=300 # Seconds before unloading idle model
# Transcription Proxy
PROXY_PORT=61337 # Port for transcription proxy
ASR_PROVIDER=wyoming # ASR provider: wyoming, openai, gemini
ASR_WYOMING_IP=whisper # Wyoming server hostname (container name in compose)
ASR_WYOMING_PORT=10300 # Wyoming server port
LLM_PROVIDER=ollama # LLM provider: ollama, openai, gemini
LLM_OLLAMA_MODEL=gemma3:4b # Ollama model name
LLM_OLLAMA_HOST=http://ollama:11434 # Ollama server URL (container name)
LLM_OPENAI_MODEL=gpt-4.1-nano # OpenAI model (if using openai provider)
OPENAI_API_KEY=sk-... # OpenAI API key (if using openai provider)
# RAG Proxy
RAG_PORT=8000 # Port for RAG proxy
RAG_LIMIT=3 # Number of document chunks per query
RAG_ENABLE_TOOLS=true # Enable read_full_document tool
EMBEDDING_MODEL=text-embedding-3-small # Embedding model for RAG/memory
# Memory Proxy
MEMORY_PORT=8100 # Port for memory proxy
MEMORY_TOP_K=5 # Number of memories per query
MEMORY_MAX_ENTRIES=500 # Max entries per conversation before eviction
MEMORY_SUMMARIZATION=true # Enable fact extraction from conversations
MEMORY_GIT_VERSIONING=true # Enable git versioning for memory changes
GPU Support
The CUDA profile automatically enables GPU for Whisper and TTS. For Ollama GPU support, edit the compose file and uncomment the deploy section under the ollama service.
Managing Services
# Start services in background
docker compose -f docker/docker-compose.yml --profile cuda up -d
# Stop services
docker compose -f docker/docker-compose.yml --profile cuda down
# View logs
docker compose -f docker/docker-compose.yml logs -f
# Rebuild from source
docker compose -f docker/docker-compose.yml --profile cuda up --build
Data Persistence
Services store data in Docker volumes:
agent-cli-whisper-cache- Whisper modelsagent-cli-tts-cache- TTS models and voicesagent-cli-ollama-data- Ollama modelsagent-cli-openwakeword-data- Wake word modelsagent-cli-rag-docs- Documents to index for RAGagent-cli-rag-db- RAG vector database (ChromaDB)agent-cli-rag-cache- RAG embedding modelsagent-cli-memory-data- Memory entries and vector indexagent-cli-memory-cache- Memory embedding models
Ports Reference
| Port | Service | Protocol |
|---|---|---|
| 8000 | RAG Proxy | HTTP API |
| 8100 | Memory Proxy | HTTP API |
| 10200 | TTS | Wyoming |
| 10201 | TTS | HTTP API |
| 10300 | Whisper | Wyoming |
| 10301 | Whisper | HTTP API |
| 10400 | OpenWakeWord | Wyoming |
| 11434 | Ollama | HTTP API |
| 61337 | Transcription Proxy | HTTP API |
Alternative: Native Installation
For better performance, consider platform-specific native installation:
- macOS Native Setup - Metal GPU acceleration
- Linux Native Setup - NVIDIA GPU acceleration