Architecture

How Agent CLI works under the hood.

System Overview

Agent CLI is built around a modular service architecture where different AI capabilities are provided by interchangeable backends.

For usage and flags, see Commands Reference and Configuration.

┌────────────────────────────────────────────────────────────────┐
│                          Agent CLI                             │
│  ┌──────────┐ ┌───────────┐ ┌──────┐ ┌───────────┐ ┌────────┐  │
│  │transcribe│ │voice-edit │ │ chat │ │ assistant │ │  ...   │  │
│  └────┬─────┘ └─────┬─────┘ └──┬───┘ └─────┬─────┘ └────────┘  │
└───────┼─────────────┼──────────┼───────────┼───────────────────┘
        │             │          │           │
        ▼             ▼          ▼           ▼
┌────────────────────────────────────────────────────────────────┐
│                     Provider Abstraction                       │
│     ┌─────────────┐   ┌─────────────┐   ┌─────────────┐        │
│     │ ASR Provider│   │ LLM Provider│   │ TTS Provider│        │
│     └──────┬──────┘   └──────┬──────┘   └──────┬──────┘        │
└────────────┼─────────────────┼─────────────────┼───────────────┘
             │                 │                 │
      ┌──────┼──────┐    ┌─────┼─────┐    ┌───────┼───────┐
      ▼      ▼      ▼    ▼     ▼     ▼    ▼   ▼   ▼   ▼   ▼
 ┌───────┐┌──────┐┌──────┐┌──────┐┌──────┐┌─────┐┌──────┐┌──────┐┌──────┐
 │Wyoming││OpenAI││Gemini││Ollama││OpenAI││Piper││OpenAI││Kokoro││Gemini│
 │Whisper││Whispr││ ASR  ││      ││Gemini││     ││ TTS  ││      ││ TTS  │
 └───────┘└──────┘└──────┘└──────┘└──────┘└─────┘└──────┘└──────┘└──────┘

Provider System

Each AI capability (ASR, LLM, TTS) has multiple backend providers:

ASR (Automatic Speech Recognition)

Provider	Implementation	GPU Support	Latency
`wyoming`	Wyoming Whisper (faster-whisper/MLX)	CUDA/Metal	Low
`openai`	OpenAI-compatible Whisper API	Cloud	Medium
`gemini`	Google Gemini API	Cloud	Medium

LLM (Large Language Model)

Provider	Implementation	GPU Support	Privacy
`ollama`	Ollama (local)	CUDA/Metal	Full
`openai`	OpenAI-compatible API	Cloud	Partial
`gemini`	Google Gemini API	Cloud	Partial

TTS (Text-to-Speech)

Provider	Implementation	Quality	Speed
`wyoming`	Wyoming Piper	Good	Fast
`openai`	OpenAI-compatible TTS	Excellent	Medium
`kokoro`	Kokoro TTS	Good	Fast
`gemini`	Google Gemini TTS	Good	Medium

Wyoming Protocol

Agent CLI uses the Wyoming Protocol for local AI services. Wyoming provides a simple TCP-based protocol for:

Speech-to-text (ASR)
Text-to-speech (TTS)
Wake word detection

Default Ports

Service	Port	Protocol
Whisper (ASR)	10300	Wyoming
Piper (TTS)	10200	Wyoming
OpenWakeWord	10400	Wyoming
Ollama (LLM)	11434	HTTP
RAG Proxy	8000	HTTP
Memory Proxy	8100	HTTP

Audio Pipeline

┌───────────┐    ┌───────────┐    ┌───────────┐    ┌───────────┐
│ Microphone│───▶│sounddevice│───▶│    WAV    │───▶│  Wyoming  │
│           │    │  capture  │    │   buffer  │    │    ASR    │
└───────────┘    └───────────┘    └───────────┘    └─────┬─────┘
                                                         │
                                                         ▼
┌───────────┐    ┌───────────┐    ┌───────────┐    ┌───────────┐
│  Speakers │◀───│sounddevice│◀───│    WAV    │◀───│  Wyoming  │
│           │    │  playback │    │   buffer  │    │    TTS    │
└───────────┘    └───────────┘    └───────────┘    └───────────┘

Configuration Loading

Configuration is loaded from multiple sources with the following precedence:

Command-line arguments (highest priority)
Environment variables (OPENAI_API_KEY, etc.)
Config file (./agent-cli-config.toml or ~/.config/agent-cli/config.toml)
Default values (lowest priority)

Process Management

Commands that run as background processes use a PID file system:

~/.cache/agent-cli/
├── assistant.pid
├── chat.pid
├── speak.pid
├── transcribe.pid
├── transcribe-live.pid
└── voice-edit.pid

~/.config/agent-cli/
├── config.toml              # Configuration
├── audio/                   # Saved recordings (transcribe-live)
├── history/                 # Chat history
├── transcriptions/          # Saved WAV files
└── transcriptions.jsonl     # Transcription log

Memory System

See Memory System Architecture for details on the long-term memory implementation. Usage: memory command.

RAG System

See RAG System Architecture for details on the document retrieval system. Usage: rag-proxy command.

Dependencies

Agent CLI uses a modular dependency structure. The base package is lightweight, with features installed as optional extras.

Core Dependencies

Always installed:

typer - CLI framework
pydantic - Data validation
rich - Terminal formatting
pyperclip - Clipboard access
httpx - HTTP client

Provider Extras

Install with agent-cli install-extras <name> or pip install agent-cli[name]:

Extra	Purpose	Key Packages
`audio`	Voice features	sounddevice, wyoming, numpy
`llm`	AI processing	pydantic-ai-slim (OpenAI, Gemini)

Feature Extras

Extra	Purpose	Key Packages
`vad`	Voice activity detection	onnxruntime
`rag`	Document chat	chromadb, markitdown
`memory`	Long-term memory	chromadb
`server`	Local ASR/TTS servers	fastapi
`faster-whisper`	Whisper (CUDA/CPU)	faster-whisper
`mlx-whisper`	Whisper (Apple Silicon)	mlx-whisper

See install-extras for the full list and installation instructions.

Platform Support

Platform	Status	Notes
macOS (Apple Silicon)	Full	Metal GPU acceleration
macOS (Intel)	Full	CPU-only
Linux (x86_64)	Full	NVIDIA GPU support
Linux (ARM)	Partial	CPU-only
Windows (WSL2)	Full	Via WSL2
Windows (Native)	Experimental	Limited testing