server

Run ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) servers locally with GPU acceleration.

Usage

agent-cli server [COMMAND] [OPTIONS]

Available Servers

Server	Description	Default Port
whisper	Local Whisper ASR server with GPU acceleration and TTL-based memory management	10301 (HTTP), 10300 (Wyoming)
tts	Local TTS server with Kokoro (GPU) or Piper (CPU) backends	10201 (HTTP), 10200 (Wyoming)
transcribe-proxy	Proxy server that forwards to configured ASR providers	61337

Quick Start

Whisper (Speech-to-Text)TTS (Text-to-Speech)Transcription Proxy

pip install "agent-cli[faster-whisper]"
agent-cli server whisper

Server runs at http://localhost:10301 with OpenAI-compatible API.

pip install "agent-cli[kokoro]"
agent-cli server tts --backend kokoro

Server runs at http://localhost:10201 with OpenAI-compatible API.

pip install "agent-cli[server]"
agent-cli server transcribe-proxy

Proxy runs at http://localhost:61337, forwarding to your configured ASR provider.

Why These Servers?

While Faster Whisper, Piper, and Kokoro are all available as standalone servers, agent-cli's implementations offer unique advantages:

Dual-protocol from one server - Both OpenAI-compatible API and Wyoming protocol run from the same instance. Use the same server for Home Assistant voice pipelines AND your scripts/apps.
TTL-based memory management - Like LlamaSwap, models load on-demand and automatically unload after idle periods. Run voice services 24/7 without permanently consuming RAM/VRAM - memory is freed when you're not actively using speech features.
Multi-platform acceleration - Whisper automatically uses the optimal backend: MLX Whisper on Apple Silicon for native Metal acceleration, Faster Whisper on Linux/CUDA for GPU acceleration.
Unified configuration - Consistent CLI interface, environment variables, and Docker setup across all services.

Common Features

All servers share these capabilities:

OpenAI-compatible APIs - Drop-in replacement for OpenAI's audio APIs
Wyoming protocol - Integration with Home Assistant voice services
TTL-based memory management - Models unload after idle periods (default: 5 minutes), freeing RAM/VRAM
Health endpoints - Monitor server status at /health
Interactive docs - Explore APIs at /docs

Choosing the Right Server

Use Case	Recommended
Local GPU-accelerated transcription	whisper
High-quality GPU TTS	tts `--backend kokoro`
CPU-friendly TTS	tts `--backend piper`
Home Assistant voice integration	whisper + tts (both have Wyoming protocol)
iOS Shortcuts integration	transcribe-proxy
Forwarding to cloud providers	transcribe-proxy
Privacy-focused (no cloud)	whisper + tts
Memory-constrained system	Both servers support TTL unloading; use smaller whisper models or tts `--backend piper` (CPU-only)

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     agent-cli Commands                       │
│  (transcribe, speak, chat, voice-edit, assistant, etc.)     │
└─────────────────────────────┬───────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  Whisper Server │  │   TTS Server    │  │ Transcription   │
│  (ASR)          │  │                 │  │ Proxy           │
│  Port: 10301    │  │  Port: 10201    │  │ Port: 61337     │
│  Wyoming: 10300 │  │  Wyoming: 10200 │  │                 │
└─────────────────┘  └─────────────────┘  └────────┬────────┘
                                                    │
                              ┌─────────────────────┼─────────────────────┐
                              ▼                     ▼                     ▼
                        ┌──────────┐          ┌──────────┐          ┌──────────┐
                        │  Wyoming │          │  OpenAI  │          │  Gemini  │
                        │  (Local) │          │  (Cloud) │          │  (Cloud) │
                        └──────────┘          └──────────┘          └──────────┘