server
Run ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) servers locally with GPU acceleration.
Usage
Available Servers
| Server | Description | Default Port |
|---|---|---|
| whisper | Local Whisper ASR server with GPU acceleration and TTL-based memory management | 10301 (HTTP), 10300 (Wyoming) |
| tts | Local TTS server with Kokoro (GPU) or Piper (CPU) backends | 10201 (HTTP), 10200 (Wyoming) |
| transcribe-proxy | Proxy server that forwards to configured ASR providers | 61337 |
Quick Start
Server runs at http://localhost:10301 with OpenAI-compatible API.
Server runs at http://localhost:10201 with OpenAI-compatible API.
Why These Servers?
While Faster Whisper, Piper, and Kokoro are all available as standalone servers, agent-cli's implementations offer unique advantages:
-
Dual-protocol from one server - Both OpenAI-compatible API and Wyoming protocol run from the same instance. Use the same server for Home Assistant voice pipelines AND your scripts/apps.
-
TTL-based memory management - Like LlamaSwap, models load on-demand and automatically unload after idle periods. Run voice services 24/7 without permanently consuming RAM/VRAM - memory is freed when you're not actively using speech features.
-
Multi-platform acceleration - Whisper automatically uses the optimal backend: MLX Whisper on Apple Silicon for native Metal acceleration, Faster Whisper on Linux/CUDA for GPU acceleration.
-
Unified configuration - Consistent CLI interface, environment variables, and Docker setup across all services.
Common Features
All servers share these capabilities:
- OpenAI-compatible APIs - Drop-in replacement for OpenAI's audio APIs
- Wyoming protocol - Integration with Home Assistant voice services
- TTL-based memory management - Models unload after idle periods (default: 5 minutes), freeing RAM/VRAM
- Health endpoints - Monitor server status at
/health - Interactive docs - Explore APIs at
/docs
Choosing the Right Server
| Use Case | Recommended |
|---|---|
| Local GPU-accelerated transcription | whisper |
| High-quality GPU TTS | tts --backend kokoro |
| CPU-friendly TTS | tts --backend piper |
| Home Assistant voice integration | whisper + tts (both have Wyoming protocol) |
| iOS Shortcuts integration | transcribe-proxy |
| Forwarding to cloud providers | transcribe-proxy |
| Privacy-focused (no cloud) | whisper + tts |
| Memory-constrained system | Both servers support TTL unloading; use smaller whisper models or tts --backend piper (CPU-only) |
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ agent-cli Commands │
│ (transcribe, speak, chat, voice-edit, assistant, etc.) │
└─────────────────────────────┬───────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Whisper Server │ │ TTS Server │ │ Transcription │
│ (ASR) │ │ │ │ Proxy │
│ Port: 10301 │ │ Port: 10201 │ │ Port: 61337 │
│ Wyoming: 10300 │ │ Wyoming: 10200 │ │ │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Wyoming │ │ OpenAI │ │ Gemini │
│ (Local) │ │ (Cloud) │ │ (Cloud) │
└──────────┘ └──────────┘ └──────────┘