transcribe-proxy
Run a proxy server that forwards transcription requests to your configured ASR provider (Wyoming, OpenAI, or Gemini).
This is useful for integrating agent-cli's transcription capabilities into other applications, such as iOS Shortcuts.
Usage
Examples
# Run on default port
agent-cli server transcribe-proxy
# Custom port
agent-cli server transcribe-proxy --port 8080
# Development mode with auto-reload
agent-cli server transcribe-proxy --reload
Options
Options
| Option | Default | Description |
|---|---|---|
--host |
0.0.0.0 |
Network interface to bind. Use 0.0.0.0 for all interfaces |
--port, -p |
61337 |
Port for the HTTP API |
--reload |
false |
Auto-reload on code changes (development only) |
General Options
| Option | Default | Description |
|---|---|---|
--log-level |
info |
Set logging level. |
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/transcribe |
POST | Transcribe audio file |
/health |
GET | Health check |
/docs |
GET | Interactive API documentation |
POST /transcribe
Transcribe an audio file with optional LLM post-processing.
Request Parameters (multipart/form-data):
| Parameter | Type | Default | Description |
|---|---|---|---|
audio |
file | required | Audio file (wav, mp3, m4a, ogg, flac, aac, webm) |
cleanup |
boolean | true |
Whether to apply LLM post-processing to clean up the transcript |
extra_instructions |
string | - | Additional instructions for the LLM cleanup (appended to any config file instructions) |
Disabling LLM Post-Processing:
If you only need raw transcription without LLM cleanup (e.g., for simple note-taking or environments without LLM infrastructure), pass cleanup=false:
This skips the LLM step entirely, reducing latency and removing the LLM dependency for that request.
Response:
{
"raw_transcript": "the original transcription",
"cleaned_transcript": "The cleaned transcription.",
"success": true,
"error": null
}
| Field | Type | Description |
|---|---|---|
raw_transcript |
string | The raw transcription from the ASR provider |
cleaned_transcript |
string or null | The LLM-cleaned transcript, or null if cleanup=false |
success |
boolean | Whether the transcription succeeded |
error |
string or null | Error message if something went wrong |
How It Works
The transcription proxy acts as a bridge between client applications and your configured ASR provider:
┌─────────────────┐ ┌─────────────────────┐ ┌─────────────────┐
│ iOS Shortcut │────▶│ Transcription │────▶│ ASR Provider │
│ or other app │ │ Proxy (:61337) │ │ (configured) │
└─────────────────┘ └─────────────────────┘ └─────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Wyoming │ │ OpenAI │ │ Gemini │
│ (Local) │ │ (Cloud) │ │ (Cloud) │
└──────────┘ └──────────┘ └──────────┘
The proxy reads your agent-cli configuration to determine which ASR provider to use, then forwards transcription requests accordingly.
Use Cases
iOS Shortcuts Integration
The proxy provides a simple HTTP endpoint that iOS Shortcuts can call to transcribe audio. See the iOS Shortcut Guide for setup instructions.
Custom Applications
Any application that can make HTTP requests can use the proxy to access transcription services without needing to implement provider-specific logic.
Installation
Requires the server extra:
Docker
Run using the published container image:
# Run standalone
docker run -p 61337:61337 ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest
# Or use docker-compose (included in both cuda and cpu profiles)
docker compose -f docker/docker-compose.yml --profile cpu up transcribe-proxy
Environment Variables
Configure the proxy using environment variables (priority: env var > config file > default):
| Variable | Default | Description |
|---|---|---|
ASR_PROVIDER |
wyoming |
ASR provider: wyoming, openai, gemini |
ASR_WYOMING_IP |
localhost |
Wyoming ASR server hostname/IP |
ASR_WYOMING_PORT |
10300 |
Wyoming ASR server port |
ASR_OPENAI_MODEL |
whisper-1 |
OpenAI ASR model name |
ASR_OPENAI_BASE_URL |
- | Custom OpenAI-compatible ASR endpoint |
ASR_OPENAI_PROMPT |
- | Custom prompt to guide transcription |
ASR_GEMINI_MODEL |
gemini-3-flash-preview |
Gemini ASR model name |
LLM_PROVIDER |
ollama |
LLM provider: ollama, openai, gemini |
LLM_OLLAMA_MODEL |
gemma3:4b |
Ollama model name |
LLM_OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
LLM_OPENAI_MODEL |
gpt-5-mini |
OpenAI model name |
LLM_GEMINI_MODEL |
gemini-3-flash-preview |
Gemini model name |
TTS_PROVIDER |
wyoming |
TTS provider: wyoming, openai, kokoro, gemini |
LOG_LEVEL |
info |
Logging level: debug, info, warning, error |
OPENAI_API_KEY |
- | OpenAI API key |
OPENAI_BASE_URL |
- | Custom OpenAI-compatible API base URL |
GEMINI_API_KEY |
- | Gemini API key |
Example with Wyoming ASR in Docker Compose:
docker run -p 61337:61337 \
-e ASR_WYOMING_IP=agent-cli-whisper \
-e ASR_WYOMING_PORT=10300 \
ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest
Custom Config File
To use a custom config, mount it as a volume: