Skip to content

transcribe-proxy

Run a proxy server that forwards transcription requests to your configured ASR provider (Wyoming, OpenAI, or Gemini).

This is useful for integrating agent-cli's transcription capabilities into other applications, such as iOS Shortcuts.

Usage

agent-cli server transcribe-proxy [OPTIONS]

Examples

# Run on default port
agent-cli server transcribe-proxy

# Custom port
agent-cli server transcribe-proxy --port 8080

# Development mode with auto-reload
agent-cli server transcribe-proxy --reload

Options

Options

Option Default Description
--host 0.0.0.0 Network interface to bind. Use 0.0.0.0 for all interfaces
--port, -p 61337 Port for the HTTP API
--reload false Auto-reload on code changes (development only)

General Options

Option Default Description
--log-level info Set logging level.

API Endpoints

Endpoint Method Description
/transcribe POST Transcribe audio file
/health GET Health check
/docs GET Interactive API documentation

POST /transcribe

Transcribe an audio file with optional LLM post-processing.

Request Parameters (multipart/form-data):

Parameter Type Default Description
audio file required Audio file (wav, mp3, m4a, ogg, flac, aac, webm)
cleanup boolean true Whether to apply LLM post-processing to clean up the transcript
extra_instructions string - Additional instructions for the LLM cleanup (appended to any config file instructions)

Disabling LLM Post-Processing:

If you only need raw transcription without LLM cleanup (e.g., for simple note-taking or environments without LLM infrastructure), pass cleanup=false:

curl -X POST http://localhost:61337/transcribe \
  -F "audio=@recording.wav" \
  -F "cleanup=false"

This skips the LLM step entirely, reducing latency and removing the LLM dependency for that request.

Response:

{
  "raw_transcript": "the original transcription",
  "cleaned_transcript": "The cleaned transcription.",
  "success": true,
  "error": null
}
Field Type Description
raw_transcript string The raw transcription from the ASR provider
cleaned_transcript string or null The LLM-cleaned transcript, or null if cleanup=false
success boolean Whether the transcription succeeded
error string or null Error message if something went wrong

How It Works

The transcription proxy acts as a bridge between client applications and your configured ASR provider:

┌─────────────────┐     ┌─────────────────────┐     ┌─────────────────┐
│  iOS Shortcut   │────▶│  Transcription      │────▶│  ASR Provider   │
│  or other app   │     │  Proxy (:61337)     │     │  (configured)   │
└─────────────────┘     └─────────────────────┘     └─────────────────┘
                              ┌────────────────────────────┼────────────────────────────┐
                              ▼                            ▼                            ▼
                        ┌──────────┐                 ┌──────────┐                 ┌──────────┐
                        │ Wyoming  │                 │  OpenAI  │                 │  Gemini  │
                        │ (Local)  │                 │  (Cloud) │                 │  (Cloud) │
                        └──────────┘                 └──────────┘                 └──────────┘

The proxy reads your agent-cli configuration to determine which ASR provider to use, then forwards transcription requests accordingly.

Use Cases

iOS Shortcuts Integration

The proxy provides a simple HTTP endpoint that iOS Shortcuts can call to transcribe audio. See the iOS Shortcut Guide for setup instructions.

Custom Applications

Any application that can make HTTP requests can use the proxy to access transcription services without needing to implement provider-specific logic.

Installation

Requires the server extra:

pip install "agent-cli[server]"
# or
uv sync --extra server

Docker

Run using the published container image:

# Run standalone
docker run -p 61337:61337 ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest

# Or use docker-compose (included in both cuda and cpu profiles)
docker compose -f docker/docker-compose.yml --profile cpu up transcribe-proxy

Environment Variables

Configure the proxy using environment variables (priority: env var > config file > default):

Variable Default Description
ASR_PROVIDER wyoming ASR provider: wyoming, openai, gemini
ASR_WYOMING_IP localhost Wyoming ASR server hostname/IP
ASR_WYOMING_PORT 10300 Wyoming ASR server port
ASR_OPENAI_MODEL whisper-1 OpenAI ASR model name
ASR_OPENAI_BASE_URL - Custom OpenAI-compatible ASR endpoint
ASR_OPENAI_PROMPT - Custom prompt to guide transcription
ASR_GEMINI_MODEL gemini-3-flash-preview Gemini ASR model name
LLM_PROVIDER ollama LLM provider: ollama, openai, gemini
LLM_OLLAMA_MODEL gemma3:4b Ollama model name
LLM_OLLAMA_HOST http://localhost:11434 Ollama server URL
LLM_OPENAI_MODEL gpt-5-mini OpenAI model name
LLM_GEMINI_MODEL gemini-3-flash-preview Gemini model name
TTS_PROVIDER wyoming TTS provider: wyoming, openai, kokoro, gemini
LOG_LEVEL info Logging level: debug, info, warning, error
OPENAI_API_KEY - OpenAI API key
OPENAI_BASE_URL - Custom OpenAI-compatible API base URL
GEMINI_API_KEY - Gemini API key

Example with Wyoming ASR in Docker Compose:

docker run -p 61337:61337 \
  -e ASR_WYOMING_IP=agent-cli-whisper \
  -e ASR_WYOMING_PORT=10300 \
  ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest

Custom Config File

To use a custom config, mount it as a volume:

docker run -p 61337:61337 \
  -v ./config.toml:/home/transcribe/.config/agent-cli/config.toml:ro \
  ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest