transcribe-proxy

Run a proxy server that forwards transcription requests to your configured ASR provider (Wyoming, OpenAI, or Gemini).

This is useful for integrating agent-cli's transcription capabilities into other applications, such as iOS Shortcuts.

Usage

agent-cli server transcribe-proxy [OPTIONS]

Examples

# Run on default port
agent-cli server transcribe-proxy

# Custom port
agent-cli server transcribe-proxy --port 8080

# Development mode with auto-reload
agent-cli server transcribe-proxy --reload

Options

Option	Default	Description
`--host`	`0.0.0.0`	Network interface to bind. Use `0.0.0.0` for all interfaces
`--port, -p`	`61337`	Port for the HTTP API
`--reload`	`false`	Auto-reload on code changes (development only)

General Options

Option	Default	Description
`--log-level`	`info`	Set logging level.

API Endpoints

Endpoint	Method	Description
`/transcribe`	POST	Transcribe audio file
`/health`	GET	Health check
`/docs`	GET	Interactive API documentation

POST /transcribe

Transcribe an audio file with optional LLM post-processing.

Request Parameters (multipart/form-data):

Parameter	Type	Default	Description
`audio`	file	required	Audio file (wav, mp3, m4a, ogg, flac, aac, webm)
`cleanup`	boolean	`true`	Whether to apply LLM post-processing to clean up the transcript
`extra_instructions`	string	-	Additional instructions for the LLM cleanup (appended to any config file instructions)

Disabling LLM Post-Processing:

If you only need raw transcription without LLM cleanup (e.g., for simple note-taking or environments without LLM infrastructure), pass cleanup=false:

curl -X POST http://localhost:61337/transcribe \
  -F "audio=@recording.wav" \
  -F "cleanup=false"

This skips the LLM step entirely, reducing latency and removing the LLM dependency for that request.

Response:

{
  "raw_transcript": "the original transcription",
  "cleaned_transcript": "The cleaned transcription.",
  "success": true,
  "error": null
}

Field	Type	Description
`raw_transcript`	string	The raw transcription from the ASR provider
`cleaned_transcript`	string or null	The LLM-cleaned transcript, or `null` if `cleanup=false`
`success`	boolean	Whether the transcription succeeded
`error`	string or null	Error message if something went wrong

How It Works

The transcription proxy acts as a bridge between client applications and your configured ASR provider:

┌─────────────────┐     ┌─────────────────────┐     ┌─────────────────┐
│  iOS Shortcut   │────▶│  Transcription      │────▶│  ASR Provider   │
│  or other app   │     │  Proxy (:61337)     │     │  (configured)   │
└─────────────────┘     └─────────────────────┘     └─────────────────┘
                                                           │
                              ┌────────────────────────────┼────────────────────────────┐
                              ▼                            ▼                            ▼
                        ┌──────────┐                 ┌──────────┐                 ┌──────────┐
                        │ Wyoming  │                 │  OpenAI  │                 │  Gemini  │
                        │ (Local)  │                 │  (Cloud) │                 │  (Cloud) │
                        └──────────┘                 └──────────┘                 └──────────┘

The proxy reads your agent-cli configuration to determine which ASR provider to use, then forwards transcription requests accordingly.

Use Cases

iOS Shortcuts Integration

The proxy provides a simple HTTP endpoint that iOS Shortcuts can call to transcribe audio. See the iOS Shortcut Guide for setup instructions.

Custom Applications

Any application that can make HTTP requests can use the proxy to access transcription services without needing to implement provider-specific logic.

Installation

Requires the server extra:

pip install "agent-cli[server]"
# or
uv sync --extra server

Docker

Run using the published container image:

# Run standalone
docker run -p 61337:61337 ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest

# Or use docker-compose (included in both cuda and cpu profiles)
docker compose -f docker/docker-compose.yml --profile cpu up transcribe-proxy

Environment Variables

Configure the proxy using environment variables (priority: env var > config file > default):

Variable	Default	Description
`ASR_PROVIDER`	`wyoming`	ASR provider: `wyoming`, `openai`, `gemini`
`ASR_WYOMING_IP`	`localhost`	Wyoming ASR server hostname/IP
`ASR_WYOMING_PORT`	`10300`	Wyoming ASR server port
`ASR_OPENAI_MODEL`	`whisper-1`	OpenAI ASR model name
`ASR_OPENAI_BASE_URL`	-	Custom OpenAI-compatible ASR endpoint
`ASR_OPENAI_PROMPT`	-	Custom prompt to guide transcription
`ASR_GEMINI_MODEL`	`gemini-3-flash-preview`	Gemini ASR model name
`LLM_PROVIDER`	`ollama`	LLM provider: `ollama`, `openai`, `gemini`
`LLM_OLLAMA_MODEL`	`gemma3:4b`	Ollama model name
`LLM_OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`LLM_OPENAI_MODEL`	`gpt-5-mini`	OpenAI model name
`LLM_GEMINI_MODEL`	`gemini-3-flash-preview`	Gemini model name
`TTS_PROVIDER`	`wyoming`	TTS provider: `wyoming`, `openai`, `kokoro`, `gemini`
`LOG_LEVEL`	`info`	Logging level: `debug`, `info`, `warning`, `error`
`OPENAI_API_KEY`	-	OpenAI API key
`OPENAI_BASE_URL`	-	Custom OpenAI-compatible API base URL
`GEMINI_API_KEY`	-	Gemini API key

Example with Wyoming ASR in Docker Compose:

docker run -p 61337:61337 \
  -e ASR_WYOMING_IP=agent-cli-whisper \
  -e ASR_WYOMING_PORT=10300 \
  ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest

Custom Config File

To use a custom config, mount it as a volume:

docker run -p 61337:61337 \
  -v ./config.toml:/home/transcribe/.config/agent-cli/config.toml:ro \
  ghcr.io/basnijholt/agent-cli-transcribe-proxy:latest