Skip to content

transcribe

Transcribe audio from your microphone to text.

Usage

agent-cli transcribe [OPTIONS]

Description

This command:

  1. Starts listening to your microphone immediately
  2. Records your speech
  3. When you press Ctrl+C, stops recording and finalizes transcription (Wyoming streams live; OpenAI uploads after stop)
  4. Copies the transcribed text to your clipboard
  5. Optionally uses an LLM to clean up the transcript

Examples

# Basic transcription
agent-cli transcribe --input-device-index 1

# With LLM cleanup
agent-cli transcribe --input-device-index 1 --llm

# List available audio devices
agent-cli transcribe --list-devices

# Transcribe from a saved file (supports wav, mp3, m4a, ogg, flac, aac, webm)
agent-cli transcribe --from-file recording.wav

# Transcribe an MP3 file with OpenAI
agent-cli transcribe --from-file podcast.mp3 --asr-provider openai

# Transcribe an M4A voice memo with Gemini
agent-cli transcribe --from-file voice_memo.m4a --asr-provider gemini

# Re-transcribe most recent recording
agent-cli transcribe --last-recording 1

Supported Audio Formats

The --from-file option supports multiple audio formats:

Provider Supported Formats
OpenAI mp3, mp4, mpeg, mpga, m4a, wav, webm
Gemini wav, mp3, aiff, aac, ogg, flac, m4a
Wyoming Any format (converted via ffmpeg)

Note

For non-WAV formats with the Wyoming provider, ffmpeg must be installed on your system.

Options

LLM Configuration

Option Default Description
--extra-instructions - Extra instructions appended to the LLM cleanup prompt (requires --llm).
--llm/--no-llm false Clean up transcript with LLM: fix errors, add punctuation, remove filler words. Uses --extra-instructions if set (via CLI or config file).

Audio Recovery

Option Default Description
--from-file - Transcribe from audio file instead of microphone. Supports wav, mp3, m4a, ogg, flac, aac, webm. Requires ffmpeg for non-WAV formats with Wyoming.
--last-recording 0 Re-transcribe a saved recording (1=most recent, 2=second-to-last, etc). Useful after connection failures or to retry with different options.
--save-recording/--no-save-recording true Save recordings to ~/.cache/agent-cli/ for --last-recording recovery.

Provider Selection

Option Default Description
--asr-provider wyoming The ASR provider to use ('wyoming', 'openai', 'gemini').
--llm-provider ollama The LLM provider to use ('ollama', 'openai', 'gemini').

Audio Input

Option Default Description
--input-device-index - Audio input device index (see --list-devices). Uses system default if omitted.
--input-device-name - Select input device by name substring (e.g., MacBook or USB).
--list-devices false List available audio devices with their indices and exit.

Audio Input: Wyoming

Option Default Description
--asr-wyoming-ip localhost Wyoming ASR server IP address.
--asr-wyoming-port 10300 Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option Default Description
--asr-openai-model whisper-1 The OpenAI model to use for ASR (transcription).
--asr-openai-base-url - Custom base URL for OpenAI-compatible ASR API (e.g., for custom Whisper server: http://localhost:9898).
--asr-openai-prompt - Custom prompt to guide transcription (optional).

Audio Input: Gemini

Option Default Description
--asr-gemini-model gemini-3-flash-preview The Gemini model to use for ASR (transcription).

LLM: Ollama

Option Default Description
--llm-ollama-model gemma3:4b The Ollama model to use. Default is gemma3:4b.
--llm-ollama-host http://localhost:11434 The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option Default Description
--llm-openai-model gpt-5-mini The OpenAI model to use for LLM tasks.
--openai-api-key - Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
--openai-base-url - Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option Default Description
--llm-gemini-model gemini-3-flash-preview The Gemini model to use for LLM tasks.
--gemini-api-key - Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Process Management

Option Default Description
--stop false Stop any running instance of this command.
--status false Check if an instance is currently running.
--toggle false Start if not running, stop if running. Ideal for hotkey binding.

General Options

Option Default Description
--clipboard/--no-clipboard true Copy result to clipboard.
--log-level warning Set logging level.
--log-file - Path to a file to write logs to.
--quiet, -q false Suppress console output from rich.
--json false Output result as JSON (implies --quiet and --no-clipboard).
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.
--transcription-log - Append transcripts to JSONL file (timestamp, hostname, model, raw/processed text). Recent entries provide context for LLM cleanup.

Workflow Integration

Toggle Recording Hotkey

The --toggle flag is designed for hotkey integration:

# First press: starts recording
agent-cli transcribe --toggle --input-device-index 1

# Second press: stops recording and transcribes
agent-cli transcribe --toggle

macOS Hotkey (skhd)

cmd + shift + r : /path/to/agent-cli transcribe --toggle --input-device-index 1

Transcription Log

Log all transcriptions with timestamps:

agent-cli transcribe --transcription-log ~/.config/agent-cli/transcriptions.log

Tips

  • Use --list-devices to find your microphone's index
  • Enable --llm for cleaner output with proper punctuation
  • Use --last-recording 1 to re-transcribe if you need to adjust settings