Skip to content

transcribe-live

Continuous live transcription with voice activity detection (VAD).

Usage

agent-cli transcribe-live [OPTIONS]

Description

Runs continuously, listening to your microphone and automatically segmenting speech using voice activity detection:

  1. Starts listening immediately
  2. Detects when you start and stop speaking
  3. Automatically transcribes each speech segment
  4. Logs results with timestamps
  5. Optionally saves audio as MP3 files

Press Ctrl+C to stop.

Segments shorter than 0.3s are discarded even if --min-segment is set lower. Saving MP3 files requires FFmpeg; if it's not available, audio saving is disabled with a warning.

Installation

Requires the vad extra:

uv tool install "agent-cli[vad]" -p 3.13
# or
pip install "agent-cli[vad]"

Examples

# Basic daemon
agent-cli transcribe-live

# With custom role
agent-cli transcribe-live --role meeting

# With LLM cleanup
agent-cli transcribe-live --llm

# Custom silence threshold
agent-cli transcribe-live --silence-threshold 1.5

Options

Options

Option Default Description
--role, -r user Label for log entries. Use to distinguish speakers or contexts in logs.
--silence-threshold, -s 1.0 Seconds of silence after speech to finalize a segment. Increase for slower speakers.
--min-segment, -m 0.25 Minimum seconds of speech required before a segment is processed. Filters brief sounds.
--vad-threshold 0.3 Silero VAD confidence threshold (0.0-1.0). Higher values require clearer speech; lower values are more sensitive to quiet/distant voices.
--save-audio/--no-save-audio true Save each speech segment as MP3. Requires ffmpeg to be installed.
--audio-dir - Base directory for MP3 files. Files are organized by date: YYYY/MM/DD/HHMMSS_mmm.mp3. Default: ~/.config/agent-cli/audio.
--transcription-log, -t - JSONL file for transcript logging (one JSON object per line with timestamp, role, raw/processed text, audio path). Default: ~/.config/agent-cli/transcriptions.jsonl.
--clipboard/--no-clipboard false Copy each completed transcription to clipboard (overwrites previous). Useful with --llm to get cleaned text.

Provider Selection

Option Default Description
--asr-provider wyoming The ASR provider to use ('wyoming', 'openai', 'gemini').
--llm-provider ollama The LLM provider to use ('ollama', 'openai', 'gemini').

Audio Input

Option Default Description
--input-device-index - Audio input device index (see --list-devices). Uses system default if omitted.
--input-device-name - Select input device by name substring (e.g., MacBook or USB).
--list-devices false List available audio devices with their indices and exit.

Audio Input: Wyoming

Option Default Description
--asr-wyoming-ip localhost Wyoming ASR server IP address.
--asr-wyoming-port 10300 Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option Default Description
--asr-openai-model whisper-1 The OpenAI model to use for ASR (transcription).
--asr-openai-base-url - Custom base URL for OpenAI-compatible ASR API (e.g., for custom Whisper server: http://localhost:9898).
--asr-openai-prompt - Custom prompt to guide transcription (optional).

Audio Input: Gemini

Option Default Description
--asr-gemini-model gemini-3-flash-preview The Gemini model to use for ASR (transcription).

LLM: Ollama

Option Default Description
--llm-ollama-model gemma3:4b The Ollama model to use. Default is gemma3:4b.
--llm-ollama-host http://localhost:11434 The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option Default Description
--llm-openai-model gpt-5-mini The OpenAI model to use for LLM tasks.
--openai-api-key - Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
--openai-base-url - Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option Default Description
--llm-gemini-model gemini-3-flash-preview The Gemini model to use for LLM tasks.
--gemini-api-key - Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

LLM Configuration

Option Default Description
--llm/--no-llm false Clean up transcript with LLM: fix errors, add punctuation, remove filler words. Uses --extra-instructions if set (via CLI or config file).

Process Management

Option Default Description
--stop false Stop any running instance of this command.
--status false Check if an instance is currently running.

General Options

Option Default Description
--log-level warning Set logging level.
--log-file - Path to a file to write logs to.
--quiet, -q false Suppress console output from rich.
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.

Output Files

Transcription Log

JSON Lines format at ~/.config/agent-cli/transcriptions.jsonl:

{"timestamp": "2024-01-15T10:30:45+00:00", "hostname": "my-host", "role": "user", "model": "wyoming", "raw_output": "Hello world", "processed_output": null, "audio_file": "/path/to/audio.mp3", "duration_seconds": 1.23}

processed_output is null when --llm is disabled.

Audio Files

Organized by date at ~/.config/agent-cli/audio/YYYY/MM/DD/*.mp3

Use Cases

Meeting Notes

agent-cli transcribe-live --role meeting --silence-threshold 2.0

Personal Notes

agent-cli transcribe-live --role notes --llm

Background Logging

agent-cli transcribe-live --no-clipboard &