transcribe-live

Continuous live transcription with voice activity detection (VAD).

Usage

agent-cli transcribe-live [OPTIONS]

Description

Runs continuously, listening to your microphone and automatically segmenting speech using voice activity detection:

Starts listening immediately
Detects when you start and stop speaking
Automatically transcribes each speech segment
Logs results with timestamps
Optionally saves audio as MP3 files

Press Ctrl+C to stop.

Segments shorter than 0.3s are discarded even if --min-segment is set lower. Saving MP3 files requires FFmpeg; if it's not available, audio saving is disabled with a warning.

Installation

Requires the vad extra:

uv tool install "agent-cli[vad]" -p 3.13
# or
pip install "agent-cli[vad]"

Examples

# Basic daemon
agent-cli transcribe-live

# With custom role
agent-cli transcribe-live --role meeting

# With LLM cleanup
agent-cli transcribe-live --llm

# Custom silence threshold
agent-cli transcribe-live --silence-threshold 1.5

Options

Option	Default	Description
`--role, -r`	`user`	Label for log entries. Use to distinguish speakers or contexts in logs.
`--silence-threshold, -s`	`1.0`	Seconds of silence after speech to finalize a segment. Increase for slower speakers.
`--min-segment, -m`	`0.25`	Minimum seconds of speech required before a segment is processed. Filters brief sounds.
`--vad-threshold`	`0.3`	Silero VAD confidence threshold (0.0-1.0). Higher values require clearer speech; lower values are more sensitive to quiet/distant voices.
`--save-audio/--no-save-audio`	`true`	Save each speech segment as MP3. Requires `ffmpeg` to be installed.
`--audio-dir`	-	Base directory for MP3 files. Files are organized by date: `YYYY/MM/DD/HHMMSS_mmm.mp3`. Default: `~/.config/agent-cli/audio`.
`--transcription-log, -t`	-	JSONL file for transcript logging (one JSON object per line with timestamp, role, raw/processed text, audio path). Default: `~/.config/agent-cli/transcriptions.jsonl`.
`--clipboard/--no-clipboard`	`false`	Copy each completed transcription to clipboard (overwrites previous). Useful with `--llm` to get cleaned text.

Provider Selection

Option	Default	Description
`--asr-provider`	`wyoming`	The ASR provider to use ('wyoming', 'openai', 'gemini').
`--llm-provider`	`ollama`	The LLM provider to use ('ollama', 'openai', 'gemini').

Audio Input

Option	Default	Description
`--input-device-index`	-	Audio input device index (see `--list-devices`). Uses system default if omitted.
`--input-device-name`	-	Select input device by name substring (e.g., `MacBook` or `USB`).
`--list-devices`	`false`	List available audio devices with their indices and exit.

Audio Input: Wyoming

Option	Default	Description
`--asr-wyoming-ip`	`localhost`	Wyoming ASR server IP address.
`--asr-wyoming-port`	`10300`	Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option	Default	Description
`--asr-openai-model`	`whisper-1`	The OpenAI model to use for ASR (transcription).
`--asr-openai-base-url`	-	Custom base URL for OpenAI-compatible ASR API (e.g., for custom Whisper server: http://localhost:9898).
`--asr-openai-prompt`	-	Custom prompt to guide transcription (optional).

Audio Input: Gemini

Option	Default	Description
`--asr-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for ASR (transcription).

LLM: Ollama

Option	Default	Description
`--llm-ollama-model`	`gemma3:4b`	The Ollama model to use. Default is gemma3:4b.
`--llm-ollama-host`	`http://localhost:11434`	The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option	Default	Description
`--llm-openai-model`	`gpt-5-mini`	The OpenAI model to use for LLM tasks.
`--openai-api-key`	-	Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
`--openai-base-url`	-	Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option	Default	Description
`--llm-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for LLM tasks.
`--gemini-api-key`	-	Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

LLM Configuration

Option	Default	Description
`--llm/--no-llm`	`false`	Clean up transcript with LLM: fix errors, add punctuation, remove filler words. Uses `--extra-instructions` if set (via CLI or config file).

Process Management

Option	Default	Description
`--stop`	`false`	Stop any running instance of this command.
`--status`	`false`	Check if an instance is currently running.

General Options

Option	Default	Description
`--log-level`	`warning`	Set logging level.
`--log-file`	-	Path to a file to write logs to.
`--quiet, -q`	`false`	Suppress console output from rich.
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.

Output Files

Transcription Log

JSON Lines format at ~/.config/agent-cli/transcriptions.jsonl:

{"timestamp": "2024-01-15T10:30:45+00:00", "hostname": "my-host", "role": "user", "model": "wyoming", "raw_output": "Hello world", "processed_output": null, "audio_file": "/path/to/audio.mp3", "duration_seconds": 1.23}

processed_output is null when --llm is disabled.

Audio Files

Organized by date at ~/.config/agent-cli/audio/YYYY/MM/DD/*.mp3

Use Cases

Meeting Notes

agent-cli transcribe-live --role meeting --silence-threshold 2.0

Personal Notes

agent-cli transcribe-live --role notes --llm

Background Logging

agent-cli transcribe-live --no-clipboard &