transcribe-daemon
A continuous background transcription service with voice activity detection (VAD).
Usage
agent-cli transcribe-daemon [OPTIONS]
Description
Runs as a daemon, listening to your microphone and automatically segmenting speech using voice activity detection:
- Starts listening immediately
- Detects when you start and stop speaking
- Automatically transcribes each speech segment
- Logs results with timestamps
- Optionally saves audio as MP3 files
Press Ctrl+C to stop the daemon.
Installation
Requires the vad extra:
uv tool install "agent-cli[vad]"
# or
pip install "agent-cli[vad]"
Examples
# Basic daemon
agent-cli transcribe-daemon
# With custom role
agent-cli transcribe-daemon --role meeting
# With LLM cleanup
agent-cli transcribe-daemon --llm
# Custom silence threshold
agent-cli transcribe-daemon --silence-threshold 1.5
Options
VAD Configuration
| Option |
Description |
Default |
-r, --role |
Role name for logging (e.g., 'meeting', 'notes') |
user |
-s, --silence-threshold |
Seconds of silence to end a segment |
1.0 |
-m, --min-segment |
Minimum speech duration in seconds |
0.25 |
--vad-threshold |
Speech detection threshold (0.0-1.0) |
0.3 |
Audio Storage
| Option |
Description |
Default |
--save-audio / --no-save-audio |
Save audio segments as MP3 |
true |
--audio-dir PATH |
Directory for MP3 files |
~/.config/agent-cli/audio |
-t, --transcription-log PATH |
JSON Lines log file |
~/.config/agent-cli/transcriptions.jsonl |
--clipboard / --no-clipboard |
Copy each transcription to clipboard |
false |
Provider Selection
| Option |
Description |
Default |
--asr-provider |
ASR provider: wyoming, openai |
wyoming |
--llm-provider |
LLM provider: ollama, openai, gemini |
ollama |
| Option |
Description |
--input-device-index |
Index of audio input device |
--input-device-name |
Input device name keywords |
--list-devices |
List available devices |
ASR (Wyoming, local)
| Option |
Description |
Default |
--asr-wyoming-ip |
Wyoming ASR server IP |
localhost |
--asr-wyoming-port |
Wyoming ASR server port |
10300 |
ASR (OpenAI)
| Option |
Description |
Default |
--asr-openai-model |
OpenAI ASR model |
whisper-1 |
--asr-openai-base-url |
Custom Whisper server URL |
- |
--asr-openai-prompt |
Custom prompt to guide transcription |
- |
LLM (Ollama, local)
| Option |
Description |
Default |
--llm-ollama-model |
Ollama model to use |
gemma3:4b |
--llm-ollama-host |
Ollama server URL |
http://localhost:11434 |
LLM (OpenAI)
| Option |
Description |
Default |
--llm-openai-model |
OpenAI model to use |
gpt-5-mini |
--openai-api-key |
OpenAI API key (or set OPENAI_API_KEY) |
- |
--openai-base-url |
Custom OpenAI-compatible API URL (or set OPENAI_BASE_URL) |
- |
LLM (Gemini)
| Option |
Description |
Default |
--llm-gemini-model |
Gemini model to use |
gemini-2.5-flash |
--gemini-api-key |
Gemini API key (or set GEMINI_API_KEY) |
- |
LLM Cleanup
| Option |
Description |
Default |
--llm / --no-llm |
Use LLM to process transcript |
false |
Process Management
| Option |
Description |
--stop |
Stop running daemon |
--status |
Check if daemon is running |
General Options
| Option |
Description |
Default |
--log-level |
Set logging level |
WARNING |
--log-file PATH |
Path to a file to write logs to |
- |
--quiet, -q |
Suppress console output |
false |
--config PATH |
Path to a TOML configuration file |
- |
--print-args |
Print resolved arguments including config values |
false |
Output Files
Transcription Log
JSON Lines format at ~/.config/agent-cli/transcriptions.jsonl:
{"timestamp": "2024-01-15T10:30:45", "role": "user", "text": "Hello world", "audio_file": "..."}
Audio Files
Organized by date at ~/.config/agent-cli/audio/YYYY/MM/DD/*.mp3
Use Cases
Meeting Notes
agent-cli transcribe-daemon --role meeting --silence-threshold 2.0
Personal Notes
agent-cli transcribe-daemon --role notes --llm
Background Logging
agent-cli transcribe-daemon --no-clipboard &