voice-edit

A voice-powered clipboard assistant that edits text based on spoken commands.

Usage

agent-cli voice-edit [OPTIONS]

Description

This command is designed for a hotkey-driven workflow to act on text you've already copied:

Copy a block of text to your clipboard (e.g., an email draft)
Press a hotkey to start the agent—it begins listening
Speak a command: "Make this more formal" or "Summarize the key points"
Press the hotkey again to stop recording
The agent transcribes your command, sends it with the clipboard text to the LLM
The result is copied back to your clipboard
Optionally speaks the result if --tts is enabled

Examples

# Run in foreground
agent-cli voice-edit --input-device-index 1

# Run in background (for hotkey integration)
agent-cli voice-edit --input-device-index 1 &

# With text-to-speech response
agent-cli voice-edit --tts

# Check status
agent-cli voice-edit --status

# Stop background process
agent-cli voice-edit --stop

Options

Provider Selection

Option	Default	Description
`--asr-provider`	`wyoming`	The ASR provider to use ('wyoming', 'openai', 'gemini').
`--llm-provider`	`ollama`	The LLM provider to use ('ollama', 'openai', 'gemini').
`--tts-provider`	`wyoming`	The TTS provider to use ('wyoming', 'openai', 'kokoro', 'gemini').

Audio Input

Option	Default	Description
`--input-device-index`	-	Audio input device index (see `--list-devices`). Uses system default if omitted.
`--input-device-name`	-	Select input device by name substring (e.g., `MacBook` or `USB`).
`--list-devices`	`false`	List available audio devices with their indices and exit.

Audio Input: Wyoming

Option	Default	Description
`--asr-wyoming-ip`	`localhost`	Wyoming ASR server IP address.
`--asr-wyoming-port`	`10300`	Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option	Default	Description
`--asr-openai-model`	`whisper-1`	The OpenAI model to use for ASR (transcription).

Audio Input: Gemini

Option	Default	Description
`--asr-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for ASR (transcription).

LLM: Ollama

Option	Default	Description
`--llm-ollama-model`	`gemma3:4b`	The Ollama model to use. Default is gemma3:4b.
`--llm-ollama-host`	`http://localhost:11434`	The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option	Default	Description
`--llm-openai-model`	`gpt-5-mini`	The OpenAI model to use for LLM tasks.
`--openai-api-key`	-	Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
`--openai-base-url`	-	Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option	Default	Description
`--llm-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for LLM tasks.
`--gemini-api-key`	-	Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Audio Output

Option	Default	Description
`--tts/--no-tts`	`false`	Enable text-to-speech for responses.
`--output-device-index`	-	Audio output device index (see `--list-devices` for available devices).
`--output-device-name`	-	Partial match on device name (e.g., 'speakers', 'headphones').
`--tts-speed`	`1.0`	Speech speed multiplier (1.0 = normal, 2.0 = twice as fast, 0.5 = half speed).

Audio Output: Wyoming

Option	Default	Description
`--tts-wyoming-ip`	`localhost`	Wyoming TTS server IP address.
`--tts-wyoming-port`	`10200`	Wyoming TTS server port.
`--tts-wyoming-voice`	-	Voice name to use for Wyoming TTS (e.g., 'en_US-lessac-medium').
`--tts-wyoming-language`	-	Language for Wyoming TTS (e.g., 'en_US').
`--tts-wyoming-speaker`	-	Speaker name for Wyoming TTS voice.

Audio Output: OpenAI-compatible

Option	Default	Description
`--tts-openai-model`	`tts-1`	The OpenAI model to use for TTS.
`--tts-openai-voice`	`alloy`	Voice for OpenAI TTS (alloy, echo, fable, onyx, nova, shimmer).
`--tts-openai-base-url`	-	Custom base URL for OpenAI-compatible TTS API (e.g., http://localhost:8000/v1 for a proxy).

Audio Output: Kokoro

Option	Default	Description
`--tts-kokoro-model`	`kokoro`	The Kokoro model to use for TTS.
`--tts-kokoro-voice`	`af_sky`	The voice to use for Kokoro TTS.
`--tts-kokoro-host`	`http://localhost:8880/v1`	The base URL for the Kokoro API.

Audio Output: Gemini

Option	Default	Description
`--tts-gemini-model`	`gemini-2.5-flash-preview-tts`	The Gemini model to use for TTS.
`--tts-gemini-voice`	`Kore`	The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', 'Charon', 'Fenrir').

Process Management

Option	Default	Description
`--stop`	`false`	Stop any running instance of this command.
`--status`	`false`	Check if an instance is currently running.
`--toggle`	`false`	Start if not running, stop if running. Ideal for hotkey binding.

General Options

Option	Default	Description
`--save-file`	-	Save audio to WAV file instead of playing through speakers.
`--clipboard/--no-clipboard`	`true`	Copy result to clipboard.
`--log-level`	`warning`	Set logging level.
`--log-file`	-	Path to a file to write logs to.
`--quiet, -q`	`false`	Suppress console output from rich.
`--json`	`false`	Output result as JSON (implies `--quiet` and `--no-clipboard`).
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.

Hotkey Integration

macOS (skhd)

# Toggle voice-edit with Cmd+Shift+V
cmd + shift + v : /path/to/agent-cli voice-edit --toggle --input-device-index 1

Linux (Hyprland)

bind = SUPER SHIFT, V, exec, agent-cli voice-edit --toggle --input-device-index 1

Example Commands

Once activated, you can give commands like:

"Make this more formal"
"Summarize the key points"
"Fix the grammar"
"Translate to Spanish"
"Make it shorter"
"Add bullet points"
"Rewrite for a technical audience"

Workflow Example

Copy an email draft:

hey can u help me with the project tmrw?

Press hotkey, speak: "Make this professional"
Press hotkey again to stop

Paste the result:

Hello,

Would you be available to assist me with the project tomorrow?

Best regards