speak

Convert text to speech using a local or remote TTS engine.

Usage

agent-cli speak [TEXT]

Description

A straightforward text-to-speech utility:

Takes text from a command-line argument or your clipboard
Sends the text to a TTS server
Plays the generated audio through your speakers

Examples

# Speak from argument
agent-cli speak "Hello, world!"

# Speak from clipboard
agent-cli speak

# Save to file instead of playing
agent-cli speak "Hello" --save-file hello.wav

# List audio output devices
agent-cli speak --list-devices

Options

Provider Selection

Option	Default	Description
`--tts-provider`	`wyoming`	The TTS provider to use ('wyoming', 'openai', 'kokoro', 'gemini').

Audio Output

Option	Default	Description
`--output-device-index`	-	Audio output device index (see `--list-devices` for available devices).
`--output-device-name`	-	Partial match on device name (e.g., 'speakers', 'headphones').
`--tts-speed`	`1.0`	Speech speed multiplier (1.0 = normal, 2.0 = twice as fast, 0.5 = half speed).

Audio Output: Wyoming

Option	Default	Description
`--tts-wyoming-ip`	`localhost`	Wyoming TTS server IP address.
`--tts-wyoming-port`	`10200`	Wyoming TTS server port.
`--tts-wyoming-voice`	-	Voice name to use for Wyoming TTS (e.g., 'en_US-lessac-medium').
`--tts-wyoming-language`	-	Language for Wyoming TTS (e.g., 'en_US').
`--tts-wyoming-speaker`	-	Speaker name for Wyoming TTS voice.

Audio Output: OpenAI-compatible

Option	Default	Description
`--tts-openai-model`	`tts-1`	The OpenAI model to use for TTS.
`--tts-openai-voice`	`alloy`	Voice for OpenAI TTS (alloy, echo, fable, onyx, nova, shimmer).
`--tts-openai-base-url`	-	Custom base URL for OpenAI-compatible TTS API (e.g., http://localhost:8000/v1 for a proxy).

Audio Output: Kokoro

Option	Default	Description
`--tts-kokoro-model`	`kokoro`	The Kokoro model to use for TTS.
`--tts-kokoro-voice`	`af_sky`	The voice to use for Kokoro TTS.
`--tts-kokoro-host`	`http://localhost:8880/v1`	The base URL for the Kokoro API.

Audio Output: Gemini

Option	Default	Description
`--tts-gemini-model`	`gemini-2.5-flash-preview-tts`	The Gemini model to use for TTS.
`--tts-gemini-voice`	`Kore`	The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', 'Charon', 'Fenrir').

LLM: Gemini

Option	Default	Description
`--gemini-api-key`	-	Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Audio Input

Option	Default	Description
`--list-devices`	`false`	List available audio devices with their indices and exit.

General Options

Option	Default	Description
`--save-file`	-	Save audio to WAV file instead of playing through speakers.
`--log-level`	`warning`	Set logging level.
`--log-file`	-	Path to a file to write logs to.
`--quiet, -q`	`false`	Suppress console output from rich.
`--json`	`false`	Output result as JSON (implies `--quiet` and `--no-clipboard`).
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.

Process Management

Option	Default	Description
`--stop`	`false`	Stop any running instance of this command.
`--status`	`false`	Check if an instance is currently running.
`--toggle`	`false`	Start if not running, stop if running. Ideal for hotkey binding.

Available Voices

Wyoming (Piper)

List available voices:

# Check Piper documentation or run with verbose logging
agent-cli speak --log-level DEBUG "test"

Common voices:

en_US-lessac-medium - US English, natural
en_GB-alan-medium - British English
de_DE-thorsten-medium - German

OpenAI

alloy, echo, fable, onyx, nova, shimmer

Kokoro

af_sky, af_bella, am_adam, and more

Gemini

Kore (default), Puck, Charon, Fenrir

Use Cases

Read Clipboard Aloud

agent-cli speak

Speed Up Audio

agent-cli speak "Long text here" --tts-speed 1.5

Save for Later

agent-cli speak "Important reminder" --save-file reminder.wav