Skip to content

speak

Convert text to speech using a local or remote TTS engine.

Usage

agent-cli speak [TEXT]

Description

A straightforward text-to-speech utility:

  1. Takes text from a command-line argument or your clipboard
  2. Sends the text to a TTS server
  3. Plays the generated audio through your speakers

Examples

# Speak from argument
agent-cli speak "Hello, world!"

# Speak from clipboard
agent-cli speak

# Save to file instead of playing
agent-cli speak "Hello" --save-file hello.wav

# List audio output devices
agent-cli speak --list-devices

Options

Provider Selection

Option Default Description
--tts-provider wyoming The TTS provider to use ('wyoming', 'openai', 'kokoro', 'gemini').

Audio Output

Option Default Description
--output-device-index - Audio output device index (see --list-devices for available devices).
--output-device-name - Partial match on device name (e.g., 'speakers', 'headphones').
--tts-speed 1.0 Speech speed multiplier (1.0 = normal, 2.0 = twice as fast, 0.5 = half speed).

Audio Output: Wyoming

Option Default Description
--tts-wyoming-ip localhost Wyoming TTS server IP address.
--tts-wyoming-port 10200 Wyoming TTS server port.
--tts-wyoming-voice - Voice name to use for Wyoming TTS (e.g., 'en_US-lessac-medium').
--tts-wyoming-language - Language for Wyoming TTS (e.g., 'en_US').
--tts-wyoming-speaker - Speaker name for Wyoming TTS voice.

Audio Output: OpenAI-compatible

Option Default Description
--tts-openai-model tts-1 The OpenAI model to use for TTS.
--tts-openai-voice alloy Voice for OpenAI TTS (alloy, echo, fable, onyx, nova, shimmer).
--tts-openai-base-url - Custom base URL for OpenAI-compatible TTS API (e.g., http://localhost:8000/v1 for a proxy).

Audio Output: Kokoro

Option Default Description
--tts-kokoro-model kokoro The Kokoro model to use for TTS.
--tts-kokoro-voice af_sky The voice to use for Kokoro TTS.
--tts-kokoro-host http://localhost:8880/v1 The base URL for the Kokoro API.

Audio Output: Gemini

Option Default Description
--tts-gemini-model gemini-2.5-flash-preview-tts The Gemini model to use for TTS.
--tts-gemini-voice Kore The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', 'Charon', 'Fenrir').

LLM: Gemini

Option Default Description
--gemini-api-key - Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Audio Input

Option Default Description
--list-devices false List available audio devices with their indices and exit.

General Options

Option Default Description
--save-file - Save audio to WAV file instead of playing through speakers.
--log-level warning Set logging level.
--log-file - Path to a file to write logs to.
--quiet, -q false Suppress console output from rich.
--json false Output result as JSON (implies --quiet and --no-clipboard).
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.

Process Management

Option Default Description
--stop false Stop any running instance of this command.
--status false Check if an instance is currently running.
--toggle false Start if not running, stop if running. Ideal for hotkey binding.

Available Voices

Wyoming (Piper)

List available voices:

# Check Piper documentation or run with verbose logging
agent-cli speak --log-level DEBUG "test"

Common voices:

  • en_US-lessac-medium - US English, natural
  • en_GB-alan-medium - British English
  • de_DE-thorsten-medium - German

OpenAI

  • alloy, echo, fable, onyx, nova, shimmer

Kokoro

  • af_sky, af_bella, am_adam, and more

Gemini

  • Kore (default), Puck, Charon, Fenrir

Use Cases

Read Clipboard Aloud

agent-cli speak

Speed Up Audio

agent-cli speak "Long text here" --tts-speed 1.5

Save for Later

agent-cli speak "Important reminder" --save-file reminder.wav