assistant

A hands-free voice assistant that activates on a wake word.

Usage

agent-cli assistant [OPTIONS]

Description

This agent continuously listens for a wake word (e.g., "Hey Nabu"):

Run the command—it starts listening for the wake word
Say the wake word to start recording
Speak your command or question
Say the wake word again to stop recording
The agent transcribes, sends to the LLM, and speaks the response
Immediately returns to listening for the wake word

Examples

# Start with default wake word
agent-cli assistant --input-device-index 1

# With custom wake word
agent-cli assistant --wake-word "hey_jarvis" --input-device-index 1

# With TTS responses
agent-cli assistant --tts --input-device-index 1

# Custom wake word server
agent-cli assistant --wake-server-ip 192.168.1.100 --wake-server-port 10400

Options

Provider Selection

Option	Default	Description
`--asr-provider`	`wyoming`	The ASR provider to use ('wyoming', 'openai', 'gemini').
`--llm-provider`	`ollama`	The LLM provider to use ('ollama', 'openai', 'gemini').
`--tts-provider`	`wyoming`	The TTS provider to use ('wyoming', 'openai', 'kokoro', 'gemini').

Wake Word

Option	Default	Description
`--wake-server-ip`	`localhost`	Wyoming wake word server IP (requires wyoming-openwakeword or similar).
`--wake-server-port`	`10400`	Wyoming wake word server port.
`--wake-word`	`ok_nabu`	Wake word to detect. Common options: `ok_nabu`, `hey_jarvis`, `alexa`. Must match a model loaded in your wake word server.

Audio Input

Option	Default	Description
`--input-device-index`	-	Audio input device index (see `--list-devices`). Uses system default if omitted.
`--input-device-name`	-	Select input device by name substring (e.g., `MacBook` or `USB`).
`--list-devices`	`false`	List available audio devices with their indices and exit.

Audio Input: Wyoming

Option	Default	Description
`--asr-wyoming-ip`	`localhost`	Wyoming ASR server IP address.
`--asr-wyoming-port`	`10300`	Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option	Default	Description
`--asr-openai-model`	`whisper-1`	The OpenAI model to use for ASR (transcription).

Audio Input: Gemini

Option	Default	Description
`--asr-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for ASR (transcription).

LLM: Ollama

Option	Default	Description
`--llm-ollama-model`	`gemma3:4b`	The Ollama model to use. Default is gemma3:4b.
`--llm-ollama-host`	`http://localhost:11434`	The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option	Default	Description
`--llm-openai-model`	`gpt-5-mini`	The OpenAI model to use for LLM tasks.
`--openai-api-key`	-	Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
`--openai-base-url`	-	Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option	Default	Description
`--llm-gemini-model`	`gemini-3-flash-preview`	The Gemini model to use for LLM tasks.
`--gemini-api-key`	-	Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Audio Output

Option	Default	Description
`--tts/--no-tts`	`false`	Enable text-to-speech for responses.
`--output-device-index`	-	Audio output device index (see `--list-devices` for available devices).
`--output-device-name`	-	Partial match on device name (e.g., 'speakers', 'headphones').
`--tts-speed`	`1.0`	Speech speed multiplier (1.0 = normal, 2.0 = twice as fast, 0.5 = half speed).

Audio Output: Wyoming

Option	Default	Description
`--tts-wyoming-ip`	`localhost`	Wyoming TTS server IP address.
`--tts-wyoming-port`	`10200`	Wyoming TTS server port.
`--tts-wyoming-voice`	-	Voice name to use for Wyoming TTS (e.g., 'en_US-lessac-medium').
`--tts-wyoming-language`	-	Language for Wyoming TTS (e.g., 'en_US').
`--tts-wyoming-speaker`	-	Speaker name for Wyoming TTS voice.

Audio Output: OpenAI-compatible

Option	Default	Description
`--tts-openai-model`	`tts-1`	The OpenAI model to use for TTS.
`--tts-openai-voice`	`alloy`	Voice for OpenAI TTS (alloy, echo, fable, onyx, nova, shimmer).
`--tts-openai-base-url`	-	Custom base URL for OpenAI-compatible TTS API (e.g., http://localhost:8000/v1 for a proxy).

Audio Output: Kokoro

Option	Default	Description
`--tts-kokoro-model`	`kokoro`	The Kokoro model to use for TTS.
`--tts-kokoro-voice`	`af_sky`	The voice to use for Kokoro TTS.
`--tts-kokoro-host`	`http://localhost:8880/v1`	The base URL for the Kokoro API.

Audio Output: Gemini

Option	Default	Description
`--tts-gemini-model`	`gemini-2.5-flash-preview-tts`	The Gemini model to use for TTS.
`--tts-gemini-voice`	`Kore`	The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', 'Charon', 'Fenrir').

Process Management

Option	Default	Description
`--stop`	`false`	Stop any running instance of this command.
`--status`	`false`	Check if an instance is currently running.
`--toggle`	`false`	Start if not running, stop if running. Ideal for hotkey binding.

General Options

Option	Default	Description
`--save-file`	-	Save audio to WAV file instead of playing through speakers.
`--clipboard/--no-clipboard`	`true`	Copy result to clipboard.
`--log-level`	`warning`	Set logging level.
`--log-file`	-	Path to a file to write logs to.
`--quiet, -q`	`false`	Suppress console output from rich.
`--config`	-	Path to a TOML configuration file.
`--print-args`	`false`	Print the command line arguments, including variables taken from the configuration file.

Available Wake Words

Available wake words depend on which models you preload. The provided scripts preload ok_nabu by default.

Common models include:

ok_nabu (default in provided scripts)
hey_jarvis
alexa

Add more models via --preload-model when starting OpenWakeWord. Custom wake words can be trained and added to the OpenWakeWord server.

Interaction Flow

┌─────────────────────────────────────────┐
│         Listening for wake word         │
│              "ok_nabu"                  │
└───────────────────┬─────────────────────┘
                    │ Wake word detected
                    ▼
┌─────────────────────────────────────────┐
│            Recording speech             │
│         (speak your question)           │
└───────────────────┬─────────────────────┘
                    │ Wake word again
                    ▼
┌─────────────────────────────────────────┐
│     Transcribe → LLM → TTS (if enabled) │
└───────────────────┬─────────────────────┘
                    │
                    ▼
              Back to listening

Tips

Speak clearly after the wake word is detected
Wait for the TTS response to finish before saying the wake word again
Use --tts for a more natural conversation experience