Skip to content

assistant

A hands-free voice assistant that activates on a wake word.

Usage

agent-cli assistant [OPTIONS]

Description

This agent continuously listens for a wake word (e.g., "Hey Nabu"):

  1. Run the command—it starts listening for the wake word
  2. Say the wake word to start recording
  3. Speak your command or question
  4. Say the wake word again to stop recording
  5. The agent transcribes, sends to the LLM, and speaks the response
  6. Immediately returns to listening for the wake word

Examples

# Start with default wake word
agent-cli assistant --input-device-index 1

# With custom wake word
agent-cli assistant --wake-word "hey_jarvis" --input-device-index 1

# With TTS responses
agent-cli assistant --tts --input-device-index 1

# Custom wake word server
agent-cli assistant --wake-server-ip 192.168.1.100 --wake-server-port 10400

Options

Provider Selection

Option Default Description
--asr-provider wyoming The ASR provider to use ('wyoming', 'openai', 'gemini').
--llm-provider ollama The LLM provider to use ('ollama', 'openai', 'gemini').
--tts-provider wyoming The TTS provider to use ('wyoming', 'openai', 'kokoro', 'gemini').

Wake Word

Option Default Description
--wake-server-ip localhost Wyoming wake word server IP (requires wyoming-openwakeword or similar).
--wake-server-port 10400 Wyoming wake word server port.
--wake-word ok_nabu Wake word to detect. Common options: ok_nabu, hey_jarvis, alexa. Must match a model loaded in your wake word server.

Audio Input

Option Default Description
--input-device-index - Audio input device index (see --list-devices). Uses system default if omitted.
--input-device-name - Select input device by name substring (e.g., MacBook or USB).
--list-devices false List available audio devices with their indices and exit.

Audio Input: Wyoming

Option Default Description
--asr-wyoming-ip localhost Wyoming ASR server IP address.
--asr-wyoming-port 10300 Wyoming ASR server port.

Audio Input: OpenAI-compatible

Option Default Description
--asr-openai-model whisper-1 The OpenAI model to use for ASR (transcription).

Audio Input: Gemini

Option Default Description
--asr-gemini-model gemini-3-flash-preview The Gemini model to use for ASR (transcription).

LLM: Ollama

Option Default Description
--llm-ollama-model gemma3:4b The Ollama model to use. Default is gemma3:4b.
--llm-ollama-host http://localhost:11434 The Ollama server host. Default is http://localhost:11434.

LLM: OpenAI-compatible

Option Default Description
--llm-openai-model gpt-5-mini The OpenAI model to use for LLM tasks.
--openai-api-key - Your OpenAI API key. Can also be set with the OPENAI_API_KEY environment variable.
--openai-base-url - Custom base URL for OpenAI-compatible API (e.g., for llama-server: http://localhost:8080/v1).

LLM: Gemini

Option Default Description
--llm-gemini-model gemini-3-flash-preview The Gemini model to use for LLM tasks.
--gemini-api-key - Your Gemini API key. Can also be set with the GEMINI_API_KEY environment variable.

Audio Output

Option Default Description
--tts/--no-tts false Enable text-to-speech for responses.
--output-device-index - Audio output device index (see --list-devices for available devices).
--output-device-name - Partial match on device name (e.g., 'speakers', 'headphones').
--tts-speed 1.0 Speech speed multiplier (1.0 = normal, 2.0 = twice as fast, 0.5 = half speed).

Audio Output: Wyoming

Option Default Description
--tts-wyoming-ip localhost Wyoming TTS server IP address.
--tts-wyoming-port 10200 Wyoming TTS server port.
--tts-wyoming-voice - Voice name to use for Wyoming TTS (e.g., 'en_US-lessac-medium').
--tts-wyoming-language - Language for Wyoming TTS (e.g., 'en_US').
--tts-wyoming-speaker - Speaker name for Wyoming TTS voice.

Audio Output: OpenAI-compatible

Option Default Description
--tts-openai-model tts-1 The OpenAI model to use for TTS.
--tts-openai-voice alloy Voice for OpenAI TTS (alloy, echo, fable, onyx, nova, shimmer).
--tts-openai-base-url - Custom base URL for OpenAI-compatible TTS API (e.g., http://localhost:8000/v1 for a proxy).

Audio Output: Kokoro

Option Default Description
--tts-kokoro-model kokoro The Kokoro model to use for TTS.
--tts-kokoro-voice af_sky The voice to use for Kokoro TTS.
--tts-kokoro-host http://localhost:8880/v1 The base URL for the Kokoro API.

Audio Output: Gemini

Option Default Description
--tts-gemini-model gemini-2.5-flash-preview-tts The Gemini model to use for TTS.
--tts-gemini-voice Kore The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', 'Charon', 'Fenrir').

Process Management

Option Default Description
--stop false Stop any running instance of this command.
--status false Check if an instance is currently running.
--toggle false Start if not running, stop if running. Ideal for hotkey binding.

General Options

Option Default Description
--save-file - Save audio to WAV file instead of playing through speakers.
--clipboard/--no-clipboard true Copy result to clipboard.
--log-level warning Set logging level.
--log-file - Path to a file to write logs to.
--quiet, -q false Suppress console output from rich.
--config - Path to a TOML configuration file.
--print-args false Print the command line arguments, including variables taken from the configuration file.

Available Wake Words

Available wake words depend on which models you preload. The provided scripts preload ok_nabu by default.

Common models include:

  • ok_nabu (default in provided scripts)
  • hey_jarvis
  • alexa

Add more models via --preload-model when starting OpenWakeWord. Custom wake words can be trained and added to the OpenWakeWord server.

Interaction Flow

┌─────────────────────────────────────────┐
│         Listening for wake word         │
│              "ok_nabu"                  │
└───────────────────┬─────────────────────┘
                    │ Wake word detected
┌─────────────────────────────────────────┐
│            Recording speech             │
│         (speak your question)           │
└───────────────────┬─────────────────────┘
                    │ Wake word again
┌─────────────────────────────────────────┐
│     Transcribe → LLM → TTS (if enabled) │
└───────────────────┬─────────────────────┘
              Back to listening

Tips

  • Speak clearly after the wake word is detected
  • Wait for the TTS response to finish before saying the wake word again
  • Use --tts for a more natural conversation experience