Getting Started¶
This guide walks you through installing Agent CLI and setting up your first voice-powered workflow.
Prerequisites¶
Before you begin, ensure you have:
- uv (recommended) or Python 3.11+
- A microphone for voice features
- Speakers for text-to-speech features
Installation¶
Option 1: CLI Tool Only¶
If you already have AI services set up or plan to use cloud services (OpenAI/Gemini):
Option 2: Full Local Setup¶
For a complete local setup with all AI services:
Verify Installation¶
Test Your Setup¶
Test Autocorrect¶
Test Transcription¶
# List available microphones
agent-cli transcribe --list-devices
# Start transcribing (press Ctrl+C to stop)
agent-cli transcribe --input-device-index 1
Test Text-to-Speech¶
Platform-Specific Guides¶
For detailed installation instructions, see the platform-specific guides:
| Platform | Guide | Notes |
|---|---|---|
| macOS | macOS Setup | Full Metal GPU acceleration |
| Linux | Linux Setup | NVIDIA GPU support |
| NixOS | NixOS Setup | Declarative configuration |
| Windows | Windows Setup | WSL2 recommended |
| Docker | Docker Setup | Cross-platform |
First Workflow: Voice Transcription¶
Here's a typical workflow for using voice transcription:
- Copy some text you want to respond to (e.g., an email)
- Press your hotkey (Cmd+Shift+R on macOS) to start recording
- Speak your response naturally
- Press the hotkey again to stop recording
- Paste the transcribed text wherever you need it
What's Next?¶
- Configuration - Customize settings and defaults
- Commands Reference - Explore all available commands
- System Integration - Set up system-wide hotkeys