Windows Installation Guide

Warning

Community Testing Needed! This Windows setup has not been tested on real Windows hardware yet. The scripts are direct translations of the working Linux/macOS scripts.

If you try this and it works (or doesn't), please open an issue to let us know! Pull requests with improvements are very welcome.

agent-cli works natively on Windows - no WSL required! All services (Ollama, Whisper, Piper) run directly on Windows.

Prerequisites

Windows 10/11
8GB+ RAM (16GB+ recommended for GPU acceleration)
10GB free disk space

For GPU Acceleration (Optional)

NVIDIA GPU (GTX 1060+ or RTX series recommended)
NVIDIA drivers installed
CUDA 12 and cuDNN 9 (see faster-whisper GPU docs)

Quick Start (Cloud Providers)

The fastest way to get started - no local services needed:

# Install uv (Python package manager)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Install agent-cli
uv tool install agent-cli -p 3.13

# Use with cloud providers (requires API keys)
$env:OPENAI_API_KEY = "sk-..."
agent-cli transcribe --asr-provider openai --llm-provider openai

Full Local Setup (Recommended)

For a completely local setup with no internet dependency.

Script-Based Installation (Recommended)

Clone the repository:

git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli

Run the setup script:

powershell -ExecutionPolicy Bypass -File scripts/setup-windows.ps1

Start all services:

powershell -ExecutionPolicy Bypass -File scripts/start-all-services-windows.ps1

Test the setup:

agent-cli transcribe

Manual Installation

If you prefer manual setup:

Install uv:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Install Ollama:

Download from ollama.com and install. Then:

ollama pull gemma3:4b

Install agent-cli:

uv tool install agent-cli -p 3.13

Run services individually:

# Terminal 1: Ollama (may already be running as a service)
ollama serve

# Terminal 2: Whisper
agent-cli server whisper

# Terminal 3: Piper
agent-cli server tts --backend piper

Services Overview

Service	Port	GPU Support	Description
Ollama	11434	✅ CUDA	LLM inference
Whisper	10300	✅ CUDA	Speech-to-text (ASR)
Piper	10200	N/A	Text-to-speech (TTS)

GPU Acceleration

The scripts automatically detect NVIDIA GPU and use:

With GPU (CUDA): large-v3 model for best accuracy
Without GPU: tiny model for faster CPU inference

To verify GPU is being used:

nvidia-smi

Global Hotkeys with AutoHotkey

Use AutoHotkey v2 for global keyboard shortcuts.

Create a file named agent-cli.ahk:

#Requires AutoHotkey v2.0
Persistent

; Win+Shift+W - Toggle transcription
#+w::{
    statusFile := A_Temp . "\agent-cli-status.txt"
    cmd := Format('{1} /C agent-cli transcribe --status > "{2}" 2>&1', A_ComSpec, statusFile)
    RunWait(cmd, , "Hide")
    status := FileRead(statusFile)
    if InStr(status, "not running") {
        TrayTip("Starting transcription...", "agent-cli", 1)
        Run("agent-cli transcribe --toggle", , "Hide")
    } else {
        TrayTip("Stopping transcription...", "agent-cli", 1)
        Run("agent-cli transcribe --toggle", , "Hide")
    }
}

; Win+Shift+A - Autocorrect clipboard
#+a::{
    TrayTip("Autocorrecting...", "agent-cli", 1)
    Run("agent-cli autocorrect", , "Hide")
}

; Win+Shift+E - Voice edit selection
#+e::{
    Send("^c")
    ClipWait(1)
    TrayTip("Voice editing...", "agent-cli", 1)
    Run("agent-cli voice-edit", , "Hide")
}

Double-click the script to run it.

Tip

To run at startup: Press Win+R, type shell:startup, and place a shortcut to your .ahk file there.

Troubleshooting

Audio device not found

Run agent-cli transcribe --list-devices and use --input-device-index with your microphone's index.

Wyoming server connection refused

Ensure the services are running:

# Check if ports are in use
netstat -an | findstr "10300 10200 11434"

GPU not being used

Verify NVIDIA drivers: nvidia-smi
Check CUDA installation
Set device explicitly: $env:WHISPER_DEVICE = "cuda"

Ollama not responding

Check if Ollama is running:

ollama list

If not, start it: ollama serve or launch from Start Menu.