Talk to your Mac.
It sees, and does the thing.

A native macOS assistant that lives next to your cursor. Press a hotkey, speak, and it reads your screen, clicks and types for you, and answers back in voice — powered by the OpenAI Realtime API.

macOS 14+ · Apple Silicon  ·  your own API key

The Cursor Voice orb resting inside a chat input field, with a glowing aurora ring around the mouse pointer.
What it does

Voice in. Action out.

No window to switch to, no copy-paste. The orb appears at your cursor, listens, and acts on whatever's in front of you.

Real-time voice

Streams your voice to the OpenAI Realtime API and talks back. Interrupt it mid-sentence — it stops cleanly and listens.

Sees your screen

Captures the display so it can answer about what you're looking at — and verify its own actions with a fresh screenshot after each one.

Drives the Mac

Clicks buttons by name through the Accessibility tree, types, scrolls, runs AppleScript and shell. Mouse simulation only as a last resort.

Reads text on screen

When a target isn't in the Accessibility tree, it OCRs the screen with the Vision framework and clicks the text directly.

Live web access

Searches the web and reads pages for current information instead of guessing from stale training data.

Remembers

Keeps durable facts — your preferred apps, project paths, common tasks — in local memory across sessions.

How it works

From hotkey to done.

1

Summon

Press the hotkey (default ⌃⌥/) or say the wake word. The orb materializes right at your cursor.

2

Speak

Ask for anything — "what's this error", "open my downloads", "search for X and summarize it". It captures the screen if it needs context.

3

It acts

It picks the most reliable path — Accessibility click, OCR, AppleScript, a keyboard shortcut — and a halo follows the cursor while it works.

4

It verifies

After each action it takes a fresh screenshot and checks the result actually happened before moving on. Then it tells you it's done.

Why it lands the click

It hits the right thing,
almost every time.

Most assistants guess pixel coordinates from a screenshot and miss a lot. Cursor Voice doesn't guess — it targets real UI, and falls back through layers until something lands.

1 · Accessibility tree

Reads the app's real controls and fires the button's own action directly — no mouse simulation, no coordinate math. Pixel-perfect on native macOS UI.

2 · On-screen text

If a target isn't in the accessibility tree, it reads the screen with the Vision framework and clicks the text — covering web pages, Electron, anything with a label.

3 · Verify & retry

After every action it takes a fresh screenshot and checks the result actually happened. If the UI didn't change, it re-locates and tries again instead of plowing ahead.

Under the hood

The tools it can reach for.

The model chooses the most direct path for each task. Clicking real UI is preferred over guessing pixels.

click_elementAXPress, no mouse sim
click_textVision OCR
see_screennative capture
batch_actionsmulti-step in one call
web_search+ fetch_url
run_applescript
run_shellguarded
type_text· hotkey · scroll
open_url
remember· recall
Install

Up and running in a minute.

It's ad-hoc signed (no paid Apple Developer ID), so the installer and Homebrew cask strip the Gatekeeper quarantine for you.

~/cursor-voice — install
$ curl -fsSL https://raw.githubusercontent.com/cursorvoice/cursor-voice/main/install.sh | bash
→ downloads the latest release, drops it in /Applications, launches it
$ brew tap cursorvoice/cursor-voice
$ brew install --cask cursor-voice

Download the DMG

Prefer to drag it in yourself? Grab the latest release DMG and right-click → Open on first launch.

Auto-updates

Once it's installed, new versions land through the in-app updater — a banner in Settings, one click to install and relaunch.

Permissions

It asks for Microphone, Screen Recording, and Accessibility — each unlocks one capability. Grant them, then relaunch.

Keeping it real

This is an early beta.

It works, and it's genuinely useful — but it's a project in active development, not a finished product. Here's the honest state of things so you know what you're getting.

Clicking is good, not perfect. Accessibility-based clicks and OCR land most of the time. Unlabeled icons and pure-canvas UIs can still trip it up.

Apple Silicon + macOS 14 only. No Intel build, no iOS, no Windows. Built and tested on one machine so far.

It controls your computer. Shell and AppleScript run with your full permissions (destructive commands are blocked). Run it on machines you trust.

Not notarized. No paid Developer ID yet, so Gatekeeper warns on first launch. The installer works around it; you can read every line of the source.

You bring the API key. It uses your own OpenAI Realtime usage — there's no hosted backend and nothing is billed by this project.

Feedback wanted. Bugs, rough edges, and ideas are exactly what this stage is for. Open an issue on GitHub.