I have XLRS, dyslexia, and ADHD.

I built Rift because every voice tool I tried fought how my brain works. This one doesn’t.

Voice to text. Text to voice. Entirely on your Mac.

Free research preview · Apple Silicon · macOS 14+

Why I built Rift

XLRS

A genetic eye condition that makes screen reading hard. I listen for hours a day — I wanted voices that don’t add their own fatigue.

Dyslexia

Text fights my brain. I needed to speak instead of type and listen instead of read — without my voice leaving my Mac.

ADHD

My thinking doesn’t follow a straight line. Most voice tools cut you off after two seconds of silence. Rift waits until you stop.

The features I built for myself turn out to help everyone.

What is XLRS?

X-linked retinoschisis is a rare genetic condition that affects the retina’s layers and central vision. It’s uncorrectable with glasses. Symptoms vary; prolonged screen reading often causes extra strain and fatigue.

Who Rift is for

The same choices that help me help anyone who wants patient dictation, natural speech, and privacy.

Dyslexia

If you think better out loud than on paper, Rift turns speech into text without fighting you — and reads it back when you need to hear what you wrote.

ADHD

If your brain takes detours, Rift doesn’t punish pauses or restarts. Live paste keeps the loop tight enough to follow.

Low vision

If reading the screen is tiring, Rift reads to you — fast first word, adjustable speed, pause anywhere — with voices made for long listens.

Motor differences

Hold-to-talk, global shortcuts, and no forced cutoff mean fewer precise key presses and no penalty for hesitation.

Writers & thinkers

If you think by talking, Rift captures it privately — on your Mac, under your control.

01

Voice to Text

Speak naturally. A local model cleans and merges as you go — not just raw transcription.

Listening...
2:34 and counting

You decide
when you're done.

My thoughts don’t follow a timer. ADHD means I pause mid-sentence to find the right word — other tools treated that pause as “done.”

No auto-endpointing

Speak. Pause. Think. Rift waits.
Other apps cut you off after 2 seconds of silence.

Others

"The quick brown—"

Cut off after pause

Rift

"The quick brown fox jumps over the lazy dog."

You press stop when ready

0ms

First-word capture

Your first word is never lost.
A 250ms lead-in buffer starts recording before you even finish pressing the button.

Buffered

Button pressed

Recording

"Hel—" is already captured

0s

Rolling context window

The model considers the last 25 seconds of audio.
It understands context, not just isolated words.

Context window
Now
-30s -25s -20s -15s -10s -5s 0s

Live paste

Text appears in your app as you speak.
Real-time streaming with final reconciliation when you stop.

The quick brown fox jumps over the lazy dog.

Auto-fix

Hallucination detection

If the first transcription guess is wrong, Rift detects it and auto-replaces.
No manual cleanup. No re-recording.

> The whether weather is nice today
detecting... fixed

Real-time

Streaming transcription

Audio is processed in chunks as you speak.
No waiting for you to finish.

Audio
Text
The quick brown fox jumps

And the smaller touches

Things you stop noticing — because they just work.

  • Silence polish. A few seconds of quiet, and Rift quietly cleans what you already pasted.
  • Polish modes. Verbatim keeps your words. Clean fixes obvious issues. Professional tightens tone.
  • Audio cues. Soft tones mark start and stop — confirmation without looking.
  • Toggle or hold-to-talk. Pick what fits your hands. Optional auto-send after paste for chat apps.
02

Text to Voice

Select text. Multiple engines. Natural speech — including code.

First word in
150 milliseconds.

I can’t always read the screen for long stretches. When audio is how I read, the first syllable can’t arrive late.

0ms

First-word latency

You hear the first word before the sentence finishes generating.
No loading spinners. No waiting.

Hello, world
"Hello..."
150ms to first sound

Seamless

Clause-level streaming

The next sentence is synthesized while the current one plays.
No gaps. No stutters. Continuous audio.

Playing: "The quick brown fox..."
Buffered: "jumps over the lazy dog."
Generating: "The end."

0ms

Audio poll rate

The audio buffer is checked every 20 milliseconds.
Imperceptible latency between chunks.

0ms 200ms

50 checks per second

Pause anywhere

Tap to pause mid-syllable. Tap again to resume from the exact position.
Your place is never lost.

Tap to pause

0.5× – 2×

Playback speed

Speed up for skimming. Slow down for comprehension.
Adjust in real-time without restarting.

0.5× 1.0×

And the smaller touches

The parts that make it usable for hours.

  • Code Talk. In Cursor, VS Code, Terminal, and docs, Rift speaks technical text naturally — overflow-x: hidden becomes “overflow-x set to hidden.”
  • Engines & voices. Kokoro (stable) and Chatterbox variants. 14+ voices. Download extras from the tray when you need them.
  • Global shortcuts. ⌃1 read selection. ⌃2 dictate. ⌃3 show, hide, or pause. One keystroke away.

How it works

Two pipelines. Local speech models. A local language model for merge, correction, and polish. Zero cloud for your voice and text.

01 Voice to Text
Ctrl + 2

Start dictation

1

Capture

Core Audio streams from your microphone with a 250ms lead-in buffer. Your first word is never lost.

2

Process

Parakeet runs on the Neural Engine and GPU via MLX. 25 seconds of rolling context. Real-time streaming.

3

Paste

Text appears at your cursor as you speak. Final reconciliation when you stop. On-device Gemma 4 polishes your text — see Intelligence.

02 Text to Voice
Ctrl + 1

Speak selected text

1

Select

Highlight text in any app or copy to clipboard. Rift reads whatever you give it.

2

Synthesize

Kokoro or Chatterbox generates audio clause-by-clause. First word in 150ms. Next sentence ready before current ends. Code Talk may run an LLM transform first in developer contexts.

3

Play

Audio streams to system output. Pause anywhere, resume from exact position. 0.5× to 2× speed.

Space Pause / Resume
Esc Stop

Four phases of local intelligence

Rift runs local language models (Gemma 4 + Qwen3, via MLX) next to Parakeet and TTS. Not just transcription — understanding and cleanup, on your Mac.

  1. Merge — New words fold into what came before. Fewer duplicates and jumps as the recognizer updates.
  2. Correct — Grammar, punctuation, and light formatting in real time. Numbers and phrasing stay intentional.
  3. Extract — When the model revises earlier audio, only genuinely new words are appended.
  4. Polish — On pause or stop (and silence polish), fillers can be trimmed, lists formatted, sentences smoothed — per your polish mode.

A fast Qwen3 0.6B tier handles real-time phases; a deeper Gemma 4 E4B tier powers polish and Code Talk transforms. All on-device.

Privacy.
That's Rift.

Your voice never leaves your Mac. Ever. When assistive tech is how you read and write, that isn’t abstract — it’s dignity.

100% on-device processing
No cloud. No servers.
No accounts required
Fully open source

Zero file I/O

Audio is synthesized directly to memory. Nothing is written to disk. Nothing persists after you close the app.

See the patience in action

A simplified replay: streaming text, a long pause, then an auto-fix. Skip to transcript

Rift

Ready

Demo transcript

Recording starts → text streams in → a 3s pause (other tools might have ended) → speech resumes → a wrong word auto-corrects.

Performance

Tested on real hardware. Real workloads.

M1 MacBook Air

Voice→Text
0.8× realtime
Text→Voice
1.2× realtime
Memory
1.8 GB

M4 Mac Studio

Voice→Text
2.1× realtime
Text→Voice
3.4× realtime
Memory
2.1 GB

How Rift compares

Feature comparison: Rift vs Whisper.cpp vs macOS Dictation
Feature Rift Whisper.cpp macOS Dictation
On-device Yes Yes Partial
No auto-cutoff Yes No No
Live paste Yes No Yes
First-word buffer 250ms None None
Local LLM polish Yes No No
TTS included Yes No Basic
TTS latency (first word) ~150ms N/A ~500ms
Voice & text privacy 100% local 100% local Cloud fallback

Requirements

  • macOS Sonoma 14.0+
  • Chip Apple Silicon
  • RAM 8GB minimum
  • Disk ~2GB

The visual metaphor

Nothing escapes.

Your data goes in — and stays in.

The Passage

The Singularity

Your Mac is the center of gravity. Voice in, text out, text in, voice out — all here. No servers. No cloud.

The Event Horizon

Once your words enter Rift, they never leave your machine. No telemetry. No uploads. No exceptions.

How the visualization works +

Raymarching

Volumetric rendering via signed distance functions. The sphere-traced shader calculates 128 iterations per pixel to simulate photon paths.

Schwarzschild geodesics

Light follows the curved spacetime geometry of a non-rotating black hole. The photon sphere appears as a bright ring at 1.5× the event horizon radius.

Keplerian disk

Accretion disk particles orbit according to Kepler's laws. Inner particles orbit faster, creating the characteristic spiral structure.

ACES tonemapping

Film-industry-standard color grading compresses the HDR luminance into displayable range while preserving the fiery accretion glow.

Visualization based on Singularity by MisterPrada

Frequently asked

Does it work offline?

Yes — all voice, text, and language work runs on your Mac. The only network use is optional update checks and the first-run model download. After that, Rift works offline.

Is my voice stored anywhere?

Never. Audio lives in memory and is discarded right away. Nothing is written to disk. Nothing is sent anywhere.

What languages are supported?

English today. The Parakeet model supports more languages — I’m working on enabling them.

What voices ship with it?

Kokoro includes several built-in voices. Chatterbox variants — including MLX fast paths — add more, downloadable from the app. Voice cloning isn’t available.

Does it run on Intel Macs?

No. Rift needs Apple Silicon (M1+) for the MLX framework.

Why is the first run slow?

First launch caches the models (~2GB). After that, launches are instant.

Is Rift open source?

Yes — MIT licensed. Source on GitHub.

How do I install it?

Download the DMG, drag Rift to Applications, launch. Apple Silicon (M1+) required. If macOS shows a security warning, the install guide has the one-line fix.

The Technology

Built different.

Four pillars — speech in, intelligence in the middle, speech out — all on Apple Silicon. No cloud for your content.

Rift

Your voice. Your Mac. Nothing else.

Download for macOS

Apple Silicon (M1+) · macOS 14+ · English