Alpha Preview

Rift

Speak, it types. Select, it reads. All on your Mac.

Download Learn more

Your voice. Your pace.

Ideas don't arrive in perfect sentences.

They pause. They revise. They find their way.

Rift is built for how people actually think —

patient, precise, and ready when you are.

01

Voice to Text

Speak naturally. Rift transcribes.

→

Listening...

2:34 and counting

You decide
when you're done.

No auto-endpointing

Speak. Pause. Think. Rift waits.
Other apps cut you off after 2 seconds of silence.

Others

"The quick brown—"

Cut off after pause

Rift

"The quick brown fox jumps over the lazy dog."

You press stop when ready

0ms

First-word capture

Your first word is never lost.
A 250ms lead-in buffer starts recording before you even finish pressing the button.

Buffered

Button pressed

Recording

"Hel—" is already captured

0s

Rolling context window

The model considers the last 25 seconds of audio.
It understands context, not just isolated words.

Context window

Now

-30s -25s -20s -15s -10s -5s 0s

Live paste

Text appears in your app as you speak.
Real-time streaming with final reconciliation when you stop.

The quick brown fox jumps over the lazy dog.

Auto-fix

Hallucination detection

If the first transcription guess is wrong, Rift detects it and auto-replaces.
No manual cleanup. No re-recording.

> The whether weather is nice today

detecting... fixed

Real-time

Streaming transcription

Audio is processed in chunks as you speak.
No waiting for you to finish.

Audio

↓

Text

The quick brown fox jumps

Formats as you speak

Numbers, dates, emails — typed the way you'd type them.

"twenty twenty five" → 2025

"five five five one two three four" → 555-1234

"john at gmail dot com" → john@gmail.com

Removes the ums

Filler words disappear. Your meaning stays.

So um I was thinking we should like schedule a meeting you know.

Your style, automatically

Choose once. Every transcription adapts.

"So um I was thinking we should like schedule a meeting for next tuesday"

So um I was thinking we should like schedule a meeting for next Tuesday. Keep every word

I was thinking we should schedule a meeting for next Tuesday. Remove fillers

I recommend scheduling a meeting for next Tuesday. Business writing

Polishes while you pause

5 seconds of silence. Your text transforms.

so um basically I was thinking like we should probably you know schedule a meeting for next tuesday and um also we need to discuss the budget I think its important

I was thinking we should schedule a meeting for next Tuesday. We also need to discuss the budget — it's important.

5s

Waiting... Polished ✓

Lists, just by speaking

First, second, third — becomes 1, 2, 3.

"Number one eggs. Number two milk. Number three bread."

→

Eggs.
Milk.
Bread.

02

Text to Voice

Select text. Hear it spoken.

→

First word in
150 milliseconds.

0ms

First-word latency

You hear the first word before the sentence finishes generating.
No loading spinners. No waiting.

Hello, world

"Hello..."

150ms to first sound

Seamless

Clause-level streaming

The next sentence is synthesized while the current one plays.
No gaps. No stutters. Continuous audio.

Playing: "The quick brown fox..."

Buffered: "jumps over the lazy dog."

Generating: "The end."

0ms

Audio poll rate

The audio buffer is checked every 20 milliseconds.
Imperceptible latency between chunks.

0ms 200ms

50 checks per second

Pause anywhere

Tap to pause mid-syllable. Tap again to resume from the exact position.
Your place is never lost.

Tap to pause

0.5× – 2×

Playback speed

Speed up for skimming. Slow down for comprehension.
Adjust in real-time without restarting.

0.5× 1.0× 2×

How it works

Two pipelines. Local intelligence. Everything on your Mac.

01 Voice to Text

Ctrl + 2

Start dictation

1

Capture

Core Audio streams from your microphone with a 250ms lead-in buffer. Your first word is never lost.

2

Understand

Speech becomes polished text. Recognition and local AI work together — cleaning, correcting, punctuating — imperceptibly fast.

3

Paste

Text appears at your cursor. Word-by-word during speech. Final reconciliation when you stop.

02 Text to Voice

Ctrl + 1

Speak selected text

1

Select

Highlight text in any app or copy to clipboard. Rift reads whatever you give it.

2

Synthesize

Kokoro generates audio clause-by-clause. First word in 150ms. Next sentence ready before current ends.

3

Play

Audio streams to system output. Pause anywhere, resume from exact position. 0.5× to 2× speed.

Adaptive intelligence

Three models. Instant decisions.

Fast

0.6B params

Real-time merge
Rolling correction

<100ms

Always loaded

Quality

1.7B params

Final polish
Silence polish

<1000ms

Loaded on demand

Deep

4B params

Background cleanup
Complex rewriting

<5000ms

During silence

Freeze-free dictation

Commit early. Commit often.

350ms pause

8s force

pause

The quick brown...

fox jumps over...

the lazy dog.

Privacy.
That's Rift.

Your voice never leaves your Mac. Ever.

100% on-device processing

No cloud. No servers.

No accounts required

Fully open source

AI enhancement runs locally too

Zero file I/O

Audio is synthesized directly to memory. Nothing is written to disk. Nothing persists after you close the app.

Performance

Tested on real hardware. Real workloads.

M1 MacBook Air

Voice→Text

0.8× realtime

Text→Voice

1.2× realtime

Memory

1.8 GB

M3 MacBook Pro

Voice→Text

1.5× realtime

Text→Voice

2.4× realtime

Memory

2.0 GB

M4 Mac Studio

Voice→Text

2.1× realtime

Text→Voice

3.4× realtime

Memory

2.1 GB

How Rift compares

Rift

Whisper.cpp

macOS Dictation

On-device

✓

Partial

No auto-cutoff

✓

✗

Live paste

✓

✗

✓

First-word buffer

250ms

None

TTS included

✓

✗

Basic

TTS latency

150ms

N/A

~500ms

Privacy

100%

Cloud fallback

AI Polish

3 modes

✗

Smart Formatting

Auto

✗

Limited

Requirements

macOS Sonoma 14.0+
Chip Apple Silicon
RAM 8GB minimum
Disk ~2GB

The Passage

Nothing escapes.
This is your interface.

Press a shortcut. A black hole appears. Speak into it — your words become text. Select text and summon it again — the words become voice. When it's done, the black hole closes. Everything happens on your Mac. Nothing left behind.

The Singularity

Your Mac is the center of gravity. All processing happens here — voice recognition, text synthesis, everything. No servers. No cloud. One machine.

The Accretion Disk

Your voice flows in like matter spiraling toward the event horizon. It gets captured, processed, transformed. The warm glow is energy being released as computation.

→

The Event Horizon

The point of no return — but in a good way. Once your words enter Rift, they never leave your machine. No telemetry, no uploads, no exceptions.

Gravitational Lensing

Just as light bends around a black hole, your voice bends into text. Text bends into voice. Transformation through the most powerful force — local compute.

How the visualization works +

Raymarching

Volumetric rendering via signed distance functions. The sphere-traced shader calculates 128 iterations per pixel to simulate photon paths.

Schwarzschild geodesics

Light follows the curved spacetime geometry of a non-rotating black hole. The photon sphere appears as a bright ring at 1.5× the event horizon radius.

Keplerian disk

Accretion disk particles orbit according to Kepler's laws. Inner particles orbit faster, creating the characteristic spiral structure.

ACES tonemapping

Film-industry-standard color grading compresses the HDR luminance into displayable range while preserving the fiery accretion glow.

Visualization based on Singularity by MisterPrada

Frequently asked

Does it work offline?

Yes, 100%. Rift never connects to the internet. All processing happens locally on your Mac using the MLX framework.

What languages are supported?

Currently English only. The underlying Parakeet model supports multiple languages, and we're working on enabling them in future updates.

Can I use my own voice for text-to-speech?

Not yet. Rift uses the Kokoro model's built-in voices. Custom voice cloning may be added in the future.

Is my voice data stored anywhere?

Never. Audio is processed in memory and discarded immediately. Nothing is written to disk or sent anywhere.

Why is the first run slow?

On first launch, Rift downloads and caches the ML models (~2GB). Subsequent launches are instant.

Does it work on Intel Macs?

No. Rift requires Apple Silicon (M1 or later) for the MLX machine learning framework.

Is Rift open source?

Yes. The full source code is available on GitHub under the MIT license.

The Technology

Built different.

Four technologies working together. All running locally on Apple Silicon. No cloud, no latency, no compromises.

The Foundation

MLX

Apple's machine learning framework. Runs entirely on your Mac's Neural Engine and GPU.

Apple Silicon On-device Open source

Voice to Text

Parakeet

NVIDIA's state-of-the-art speech recognition, optimized for Apple Silicon.

0.6B params TDT arch ~800MB

Text to Voice

Kokoro

Neural text-to-speech with natural-sounding voices. Real-time synthesis.

82M params Memory output ~1.2GB

Language Intelligence

Qwen3

Local LLM that cleans, corrects, and polishes. Three models adapt to the moment.

0.6B–4B params MLX optimized 50ms–5s

Rift

Your voice. Your Mac. Nothing else.

Download for macOS

Free · Open Source · View on GitHub

Loading version...

Rift

Your voice. Your pace.

Voice to Text

Text to Voice

How it works

Capture

Understand

Paste

Select

Synthesize

Play

Fast

Quality

Deep

Privacy.That's Rift.

Performance

How Rift compares

Requirements

Nothing escapes.This is your interface.

The Singularity

The Accretion Disk

The Event Horizon

Gravitational Lensing

Frequently asked

Built different.

MLX

Parakeet

Kokoro

Qwen3

Rift

Privacy.
That's Rift.

Nothing escapes.
This is your interface.