Vision

We’re building a voice OS

SkipType is a dictation app today, but dictation is the foothold, not the destination. The deeper bet is that voice — not the keyboard, the mouse, or the touchscreen — becomes the primary way people instruct their computer. We started with the part of that future that already works: turning thought into written text.

Why now

People have been promising voice as the next interface for fifteen years. Siri arrived in 2011, Alexa in 2014, and dictation has shipped in every modern OS for longer than that. None of it replaced the keyboard. What’s changed in the last two years isn’t transcription — that’s been good enough for a while. It’s two new capabilities on top of it. Models have become genuinely agentic: they can plan multi-step work, call tools, and recover when something goes wrong. And they can now use a computer the way a person does — looking at the screen, clicking, typing, driving apps that were never built to take instructions from a model. Those two are what turn a spoken instruction into a finished task instead of a printed sentence, and what makes a voice OS plausible now in a way it wasn’t five years ago.

Typing is a workaround

Most of us think faster than we type, and the act of pushing thought through a keyboard slows the thinking down. We pause to spell, we backspace, we tighten sentences in our head before our fingers ever touch the keys. Hours every day are spent translating fully-formed ideas into the slowest medium we own.

Voice is the natural input

We’ve spoken for a million years and typed for a hundred and fifty. Speech is faster than typing for almost everyone. The reason it never replaced the keyboard isn’t speed — it’s output quality. Raw transcription reads like a transcript: ums, false starts, mid-sentence corrections, sentences that wander. Nobody wants to send that.

Polished writing is the natural output

What people actually want is to talk and have well-formed writing appear — cleaned-up, properly punctuated, with the fillers and backtracking removed — the way you’d have written it if you’d taken the time. SkipType does that step for you, so the gap between thought and finished sentence collapses to a single tap.

From dictation to a voice OS

Dictation is the first layer because writing is where typing slows you down the most. But typing isn’t the only thing the keyboard and the mouse are doing. They’re also how you open the doc Sarah sent yesterday, rename three folders, jump to line 240 in auth.ts, summarize a pull request and post it in #eng, move the deck into the Q3 folder, fire up an agent to draft a customer reply. Every one of those is a sentence you could speak in less time than it takes to find the menu.

The vision is that those interactions collapse, one by one, into speaking — and that the OS itself starts listening for intent instead of keystrokes and clicks. Not a chatbot in a window. Not Alexa pretending to be useful. A layer underneath whatever app you’re focused on, that hears what you mean and routes it to whatever needs to happen.

Dictation is the wedge because it’s the highest-frequency, lowest-stakes case. Once you’re speaking instead of typing your email reply, speaking instead of clicking through your menu is a small step. The capability we’re building is the same either way: turn imprecise speech into the thing you actually wanted. For dictation that thing is a polished sentence. For the layers above it, that thing is an action — a file moved, a doc opened, a message sent.

What we believe

We’re a small team that thinks the keyboard-and-mouse era is going to end the same way the command line did — not all at once, not for everyone, but progressively, as a better way of telling a computer what you want shows up and proves itself one task at a time. SkipType is our first proof.