We built an app to solve a small annoyance: too many articles and documents to read, not enough time to sit and read them. We wanted to press play on anything – a web page, a PDF, whatever was on the clipboard – and listen the way you’d listen to a podcast, with the words lighting up on screen so you never lose your place.
Our founder tried a handful of existing apps for this. Most had too many issues, and paying over 100 EUR a year for something this simple didn’t sit right. So we built our own version, called Play Text, and released it free to everyone. You can download it from the Play Text page.
It took two evenings, built fully with Claude Code on Opus 4.8. This post covers what it does and the engineering decisions behind it.
What Play Text does
Play Text turns text into audio with live, word-by-word highlighting. Paste text, open a document (PDF, DOC, TXT, Markdown, RTF, HTML), play a web link, or share something to it from any app. It pulls out the readable content and starts speaking, with a pink pill sweeping across each word as it’s spoken.
On the Mac it runs as a lightweight menu-bar app with an always-on-top floating player. Two features ended up in our daily use:
- A Safari extension with two modes. “Read the article” auto-detects the page’s main content. “Pick a section” lets you hover the page, see any text block get outlined, click it, and play only that part. The highlight pill follows the narration right on the live page.
- A global keyboard shortcut. Select text in any app – Mail, Slack, a PDF – press the shortcut, and the floating player opens with the full transcript and starts reading.
On iPhone, everything funnels into one transcription panel. The share sheet puts Play Text one tap away in every app, audio keeps playing in the background with lock-screen controls, and settings sync between devices over iCloud.
One Swift core, two platforms
macOS and iOS share a single core called PlayTextCore. It handles text segmentation, audio playback, and the alignment math that maps playback time to character positions. The platform apps are thin shells around it. The same engine highlights words in the Mac’s floating player, on a live Safari page, and in the iOS panel.
The Safari integration could easily have turned into a mess of duplicated logic. Instead, the native app stays the single audio and highlight engine for every source. The extension scrapes the exact text and hands it to the app. The app pushes back one integer a few times per second: the global character index it is currently speaking. The page expands that index into a word and draws the pill. Because the page gave the app the exact text it scraped, the indices always line up. No fuzzy matching, no drift.
Voices that run on the device
The default narrator is Apple’s on-device voice. No key, works offline. For higher fidelity we bundled Kokoro-82M, an open-weight neural voice that runs fully on-device through MLX on Apple silicon. The model ships inside the app, roughly 327 MB of it, which is why the download is on the larger side. The trade-off is natural-sounding speech with no API key, no cloud, and no per-character bill. ElevenLabs is available as an opt-in cloud engine for anyone who wants it, but nothing requires it.
Web pages and documents often arrive with leftover navigation, cookie banners, and other boilerplate. An optional cleanup pass uses Apple’s on-device foundation model to strip that noise before narration. One subtlety: Safari pages always use verbatim scraping instead. If the AI rewrote the text, the character indices would no longer match the live page and the highlight pill would drift.
Why iOS works differently
On the Mac, the app can push live word positions into a Safari page. iOS offers no such bridge – an app can’t drive Safari’s page content. Rather than fake it, the iPhone app uses a single in-app reader. Wherever the text comes from (clipboard, link, file, share sheet), one panel renders it and highlights along. Same engine, honest platform design.
Shipping it
The iOS app went through App Store review and is live there. The Mac app is a free direct download from the Play Text page. It requires Apple silicon, since the Kokoro engine runs on MLX.
The privacy story is the simple kind. Voices are generated on your device, your text never leaves it, and there’s no telemetry. That isn’t a setting you toggle – there is simply no server for your reading to go to. The one exception is the optional ElevenLabs integration, which you can read about in the privacy policy. No accounts, no subscriptions, no ads, no tracking, no paywalls.
We made Play Text for ourselves. Since we had it anyway, it made sense to give it to anyone who might find it useful. You can grab it for iPhone or Mac here. And if you’re reading this in Safari on a Mac with the extension installed, click the toolbar button and have this post read to you instead.