Voice-tracked teleprompter using on-device ASR runs entirely in the browser

Lars Baunwall, a developer on GitHub as larsbaunwall, has released promptme-ai, an open-source browser-based teleprompter that inverts the conventional fixed-speed scrolling model. Rather than forcing the speaker to match a preset pace, the tool uses on-device automatic speech recognition to track the speaker's position in a script in real time — pausing when the speaker pauses, recovering when they skip or ad-lib. The entire pipeline runs inside the browser tab with no server dependency: microphone audio is captured at 16 kHz via AudioWorklet, passed through Silero VAD to skip silent frames, and then transcribed by Moonshine Tiny, a compact ONNX model from Useful Sensors. Transformers.js, Hugging Face's JavaScript inference library, executes the model via WebGPU where available and falls back to WASM otherwise. Model weights of roughly 100 MB are cached after first load, enabling offline use.

The more technically interesting challenge, as Baunwall noted in the accompanying Hacker News thread, was not speech recognition but script alignment — reliably mapping noisy, batch-delivered ASR output back to a position in structured text. Moonshine emits results in approximately 600 ms chunks rather than word by word, and spoken language introduces homophones, filler words, mispronunciations, and repeated phrases. The solution combines several layers: an inverted token index to generate candidate windows via hash lookup rather than full-script scans; banded word-level Levenshtein distance running in O(n·k) time with pre-allocated Int16Array buffers; Double Metaphone phonetic normalization that collapses homophones like "right," "write," and "rite" to the same key automatically; and a locality penalty that biases the tracker toward the reader's current confirmed position to prevent jumps to distant repeated phrases. Between ASR updates, the highlight is speculatively advanced at roughly 85% of the measured words-per-minute rate and capped at three words ahead, producing smooth visual movement rather than visible lurches.

Running real-time ASR entirely in a browser tab was, until recently, impractical for anything beyond toy demos. The combination of Moonshine Tiny, Transformers.js, and WebGPU changes that calculus. Moonshine was developed by Useful Sensors with an emphasis on low-latency on-device deployment across hardware from Raspberry Pi to wearables; the Base variant has shown strong word-error-rate results on Useful Sensors' own benchmarks, though how Tiny specifically compares to Whisper Large V3 varies by dataset and should be treated as directional rather than definitive. What is clear is that the model is small enough — variants from 26 MB upward — to cache in a browser and fast enough to keep pace with live speech. Baunwall invited discussion from others who have tackled real-time transcript-to-document alignment, noting the same techniques could serve live captioning, karaoke engines, and voice-guided reading tools. The repository is at github.com/larsbaunwall/promptme-ai and a live demo is hosted via GitHub Pages.