TurboQuant-WASM brings Google's vector quantization algorithm to WebAssembly, compressing embedding vectors by roughly 6x in the browser. Based on the ICLR 2026 paper, it crunches vectors down to about 4.5 bits per dimension while keeping inner products usable. You can run similarity search directly on compressed data. The TypeScript API is straightforward: encode() compresses, decode() expands back, and dot() computes similarity without decompressing. There's also dotBatch() that runs 83x faster than looping through vectors individually. The live demos cover Wikipedia passage search, Unsplash image similarity, and 3D Gaussian Splatting compression. That last one matters because 3DGS files get big fast. There are tradeoffs. A Hacker News commenter who wrapped TurboQuant in a SQLite extension found that 32-bit floats stayed faster on CPU without GPU help. The WASM build needs relaxed SIMD, so you need Chrome 114+, Firefox 128+, Safari 18+, or Node 20+. It's built for bandwidth-constrained situations where shrinking payload size matters more than raw compute speed.
TurboQuant-WASM: 6x vector compression in the browser
TurboQuant-WASM is an experimental WebAssembly implementation of Google's TurboQuant vector quantization algorithm for browsers and Node.js. Based on the ICLR 2026 paper, it provides ~6x compression (~4.5 bits/dimension) while preserving inner products, enabling browser-based vector search, image similarity, and 3D Gaussian Splatting compression. The implementation uses relaxed SIMD instructions and provides a TypeScript API.