Google Blinks: TorchTPU Runs PyTorch Natively on TPUs

Google just announced TorchTPU. PyTorch running natively on TPUs. This is Google admitting what everyone already knew: PyTorch won the framework wars. After a decade of pushing TensorFlow and more recently JAX, Google is building first-class support for the framework researchers actually use. PyTorch accounts for over 80% of papers at top ML conferences. Google had no choice but to adapt.

TorchTPU uses PyTorch's "PrivateUse1" interface to create a native experience without wrappers or subclasses. Developers just change their device initialization to "tpu" and run existing code. Three eager modes are available: Debug (slow but useful for tracking errors), Strict (asynchronous, mirrors default PyTorch), and Fused Eager (automatically fuses operations on the fly, delivering 50-100%+ performance gains). For peak performance, torch.compile integrates with XLA as the backend compiler, mapping PyTorch operators directly into StableHLO.

This isn't Google's first attempt at PyTorch on TPU. The existing PyTorch/XLA integration has been painful. One Hacker News user described "undocumented behavior and bugs" that would cause training to hang silently. TorchTPU exists to fix that mess. It supports distributed training patterns including DDP, FSDPv2, and DTensor out of the box. The 2026 roadmap promises a public GitHub repository, Helion DSL integration, and first-class dynamic shapes support.

Google is moving from a full-stack ecosystem play to hardware excellence. Let developers pick their own frameworks. It's the NVIDIA playbook. Cloud giants have poured hundreds of billions into infrastructure, building out the compute that makes TPU acceleration possible. But CUDA earned trust over years. TorchTPU has to do the same, starting from a hole dug by PyTorch/XLA's rough history.