OpenAI unveiled Jalapeño, its first piece of custom silicon, an inference-only accelerator designed with Broadcom and shown off on Wednesday.

The headline figure is the calendar. VentureBeat reports the chip went from design to tape-out in about nine months, a pace OpenAI partly credits to using its own models inside the design loop. It is built strictly for inference, the act of running trained models; heavier training workloads will still run on Nvidia hardware. Broadcom supplies the Tomahawk networking, Celestica builds the boards and racks, and first prototype deployments are slated for late 2026, scaling toward gigawatt data centres with Microsoft.

Even a small cut in per-token inference cost matters more to OpenAI's books than raw speed, and this is the clearest sign yet it wants out from under Nvidia's margin. Whether a first-generation ASIC can actually beat a mature GPU on cost is the part nobody has shown.