4M Tokens on a Plane: What Broke Running Local LLMs at 30,000 Feet

Dmitri Lerko spent ten hours on a British Airways flight from London to Las Vegas running local LLMs on a MacBook Pro M5 Max. No wifi. No cloud APIs. Just Gemma 4 31B and Qwen 4.6 36B through LM Studio. He built a billing analytics tool for loveholidays cloud spend using DuckDB, processed roughly 4M tokens across smaller engineering tasks, and walked away with hard data on what local inference actually costs. The numbers are brutal. Sustained inference drew 70-80W, draining battery at 1% per minute even while plugged into the seat's advertised 70W outlet. At those wattages, the chassis gets uncomfortable fast. Throughput degraded noticeably past 100k token contexts. A handful of prompts sent models into infinite loops that needed manual intervention. Lerko built two custom monitoring tools during the flight, powermonitor and lmstats, to track power consumption and token throughput. Here's what hurts. After landing, Lerko discovered that using an iPhone cable instead of a MacBook cable throttled power delivery from 94W to 60W. A 36% performance gap from a cable choice. This is the kind of thing nobody warns you about because almost nobody has tried this yet. Aircraft power infrastructure makes the whole thing harder. Economy seat outlets typically provide 15-75W, with complete cutoffs rather than throttling when you exceed the limit. Newer planes like the Boeing 787 and Airbus A350 have better systems, but they're designed for standard laptop use. Nobody planned for sustained AI workloads at 30,000 feet. Lerko's workaround, keeping one problem per session and writing long plans to markdown for re-ingestion, is more discipline than optimization. But that discipline teaches you something. Steve Turner noted on LinkedIn that running local, where cost is physically visible, makes you more critical of what you ask cloud models. He's right. When your battery is dying, you think harder about every prompt. Local inference is viable for tight-scope coding and exploratory work. For large-context reasoning and heavy tasks, you still need the cloud. But watching someone push 4M tokens through a laptop on a plane tells you the gap is closing.