Russell Harper got GPT-4o playing an 8-bit retro game on the Commander X16, and the approach is refreshingly practical. Instead of feeding raw pixels to the LLM and hoping it learns to see, Harper built what he calls "smart senses": structured text data that tells the AI exactly what's happening in the game world. Touch sensors handle collision detection. EMF detection provides opponent positions. The AI skips screen decoding entirely and just reasons about state.

The architecture is straightforward. PHP sits between ChatGPT's API and the x16-emulator, connected via a new feature Harper calls VIA2-socket (currently a pull request under review). The game runs turn-based to accommodate API latency, and Harper added persistent notes so the LLM carries strategies across sessions. He chose gpt-4o for its reasoning chops and affordable per-call pricing.

Across three sequential games against the built-in AI, GPT-4o changed how it played. Game one was a "lucky win." Game two went aggressive. Game three settled on a "wait it out" strategy that worked. The persistent notes made the difference, letting the AI remember what failed and what succeeded.

Agent builders should pay attention here. Computer vision-based agents burn tokens and time on pixel processing. Harper's approach won't work for every scenario, but when you control the interface, abstracting perception into clean data is a genuinely useful trick.