UPenn's Codex skill renders web page videos from plain English

Andrew Head's lab at UPenn just shipped something practical. web-scroll-video is an open-source tool that turns web pages into MP4 videos. You describe what you want in plain English ("scroll slowly to the bottom, click the first post, pause for two seconds"), and OpenAI's Codex generates a cue sheet that drives a headless Chrome browser. The tool pipes screenshots to FFmpeg and spits out 1080p H.264 files at 30fps. The code is on GitHub under UPenn's CIS organization.

Want changes? Just ask, like you're giving notes to an editor. The cue sheet sits next to the video file, so re-rendering with tweaks takes one request. You can click, type, zoom, highlight text, and wait for content to load. It even does a warmup scroll pass to handle lazy-loaded images before recording starts. Viewport size, frame rate, and scroll speed are all configurable. The stack is simple: Node.js 22 or newer, a Chromium-based browser, and FFmpeg. No npm packages required.

This is what agent skills should look like. Codex doesn't try to render video itself. It delegates to a focused tool. Natural language bridges the gap between "I want this" and actual execution. The result is small, functional, and genuinely useful. Most agent integrations overreach. This one doesn't.