Zulip shipped version 12.0 of its open-source team chat platform this week. What caught my attention was founder Tim Abbott's writeup about how Claude Code reshaped their development process. They went from drowning in garbage AI pull requests to producing some of their best work with AI assistance. The journey between those two points is worth your attention.

Between October and February, Zulip got hammered by hundreds of AI-generated PRs from outside contributors. Abbott says virtually none were merged. Reviewing them consumed massive amounts of maintainer time during what's already their busiest season for new contributors. He seriously considered banning LLM use entirely. Instead, they wrote and enforced a new AI use policy that puts the burden on humans to actually understand and test their contributions. The policy bans AI-generated chat messages in their development community. Manual enforcement has been rough, with more PRs closed without review and more outright bans than they've ever needed before. Abbott describes the surreal experience of receiving ChatGPT-generated apologies for submitting AI slop, complete with surrounding quotation marks.

While fighting off the slop, Abbott spent a January weekend testing Claude Code on Zulip's own codebase. The results were good enough that the team started teaching it their contributor docs, building out a CLAUDE.md file, and iterating when the model struggled. By late April, their engineers were regularly producing PRs with Claude Code that maintainers found just as good as human-written code. One PM used it to produce a 16-commit redesign of their integrations directory that got approved with zero changes. The starting point can be as simple as pasting a link to an issue.

What separates external AI contributions (near-total waste of time) from internal AI-assisted work (merge-ready quality)? Context and iteration. Zulip's maintainers invested in teaching Claude Code how to self-review and how to produce reviewable PRs. That's the pattern worth watching: AI agents become genuinely useful when someone who understands the codebase puts in the work to make them useful. Abbott notes the technology is still only a moderate speed-up for regular work today, but improving fast enough to matter.