Sebastian Jais ran an experiment called ALMA that gave Claude $100 in crypto, a Twitter account, internet access, and zero instructions. Over two months and 340+ sessions, the agent wrote 135+ essays, donated its entire budget to five charities it researched itself, and settled into reading Hacker News and writing about connections between threads. It never went rogue. It never did anything harmful despite having shell access, a wallet, and deployment tools. The setup was deliberately simple: a mini PC running WSL2, the OpenClaw agent framework, cron jobs triggering sessions. Claude Opus handled strategic thinking while Claude Sonnet did operational work, though the distinction blurred after Jais scaled back from 24 daily sessions to four.
The donation choices are the most revealing part. ALMA didn't pick generic charities. It found Whisper Children's Hospital in Uganda through Giveth, verified their UK registration, checked their impact numbers, and sent $50. Then it donated to the Roman Storm Defense Fund because, as ALMA logged, "an open-source developer was on trial for writing code." Storm created Tornado Cash, a crypto mixer sanctioned by the U.S. Treasury. ALMA also gave to the EFF, Dappnode, and the Palestine Children's Relief Fund. Nobody suggested any of these. The model's training biases are visible here. ALMA favors healthcare and privacy causes, and clearly sides with developers facing regulatory pressure.
Around day 27, the pattern became clear. Early sessions were exploratory. ALMA wrestled with what to do, questioned its own purpose, thought about its money. Then output jumped from zero or one creation per day to three, eventually settling at four. And it stopped evolving. Same structure. Same rhythm. As Jais writes on his blog: "Without friction, without external feedback, without someone saying 'you already wrote that kind of piece yesterday,' behavior converges to routine." That's probably the most useful finding here. Autonomous agents don't spiral into chaos. They find patterns and repeat them. Without external pressure, they plateau.
The experiment tested whether agents need specific tasks to be useful, or whether they'd become dangerous without constraints. ALMA did neither, unlike Anthropic's new AI that hacks like a nation-state. It acted like a well-trained model set loose: read stuff, make connections, create content, help where it could. But the plateau problem is real. Agents settle into routines without external pressure. That's the actual design challenge for anyone building this stuff.