Oumi, a startup that benchmarks AI models, tested Google's AI Overviews across more than 4,300 searches using OpenAI's SimpleQA benchmark. The results were not great. Gemini 2 got things right 85% of the time, while the newer Gemini 3 hit 91%. Sounds decent until you do the math on Google's projected 5 trillion searches this year. That error rate translates to hundreds of thousands of wrong answers every minute.
The weirdest part? Accuracy improved between versions, but attribution got worse. Oumi found that "ungrounded" answers, where the cited sources don't actually back up what the AI says, jumped from 37% with Gemini 2 to 51% with Gemini 3. The system is also gullible. BBC podcast host Thomas Germain wrote a satirical blog post calling himself a competitive eating champion, and within a day Google's AI was presenting it as fact. This highlights the risks of AI-generated misinformation, seen when AI impersonates artists or generates misleading content.
Publishers are frustrated too. Danielle Coffey, CEO of the News/Media Alliance, told the New York Post that AI Overviews have been "a disaster for publishers who rely on clicks to fund the production of quality journalism." The AI summaries have sat at the top of search results since 2024, pushing traditional links out of view. Google pushed back on the study, calling it flawed and pointing out that SimpleQA itself contains inaccurate information in its dataset. A spokesperson said the analysis "doesn't reflect what people are actually searching on Google."