A flood of low-quality medical papers is being produced on TriNetX, a platform with anonymised health records for more than 300 million patients, and large language models are quietly making it worse, according to Science.

When researchers asked seven leading LLMs how to correct for immortal-time bias, a common statistical trap, on TriNetX, six recommended methods that cannot actually be implemented on the platform. The investigators then found eight published papers that used exactly those impossible, AI-suggested approaches. Push-button analysis already invites p-hacking and cherry-picked results; confident but wrong model advice adds a new failure mode.

It is a clean warning for agentic research tools: an LLM that sounds authoritative about a method it has never tested can launder bad statistics straight into the literature.