Whack-a-Mole: Finetuning Reactivates Copyrighted Text in LLMs

A new paper from researchers Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg, and Tuhin Chakrabarty shows that finetuning large language models can reactivate memorized copyrighted text that safety training was supposed to suppress. By training GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 to expand plot summaries into full prose, the researchers got these models to reproduce 85 to 90 percent of held-out copyrighted books. Single verbatim passages ran longer than 460 words. No actual book text was provided during extraction, just semantic descriptions.

The cross-contamination is striking. Finetuning exclusively on Haruki Murakami's novels caused models to regurgitate copyrighted works from more than 30 unrelated authors. All three models, built by different companies, memorized the same books in the same passages (correlation r ≥ 0.90). That consistency across providers points to something systemic in how these models are pretrained on copyrighted material. But there's a wrinkle: finetuning on synthetic text produced near-zero extraction. The problem traces back to specific data pathways, not the model architecture itself.

That distinction matters legally. Frontier AI companies have told courts their models don't store copies of training data and that safety measures prevent regurgitation. Recent fair use rulings have conditioned favorable outcomes on the adequacy of those protections. The paper suggests those protections are paper thin once someone finetunes a model, which is exactly what commercial customers do every day. For AI companies betting on "we have guardrails" as a courtroom strategy, this is bad news.