Machines Scan 61.8M Abuse Files Without Human Eyes

In 2025, the National Center for Missing and Exploited Children processed 61.8 million files of suspected child sexual abuse material. At one second per file, human review would take nearly two years of nonstop work.

So machines do it instead. They use perceptual hashing to convert images into tiny numerical fingerprints, then compare those against databases of known abuse material. Nobody views the original photo. Microsoft's PhotoDNA and Meta's PDQ are the workhorses here. They work because perceptual hashes stay stable even when images are resized, watermarked, or recompressed. Two images that look the same to your eye produce nearly identical hashes, even if every pixel value is technically different. The image itself never leaves the device where it was scanned. Only 256 bits travel through the system.

But hashing only catches material that's been seen before and verified by a human expert.

Unknown CSAM requires a different approach. Newly produced abuse or AI-generated content demands machine learning classifiers trained to recognize patterns rather than match exact fingerprints. This is harder. False positive rates climb. Compute costs balloon. And the privacy questions get thornier.

It's getting worse fast. Over 1.5 million reports in 2025 involved generative AI, including synthetic images built using the likenesses of real children who have never suffered contact abuse. Every report, real or synthetic, gets treated as potentially depicting a real child in danger. That floods investigation pipelines with content that traditional hash databases can't recognize.

The architecture choices matter for anyone building content systems. Server-side hashing works for plain uploads. End-to-end encrypted environments can't see the content at all, which is what pushed Apple to propose client-side scanning in 2021. Their system, NeuralHash, would scan photos on your device before encryption, using cryptographic protocols like Private Set Intersection to compare local hashes against a blinded database without revealing the database contents to the user.

Apple paused the project after privacy researchers pointed out that once you build infrastructure to scan local files for one type of content, updating the hash database to target political dissent or other material becomes trivial. The technical capability is indifferent to what you point it at.