AI Is Enabling Scientific Fraud at Scale—and Journals Aren't Ready

Academic publishing is under attack by generative AI. Organized paper mills now operate as fully automated factories—churning out synthetic data, fabricated images, and plausible-sounding manuscripts at industrial scale. A landmark 2025 PNAS study found a corpus of 32,786 suspected fraudulent papers linked to these operations, with fraudulent output doubling every 1.5 years—outpacing the growth of legitimate science itself.

What Are AI-Powered Paper Mills?

Paper mills are commercial services that manufacture fraudulent academic manuscripts and sell authorship credits to researchers under pressure to publish. The model isn’t new—it emerged from “publish or perish” academic culture in countries with institutional incentives tied to publication count. What’s new is the automation layer.

Generative AI has collapsed the cost and effort required to produce superficially credible research. Where fraud once required a researcher to manually fabricate datasets or steal images, a paper mill today can deploy large language models to draft manuscripts, generate fake statistical analyses, create plausible-but-synthetic images, and produce reviewer commentary—all without meaningful human involvement.

The output is harder to detect than older template-based fraud. LLM-generated text mimics the structure and register of legitimate academic writing. Fabricated figures can pass cursory visual inspection. And crucially, paper mills have adapted their workflows to explicitly evade detection tools.

How the Fraud Pipeline Actually Works

The paper mill supply chain involves several distinct actors:

Content generation: LLMs draft manuscripts in target topic areas. Paraphrasing tools obfuscate copied source material
Data fabrication: Statistical outputs are generated to support pre-determined conclusions. Images are synthesized or duplicated from legitimate papers
Authorship brokerage: Researchers purchase authorship slots. Some mills offer tiered pricing: first author, co-author, corresponding author
Peer review manipulation: Mills maintain networks of compromised reviewers who approve submissions for payment or reciprocal favor
Journal targeting: Predatory journals with weak oversight are primary targets, but paper mill products routinely appear in mainstream publications

The PNAS study revealed something important: this isn’t just about fake papers. Researchers uncovered “footprints of activities connected to scientific fraud that extend beyond the production of fake papers to brokerage roles in a widespread network of editors and authors who cooperate to achieve the publication of scientific papers that escape traditional peer-review standards.”¹

In other words, paper mills don’t just produce fraudulent content—they corrupt the gatekeeping infrastructure itself.

The Hindawi Collapse: A Case Study in Systemic Failure

The clearest demonstration of scale came in 2023, when Hindawi—the open-access arm of Wiley—retracted over 8,000 articles in a single mass action. The papers were almost entirely from “Special Issues,” a publishing format where guest editors, rather than editorial staff, managed submissions.

Paper mills systematically targeted Special Issues because guest editors are typically less experienced and more motivated to fill issues quickly. Indicators of manipulation included: duplicated reviewer text, suspiciously fast turnaround times, reviewer pools concentrated among a small number of individuals, and fraudulent use of legitimate researcher identities in reviewer databases.

The financial cost to Wiley was substantial—an estimated $35–40 million in lost revenue.² The reputational cost to academic publishing was harder to price, but arguably worse.

Why Detection Tools Are Losing the Arms Race

The response to paper mill fraud has produced several detection approaches—but each faces fundamental limitations.

Tortured Phrases

The Problematic Paper Screener, developed by Guillaume Cabanac, Cyril Labbé, and Alexander Magazinov at Université Grenoble Alpes, searches 130 million published papers weekly for “tortured phrases”—nonsensical substitutions produced when paraphrasing tools mangle established terminology to evade plagiarism detection.³

Examples include “counterfeit negative” (false negative), “flag to clamor” (signal to noise), and “cruel temperature” (mean temperature). As of September 2025, more than 7,500 such phrases had been catalogued. The tool has contributed to over 1,000 retractions and flagged more than 20,000 papers containing five or more tortured phrases.

But this approach is reactive by definition. Mills identify flagged phrases and update their paraphrasing models. Springer Nature now integrates non-standard phrase detection directly into submission workflows, but the underlying dynamic—detection tools identify patterns, mills adapt—remains.

Image Analysis

Tools like Proofig use computer vision to detect duplicated or manipulated figures—a significant vector because fabricated research often reuses images across papers. But generative image models are now capable of producing synthetic microscopy, gel electrophoresis, and clinical imaging that is, according to some researchers, “indistinguishable from real.”

Network Analysis

The STM Integrity Hub—used by 40 publishers to screen over 125,000 papers monthly—takes a multi-signal approach: author credential verification, citation network analysis, reference validation, and AI-content detection. The platform intercepts approximately 1,000 suspected paper mill submissions each month.⁴

That number sounds significant. But the PNAS study corpus alone contained 32,786 suspicious papers. Monthly interception of 1,000 represents a small fraction of the throughput that detection systems believe is moving through the pipeline.

AI Content Detection

The most acute challenge is distinguishing AI-generated text from human writing. Current detection tools including GPTZero and Turnitin’s AI detector perform poorly against text that has been minimally post-processed. Research has shown that tools like AIUndetect can fool detection systems “almost all the time.”⁵

A 2025 study found that AI-generated peer reviews—not just papers, but the reviews meant to catch fraudulent papers—are “nearly impossible to distinguish from human writing.” The tools designed to catch AI-generated content failed to identify most AI-generated review reports.⁶

The Peer Review Problem Is Worse Than Reported

Between 6.5% and 16.9% of peer review reports at major AI conferences in 2024 appeared to contain LLM-generated text—and in most cases this use was undetectable.⁶ This isn’t an academic publishing problem confined to predatory journals. It’s inside the conferences that define state-of-the-art benchmarks.

Over 6,400 retractions in 2024 and 2025 were attributed to fake peer reviews—a number that represents only detected and actioned cases, not the full population of compromised reviews in circulation.

PLOS and Frontiers took a blunt countermeasure in 2025: they stopped accepting submissions based solely on public health datasets, because the submission volume from AI-generated papers exploiting those datasets had become unmanageable. This is essentially a publisher admitting that a category of research has been so thoroughly compromised that gatekeeping individual papers is no longer viable—they closed the pipeline entirely.

Detection Tool Comparison

Tool / Initiative	Approach	Scale	Key Limitation
Problematic Paper Screener	Tortured phrase detection	130M papers scanned weekly	Reactive; mills adapt phrasing
STM Integrity Hub	Multi-signal (network, credentials, AI content)	125K papers/month, 40 publishers	Intercepts ~1K/month vs. estimated throughput
Proofig	Image duplication/manipulation	Manuscript-level	Generative images increasingly undetectable
Springer Nature NLP tool	Non-standard phrase detection	Integrated into submission	Catches known patterns, not novel evasions
GPTZero / Turnitin AI	LLM text detection	Widely deployed	Post-processing defeats most detection
Cancer paper AI screener (Nature)	Title/abstract similarity to known mill products	250K+ papers flagged	Similarity-based; won’t catch novel templates

Who Gets Hurt

The practical consequences of fraudulent literature in circulation extend well beyond academic reputation.

Biomedical research: The cancer literature is particularly contaminated—an AI tool scanning manuscript titles and abstracts flagged over 250,000 cancer studies bearing textual similarities to known paper mill articles.⁷ If even a fraction of those represent fraudulent data, clinical researchers are building on corrupted foundations.

Systematic reviews and meta-analyses: These are the highest-trust documents in evidence-based medicine. They aggregate findings across studies. If source studies are fraudulent, systematic reviews amplify the contamination with an additional layer of authority.

AI training data: Scientific papers are scraped as training data for large language models. Fraudulent papers entering the training corpus degrade model accuracy in specialized domains—a form of knowledge contamination that compounds over model generations.

Researcher careers: The TU Delft case in 2025 documented a researcher who discovered fraudulent papers published under their name without their knowledge.⁸ False authorship—where mills use stolen researcher identities to add credibility to fraudulent papers—is an emerging category of harm with no clear remediation pathway.

What Journals and Institutions Are (and Aren’t) Doing

Publisher responses in 2025 have been mostly incremental:

Springer Nature integrated non-standard phrase detection into the STM Integrity Hub
PLOS and Frontiers restricted specific submission categories
COPE updated guidance on undisclosed third-party involvement and AI use
Individual journals have adopted STM Integrity Hub screening, but adoption is uneven

What’s conspicuously absent: structural reform to the incentive architecture that creates demand for paper mills. That requires institutional and policy action—changes to faculty evaluation criteria, research funding requirements, and government mandates in countries where publication count remains a primary advancement metric.

The Retraction Watch study co-author’s framing from 2025 is apt: fighting coordinated publication fraud is “like emptying an overflowing bathtub with a spoon.”⁹ Detection tools are the spoon. The tap is still fully open.

The Arms Race Trajectory

The immediate trend is unfavorable for detection. Generative models improve continuously; detection tools update reactively. Mills are not static targets—they run detection evasion as a core competency, systematically testing their outputs against available screening tools.

The medium-term risk is normalization: as fraudulent papers accumulate in indexed databases, the signal-to-noise ratio in scientific literature degrades. Researchers cite papers without investigating provenance. Meta-analyses include fraudulent inputs. Clinical guidelines absorb corrupted evidence. The damage doesn’t manifest in a single catastrophic event—it diffuses through the literature in ways that are hard to trace and harder to undo.

The technology response—better detection, network analysis, publisher coordination—is necessary but insufficient. The paper mill problem is ultimately a structural problem in academic incentive design, and it will not be solved by screening tools alone.

Frequently Asked Questions

Q: How many fraudulent papers are currently in scientific literature? A: A 2025 PNAS study identified a corpus of 32,786 suspected paper mill articles, but this represents a fraction of total suspected fraud—the Problematic Paper Screener has flagged over 20,000 papers containing five or more tortured phrases alone, and an AI tool flagged over 250,000 cancer studies with similarities to known paper mill outputs.

Q: Can journals detect AI-generated papers before publication? A: Not reliably. Current AI detection tools perform poorly against post-processed LLM text, and specialized evasion tools like AIUndetect defeat them “almost all the time” according to research. The STM Integrity Hub intercepts approximately 1,000 suspected paper mill submissions monthly, but this represents a small fraction of estimated throughput.

Q: Are AI-generated peer reviews detectable? A: No. A 2025 study found that AI-generated peer review reports are “nearly impossible to distinguish from human writing,” with detection tools failing on most tested cases. Between 6.5% and 16.9% of peer reviews at major AI conferences appeared AI-generated.

Q: Which research fields are most affected? A: Biomedical research—particularly cancer, clinical studies, and public health—is heavily targeted, partly because these fields have large volumes of publicly accessible datasets that mills exploit. Engineering and computer science are also significantly affected.

Q: What can individual researchers do to protect their work? A: Researchers should monitor for false authorship using tools like Google Scholar alerts on their name, scrutinize citation sources before including them in reviews, and report suspicious papers to resources like Retraction Watch and PubPeer. Institutional change requires engaging with faculty evaluation criteria that currently reward publication count over quality.

Sources:

Rooij, B. et al. “The entities enabling scientific fraud at scale are large, resilient, and growing rapidly.” Proceedings of the National Academy of Sciences, August 2025. https://www.pnas.org/doi/10.1073/pnas.2420092122 ↩ ↩²
“What the Collapse of Hindawi Reveals About Systemic Risk in Scholarly Communication.” Katina Magazine, 2026. https://katinamagazine.org/content/article/open-knowledge/2026/what-the-collapse-of-hindawi-reveals-systemic-risk ↩
“Problematic Paper Screener: Trawling for fraud in the scientific literature.” The Conversation / TechXplore, January 2025. https://techxplore.com/news/2025/01-problematic-paper-screener-trawling-fraud.html ↩
“STM Integrity Hub expands with Springer Nature’s AI-Powered Text Detection Tool.” STM Association, April 2025. https://stm-assoc.org/stm-integrity-hub-expands-with-springer-natures-ai-powered-text-detection-tool/ ↩
“AI-Powered Paper Mills: The New Threat to Research Integrity.” Enago Academy, 2025. https://www.enago.com/academy/ai-powered-paper-mills-research-integrity/ ↩
“‘A serious problem’: peer reviews created using AI can avoid detection.” Nature, 2025. https://www.nature.com/articles/d41586-025-04032-1 ↩ ↩²
“Low-quality papers are flooding the cancer literature — can this AI tool help to catch them?” Nature, 2025. https://www.nature.com/articles/d41586-025-02906-y ↩
“Scientific Study Exposes Publication Fraud Involving Widespread Use of AI.” TU Delft, 2025. https://www.tudelft.nl/en/2025/eemcs/scientific-study-exposes-publication-fraud-involving-widespread-use-of-ai ↩
“Fighting coordinated publication fraud is like ‘emptying an overflowing bathtub with a spoon,’ study coauthor says.” Retraction Watch, August 2025. https://retractionwatch.com/2025/08/04/fighting-coordinated-publication-fraud-is-like-emptying-an-overflowing-bathtub-with-a-spoon-study-coauthor-says/ ↩