groundy
models & research

Fable 5 Distillation Protection: How Anthropic Blocks Model Copying

Claude Fable 5 ships with distillation protection to prevent capability extraction. A first-principles look at what it is, how it works, and why API consumers should care.

7 min · · · 3 sources ↓

Claude Fable 5 launched June 9, 2026 as Anthropic’s most capable widely released model, occupying a new Mythos-class tier above the entire Opus line.1 Along with the model itself came a feature Anthropic has not publicly explained in detail: distillation protection, described as a mechanism that “prevents model capability extraction.”1 That single sentence is the entirety of what Anthropic has disclosed about how it works. What follows is a first-principles analysis of what distillation is, why it threatens frontier model providers, and what a protection system of this kind can realistically accomplish.

What is model distillation?

Knowledge distillation is a training technique in which a smaller or weaker model (the student) learns to imitate the outputs of a larger or more capable model (the teacher). The student is trained not just on labeled data but on the probability distributions the teacher assigns to its outputs, capturing behavior that would be expensive or impossible to replicate by training from scratch on raw data alone.

In a legitimate context, distillation is how providers compress large models into smaller, cheaper variants for deployment on constrained hardware. The same technique, applied without authorization, becomes a mechanism for copying a proprietary model’s capabilities.

The threat model is straightforward. A competitor or independent actor uses the production API to generate a large volume of input-output pairs, then trains their own model to reproduce those outputs. With enough queries, the student model can approach the teacher’s performance on specific task domains without the multi-hundred-million-dollar training run.

Why does it matter for Fable 5 specifically?

Fable 5 sits at $10 per million input tokens and $50 per million output tokens, exactly twice Opus 4.8’s rate.1 That pricing reflects substantial training investment in a model that currently leads among frontier models on coding evaluations including FrontierCode, CursorBench, and ViBench, and was the first model to exceed 90% on a core analytics benchmark Anthropic tracks internally.1

A model that can be copied through API queries represents a specific commercial threat at this tier. The higher the capability gap between a frontier model and its nearest competitor, the more valuable an unauthorized distillation becomes. Fable 5’s architecture also includes adaptive thinking that is always active, which produces richer intermediate reasoning traces than models where thinking is optional or disabled. Those traces are high-quality training signal for a student model attempting to replicate not just the outputs but the reasoning process.

The companion model Claude Mythos 5, which shares the same underlying architecture as Fable 5 with safeguards relaxed in some areas, is restricted to approved Project Glasswing partners and select biology researchers, with no self-serve access.12 Mythos 5 traffic receives 30-day retention and is not used for training.1 This separate access tier is itself a form of capability containment, though it is distinct from the distillation protection feature.

How does distillation protection work?

Anthropic has not published the mechanism. What follows is what can be inferred from the published description and from the broader research literature.

The most direct approaches operate at output level. A protection system can inject subtle perturbations into model outputs that are imperceptible to human readers but degrade the quality of models trained on those outputs. This class of technique, sometimes called output poisoning or adversarial watermarking, exploits the fact that a distillation student is optimizing against the teacher’s logits. If those logits carry a consistent hidden signal, the student learns to reproduce the distortion rather than the underlying capability.

A second class of approaches involves detection rather than prevention. By analyzing query patterns across API sessions, a provider can identify the statistical signatures of systematic data collection: unusually uniform prompts, high query volumes covering the same capability space, structured sampling of the model’s output distribution. Detection enables enforcement but does not directly degrade the quality of any collected data.

A third approach involves withholding information selectively. A model can be trained or instructed to produce less complete outputs in contexts where it identifies likely extraction attempts, making the collected dataset less representative of the model’s full capability.

These approaches are not mutually exclusive. A production system likely combines several layers. Anthropic’s use of the phrase “distillation protection” rather than “distillation detection” suggests the mechanism operates at output level rather than purely at the enforcement layer, but that reading is speculative given the current absence of technical documentation.1

What does protection actually prevent?

The honest answer is that no distillation protection is absolute. The research literature on adversarial watermarking and output perturbation describes an ongoing tension between protection schemes and circumvention techniques, analogous to the dynamic between content filters and jailbreaks.

What protection achieves is raising the cost of extraction. A well-designed system forces an attacker to query at much higher volume to obtain useful training signal, make the collected data less informative for training, and potentially expose the extraction attempt to detection and enforcement. For a commercial competitor, these costs matter. The economics of unauthorized distillation depend on obtaining high-quality data cheaply. A mechanism that forces a ten-fold increase in required queries, or that degrades student model performance substantially, can make the attack economically unattractive even if it cannot prevent it entirely.

For API consumers building legitimate applications, distillation protection should be invisible. The stated goal is to impede capability extraction, not to alter normal use-case outputs. Anthropic has not documented any quality trade-offs for production applications.

Why API consumers should understand this feature

The presence of distillation protection has practical implications for anyone building on the Fable 5 API, even if they have no intention of extracting the model’s capabilities.

First, if you are building a training pipeline that uses Fable 5 outputs as input data for fine-tuning your own models, you are operating in territory that the protection mechanism is designed to disrupt. The protection does not distinguish between authorized and unauthorized uses of outputs as training data, because the distinction is not technically observable. A developer fine-tuning a domain-specific model on Fable 5-generated annotations is doing something structurally similar to what an adversarial extractor does, even if the intent and scale differ substantially.

Second, any application that systematically sweeps Fable 5 across a broad capability space, including benchmarking pipelines, capability evaluators, and large-scale content generation jobs, may present query patterns that activate detection components of the protection system.

Third, the precedent matters. Fable 5 is the first Claude model to ship with a named distillation protection feature.1 If the mechanism proves effective and scales without quality impact on legitimate use, expect it to propagate across the model lineup in future releases. Understanding what it is now, before documentation is published, positions API consumers to adapt their architectures accordingly.

What this reveals about frontier model economics

Distillation protection is a response to a specific economic dynamic. As frontier models become more capable, the value of capturing their behavior through API queries increases relative to the cost of training from scratch. At some capability level, the extraction economics become attractive enough to warrant systematic attempts.

Anthropic’s decision to ship this feature with Fable 5 rather than waiting for a documented extraction incident suggests they have assessed the risk as material at this capability tier. The model’s pricing, its lead on specific coding and analytics benchmarks, and the decision to create a separate restricted tier for the higher-capability Mythos 5 variant all point to a provider that is treating capability containment as a commercial priority alongside safety.

For the AI industry, distillation protection at the API layer represents a new category of model defense. The technical arms race between extraction and protection will likely shape how frontier models are deployed and priced over the next several years.

Frequently Asked Questions

Q: Does distillation protection affect normal API use? A: Anthropic has stated only that it prevents model capability extraction. No documented quality impact on standard applications has been disclosed. If protection is implemented at output level, it is designed to be imperceptible in normal use.1

Q: Can I use Fable 5 outputs to fine-tune my own model? A: Anthropic’s Terms of Service govern what you can do with API outputs. The distillation protection feature is a separate technical mechanism and does not define what is permitted. Review the current API usage policy directly before building training pipelines on top of Fable 5 outputs.

Q: How does this relate to the Mythos 5 data retention policy? A: Mythos 5 traffic has a 30-day retention policy and is not used for training.1 That is a data handling policy, not a distillation protection mechanism. The two operate at different layers: data retention governs what Anthropic does with your inputs; distillation protection governs what happens to the outputs you receive.

Q: Is distillation protection the same as watermarking? A: Watermarking is one possible implementation approach. Anthropic has not confirmed whether Fable 5’s protection uses watermarking, output perturbation, query pattern detection, or some combination. The mechanisms are distinct in how they work and what they protect against.1

sources · 3 cited

  1. Anthropic. "Introducing Claude Fable 5 and Claude Mythos 5." Anthropic News, June 9, 2026 vendor accessed 2026-06-10
  2. Anthropic. "Claude Models Overview." Anthropic Platform Documentation, June 2026 vendor accessed 2026-06-10
  3. Anthropic. "Introducing Claude Opus 4.7." Anthropic News vendor accessed 2026-06-10