groundy
ethics, policy & safety

Distributed Training Breaks the Compute Thresholds Behind AI Regulation

A May 2026 paper shows DiLoCo-style distributed training can split a frontier model run across sub-threshold clusters, making FLOP-based regulatory caps bypassable by design.

6 min · · · 5 sources ↓

FLOP Thresholds Are the Backbone of AI Regulation, And They Have a Blind Spot

Every major AI governance framework in effect today ties its regulatory trigger to a single number: cumulative training FLOPs. The EU AI Act presumes “high impact capabilities” when a model is trained above 10^25 FLOP (Article 51). The now-expired US Executive Order 14110 set its reporting threshold at 10^26 FLOP (arXiv:2605.29359). California’s SB 53 enforces a comparable bar with penalties up to $1 million (arXiv:2605.29359). A paper published on arXiv in May 2026 argues that distributed training algorithms have advanced far enough to make those thresholds an accounting fiction.

The argument is mechanical, not speculative: if you split a training run across enough small clusters, no individual cluster crosses the FLOP cap, even though the combined run produces a model that would have triggered the threshold if trained in one place. The paper models this under adversarial constraints and concludes that frontier-scale models are, in principle, trainable on dispersed nodes connected by consumer-grade internet (arXiv:2605.29359).

How Communication-Efficient Training Shrinks the Bandwidth Wall

Standard distributed training assumes high-bandwidth, low-latency interconnects. Recent algorithmic advances compress inter-node gradient transfers to the point where frontier-scale training becomes theoretically possible on bandwidth orders of magnitude below datacenter standard (arXiv:2605.29359). The technique works by performing more local gradient updates before synchronizing, trading compute for communication.

A separate empirical result reinforces the point. Work on distributed training under packet loss demonstrated that LLAMA2 7B trained across 64 GPUs can tolerate 10% random packet loss with at most 0.8% change in perplexity (arXiv:2507.07114). That finding addresses a different dimension of unreliable networks but supports the same conclusion: distributed training is robust on infrastructure that would have been considered unusable even a few years ago.

What the Feasibility Model Actually Shows (and Doesn’t)

The paper’s adversarial model is deliberately constrained: each node is limited in compute, bandwidth is capped at consumer-grade levels, and latency is set to reflect typical internet connections. These parameters were chosen because they approximate what a well-resourced actor could assemble without attracting the attention of compute-governance regimes that track large GPU clusters. Scher et al. (2025) proposed the most restrictive regime in the literature: banning pre-training above 10^24 FLOP (arXiv:2605.29359). The paper’s modeling suggests that models above that banned threshold could, in theory, be trained on sub-threshold nodes connected by consumer-grade internet.

What the paper does not show is a live demonstration. No one has trained a frontier-scale model across dispersed consumer-internet nodes. The feasibility window is also projected over a multi-year timeframe, meaning the technique may require further algorithmic improvements before it works at that scale.

The paper also acknowledges additional bypass vectors beyond distributed training. Model distillation and mixture-of-agents techniques could produce capable models without any single training run crossing a FLOP threshold. The paper flags these as complicating factors but does not model them quantitatively (arXiv:2605.29359).

The Regulatory Patch: Chips, Memory, and Forensic Accounting

If FLOP-based thresholds are bypassable, the paper recommends shifting the regulatory target. Its proposed countermeasures include chip-level tracking, forensic accounting of hardware purchases, whistleblowing incentives, and registration requirements triggered by both compute throughput and accelerator memory thresholds rather than FLOP counts alone (arXiv:2605.29359). The logic is straightforward: tracking physical hardware is harder to spoof than auditing a training run’s compute budget after the fact.

The EU’s current framework already contains hooks for amendment. Article 51 grants the Commission authority to adjust the 10^25 FLOP threshold via delegated acts to keep pace with algorithmic improvements (Article 51). The systemic-risk threshold is noted as “under review” in current EU guidance (EU AI Act GPAI obligations). EU guidelines also require providers to notify the Commission within two weeks when models meet or are expected to meet systemic-risk thresholds, extending to the planning phase (EU regulatory guidelines).

The problem is that lowering the FLOP threshold to catch distributed training also sweeps in smaller, legitimate actors whose combined compute is modest by any standard. Registration triggered by accelerator memory or cluster size, as the paper suggests, is a more targeted signal, but it requires a hardware-tracking infrastructure that does not currently exist at international scale.

Why Enforcement Gets Harder, Not Easier, After the Fix

The structural problem is not that regulators picked the wrong FLOP number. It is that any static threshold will erode as training algorithms become more communication-efficient. Each improvement widens the gap between what a threshold was designed to catch and what a dispersed training run actually needs.

The enforcement problem compounds. In a world where consumer-grade nodes with a handful of accelerators can collectively train a frontier model, the observable surface area for enforcement expands from a few hundred data centers to potentially thousands of small installations. Chip-level provenance tracking, if it existed, would narrow that surface. But building that tracking system requires the same international coordination that has stalled every prior attempt at compute governance, and the technical capability to evade it improves on a shorter timeline than the regulatory process typically operates on.

Frequently Asked Questions

Does the EU’s 10^23 FLOP GPAI classification threshold also get bypassed by distributed training?

Yes. The EU operates two tiers: 10^23 FLOP triggers GPAI classification, while 10^25 FLOP triggers systemic-risk designation. Distributed training splits the run so no single node crosses either bar. The 10^23 FLOP tier is low enough that a single node with 16 H100 GPUs running a long training job could approach it on its own, meaning regulators cannot simply lower the threshold without pulling individual research workstations into the reporting regime.

How do enforcement penalties differ between the EU and California frameworks?

The EU AI Act permits fines up to 3% of a provider’s global annual turnover or EUR 15 million, whichever is higher. California’s SB 53 caps penalties at $1 million. For a large AI lab, EU exposure could be two to three orders of magnitude larger. Neither framework’s penalty structure currently accounts for whether a training run was distributed across sub-threshold nodes.

Is there a tool policymakers can use to model distributed training feasibility?

The paper’s authors published an interactive simulator at intelligence.org/research/distributed-training-simulator with open-source code on GitHub. Users specify bandwidth, latency, compute-per-node, and target model size, and the tool reports whether the configuration is feasible under current algorithmic assumptions. It is designed for governance bodies that need to set thresholds grounded in technical constraints rather than static FLOP counts.

How large is the gap between what was modeled and what has been empirically demonstrated?

The paper’s feasibility claim targets a Llama 3.1-405B-class model (roughly 405 billion parameters) trained across dispersed nodes, but this is mathematical modeling only. The largest empirical demonstration of distributed training under adverse network conditions is LLAMA2 7B (7 billion parameters) across 64 GPUs with simulated packet loss, from a separate study. That is roughly a 58x gap in parameter count. The paper projects a multi-year window for closing it through further algorithmic improvements.

sources · 5 cited

  1. Article 51: Classification of GPAI Models with Systemic Risk primary accessed 2026-05-29
  2. Does Distributed Training Undermine Compute Governance? primary accessed 2026-05-29
  3. Distributed Training under Packet Loss primary accessed 2026-05-29
  4. General-purpose AI obligations under the AI Act primary accessed 2026-05-29
  5. EU clarifies AI model thresholds in new regulatory guidelines analysis accessed 2026-05-29