GDPR Rectification Rights Have No Clear Owner in ML Supply Chains

GDPR Articles 16 and 17 give EU data subjects the right to correct inaccurate personal data and, under certain conditions, have it erased. A June 2026 arXiv short paper (2606.05946) argues these rights become structurally unenforceable once personal data has been absorbed into an ML model through a chain of brokers, fine-tuners, and downstream deployers. The paper’s “models in the dark” framing names a specific problem: GDPR’s compliance model assumes a single identifiable controller, and ML supply chains do not have one.

What Articles 16 and 17 Require, And What They Assume

Article 16 grants data subjects the right to obtain rectification of inaccurate personal data “without undue delay.” Article 17 grants the right to erasure on grounds including withdrawal of consent or when the data is no longer necessary for its original purpose. Both provisions assume that the data subject’s information exists in a form that can be located, amended, or deleted by a party that both holds the data and has the authority to act on it.

The GDPR further requires controllers to demonstrate compliance under the accountability principle (Article 5.2) and to implement data protection “by design and by default” under Article 25. These obligations bind the entity classified as the data controller: the party that decides why and how personal data is processed. Processors, defined as third parties acting on the controller’s behalf, carry fewer direct obligations. Fines for non-compliance reach up to €20 million or 4% of global annual revenue, whichever is higher.

The regulatory architecture is clear when a single company collects data, stores it in a database, and uses it for a stated purpose. It is less clear when the “database” is a set of trained model weights distributed across multiple organizations, each performing a different transformation on the data.

The ML Supply Chain: Brokers, Fine-Tuners, and Deployers

A modern ML pipeline typically involves multiple parties handling the same underlying data at different stages. A data broker curates a training corpus. A model developer trains a foundation model on that corpus. A fine-tuner adapts the model for a specific domain. A downstream deployer integrates the fine-tuned model into a product. At each stage, the original data subjects’ contributions become progressively harder to isolate.

The arXiv paper argues that this chain creates a regulatory dead zone. When personal data is baked into model weights through training, no single party in the chain can fully trace or reverse a specific individual’s contribution. The weights are a lossy, distributed representation of the entire training set. Asking a deployer to locate and remove one person’s data from a model they did not train, using data they did not collect, is asking them to perform an operation that the model’s architecture does not support.

Why Training Data Becomes Untraceable After Model Convergence

Once a neural network has converged on its training data, individual training examples do not map to identifiable regions of the parameter space. A single person’s records may have influenced thousands of weights, each shared with contributions from millions of other examples. The model does not maintain an index of which data points shaped which parameters, and the training process is not designed to preserve that mapping.

This is not a bug in the training pipeline. It is the intended behavior of gradient-based optimization. The whole point of training on large datasets is to produce representations that generalize beyond any single example. The legal framework, written for databases where records can be located and deleted row by row, assumes a data model that ML systems deliberately do not preserve.

The EUR-Lex summary of the GDPR describes the one-stop-shop mechanism, under which businesses with multiple EU establishments deal with a single lead supervisory authority. This mechanism presumes a single identifiable controller. The paper’s argument is that this presumption breaks in multi-party ML supply chains where brokers, fine-tuners, and deployers each claim processor status and no single party controls the full data lifecycle.

Controller or Processor? Nobody Wants to Own the Erasure Obligation

The controller-versus-processor classification determines who bears primary accountability for GDPR compliance. In conventional data processing, the distinction is workable: the hospital is the controller, the cloud provider is the processor. In an ML supply chain, the boundaries blur.

A data broker that curates training data might argue it is a processor acting on the model developer’s instructions. The model developer might argue it is a processor transforming data at the fine-tuner’s request. The fine-tuner might argue it merely adapts an existing model and never processes raw personal data. The deployer might argue it only runs inference and has no access to training data at all. Each party has a plausible claim to processor status, and none has a strong incentive to accept the controller label, given the €20 million or 4% revenue penalty exposure.

The paper argues that this classification game shifts the compliance burden onto whichever entity is easiest to identify and regulate, which is typically the one closest to the end user, and the one least technically equipped to trace or alter the model’s training lineage. As of June 2026, EU supervisory authorities have not reached a consensus position on how to assign controller status across AI training pipelines.

Machine Unlearning: Promising Research, Unresolved Production Gap

The academic response to the erasure problem is “machine unlearning”: techniques designed to remove a specific data point’s influence from a trained model without retraining from scratch. The research area is active. Whether any published technique works reliably at production scale, on large foundation models, with provable guarantees that the targeted data’s influence has been fully removed, is a different question.

The brief’s research notes flag that machine unlearning is “often oversold by vendors.” A technique demonstrated on a small classifier does not transfer to a 70-billion-parameter model trained on web-scale data. No source in the brief documents a production-grade unlearning system that satisfies GDPR’s erasure standard, and none should be assumed to exist without independent verification.

The arXiv paper, according to the brief’s characterization, frames this gap as structural rather than temporary. The argument is not that better unlearning techniques are impossible, but that the regulatory assumption of a traceable, reversible data pipeline is incompatible with how modern ML systems are actually built and distributed.

What the 2027 Cross-Border Rules Will Not Fix

Regulation (EU) 2025/2518, adopted November 26, 2025, establishes new procedural rules for cross-border GDPR enforcement. It clarifies procedural rights for complainants and parties under investigation, and it applies from April 2, 2027. The regulation addresses how enforcement works across borders. It does not address who the controller is when the “data” in question is distributed across a trained model’s weights.

This is the structural gap the paper identifies. Even with clearer enforcement procedures, a data subject who requests erasure under Article 17 still needs a controller who can locate their data and delete it. In an ML supply chain, the entity that receives the request may not be the entity that trained the model, may not have access to the training data, and may not be able to determine whether the data subject’s information is present in the model’s weights at all.

The GDPR was written for an era of structured databases and identifiable data controllers. The regulation’s own accountability principle requires controllers to demonstrate compliance. Demonstrating that a specific individual’s data has been removed from a model’s weights requires the ability to verify that removal, which in turn requires the ability to measure whether the individual’s data influenced the model in the first place. Current ML systems do not provide this, and the regulatory framework does not account for its absence.

Frequently Asked Questions

Does the supply-chain gap apply to open-source model releases?

When a foundation model is released as open weights, the original trainer has no contractual relationship with downstream fine-tuners. GDPR Article 28 requires a written contract between controller and processor that specifies processing instructions. Open-weight distribution has no such contract, so the legal chain of accountability breaks at the point of release. Downstream users operate outside any processor agreement, and the original trainer has no mechanism to compel erasure compliance from them.

How does Article 17 differ from the original ‘right to be forgotten’ proposal?

The European Parliament’s March 2014 draft proposed a broader right to be forgotten with wider deletion grounds. Article 17 as adopted narrowed this to specific conditions: consent withdrawal, data no longer necessary for its purpose, or the controller’s legitimate interests being overridden. Even this narrower version cannot be satisfied once data is baked into model weights, suggesting the original broader proposal would have been even more disconnected from how ML pipelines actually work.

What happens when a data subject files an erasure complaint and no controller is identified?

Regulation 2025/2518, effective April 2027, grants complainants the right to be heard and to access investigation files in cross-border cases. But procedural rights address process, not technical capability. An investigation can proceed against the most identifiable entity, but if no party can demonstrate that a specific person’s data was removed from distributed model weights, the complaint may close without an enforceable remedy.

Would retraining a model from scratch satisfy an Article 17 request?

Retraining on a corpus that excludes the data subject’s records produces a new model version without that data. However, copies of the old model may already be distributed to downstream fine-tuners and deployers who have no obligation to update. Article 19 requires the controller to inform recipients of any erasure, but in an open-weight ecosystem the controller often does not know who those recipients are. Retraining addresses one link in the chain while leaving the rest untouched.