Canada's Joint Privacy Ruling: OpenAI Trained ChatGPT on Medical and Ideological Data Without Consent

On May 6, 2026, Canada’s federal Privacy Commissioner and counterparts from Quebec, British Columbia, and Alberta jointly ruled¹ that OpenAI’s training of ChatGPT on publicly scraped data violated five privacy statutes. The regulators found the collection “overbroad and therefore inappropriate,” encompassing health conditions, political views, and children’s information about non-users. The finding rejects the argument that public availability equals consent, and it shifts the compliance burden to vendors to justify their data practices or fit narrow statutory exemptions.

The Joint Finding: What Four Regulators Agreed On

The joint investigation¹ concluded that OpenAI’s initial training and deployment of ChatGPT ran afoul of five privacy statutes: federal PIPEDA, British Columbia’s PIPA, Alberta’s PIPA, and Quebec’s Private Sector Act. The regulators examined data drawn from public websites and licensed datasets, and found the scope “overbroad and therefore inappropriate.” That overbreadth necessarily swept up personal information of varying sensitivity, including health conditions, political views, and children’s information.

OpenAI did not obtain valid consent, and the regulators rejected implied consent. Individuals posting data to social media or professional profiles would not reasonably have expected it to be repurposed for generative AI training, and there was no real choice about participation. The ruling¹ treats the absence of an opt-out as a structural failure, not a technical inconvenience.

Why ‘Publicly Available’ Failed as a Defense in All Four Jurisdictions

OpenAI’s central defense, that the data was publicly accessible and therefore fair game, failed in every jurisdiction. The regulators ruled that information scraped from social media or professional profiles does not fall under PIPEDA’s publicly available exception¹, and that indiscriminate scraping will not meet provincial PIPA requirements. The test under Canadian law is whether individuals would reasonably expect the use, not whether the data sits behind a login wall.

This is a significant doctrinal shift for AI vendors who have treated the open web as a consent-free zone. The ruling signals that bulk scraping for model training faces stricter scrutiny than, say, academic research or search indexing, because the ingestion is comprehensive, permanent, and directed toward commercial generative products.

The Split Verdict: Federal ‘Resolved’ vs. Provincial ‘Unresolved’

The four regulators did not reach a unified conclusion on remedy. The federal Office of the Privacy Commissioner deemed the matter well-founded and conditionally resolved¹, accepting OpenAI’s commitments as sufficient to close the file. British Columbia and Alberta, however, found it well-founded and unresolved², and Quebec left consent and retention issues partially unresolved².

This split creates immediate legal uncertainty. A model builder licensing datasets rather than scraping directly now faces uneven risk: federal acceptance of OpenAI’s mitigations does not preclude provincial enforcement actions. The divergence also complicates compliance architecture for any vendor operating across Canadian jurisdictions.

What OpenAI Retired, Filtered, and Committed To

OpenAI has retired the GPT-3.5 and GPT-4 models that were trained in violation of Canadian privacy laws and implemented an internal filtering tool to detect and mask identifying information about private individuals in training data, according to analysis by gblock³. The company also committed to quarterly compliance reporting with specific deadlines.

Within three months, OpenAI must provide clearer notice to signed-out users that chats may be reviewed and used for training. Within six months, it must improve data export formats, confirm strong protections for retired datasets, and test protective measures for children of public figures, per CP24’s reporting⁴ on the commitment schedule.

What This Means for Downstream Model Builders and Dataset Vendors

The precedent shifts the burden of proof onto AI vendors to demonstrate valid consent or fit a narrow statutory exemption for training data. For downstream model builders, the ruling raises the cost of bulk web scraping wherever federal-provincial joint enforcement applies. Vendors can no longer assume that publicly posted data carries an implicit license for model training.

The split verdict amplifies the risk for dataset licensing businesses. A vendor that purchases cleaned datasets from intermediaries may face liability if the original collection lacked consent, and the unresolved status in BC and Alberta leaves open the possibility of further orders or penalties. Compliance teams will need to trace provenance with more rigor than the current standard, which often treats “public web” as a self-certifying category.

The Global Precedent: Canada vs. EU and US Trajectories

The finding that publicly accessible data does not equal consent-exempt data will shape AI training jurisprudence globally, according to the joint investigation¹. For model builders operating across borders, the ruling means the “public web” defense is no longer reliable in Canada, and the split verdict between federal and provincial regulators introduces uncertainty that licensed dataset vendors will have to price into their compliance risk.

The specific commitments carry fixed deadlines. Privacy Commissioner Philippe Dufresne⁵ stated that “appropriate safeguards are the cornerstone of responsible innovation” and that addressing AI privacy impacts is key to ensuring Canadians can benefit from new technologies without giving up their fundamental right to privacy. The August and November 2026 deadlines will test whether those safeguards arrive on schedule.

Frequently Asked Questions

Does this ruling apply to foreign AI companies that merely serve Canadian users?

PIPEDA applies to any organization collecting personal information in the course of commercial activities with a real and substantial connection to Canada, regardless of domicile. The ruling did not limit its scope to Canada-based entities, so offshore model providers whose products are accessible to Canadian users could face similar joint investigations, a reach that many non-Canadian AI vendors have not factored into their compliance posture.

Italy’s Garante banned ChatGPT outright in 2023 under GDPR, then lifted the ban after OpenAI added age verification and transparency notices, a binary ban-and-lift cycle targeting deployment practices. Canada’s ruling goes further upstream, directly addressing the legality of the training data collection itself and imposing an ongoing quarterly compliance schedule. The split verdict between federal and provincial regulators, where some files remain open indefinitely, also has no direct EU analogue.

What happens if OpenAI misses the August or November 2026 deadlines?

Under PIPEDA, the federal Privacy Commissioner cannot levy monetary penalties directly, enforcement requires escalation to the Federal Court. But Quebec’s Law 25 grants its regulator administrative monetary penalty powers, creating asymmetric consequences: the same missed deadline could trigger financial penalties in Quebec while producing only a court referral federally. This enforcement asymmetry is partly why BC and Alberta kept their files open as unresolved.

Why did OpenAI retire entire models rather than removing only the flagged personal data?

Machine unlearning techniques cannot reliably erase specific individual data points from a trained large language model’s weights, because the knowledge is distributed across billions of parameters in ways that resist surgical removal. OpenAI chose full model retirement, replacing GPT-3.5 and GPT-4 with newer checkpoints, because selective data erasure from an already-trained model remains technically infeasible at production scale. This means retroactive compliance may require full model deprecation rather than targeted corrections.