OpenAI Adds a GPT-5 System Card Addendum on Sensitive Conversations

OpenAI has published an addendum to its GPT-5 System Card that adds two new safety evaluation categories, Emotional Reliance and Mental Health, and reports benchmark improvements between the August 15 GPT-5 Instant build and an October 3 updated model. The addendum lives on OpenAI’s Deployment Safety Hub rather than as a revision to the original arXiv system card, and is the first post-launch addendum to GPT-5’s safety documentation.

What the addendum actually changes

The addendum introduces evaluation results for two categories that did not exist when the GPT-5 System Card was first published: Emotional Reliance and Mental Health. These measure how often the model’s responses fall into “unsafe” territory when users discuss topics related to emotional attachment to the system or mental health crises.

The reported numbers show large improvements between the two builds. On Emotional Reliance, the “not unsafe” score moved from 0.507 to 0.976. On Mental Health, it moved from 0.273 to 0.926, according to the addendum’s benchmark tables.

Two things are worth noting about these figures. The baselines (0.507 and 0.273) were scored retrospectively against the August 15 build, which predates both evaluation categories. The August model was never trained or fine-tuned with these benchmarks in mind, so the low baselines are partially an artifact of retroactive measurement rather than a controlled before-after comparison. The headline improvement claim, a 65-80% reduction in “responses that fall short of our desired behavior,” is OpenAI’s own characterization with no external audit cited.

OpenAI says it worked with more than 170 mental health experts on the October 3 update, per the addendum’s introduction.

The regression buried in a table

While the addendum emphasizes safety improvements, not every metric in the benchmark tables moved in the same direction. Extremism detection regressed from 0.933 on the August 15 build to 0.925 on the October 3 build, per the addendum’s Production Benchmarks table.

This is a small shift, and it may be an acceptable trade-off if the gains in emotional-reliance and mental-health evals are genuine. But the fact that it appears only in a table, with no discussion of why the October build regressed on this dimension, is an asymmetry that matters to practitioners making deployment decisions. Safety improvements in one dimension do not guarantee stability in others, and the addendum’s structure makes that trade-off easy to miss.

From snapshot to changelog

The GPT-5 System Card was originally published on arXiv. The sensitive-conversations addendum sits alongside it as a separate page on the Deployment Safety Hub. The GPT-5.5 system card, published April 23 and updated a day later with API deployment safeguard details, continues the same pattern of rapid post-launch amendment.

This is a structural shift in how system cards function. When a system card was a single document published at model launch, compliance teams and auditors could anchor their assessments on that snapshot. Now that system cards accumulate addenda, revisions, and mid-cycle patches, the launch-day document is no longer authoritative on its own. The authoritative safety posture of a deployed model is distributed across multiple pages, updated on unclear schedules, with no unified changelog to track what changed and when.

For organizations that use GPT-5 in production and need to maintain compliance documentation, monitoring OpenAI’s Deployment Safety Hub and arXiv page for updates is now a baseline operational requirement.

The incidents behind the addendum

The addendum did not emerge in a vacuum. In September 2025, OpenAI announced it would route sensitive conversations to reasoning models like GPT-5-thinking via a real-time router, and introduced parental controls including age-appropriate model behavior rules and the ability to disable memory and chat history for teen accounts, according to TechCrunch’s reporting.

That announcement followed two high-profile incidents. The first involved the suicide of teenager Adam Raine, whose parents filed a wrongful death lawsuit against OpenAI. The second was the murder-suicide by Stein-Erik Soelberg, also involving ChatGPT failing to intervene during a mental health crisis.

The addendum should be read in that context. It is a technical response to specific harm cases that remain in active litigation.

Uneven transparency across labs

Zvi Mowshowitz has noted that OpenAI’s GPT-5.5 system card provides less detail than Anthropic’s Mythos and Opus model cards, expressing reservations about its ability to surface new alignment problems, according to coverage at LetsDataScience. Anthropic’s June 2026 launch of Claude Fable 5, its most capable widely released model, continued this pattern with quantified cybersecurity and biology classifier results included in the release documentation.

OpenAI is at least publishing the data. The benchmark tables in the sensitive-conversations addendum include before-and-after numbers on specific evals, and regressions like the extremism score are there for anyone willing to read past the summary.

What practitioners should do

For teams deploying GPT-5 or evaluating whether to adopt updated builds:

Do not anchor on the original system card alone. The launch-day snapshot does not reflect the current safety posture. Check the Deployment Safety Hub for addenda and the arXiv page for version updates.
Audit the trade-offs explicitly. The October 3 build improved most safety evals but regressed on extremism detection (0.933 to 0.925). If your use case depends on extremism detection, the regression may be material.
Read the baselines critically. The Emotional Reliance and Mental Health baselines were scored retrospectively on a model that was not designed for those evals. The improvement numbers are directionally useful but should not be quoted as absolute measures of safety gain without that caveat.
Track the litigation. The Raine wrongful death suit and related cases are ongoing. Depending on outcomes, the regulatory and liability landscape for sensitive-conversation handling may shift further.

Frequently Asked Questions

Did the October 3 build regress on anything beyond extremism detection?

Yes. On the SimpleQA benchmark, accuracy dropped from 0.46 to 0.44 and the hallucination rate rose from 0.49 to 0.52 between the August 15 and October 3 builds. This is the second regression visible in the addendum’s tables but absent from its narrative discussion.

What did the May 2026 arXiv revision to the original GPT-5 System Card add?

The v2 update added monitorability evaluations and additional authors to the arXiv paper (2601.03267). Monitorability evals measure whether a model’s internal processing can be inspected by external observers, a category relevant to compliance audits. This revision was made to the original system card separately from the sensitive-conversations addendum on the Deployment Safety Hub, with no cross-reference or unified release note between the two.

What parental-control features shipped with the September 2025 sensitive-conversation routing?

Beyond routing to reasoning models, OpenAI introduced distress notifications sent to parents of teen users, age-appropriate behavior rules, and the ability to disable memory and chat history on teen accounts. These controls preceded the addendum’s benchmark data by roughly a year.

What litigation risk do GPT-5 deployers face from the incidents that prompted the addendum?

Jay Edelson, counsel for the Raine family, has publicly called OpenAI’s overall safety response ‘inadequate.’ If the Raine wrongful death suit or the Soelberg case produces a liability finding or a settlement with operational commitments, organizations running GPT-5 in customer-facing applications could face new duty-of-care standards for mental health crisis handling, even if they are not parties to the litigation.

How does OpenAI’s approach to system-card updates compare to Anthropic’s?

Zvi Mowshowitz has noted that Anthropic’s Mythos and Opus model cards provide more detail at publication time than OpenAI’s GPT-5.5 card. Anthropic publishes cards as versioned standalone documents, while OpenAI is fragmenting its safety record across arXiv papers, the Deployment Safety Hub, and unannounced addenda with no unified changelog. For compliance teams, Anthropic’s model makes it easier to identify what changed between revisions and when. Claude Opus 4.8, released May 28, 2026, extends this pattern: Anthropic describes it as more likely to flag uncertainties and less likely to make unsupported claims than Opus 4.7, which is a measurable honesty property that a system card could in principle benchmark directly. Claude Fable 5, launched June 9, 2026 as Anthropic’s most capable widely released model above Opus 4.8, continues the same approach: its safety documentation includes cybersecurity classifiers (zero compliance across all 30 jailbreak techniques tested) and biology/chemistry classifier results, with flagged prompts falling back to Opus 4.8. Whether any of these claims hold under independent evaluation is a separate question, but the stated design intent is concrete and auditable in a way that OpenAI’s scattered addenda are not. For context on how OpenAI frames biology risk disclosures differently, see OpenAI’s Biology Risk Post Reads as S-1 Disclosure Prep, Not Safety Theater.