OpenAI has published an addendum to its GPT-5 System Card that adds two new safety evaluation categories, Emotional Reliance and Mental Health, and reports benchmark improvements between the August 15 GPT-5 Instant build and an October 3 updated model. The addendum lives on OpenAI’s Deployment Safety Hub rather than as a revision to the original arXiv system card, and is the first post-launch addendum to GPT-5’s safety documentation.
What the addendum actually changes
The addendum introduces evaluation results for two categories that did not exist when the GPT-5 System Card was first published: Emotional Reliance and Mental Health. These measure how often the model’s responses fall into “unsafe” territory when users discuss topics related to emotional attachment to the system or mental health crises.
The reported numbers show large improvements between the two builds. On Emotional Reliance, the “not unsafe” score moved from 0.507 to 0.976. On Mental Health, it moved from 0.273 to 0.926, according to the addendum’s benchmark tables.
Two things are worth noting about these figures. The baselines (0.507 and 0.273) were scored retrospectively against the August 15 build, which predates both evaluation categories. The August model was never trained or fine-tuned with these benchmarks in mind, so the low baselines are partially an artifact of retroactive measurement rather than a controlled before-after comparison. The headline improvement claim, a 65-80% reduction in “responses that fall short of our desired behavior,” is OpenAI’s own characterization with no external audit cited.
OpenAI says it worked with more than 170 mental health experts on the October 3 update, per the addendum’s introduction.
The regression buried in a table
While the addendum emphasizes safety improvements, not every metric in the benchmark tables moved in the same direction. Extremism detection regressed from 0.933 on the August 15 build to 0.925 on the October 3 build, per the addendum’s Production Benchmarks table.
This is a small shift, and it may be an acceptable trade-off if the gains in emotional-reliance and mental-health evals are genuine. But the fact that it appears only in a table, with no discussion of why the October build regressed on this dimension, is an asymmetry that matters to practitioners making deployment decisions. Safety improvements in one dimension do not guarantee stability in others, and the addendum’s structure makes that trade-off easy to miss.
From snapshot to changelog
The GPT-5 System Card was originally published on arXiv. The sensitive-conversations addendum sits alongside it as a separate page on the Deployment Safety Hub. The GPT-5.5 system card, published April 23 and updated a day later with API deployment safeguard details, continues the same pattern of rapid post-launch amendment.
This is a structural shift in how system cards function. When a system card was a single document published at model launch, compliance teams and auditors could anchor their assessments on that snapshot. Now that system cards accumulate addenda, revisions, and mid-cycle patches, the launch-day document is no longer authoritative on its own. The authoritative safety posture of a deployed model is distributed across multiple pages, updated on unclear schedules, with no unified changelog to track what changed and when.
For organizations that use GPT-5 in production and need to maintain compliance documentation, monitoring OpenAI’s Deployment Safety Hub and arXiv page for updates is now a baseline operational requirement.
The incidents behind the addendum
The addendum did not emerge in a vacuum. In September 2025, OpenAI announced it would route sensitive conversations to reasoning models like GPT-5-thinking via a real-time router, and introduced parental controls including age-appropriate model behavior rules and the ability to disable memory and chat history for teen accounts, according to TechCrunch’s reporting.
That announcement followed two high-profile incidents. The first involved the suicide of teenager Adam Raine, whose parents filed a wrongful death lawsuit against OpenAI. The second was the murder-suicide by Stein-Erik Soelberg, also involving ChatGPT failing to intervene during a mental health crisis.
The addendum should be read in that context. It is a technical response to specific harm cases that remain in active litigation.
Uneven transparency across labs
Zvi Mowshowitz has noted that OpenAI’s GPT-5.5 system card provides less detail than Anthropic’s Mythos and Opus model cards, expressing reservations about its ability to surface new alignment problems, according to coverage at LetsDataScience.
OpenAI is at least publishing the data. The benchmark tables in the sensitive-conversations addendum include before-and-after numbers on specific evals, and regressions like the extremism score are there for anyone willing to read past the summary.
What practitioners should do
For teams deploying GPT-5 or evaluating whether to adopt updated builds:
Do not anchor on the original system card alone. The launch-day snapshot does not reflect the current safety posture. Check the Deployment Safety Hub for addenda and the arXiv page for version updates.
Audit the trade-offs explicitly. The October 3 build improved most safety evals but regressed on extremism detection (0.933 to 0.925). If your use case depends on extremism detection, the regression may be material.
Read the baselines critically. The Emotional Reliance and Mental Health baselines were scored retrospectively on a model that was not designed for those evals. The improvement numbers are directionally useful but should not be quoted as absolute measures of safety gain without that caveat.
Track the litigation. The Raine wrongful death suit and related cases are ongoing. Depending on outcomes, the regulatory and liability landscape for sensitive-conversation handling may shift further.
Frequently Asked Questions
Did the October 3 build regress on anything beyond extremism detection?
Yes. On the SimpleQA benchmark, accuracy dropped from 0.46 to 0.44 and the hallucination rate rose from 0.49 to 0.52 between the August 15 and October 3 builds. This is the second regression visible in the addendum’s tables but absent from its narrative discussion.
What did the May 2026 arXiv revision to the original GPT-5 System Card add?
The v2 update added monitorability evaluations and additional authors to the arXiv paper (2601.03267). Monitorability evals measure whether a model’s internal processing can be inspected by external observers, a category relevant to compliance audits. This revision was made to the original system card separately from the sensitive-conversations addendum on the Deployment Safety Hub, with no cross-reference or unified release note between the two.
What parental-control features shipped with the September 2025 sensitive-conversation routing?
Beyond routing to reasoning models, OpenAI introduced distress notifications sent to parents of teen users, age-appropriate behavior rules, and the ability to disable memory and chat history on teen accounts. These controls preceded the addendum’s benchmark data by roughly a year.
What litigation risk do GPT-5 deployers face from the incidents that prompted the addendum?
Jay Edelson, counsel for the Raine family, has publicly called OpenAI’s overall safety response ‘inadequate.’ If the Raine wrongful death suit or the Soelberg case produces a liability finding or a settlement with operational commitments, organizations running GPT-5 in customer-facing applications could face new duty-of-care standards for mental health crisis handling, even if they are not parties to the litigation.
How does OpenAI’s approach to system-card updates compare to Anthropic’s?
Zvi Mowshowitz has noted that Anthropic’s Mythos and Opus model cards provide more detail at publication time than OpenAI’s GPT-5.5 card. Anthropic publishes cards as versioned standalone documents, while OpenAI is fragmenting its safety record across arXiv papers, the Deployment Safety Hub, and unannounced addenda with no unified changelog. For compliance teams, Anthropic’s model makes it easier to identify what changed between revisions and when.