OpenAI added two user-visible safety states to ChatGPT in early 2026: Lockdown Mode, which kills live network access at the infrastructure layer, and Elevated Risk labels, which surface when network-dependent capabilities are active. Together they acknowledge something the prompt-injection research community has been saying for years: model-level content filtering cannot reliably stop an attacker who can inject instructions into the context window. The fix is to remove the exfiltration path entirely.
What Lockdown Mode disables
Lockdown Mode restricts ChatGPT’s web browsing to cached content only, according to The Cyber Express. No live network requests leave OpenAI’s controlled network while the mode is active. The effect is deterministic: if a prompt injection persuades the model to visit a URL, the request simply cannot fire.
The tradeoff is real. In Lockdown Mode, ChatGPT responses cannot include images. Deep Research, Agent Mode, Canvas code network approval, and file-download-for-analysis are all disabled. Manually uploaded files remain usable, but any capability that requires outbound network access is gone. This is not a heuristic that catches most exfiltration attempts; it is a hard kill switch on the attack surface.
The feature is available for ChatGPT Enterprise, Edu, Healthcare, and Teachers plans, with workspace admins enabling it through role-based controls and granular per-app overrides, per Marvin-42 Insights. Consumer availability is planned but not yet shipped as of late May 2026.
The DNS side-channel that underscored the threat
One week after Lockdown Mode shipped on February 13, 2026, Check Point researchers disclosed a DNS side-channel vulnerability in ChatGPT’s Code Execution sandbox. The attack vector: encoded subdomain lookups that silently exfiltrated data out of the sandbox via DNS queries, according to Xaltius Academy’s writeup of the finding. OpenAI patched it on February 20.
The sequence matters regardless. Lockdown Mode was built to close exactly this class of vulnerability: outbound network channels that a prompt-injected model can abuse without the user’s knowledge. Whether the Check Point disclosure directly motivated the feature or merely confirmed the threat model, the architecture addresses the same problem from the same direction.
Elevated Risk labels across ChatGPT, Atlas, and Codex
Elevated Risk labels are the companion feature: standardized indicators that appear in settings whenever network-related capabilities are enabled across ChatGPT, ChatGPT Atlas, and Codex, according to The Cyber Express. OpenAI has stated it will remove the labels as security mitigations improve.
These are informational signals, not enforcement mechanisms. They tell the user that the session has a broader attack surface than a locked-down one, but they do not restrict behavior on their own. The design borrows from the banking-app and crypto-wallet pattern: when you enable wire transfers or connect a new device, the app flags the elevated risk and asks for explicit confirmation, per Marvin-42 Insights. Apple’s Lockdown Mode on iOS follows the same philosophy: deterministic capability restrictions rather than probabilistic content filtering.
The label placement across three products (ChatGPT proper, the Atlas research tool, and the Codex coding agent) suggests OpenAI plans to make mode-aware UX a platform-wide convention, not a single-product toggle.
Why deterministic controls, and why now
OpenAI explicitly frames Lockdown Mode as an infrastructure-level control that “sidesteps” the model’s inability to reliably distinguish legitimate system prompts from malicious injected instructions, according to The Cyber Express. That framing is architecturally honest. Prompt injection works because LLMs process all tokens in the context window the same way; there is no reliable boundary between “instruction from the developer” and “instruction injected via a fetched webpage.” Every paper on the topic reaches the same conclusion, and no amount of fine-tuning has produced a model that can distinguish the two categories with the reliability required for security enforcement.
Lockdown Mode’s approach is to stop trying to solve that problem at the model layer and instead remove the exfiltration channel entirely. If the model cannot make outbound requests, it does not matter whether it was persuaded to try.
The two layers complement each other: Sensitive Chats tries to make the model respond better to dangerous content; Lockdown Mode tries to make it impossible for the model to leak data even when it responds wrong. One is probabilistic improvement; the other is a deterministic circuit breaker.
What competitors are doing
The brief does not include specific, verified information about equivalent safety-mode features from Google (Gemini), Anthropic (Claude), or other consumer AI vendors as of May 2026. What can be observed from the architecture: most consumer AI assistants currently treat every session identically from a capability standpoint. If web browsing is enabled, it is enabled in all conversations with no session-level kill switch. If code execution is available, the sandbox boundary is uniform.
OpenAI’s move creates a decision point for competitors. Matching the mode-aware UX requires shipping per-session capability toggles, admin-level controls for enterprise plans, and user-facing risk indicators. Not matching it means explaining why every session carries the same attack surface regardless of context. Neither position is comfortable, and the absence of a standard here means each vendor will design their own granularity and labeling scheme until the industry converges (or regulators impose one).
What is still missing
Consumer ChatGPT does not have Lockdown Mode. Enterprise, Edu, Healthcare, and Teachers plans have it; everyone else is waiting on a rollout date that OpenAI has described only as “coming months,” per Marvin-42 Insights. The users most likely to encounter prompt injection through casual browsing of untrusted content are the ones without the toggle.
The Elevated Risk labels are also a first pass. OpenAI said it will remove them as mitigations improve, which means the labels encode a known gap in the current security posture. When the gap closes, the label disappears. Until then, the user is being asked to notice and care about a setting-indicator that most people will ignore after the first time they see it, which is the same problem banking apps have with their own risk banners.
The remaining attack surface is also worth stating plainly. Lockdown Mode closes the network exfiltration channel. It does not address prompt injection that exfiltrates data through the model’s text output in the same conversation, through side effects in code execution that stay within the sandbox, or through any channel that does not require an outbound network request. The DNS side-channel was one path; others exist, and Lockdown Mode is a targeted defense against a specific class of them.
OpenAI has built a credible infrastructure-level control and paired it with transparent risk labeling. The architecture concedes, by its very design, that model-level filtering is insufficient for high-stakes sessions. That is the correct engineering call. The question is whether consumers will get the same option before the next exfiltration path is demonstrated, and whether the industry will treat mode-aware safety as a baseline or a differentiator.
Frequently Asked Questions
How does the Sensitive Chats safety model differ from standard content filtering?
The May 2026 Sensitive Chats update uses a specialized safety model that generates cross-conversation summaries—carrying safety context between sessions rather than evaluating each conversation in isolation. This structural change, not just better fine-tuning, is what produced the 50%+ improvement in safe-response rates on GPT-5.5 Instant for self-harm and harm-to-others categories.
Can admins restrict individual actions instead of toggling Lockdown Mode per app?
Yes. The admin controls support per-action granularity in addition to per-app toggles, meaning administrators can disable specific network-dependent capabilities (e.g., file-download-for-analysis) while leaving others active within the same app. This is finer-grained than the all-or-nothing kill switch the feature name implies.
What happens to user awareness when OpenAI removes Elevated Risk labels?
Retiring the labels eliminates the only in-product signal that a session carries heightened exfiltration risk. If a new attack surface emerges after removal, users will have no visible indicator that their session is vulnerable—the same false-confidence pattern banking apps create when they stop showing risk banners after a user dismisses them once.
Does Lockdown Mode extend to ChatGPT Atlas and Codex, or just the core chat product?
Elevated Risk labels are confirmed across ChatGPT, Atlas, and Codex, but public sources do not specify whether Lockdown Mode’s network kill switch applies to Atlas and Codex. Atlas relies on live web access for research synthesis and Codex needs network connectivity for code execution, so Lockdown Mode would be significantly harder to adopt in those products without per-action overrides that selectively restore specific capabilities.