groundy
security

OpenAI Adds Lockdown Mode to ChatGPT, Shifting Prompt-Injection Risk to Users

OpenAI's Lockdown Mode disables agentic features builders rely on rather than fixing prompt injection at runtime, forcing a binary choice between security and capability.

7 min · · · 4 sources ↓

OpenAI’s new Lockdown Mode for ChatGPT disables web browsing, Deep Research, Agent Mode, Canvas network access, and file downloads behind a deterministic system-level wall. Shipped February 2026 for enterprise and education plans, the feature reads as a concession: prompt injection cannot be reliably caught at runtime, so OpenAI is offering to turn off the capabilities that create the exposure. That decision shifts risk onto every developer composing agentic workflows on top of ChatGPT.

What Lockdown Mode Disables

Introduced February 13, 2026, Lockdown Mode restricts ChatGPT’s web browsing to cached content and disables Deep Research, Agent Mode, Canvas network access, and file downloads used for data analysis. The mode is not a blanket kill switch for ChatGPT itself; the base conversational model keeps running. What stops is everything that reaches outside the model’s context to fetch, execute, or export.

Workspace administrators enable Lockdown Mode by creating a custom role in Workspace Settings and selecting which apps and actions within those apps remain available. OpenAI has integrated the mode with its Compliance API Logs Platform to give administrators visibility into which apps users access and what connected sources are queried.

Availability is limited. As of June 2026, Lockdown Mode is open only to ChatGPT Enterprise, Edu, Healthcare, and Teachers plans, per NERDS.xyz. Consumer and Team rollout is described as planned for “the coming months” with no published timeline.

Deterministic Constraints, Not Runtime Filters

Lockdown Mode’s restrictions are deterministic system-level constraints rather than AI-based filters subject to bypass. NextPJ describes this explicitly as “not a model-level fix that might be bypassed; a hard system-level restriction.” OpenAI frames the tradeoff plainly: the mode shrinks ChatGPT’s “attack surface, even if that means limiting useful features,” per Winbuzzer.

That framing is the interesting part. OpenAI is not claiming improved detection. It is not shipping a smarter filter. It is offering to remove the attack vector entirely, at the cost of removing the feature. For a company whose public posture on safety has historically emphasized model-level alignment and guardrails, this reads as a concession that the runtime detection layer has a ceiling.

Lockdown Mode has been compared to Apple’s Lockdown Mode for iPhones: extreme protection for people who genuinely need it, activated voluntarily, with the understanding that it restricts functionality. OpenAI has not cited Apple as a design reference, but the pattern is recognizable. When a vendor cannot harden the default experience enough to protect the highest-risk users, it offers an opt-in restrictive mode and shifts the security decision to the user. The Elevated Risk labels reinforce this reading: they are a disclosure mechanism, not a defense mechanism. Together, the two features say OpenAI can tell you where the risk is and can turn it off, but cannot yet stop it from being exploitable while keeping it turned on.

Elevated Risk Labels: Transparency or Liability Shift?

Alongside Lockdown Mode, OpenAI introduced Elevated Risk labels across ChatGPT, ChatGPT Atlas, and Codex. These labels flag capabilities that introduce additional security risk: a transparency measure, not a disabling one. OpenAI has stated that labels will be removed once safeguards improve enough for general use, according to NERDS.xyz.

The labels serve two audiences. For end users and administrators, they surface which capabilities carry elevated risk, enabling more informed deployment decisions. For OpenAI, they document that the company disclosed the risk, which has a liability-shaping effect. A label that says “this feature carries additional security risk” is structurally similar to the warnings on consumer electronics: it informs the buyer and protects the manufacturer.

The practical consequence for developers is that every agentic capability now ships with a warning label. If you build a workflow that chains Codex, Atlas browsing, and Agent Mode, you are stacking three Elevated Risk features. The labels do not prevent you from doing this. They ensure you were told.

Who Bears the Cost

ChatGPT reached 900 million weekly active users by February 2026 and, as of June 2026, runs on GPT-5.5. The platform’s agentic feature set has expanded steadily: Operator (January 2025), Codex (May 2025), the ChatGPT agent (July 2025), and ChatGPT Atlas browser (October 2025). Each addition widens the attack surface that Lockdown Mode is designed to constrain.

For builders composing agentic workflows on top of ChatGPT, Lockdown Mode creates a direct tension. The features most valuable for automation, including browsing, code execution, file handling, and autonomous agent loops, are exactly what Lockdown Mode disables. An administrator who enables Lockdown Mode secures the deployment by gutting the capability. An administrator who leaves it off retains the capability and accepts the risk.

This is not a theoretical concern. Any production workflow that relies on ChatGPT’s agent to browse the web, execute code, or process uploaded files inherits prompt-injection exposure proportional to the privilege granted to the agent. Lockdown Mode offers a binary response: full capability with elevated risk, or restricted capability with reduced risk. There is no middle position where the model-level defense catches injection while preserving all features. OpenAI’s own architecture appears not to support that middle ground as of mid-2026.

The CISA Incident and Kill-Chain Framing

In August 2025, CISA acting director Madhu Gottumukkala accidentally uploaded sensitive government information marked “for official use only” to a public version of ChatGPT, triggering multiple internal cybersecurity warnings, per Winbuzzer. The incident illustrates the category of risk Lockdown Mode addresses: not adversarial exploitation alone, but inadvertent data exposure amplified by the model’s ability to ingest and process arbitrary input.

Bruce Schneier has described prompt-injection attacks as “the first step of a kill chain,” not isolated incidents. That framing treats prompt injection as the entry point for a multi-stage exploitation sequence: inject a prompt, exfiltrate data, pivot to connected systems. It is the framing OpenAI uses to justify Lockdown Mode’s severity.

What Lockdown Mode Signals About Prompt-Injection Defense in Mid-2026

For anyone building on ChatGPT’s agentic APIs, the implication is direct. As of mid-2026, the model will not defend itself against injection at runtime. Your choices are to accept the exposure, restrict the capability, or build your own detection layer on top of OpenAI’s output. Those are the available positions.

Frequently Asked Questions

Can ChatGPT Team plan administrators enable Lockdown Mode today?

No. As of June 2026, only Enterprise, Edu, Healthcare, and Teachers plans have access. Team plan admins managing multi-user deployments with agentic workflows have no toggle available and no published timeline beyond “coming months.” Those teams must either accept the full attack surface or restrict ChatGPT usage through external network policies, such as blocking outbound connections from ChatGPT sessions to internal resources.

How does Lockdown Mode differ from OpenAI’s earlier prompt-injection mitigations?

Prior OpenAI safeguards relied on model-level filters and instruction-hierarchy techniques that tried to detect and reject injected prompts at runtime. Lockdown Mode abandons that approach: instead of distinguishing benign from malicious prompts, it removes the capabilities that injection exploits (browsing, code execution, file handling). The shift is from detection to containment, which is why NextPJ describes it as a “hard system-level restriction” rather than a filter that might fail.

Does Lockdown Mode protect against prompt injection within the base conversational model?

No. The base model continues running with its full instruction-following behavior. An attacker who manipulates output within the model’s existing context window, without needing web access, file uploads, or agent loops, faces no additional barrier under Lockdown Mode. The mode constrains the model’s reach into external systems, not its susceptibility to adversarial instruction within a conversation.

What audit capability does the Compliance API Logs Platform integration add?

Administrators can review which apps users accessed, which connected data sources were queried, and when those interactions occurred, all within OpenAI’s existing compliance logging infrastructure. This is post-hoc auditing: it reconstructs what happened after the fact rather than flagging whether a specific interaction resulted from injection. For regulated industries, it provides an evidence trail for incident review, not real-time threat detection.

What would make Lockdown Mode unnecessary?

A reliable runtime detector that distinguishes injected instructions from legitimate user requests without disabling features. OpenAI has committed to removing Elevated Risk labels once safeguards improve, but has published no quantitative threshold for what “improve enough” means. There is no measurable milestone that would trigger Lockdown Mode’s deprecation, and given the steady cadence of agentic product launches (Operator, Codex, Atlas, each expanding the attack surface), the feature’s scope may grow before it shrinks.

sources · 4 cited

  1. OpenAI Launches ChatGPT Lockdown Mode for High-Risk Users analysis accessed 2026-06-05
  2. OpenAI introduces Lockdown Mode and Elevated Risk labels in ChatGPT to counter prompt injection threats analysis accessed 2026-06-05
  3. ChatGPT Just Got a Lockdown Mode: Here's What It Does and Who Needs It community accessed 2026-06-05
  4. ChatGPT primary accessed 2026-06-05