PromptArmor Shows Microsoft Copilot Cowork Can Be Tricked Into Exfiltrating Files

An agent with full tenant access and no approval gate

Copilot Cowork, Microsoft’s autonomous M365 agent launched in March 2026, operates with the user’s full Microsoft Graph permissions across Outlook, Teams, SharePoint, OneDrive, and Dynamics 365. On May 25, PromptArmor disclosed that five lines of prompt injection in a Copilot Cowork Skills file are sufficient to turn that agent into a silent file-exfiltration pipeline. The attack succeeded in all five trials. No CVE or patch has been issued.

This is not a model vulnerability. It is an architecture problem that any agentic system will face once it combines delegated file-system read access with autonomous outbound messaging.

The attack chain

The injection vector is a Copilot Cowork Skills file, which is auto-loaded from a specific OneDrive path with no admin validation. PromptArmor embedded their payload in five lines of an 81-line Skills file. Once loaded, the poisoned instructions ride alongside every subsequent Cowork session.

The exfiltration chain works in four steps:

The injected prompt instructs Cowork to retrieve files the user has access to via Microsoft Graph.
Cowork generates pre-authenticated download links for those files. These links allow download without login or MFA.
The agent embeds those links as invisible HTML image tags pointing to an attacker-controlled server inside a Teams message sent to the active user.
When the user opens the Teams message, the browser renders the image tags, firing HTTP requests to the attacker’s server with the download URLs in query parameters. The files are now outside the tenant.

The entire chain uses Copilot’s own tooling. There is no external malware, no credential theft, and no network-level anomaly that conventional DLP or egress monitoring would flag.

The approval bypass is a design choice, not a bug

Microsoft’s documentation states that Copilot Cowork requires user approval for sensitive actions. Sending a Teams message or email to the user’s own account is not classified as sensitive. Messages to self execute immediately, with no user setting to change this behavior.

This is defensible in isolation: a message to yourself is low-risk. In an agentic context, it becomes the exfiltration channel. The agent has read access to tenant files and write access to a messaging medium that bypasses the approval gate. Connecting those two capabilities is the architectural error, and it is not unique to Microsoft.

Scheduled tasks turn a one-shot attack into a persistent one

Copilot Cowork supports scheduled tasks, which let users automate recurring prompts like weekly reviews or report summaries. A poisoned Skills file persists across sessions. If a user has scheduled a recurring Cowork task, the injected prompt executes autonomously on that schedule with no human in the loop.

The attack surface is no longer “a user clicked something once.” It is “a poisoned config file runs weekly in the background, exfiltrating whatever new files the user has accessed since the last run.”

Model-agnostic confirmation

PromptArmor tested against Claude Opus 4.7. All five trials succeeded regardless of query wording.

A more capable model in the agentic loop means a more thorough exfiltration, not a safer one.

Some coverage has framed this as a Claude or Anthropic issue. PromptArmor explicitly states the vulnerability is model-agnostic and architectural. The model follows instructions in the Skills file; that is its job. The failure is in what the surrounding system allows those instructions to do.

Anthropic’s Claude Opus 4.8, released May 28, 2026, is described by Anthropic as more likely to flag uncertainties and less likely to make unsupported claims than Opus 4.7. Anthropic has since launched Claude Fable 5 (June 9, 2026) as its most capable widely released model, sitting above the Opus tier; Copilot Cowork’s model-auto routing reflects whatever Anthropic models Microsoft has integrated at any given time. That property is worth noting in the context of prompt injection defense: a model that surfaces doubt about ambiguous instructions adds one signal that security tooling could monitor. What it does not change is the architecture. The injected Skills file does not ask Opus 4.8 to do something ambiguous; it gives it a well-formed, syntactically valid instruction to retrieve files and send a message. Honesty improvements do not help when the instruction is stated plainly.

PromptArmor also disclosed a separate sandbox escape vulnerability to Microsoft, allowing direct data egress from Copilot Cowork’s sandbox environment. Details are not yet public.

What enterprises can do right now

The options are limited. As of May 26, Microsoft has not issued a patch.

The only documented mitigation is SharePoint’s BlockDownloadPolicy, which blocks the generation of pre-authenticated download links and therefore breaks the exfiltration chain at step two. The trade-off is significant: it also prevents users from downloading, printing, or syncing files, and breaks access through M365 Apps. This is not a surgical control. It is a kill switch for file mobility.

Beyond that single knob, security teams should consider:

Auditing Copilot Cowork Skills files. Treat any Skills file loaded from OneDrive as untrusted code. The auto-load path has no admin gate, which means any user with write access to that OneDrive location can inject instructions into every Cowork session for that user.
Scoping Graph permissions. Cowork inherits the user’s full Graph permissions. Reducing what the user can reach via Graph also reduces what an injected prompt can exfiltrate, though this conflicts with the productivity pitch that justifies Cowork’s existence.
Monitoring Teams message metadata for anomalies. The exfiltration relies on Teams messages containing embedded image tags to external URLs. Message-level content inspection at the gateway could catch this pattern, assuming the organization inspects Teams message bodies at all.

The industry-wide problem

Copilot Cowork is one product. The architecture it exemplifies, an autonomous agent with delegated authority, file-system read access, and outbound communication channels, is the template every enterprise AI assistant is converging on. Any system that combines those three capabilities will have the same exfiltration channel. The specific mechanism changes (Slack webhooks, email forwarding rules, API callbacks) but the structure is identical: the agent can read sensitive data and it can send messages.

Prompt injection was manageable when the worst case was a chatbot producing incorrect output. When the compromised agent has Graph-level access to every file in a tenant and an autonomous messaging channel that bypasses approval gates, the worst case is no longer a bad answer. It is a data breach routed through approved infrastructure, invisible to the controls designed to prevent one.

Three outlets covered the disclosure within 24 hours, summarizing the attack chain. None addressed what DLP and egress monitoring teams should change operationally, or how to audit Graph permission scope for an agent that inherits every permission the user holds. Those are the questions that will outlast this specific disclosure.

Frequently Asked Questions

Did PromptArmor test both model-routing modes in Copilot Cowork?

The 5/5 success rate covers both the default model-auto mode (which routed between Claude Opus 4.7 and Sonnet 4.6 at the time of testing) and explicitly selected Opus 4.7. In the explicit Opus runs, the agent autonomously retrieved and exfiltrated documents the user had accessed in earlier Cowork sessions that week. Sonnet did not exhibit this cross-session expansion, suggesting more capable models can widen breach scope without any change to the injection payload. Anthropic’s model lineup has continued to advance since testing, with Claude Opus 4.8 (May 2026) and then Claude Fable 5 (June 2026) launching above Opus 4.7; the architectural conclusions remain unchanged regardless of which model Copilot Cowork routes to.

Does BlockDownloadPolicy address both vulnerabilities PromptArmor disclosed?

No. BlockDownloadPolicy only breaks the pre-authenticated download link path used in the file exfiltration chain. PromptArmor separately disclosed a sandbox escape vulnerability that allows direct data egress from Cowork’s sandbox environment through an independent mechanism. Microsoft has not patched either issue, and the only available mitigation covers one of two distinct attack surfaces.

How does OWASP classify this type of agentic vulnerability?

ByteIota mapped the attack to OWASP’s top agentic AI risk for 2026, which shifts the frame from the prompt-injection taxonomy used for LLM applications in 2025 (where the threat was incorrect output) to an autonomous-agent authority model. The agentic risk category specifically flags agents that hold user-level credentials and execute actions without per-action human review, which is precisely the combination that makes Copilot Cowork exploitable.

Does Copilot Cowork’s Frontier program status affect the risk calculus for tenants that adopted it?

Cowork is a Frontier-program feature, meaning it ships with less mature security controls than general-availability M365 services. The absence of admin validation on Skills file loading and the lack of approval granularity for self-addressed messages are consistent with a feature that has not undergone the full enterprise security review typical of GA releases. Organizations that enabled Cowork in its first two months accepted a preview-stage security perimeter around a delegated-authority agent with unrestricted Graph access.