On April 21, 2026, GitHub published GHSA-6w67-hwm5-92mq: CVSS 7.5, an SSRF in LMDeploy’s vision-language endpoint that lets any unauthenticated caller hand the inference server a URL and watch it fetch. No public proof-of-concept existed. Twelve hours and thirty-one minutes later, Sysdig recorded the first exploitation from a Hong Kong IP scanning AWS metadata endpoints and localhost.
The 12-Hour Window: From Advisory to Exploit
The GHSA described the vulnerable parameter (image_url in vision-language chat completions), the affected version range (≤0.12.2), and the SSRF class. That’s enough to write an exploit without a PoC, which is what happened.
Sysdig’s analysis places the first hit at 03
UTC on April 22 from 103.116.72.119 (Hong Kong, AS400618). No independent PoC had circulated by that point. The advisory text was the PoC.How the SSRF Works: Vision URLs as HTTP Primitives
LMDeploy’s vision-language endpoint accepts an image_url parameter in the chat completions request body. The server fetches that URL server-side before passing the image to the model. No auth check gates the fetch; the URL is treated as a well-intentioned pointer to an image.
That fetch is an outbound HTTP request the caller controls entirely. Point it at http://169.254.169.254/latest/meta-data/iam/security-credentials/ and the server returns AWS instance metadata. Point it at http://127.0.0.1:6379 and Redis answers if it’s running without auth. The inference server becomes an HTTP proxy, running with whatever IAM role the EC2 instance carries.
The design assumption that broke here is that image URLs are benign. Once a server fetches attacker-controlled URLs server-side, the distinction between “image fetcher” and “HTTP proxy” collapses.
What Sysdig Saw: IMDS, Redis, and Localhost Sweeps
Sysdig’s telemetry shows the attacker probing four targets: the AWS IMDS at 169.254.169.254, Redis at 127.0.0.1
, localhost ports 8080 and 3306, and an out-of-band callback to requestrepo.com. The requestrepo callback is a standard SSRF confirmation technique: if the callback arrives, the SSRF is live and the attacker proceeds. The IMDS and Redis probes are the monetization layer.A credential retrieved from the IMDS carries the same IAM scope as the instance role. In many inference deployments that scope extends well beyond what model serving actually requires, making IMDS credential theft a pivot to broader infrastructure access.
The Patch: Blocking Non-Global IPs (and the DNS Rebinding Gap)
LMDeploy v0.12.3 added _is_safe_url() in lmdeploy/vl/media/connection.py. The fix resolves the hostname, checks whether the resulting IP falls into any private, loopback, link-local, or multicast range, and rejects the request if so. Redirects are capped as well.
That covers the direct case. A Copilot review on PR #4447 flagged a residual gap: the safety check and the actual HTTP fetch resolve the hostname in separate operations, creating a TOCTOU window. An attacker controlling a DNS record can return a globally routable IP on the first resolution (passing the check) and a private IP on the second (used by the fetch). DNS rebinding against SSRF mitigations is a known technique; the standard fix is to pin the resolved IP at check time and reuse it for the fetch rather than resolving twice.
What Operators Should Do: Egress, IMDSv2, and Network Segmentation
Upgrading to v0.12.3 is the floor, not the ceiling. Three additional controls narrow the remaining surface:
IMDSv2 enforcement. AWS IMDSv2 requires a PUT request with a session token before serving credentials; a simple GET to 169.254.169.254 returns nothing. Enforcing IMDSv2 on every instance running LMDeploy removes the IMDS credential theft vector even when SSRF remains exploitable.
Egress policy. The inference server should not be able to reach 169.254.169.254, RFC 1918 space, or loopback from the application layer. A host firewall rule or a Kubernetes NetworkPolicy blocking those destinations stops the exploit at the network regardless of whether the application-level check holds.
Internal service authentication. If Redis or other internal services run alongside the inference server, they need auth. An unauthenticated Redis reachable via SSRF is a second-order persistence path regardless of whether IMDS credentials were captured first.
The Bigger Picture: Inference Servers as Unintentional Proxies
URL-fetch functions have been SSRF vectors for as long as SSRF has been named. The deployment context is what’s different now: inference servers run on cloud instances with broad IAM roles, accept requests from users or downstream services without network-level egress controls, and are added to infrastructure by teams whose primary expertise is model evaluation.
Vision-language endpoints are the sharp edge because the image-fetch requirement is legitimate. A vision model that can’t retrieve a URL is half the product. The fetch cannot be removed; it has to be hardened. The gap isn’t an unusual mistake by LMDeploy’s developers. Inference servers are routinely deployed as API-auth problems when they are also network-perimeter problems.
The advisory text alone was sufficient to build an exploit, as Sysdig’s timeline documents. The SSRF class is known; the parameter name was in the GHSA; the rest is curl.
Frequently Asked Questions
Why did EPSS score this CVE at 0.04% when exploitation occurred within hours?
EPSS is trained on historical exploitation patterns and systematically underestimates CVE classes not yet well-represented in its dataset. AI inference CVEs with SSRF primitives pointing at cloud-standard targets like IMDS and Redis represent a category where existing attacker tooling and advisory-text explicitness compress exploitation timelines below what base rates predict. The 0.04% score should be treated as a floor for any exposed LMDeploy instance, not a realistic probability.
Does the SSRF threat model change for LMDeploy running on GCP or Azure instead of AWS EC2?
The v0.12.3 IP-range block covers 169.254.169.254 on all cloud platforms, so both GCP and Azure metadata services are blocked by the same rule as AWS IMDS. The practical difference is that GCP requires a Metadata-Flavor: Google header and Azure requires a Metadata: true header to return credentials, meaning a generic SSRF probe without those headers gets an error rather than credentials — a marginally higher bar than AWS IMDS, which serves instance metadata via plain GET.
Is 12.5 hours the fastest weaponization on record for an AI inference CVE?
Sysdig documented CVE-2026-33626 as the fastest-known weaponization of a 2026 AI inference CVE. Prior SSRF CVEs in AI tooling had exploitation timelines measured in days; the combination of an explicit vulnerable parameter name in the GHSA text and a threat actor already equipped for cloud SSRF pivot chains collapsed that to within the same calendar day as the advisory publication.
Should teams apply egress firewall rules or enforce IMDSv2 first when triaging this CVE?
A single host-firewall or NetworkPolicy rule blocking link-local, loopback, and RFC 1918 egress from the inference process is architecturally more comprehensive than IMDSv2 enforcement alone — it closes IMDS access, Redis sweeps, and localhost port probes in one operation and remains effective regardless of the DNS rebinding gap left open in v0.12.3. The IMDSv2-first recommendation is a speed argument for teams working manually in the AWS Console; teams with infrastructure-as-code firewall automation should apply egress rules first.