Table of Contents

On April 20, 2026, CISA-ADP scored CVE-2026-5760 CVSS 9.8 CRITICAL, disclosing that SGLang’s /v1/rerank endpoint executes arbitrary Python/) when it renders a model’s tokenizer.chat_template through an unsandboxed jinja2.Environment() (NVD CVE-2026-5760 Detail). A poisoned GGUF file from any source can embed a Jinja2 SSTI payload in that metadata field, which turns a routine model download into remote code execution on the inference host. That means self-hosted teams can no longer treat Hugging Face pulls as weights-only operations.

The April 20 Disclosure: What CVE-2026-5760 Actually Does

On April 20, 2026, CISA-ADP published CVE-2026-5760 with a CVSS 9.8 CRITICAL score (NVD CVE-2026-5760 Detail). The vulnerability affects SGLang’s /v1/rerank endpoint and stems from a specific implementation choice: the code in python/sglang/srt/entrypoints/openai/serving_rerank.py instantiates a plain jinja2.Environment() rather than jinja2.sandbox.ImmutableSandboxedEnvironment when rendering a model’s tokenizer.chat_template (SGLang serving_rerank.py source (main branch)).

Because the environment is unsandboxed, any Jinja2 template expressions in that metadata are evaluated with full Python access. Three days after disclosure, the SGLang main branch still contains the exact vulnerable code, and the project’s security advisory page lists zero published advisories (SGLang Security Advisories (showing none published)). CERT/CC, which coordinated the disclosure, states explicitly that “no response or patch was obtained during the coordination process” and that it has not received a statement from the vendor (CERT/CC Vulnerability Note VU#915947).

How the Attack Chain Works: From Poisoned GGUF to RCE

An attacker embeds a Jinja2 server-side template injection (SSTI) payload inside a GGUF model’s tokenizer.chat_template metadata (CERT/CC Vulnerability Note VU#915947). GGUF files carry more than quantized weights; they include tokenizer configuration, chat templates, and other metadata that inference runtimes parse during model load.

When a self-hosted SGLang instance loads that model and the /v1/rerank endpoint processes a request, the runtime passes the poisoned template through the unsandboxed jinja2.Environment() (SGLang serving_rerank.py source (main branch)). The SSTI payload executes arbitrary Python on the inference host at load time. This is not prompt injection; the malicious code runs during model deserialization, before any user query is processed.

Why Hugging Face’s Scanners Miss It: The Trust Boundary Gap

Hugging Face runs ClamAV malware scans and pickle import checks on every uploaded file (Hugging Face Hub Pickle Scanning Documentation). Those scanners are designed to catch known malware signatures and unsafe Python deserialization, not to inspect GGUF metadata or embedded Jinja2 templates for SSTI payloads (Hugging Face Hub Pickle Scanning Documentation).

The gap matters because most self-hosted inference pipelines treat a model hub download as a passive data transfer. Teams verify hashes for integrity, but rarely audit the internal metadata fields that the runtime will execute. CVE-2026-5760 forces those teams to move provenance verification upstream: every downloaded model file must be treated as untrusted code, not as inert weights.

The Repeating Pattern: llama-cpp-python’s CVE-2024-34359 and What SGLang Ignored

This vulnerability class is not new. In 2024, llama-cpp-python disclosed CVE-2024-34359, scored CVSS 9.7, for the exact same flaw: rendering tokenizer.chat_template through an unsandboxed jinja2.Environment() (llama-cpp-python GHSA-56xg-wfcc-g829 (CVE-2024-34359)). The project patched it in v0.2.72 by replacing the unsandboxed environment with ImmutableSandboxedEnvironment (llama-cpp-python GHSA-56xg-wfcc-g829 (CVE-2024-34359)), a change visible in commit b454f40a (llama-cpp-python commit b454f40a (ImmutableSandboxedEnvironment fix)).

The fix pattern was established in 2024. SGLang’s current main branch reproduces the identical mistake, despite the prior art and despite the disclosure coordination process that CERT/CC ran for CVE-2026-5760 (CERT/CC Vulnerability Note VU#915947).

What Teams Should Do Now: Mitigations and Deploy-Path Checks

Until a patch ships, teams running SGLang should audit every GGUF file before it reaches the inference host. That means inspecting tokenizer.chat_template for Jinja2 expressions that call into Python internals, and treating any template that uses __import__, os, subprocess, or similar constructs as malicious regardless of the model’s origin.

For maintainers, the remediation is the same one llama-cpp-python applied in 2024: swap jinja2.Environment() for jinja2.sandbox.ImmutableSandboxedEnvironment anywhere tokenizer templates are rendered during model loading (llama-cpp-python commit b454f40a (ImmutableSandboxedEnvironment fix)). The root cause is systemic; while the disclosed trigger is the /v1/rerank endpoint, any code path that renders model-supplied templates without sandboxing carries the same risk.

What Still Needs Fixing: Vendor Response and Scanner Coverage

As of April 23, 2026, SGLang has not published a security advisory, shipped a patch, or publicly acknowledged the vulnerability (CERT/CC Vulnerability Note VU#915947, SGLang Security Advisories (showing none published)). The PoC author references a related vLLM CVE-2025-61620, though NVD did not list that entry as of the research date, so independent verification is not yet available.

Model hub security scanning also remains incomplete. Hugging Face’s current pipeline does not inspect GGUF metadata for executable Jinja2 (Hugging Face Hub Pickle Scanning Documentation), which leaves a window open for poisoned uploads that pass every existing check. Closing that window will require either hub-side template static analysis or runtime sandboxing mandates that inference frameworks adopt as a default.

Frequently Asked Questions

Does this vulnerability affect SGLang endpoints other than /v1/rerank?

While CVE-2026-5760 was disclosed for the /v1/rerank endpoint, the root cause is unsandboxed Jinja2 template rendering during model loading. Any code path that renders model-supplied tokenizer.chat_template without sandboxing carries the same risk.

How is this different from prompt injection?

Prompt injection manipulates user-facing inputs to alter model behavior. CVE-2026-5760 is server-side template injection: the Jinja2 payload executes during model deserialization on the inference host, before any user query is processed. Firewalls inspecting user inputs will not detect it.

What should teams do to protect themselves until a patch is available?

Teams should audit every GGUF file before loading it, inspecting tokenizer.chat_template for Jinja2 expressions that call Python internals like import, os, or subprocess. Any template using such constructs should be treated as malicious regardless of the model’s origin.

Will Hugging Face’s security scanners catch a poisoned GGUF file?

No. Hugging Face’s current scanners run ClamAV and check for unsafe pickle imports, but they do not inspect GGUF metadata or embedded Jinja2 templates for SSTI payloads. A clean scan does not guarantee a GGUF file is safe to load.

Sources

  1. NVD CVE-2026-5760 Detailprimaryaccessed 2026-04-23
  2. CERT/CC Vulnerability Note VU#915947primaryaccessed 2026-04-23
  3. SGLang serving_rerank.py source (main branch)primaryaccessed 2026-04-23
  4. SGLang Security Advisories (showing none published)primaryaccessed 2026-04-23
  5. Hugging Face Hub Pickle Scanning Documentationvendoraccessed 2026-04-23
  6. llama-cpp-python GHSA-56xg-wfcc-g829 (CVE-2024-34359)vendoraccessed 2026-04-23
  7. llama-cpp-python commit b454f40a (ImmutableSandboxedEnvironment fix)primaryaccessed 2026-04-23

Enjoyed this article?

Stay updated with our latest insights on AI and technology.