groundy
infrastructure & runtime

Pod-Level Remote Attestation in Kubernetes: Confidential Workloads on dstack

dstack-capsule binds pod identity into Intel TDX hardware quotes, enabling multi-pod confidential VMs without the per-VM density tax of Confidential Containers.

7 min · · · 4 sources ↓

Confidential computing on Kubernetes has a granularity problem. Current Confidential Containers (CoCo) deployments attest the virtual machine, not the workload inside it, and they incur prohibitive per-VM resource overhead for the privilege. A preprint from researchers at OPPO and Phala proposes a different trade: share a single Intel TDX confidential VM across multiple pods, but bind each pod’s identity directly into hardware-signed attestation quotes.

The Granularity Problem

As of 2026, Confidential Containers, the CNCF-incubating project built on Kata Containers, works by launching each pod inside its own microVM with confidential-computing flags set. The VM’s measured boot chain proves the guest OS booted correctly. What it does not prove is which container image is running, which pod spec was admitted, or what workload identity the pod claims. The attestation boundary stops at the hypervisor-facing VM metadata, which is exactly the boundary a compromised kubelet or scheduler can tamper with.

On the resource side, the prohibitive per-VM resource overhead of each Kata microVM makes density the binding constraint for multi-tenant clusters running confidential LLM inference or regulated data processing.

As of 2026, managed cloud offerings from Azure, GKE, and AWS have stable interfaces for confidential containers, but they inherit the same one-pod-per-VM model. Standard runc containers remain unprotected even when SEV-SNP or TDX hardware is available on the node.

Two-Layer Attestation in dstack-capsule

The dstack-capsule paper splits attestation into two layers that run on a single shared TDX confidential VM.

Static layer. The platform’s measured boot measurements are frozen into RTMR[3], one of the TDX runtime measurement registers. This covers the guest kernel, the initrd, and the dm-verity-protected OS image. Once frozen, these values cannot change for the lifetime of the VM.

Dynamic layer. On every attestation request, each pod’s identity is packed into the TDX Quote’s 64-byte report_data field, signed by the CPU. The field contains the pod_uid, a pod_spec_hash, and a workload_id. A relying party can verify not just that a confidential VM is running, but that a specific pod with a specific spec digest is running inside it.

The key difference from CoCo’s approach: because TDX quotes are per-request and per-pod, multiple pods can coexist in the same VM without diluting the attestation signal. Each pod gets its own hardware-backed proof of identity, and a co-resident pod cannot forge that proof because it cannot write to another pod’s report_data field.

The Privilege Fuse

dstack-capsule introduces a mechanism the authors call the privilege fuse: an irreversible, atomic state transition that moves a Kubernetes node from a privileged setup phase to a locked-down runtime phase. The implementation uses a compare-and-swap operation plus a persistent marker file.

Before the fuse is blown, privileged pods (those requesting hostNetwork, hostPID, hostIPC, hostPath, or the privileged security context flag) can be scheduled. After the fuse is blown, the admission controller rejects all such pods and RTMR[3] is frozen. There is no unfuse operation.

This is a design pattern worth naming because it solves a real operational problem. Confidential VMs need a setup phase where the node joins the cluster, pulls images, and configures networking. Without a fuse-like mechanism, that setup phase is a permanent privilege window. With it, the window closes once and cannot be reopened without reprovisioning the entire VM.

The operational cost is real: a misconfigured fuse locks you out of node maintenance without reprovisioning. Platform teams running dstack-capsule would need to treat node lifecycle as immutable after fuse-blow, similar to how Flatcar Container Linux handles updates via partition swaps rather than in-place mutation.

The Multi-Layer Sandbox

dstack-capsule does not rely on attestation alone. The implementation, built on Kubernetes 1.32 with Intel TDX and Sysbox, layers several containment mechanisms:

The choice of Sysbox over Kata is what enables the shared-VM model. Sysbox provides user-namespace-based isolation at container granularity without launching a separate VM per pod, which is why the memory footprint stays closer to a standard Kubernetes deployment than to CoCo’s dedicated VM per pod.

CoCo vs. dstack-capsule

DimensionConfidential Containers (CoCo)dstack-capsule
Isolation unitOne pod per Kata microVMMultiple pods per TDX VM with Sysbox
Memory per podDedicated VM per pod with prohibitive overheadShared VM pool, no per-pod VM overhead
Attestation targetGuest OS boot measurementsPod UID + spec hash + workload ID in TDX Quote
Attestation granularityVM-levelPod-level
MaturityCNCF incubating; Azure, GKE, AWS offeringsarXiv preprint, research prototype
HardwareAMD SEV-SNP, Intel TDXIntel TDX only

The density advantage is straightforward: dstack-capsule’s shared-VM model should accommodate more pods per node than CoCo because each pod is a Sysbox container, not a full VM. The exact density multiplier depends on workload memory requirements, which the preprint does not benchmark in production-scale deployments.

What This Means for Multi-Tenant Clusters

The second-order effect of pod-level attestation lands on secrets injection and compromised-node containment.

In current CoCo deployments, secrets are typically delivered to the VM during boot, before the workload starts. The attestation verifies the VM booted correctly, then the secrets are released. But if attestation only covers the VM and not the pod, a compromised kubelet could schedule a different pod on the same node and the VM-level attestation would still pass. The secrets would be available to the wrong workload.

dstack-capsule’s model binds the pod spec hash into the CPU-signed quote. A relying party (a KMS running in an independent TEE) can verify that the exact pod spec it approved is the one requesting the secret. A co-resident pod with a different spec hash cannot impersonate the attested pod because it cannot produce a valid TDX quote with the target’s pod_uid and pod_spec_hash.

The threat model assumes simultaneous collusion between the cloud platform operator (who controls the hypervisor, host OS, and Kubernetes control plane) and the pod developer (who may try to extract user data or escalate privileges). Trust is placed in the Intel TDX hardware and microcode, the dm-verity-protected OS image, and the independently-attested KMS. If any of those layers breaks, the model breaks with it.

For platform teams evaluating confidential Kubernetes, the question is not whether dstack-capsule is production-ready today (it is not). The question is whether pod-level attestation becomes a requirement as regulated workloads move onto shared infrastructure. If it does, the two-layer architecture, with static platform measurements and dynamic pod identity, is a plausible design for getting there without the density tax of one VM per pod.

Frequently Asked Questions

Does dstack-capsule work on AMD SEV-SNP or ARM CCA hardware?

No. The implementation relies on Intel TDX-specific primitives, including RTMR registers and the 64-byte report_data field in TDX Quotes, with guest support that merged in Linux 5.19. Porting to AMD SEV-SNP or ARM CCA would require mapping those primitives to each platform’s attestation structures. CoCo’s vendor-neutral Kata abstraction is what gives it multi-platform support across SEV-SNP and TDX, which dstack-capsule trades away for pod-level attestation depth.

How many confidential pods can a single node run under CoCo versus dstack-capsule?

CoCo’s Kata microVMs require approximately 2 GB of dedicated memory per pod, limiting a 64 GB host to roughly 30 confidential pods before memory exhaustion. dstack-capsule’s Sysbox-container model shares a single TDX VM, so per-pod memory overhead is determined by the workload itself rather than a fixed VM allocation. The preprint provides no production-scale density benchmarks, so the actual improvement ratio remains unvalidated.

What attack vectors does the dstack-capsule threat model explicitly exclude?

Side-channel attacks (cache timing, power analysis) and Intel microcode vulnerabilities are explicitly out of scope. The trust chain depends on TDX hardware integrity, a dm-verity-protected OS image, and an independently-attested KMS running in a separate TEE. A microcode-level compromise bypasses all three layers. The codebase itself, approximately 7,700 lines of Rust and 660 lines of Go, has no production deployment history.

How does the Springer 2025 pod-integrity approach differ in its trust root from dstack-capsule?

The Springer paper uses TPM hardware roots of trust to verify node-level integrity and detect unauthorized pod modifications. dstack-capsule embeds pod identity directly into CPU-signed TDX Quotes rather than measuring through a TPM. The practical implication: TPM-based approaches run on most server hardware shipped since 2016, while dstack-capsule requires Intel TDX-capable processors (4th-gen Xeon Sapphire Rapids and later), restricting the eligible node pool.

What happens to co-resident pods if a TDX microcode vulnerability is disclosed?

A quote-integrity break would collapse the entire attestation chain, because pod identity, privilege fuse state, and secret-release decisions all depend on TDX Quotes being trustworthy. Both CoCo and dstack-capsule would lose attestation guarantees, but CoCo’s Kata VM boundary provides isolation that does not depend on attestation correctness. dstack-capsule’s shared-VM model has a smaller isolation boundary between pods, so a TDX break would expose co-resident workloads to each other more directly than CoCo’s air-gapped microVMs.

sources · 4 cited

  1. Implement Kubernetes Pod-Level Remote Attestation for Confidential Workloads on dstack (full HTML) primary accessed 2026-06-06
  2. Implement Kubernetes Pod-Level Remote Attestation for Confidential Workloads on dstack primary accessed 2026-06-06
  3. Confidential Containers on Kubernetes: AMD SEV-SNP, Intel TDX, and the Attestation Flow analysis accessed 2026-06-06
  4. Extending Kubernetes for Pods Integrity Verification analysis accessed 2026-06-06