Using Your Nvidia GPU's VRAM as Linux Swap: Where the NBD Hack Breaks Down

What NBD-VRAM Does and How It Works

An open-source project called NBD-VRAM exposes unused NVIDIA GeForce video memory as a Linux block device using the Network Block Device protocol, letting the kernel swap pages into VRAM. The hack is real. The question worth answering is narrower: on what specific hardware profile does this beat the alternatives, and what are you giving up to get there?

The mechanism is straightforward at the protocol level. NBD, the Linux Network Block Device protocol, exports a block device over a socket, typically over a network. NBD-VRAM repoints that socket at local VRAM, allocating a region of the GPU’s memory and presenting it to the kernel as a swap-capable block device. The kernel treats it like any other swap target. Pages get written out, pages get read back. The GPU’s memory controller handles the actual storage. Consumer GeForce cards ship with substantial VRAM, much of which sits idle in workloads that don’t stress the framebuffer.

The Memory Hierarchy Problem

The conventional memory hierarchy is ordered by latency: registers, then L1/L2/L3 cache, then DRAM, then storage (NVMe, SATA, network). VRAM occupies an unusual position. On bandwidth, GDDR6 and GDDR6X are competitive with or faster than DDR4/DDR5 system RAM. On latency, VRAM is worse than system RAM because every access has to traverse the PCIe bus, and the GPU’s memory controller is not optimized for the small-random-read pattern that swap generates.

Swapping to VRAM inverts the usual assumption that the swap tier is slower than the RAM tier. In this configuration, swap lives on hardware that is, in raw throughput terms, comparable to the RAM it backs up. But the access path (kernel page fault, NBD socket, PCIe bus, GPU memory controller, PCIe bus back, NBD socket, kernel) is longer than the direct path to system RAM, and it is much longer than the path through a compressed in-memory block device like zram.

The NBD Protocol Tax

NBD was designed for network-attached storage, not for loopback swap on the same machine. Every page that the kernel swaps to VRAM travels through the full NBD request/response cycle: a write request goes from the kernel’s block layer to the NBD client driver, over a socket to the NBD-VRAM userspace process, across PCIe to the GPU, and the completion acknowledgment reverses the path. For a read, the same round trip applies in the opposite direction.

That is multiple context switches, a socket traversal, and a PCIe round trip per swapped page. No benchmarks for NBD-VRAM have been published as of June 2026, so no specific latency figures exist, but the architectural overhead is predictable from the protocol design: it will be slower than a direct kernel-managed block device in system RAM because it must pass through additional layers that system-RAM swap does not. The comparison that matters is not VRAM versus disk (VRAM wins that easily), but VRAM-through-NBD versus compressed system RAM through zram, where the protocol and bus overhead tip the comparison the other way.

The Zero-Sum Trade

The capacity you gain for swap is capacity you lose for everything else the GPU does. When VRAM fills up on a workload that actually needs it, the system falls back to system RAM with a severe performance penalty. Dedicating VRAM to swap directly competes with any CUDA workload, inference job, or rendering pass running on the same card.

This is the core tension. The machines where spare VRAM is most available (desktops with a GPU doing light or no compute work) are the machines least likely to need additional swap. The machines where swap pressure is highest (memory-constrained servers running model inference) are the machines where every megabyte of VRAM is already allocated to model weights, KV caches, or activation buffers.

zram as the In-Kernel Baseline

Before committing GPU memory to swap, compare it against what the kernel already provides. zram, mainlined in Linux 3.14 (March 2014), creates a compressed block device directly in system RAM with no protocol overhead, no PCIe traversal, and no userspace daemon. It supports LZ4, LZO-RLE, ZSTD, and DEFLATE compression algorithms and is used in production by Fedora (enabled by default since release 33).

The compression ratios are the key number. The Arch Linux wiki documents a real-world zstd ratio of roughly 1:3 and notes that even assuming a more conservative 1:2 ratio, zram still provides more effective storage than uncompressed RAM alone. A zram device consumes approximately 0.1% of its configured disksize when idle.

The practical upshot: on a 16 GB system, a 32 GB zram swap device backed by zstd compression gives you roughly 16 GB of effective additional memory (compressed into ~8 GB of physical RAM) with zero bus overhead and zero competition with GPU workloads. For sysctl tuning guidance, consult the Arch Linux wiki; the general principle is that the kernel should prefer in-memory swap aggressively over filesystem I/O.

When NBD-VRAM Actually Makes Sense

The legitimate use case is narrow but real:

You already own a consumer GeForce GPU with significant idle VRAM. A desktop with a 24 GB RTX 4090 doing no GPU compute has ~24 GB of memory sitting unused. If the machine is also RAM-constrained and cannot accept more DIMMs (small-form-factor board, all slots populated, or budget zero), NBD-VRAM gives you swap capacity that would otherwise require purchasing hardware.
The machine runs no GPU inference or rendering workloads. If the GPU’s only job is display output or is headless entirely, dedicating most of its VRAM to swap does not compete with anything. A headless Linux box with a leftover GeForce card is the canonical target.
zram alone is insufficient and no additional system RAM can be installed. If you have 8 GB of RAM, zram at 2:1 gives you roughly 8 GB of effective additional capacity. If you need more than that and cannot add DIMMs, the VRAM on an idle GPU is the only remaining local option short of NVMe swap, which is slower still.

Outside that profile, the tradeoff degrades quickly. On any machine running CUDA workloads, dedicating VRAM to swap steals memory from the exact process that made the GPU valuable in the first place. On a machine where DIMMs can be added, buying a 16 GB DIMM kit outperforms the NBD hack in every dimension: lower latency, higher bandwidth, no protocol overhead, no userspace dependency, no risk of GPU-reset-induced swap corruption.

Operational Gotchas

NBD-VRAM introduces several failure modes that conventional swap does not have:

GPU resets destroy swap contents. A driver crash, a CUDA illegal memory access, or a thermal throttle that triggers a GPU reset wipes the VRAM region backing the swap device. The kernel will then find corrupted or missing swap pages. Depending on what was in those pages, this ranges from process-level segfaults to a full kernel panic. Conventional disk-backed swap and zram do not share this vulnerability.

No hibernation support. Hibernation writes the full contents of system RAM to the swap device and powers off. VRAM is volatile and loses its contents on power loss. Suspend-to-RAM keeps the GPU powered, but hibernate-to-VRAM is architecturally impossible without a battery-backed GPU or a secondary non-volatile swap target for the hibernation image.

NBD userspace dependency. The swap device depends on a userspace daemon (the NBD-VRAM process). If that process crashes or gets OOM-killed, the swap device disappears mid-operation. This is the same fragility that affects network-backed NBD swap, and it is why zram, which operates entirely in kernel space, is preferred for swap on production systems.

Consumer GPU specific. The project targets GeForce cards specifically. Datacenter GPUs (A100, H100) already have their VRAM fully utilized in most deployments, and the cards themselves cost enough that adding system RAM is trivial by comparison. The hack is for the consumer surplus: commodity GPUs with memory to spare.

The Practitioner’s Summary

NBD-VRAM is a clever piece of systems engineering that solves a real but narrow problem. On a memory-constrained Linux box with an idle consumer GPU and no option to add RAM, it gives you a swap tier that is faster than disk without spending money. For every other configuration, zram delivers comparable or better effective capacity (via 2:1 to 3:1 compression), lower latency (no PCIe round trip, no NBD protocol), and none of the operational fragility of a GPU-backed swap device. Set up zram first. Reach for NBD-VRAM only when zram is not enough and the hardware profile fits.

Frequently Asked Questions

What sysctl tuning does zram swap require?

Set vm.swappiness to 180 and vm.page-cluster to 0. The first tells the kernel to prefer swap over dropping page cache, which is correct when swap is compressed RAM rather than slow disk. The second disables the kernel’s default of reading adjacent pages on a swap-in, since sequential readahead saves nothing on an in-memory device. These values are counterintuitive if you learned swap tuning on spinning disks, where low swappiness and large readahead were standard advice.

How much physical RAM does zram consume when no pages are swapped into it?

Roughly 0.1% of the configured disksize. A 32 GB zram swap device uses about 32 MB of physical RAM until pages start filling it. You can provision zram generously without paying for capacity you are not using, which is a different cost model from NBD-VRAM where allocated VRAM is immediately unavailable to GPU workloads regardless of whether the kernel has written any swap pages to it.

Does a GPU that also drives a desktop reduce the swap-capable VRAM?

Yes. The framebuffer, compositor, and hardware-accelerated applications (browsers with GPU compositing, video players) all claim VRAM before NBD-VRAM can allocate its swap region. A 24 GB RTX 4090 driving a 4K display might leave 20 to 22 GB for swap in a light desktop session, and less once additional GPU-accelerated windows open. The usable reserve fluctuates at runtime, unlike a DIMM whose capacity is fixed and exclusive.

Is there a ceiling on how large a zram swap device can be?

The kernel documentation recommends not exceeding twice physical RAM, since a 2:1 compression ratio is the expected baseline. On a 16 GB system, a 32 GB zram device is the practical maximum before you are counting on compression the documentation does not guarantee. NBD-VRAM has no compression dependency but is hard-capped by the GPU’s physical VRAM, which on consumer GeForce cards ranges from 8 to 24 GB and cannot be expanded without replacing the card.