Kioxia and Dell Technologies have fitted 9.8 PB of flash into a single 2U server[^1], using forty 245.76 TB LC9 Series SSDs[^2] in a Dell PowerEdge R7725xd. A full rack of twenty units, as Blocks & Files notes[^3], would carry 196 PB. The milestone is real, but the operational implication is sharper: at this density, rebuild bandwidth and drive endurance become the hard constraints, and raw $/GB stops being the primary design variable.
The Chassis and the Drives
The PowerEdge R7725xd[^1] runs AMD EPYC 9005 processors and packs 40 E3.L NVMe SSDs. The drives are Kioxia’s LC9 Series, built on 8th-generation BiCS FLASH QLC NAND with CMOS-directly-bonded-to-array (CBA) packaging and 32 stacked 2Tb dies. Kioxia rates them at up to 12 GB/s sequential read, 3 GB/s sequential write, and 1.3 million random read IOPS[^2]. Endurance is 0.3 DWPD, which is standard for QLC but worth noting when each drive holds a quarter petabyte.
Kioxia’s own comparison claims that a 9.8 PB configuration built from 30.72 TB SSDs would need seven extra servers, 280 additional drives, and would draw 8 times the power[^1]. The server also offers up to five 400 Gbps NICs[^4] and is air-cooled, a deliberate fit for GPU-heavy AI pipelines where liquid cooling is already spoken for.
The Rebuild Time Problem
A 245 TB NVMe drive does not rebuild instantly. Murat Karslioglu estimates[^5] that at realistic sustained throughput, 30 to 50 percent of peak, a rebuild takes 14 to 27 hours on Gen5 or Gen4 links. During that window, a second failure in the same redundancy group risks data loss.
The math is unforgiving. Mean Time To Data Loss (MTTDL) scales with MTTR raised to the parity count. Doubling the rebuild time does not halve durability; for double-parity schemes, it degrades by a factor of four. For triple parity, the degradation is a factor of eight. This is not a theoretical edge case. It is the central operational concern when a single chassis holds forty of these drives.
What This Means for Cluster Design
The design tradeoff has shifted. Capacity planning was the bottleneck for the last decade; now recovery-time SLOs are. Teams specifying storage for self-hosted or on-prem infrastructure must treat erasure-coding parameters as a first-class decision, not a default preset.
Wider parity groups, declustered placement, and faster rebuild networks all become necessary. The 400 Gbps NICs in the R7725xd[^4] are not there for show; they are structural relief for a rebuild stream that must move nearly 250 TB before the next drive rolls the dice. Endurance also deserves scrutiny. At 0.3 DWPD, a full-drive rewrite every three days is the rated ceiling. Workloads with high churn will chew through that faster than operators accustomed to 30 TB drives might expect.
The Competitive Landscape
Kioxia and Dell are first to ship, but they are not alone. Blocks & Files reports[^3] that Micron is developing the 6600 ION, Sandisk the UltraQLC SN670, SK Hynix the AIN D, and Samsung is working on nearline SSDs with a stated roadmap toward 1 PB per drive. Scality provided the Samsung detail, which suggests the race is toward half-petabyte and petabyte-class devices rather than incremental density gains.
The system will be on display at Dell Technologies World 2026 in Las Vegas[^4]. For practitioners, the demo worth watching is not the capacity number. It is whether Dell and Kioxia publish sustained-rebuild benchmarks under load, because that figure, not the petabyte count, determines whether this chassis belongs in a production cluster.
Frequently Asked Questions
What happens if the chassis itself fails, not just a single drive?
A backplane, power-supply, or motherboard failure in the R7725xd takes all 40 drives offline simultaneously—a correlated failure that in-node erasure coding cannot remediate. Operators must treat each 2U enclosure as a single 9.8 PB failure domain and stripe across multiple chassis, which partially offsets the density advantage.
How much worse does the rebuild window get if Samsung ships its planned 1 PB SSDs?
At the same sustained-throughput percentages used for the 245 TB estimate, a 1 PB drive would rebuild in roughly 57 to 110 hours. Under triple parity, that near-quintuple increase in MTTR would degrade MTTDL by approximately 125x (MTTDL ∝ MTTR³), assuming the scaling relationship holds.
Why is the rebuild 14-27 hours when the server has 400 Gbps NICs?
At 400 Gbps (50 GB/s), transferring 245 TB takes under 90 minutes in theory. The bottleneck is the destination drive’s write path: the LC9 rates 3 GB/s sequential write, and rebuild traffic involves random IO patterns and inline parity calculation that fall well below that spec. The NICs help with cross-node rebuilds but cannot make a single drive ingest data faster than its own write pipeline allows.
How does the rebuild time compare against a 9.8 PB fleet of 30 TB drives?
Kioxia’s reference 30.72 TB configuration spreads 9.8 PB across eight servers with roughly 320 drives. A single 30 TB drive rebuild at comparable per-drive throughput finishes in roughly 1.5 to 3 hours—an order of magnitude faster—so the larger fleet maintains dramatically higher MTTDL despite having more individual failure candidates.