nand2mario’s z386 is a compact 80386-compatible FPGA core that skips the usual clean-room instruction reimplementations and instead wires its control logic directly to Intel’s original 386 microcode ROM, extracted from die photographs by a separate reverse-engineering effort. On a DE10-Nano board it runs Doom at 16.5 FPS, boots DOS, and executes protected-mode software. The approach works. It also sits on legally uncertain ground that none of the project’s coverage has addressed.
What z386 Is and Why It Matters
Most open-source x86 FPGA cores implement the instruction set behaviorally: each instruction is modeled as a separate RTL block, verified against a software reference, and licensed under a permissive open-source license. ao486, the best-known 486SX-compatible core, follows exactly this path, using Bochs as its behavioral reference.
z386 takes a different route. Rather than re-implementing what each 386 instruction does, it implements the hardware structures needed to execute Intel’s own microcode the way the original silicon did. The RTL provides the datapath, the ALU, the paging unit, and the bus interface. A recovered 2,560-entry, 37-bit-wide microcode ROM provides the control program that sequences micro-operations through those structures. The result is a CPU that runs instruction streams the same way the 1985 Intel part did, not merely one that produces compatible results.
nand2mario developed z386 in SystemVerilog from January to April 2026 and describes it as a compact core (roughly 8K lines of code by cloc, versus ao486’s 17.6K). It runs on Altera Cyclone V and Gowin GW5A FPGAs.
How Intel’s 386 Microcode Was Extracted
The microcode ROM that z386 depends on was not supplied by Intel. It was recovered from high-resolution die images of an original 80386 by a team consisting of reenigne, gloriouscow, smartest blob, and Ken Shirriff, using a pipeline of image processing, neural-network-based pattern recognition, and manual analysis. The ROM contains 2,560 entries, each 37 bits wide, encoding the micro-operations that drive the processor’s internal datapath.
This extraction built on prior work. nand2mario’s earlier z8086 project demonstrated the same approach for the Intel 8086. The 386 microcode proved considerably harder: the micro-operations are denser, more contextual, and full of implicit assumptions about hidden hardware state that the RTL must replicate exactly or the microcode produces wrong results. Getting the microcode right was necessary but not sufficient; the hardware structures surrounding it had to match Intel’s original design closely enough that the micro-ops would sequence correctly.
The 386’s instruction decoder uses two PLA-style lookup tables: a Control PLA that determines which byte comes next in an instruction stream, and an Entry PLA that determines where in the microcode ROM to start.
Architecture: Eight Units, One Recovered ROM
z386 is organized into roughly eight functional units: the microcode sequencer, the instruction decoder (with its two PLAs), the ALU, the barrel shifter, the paging unit with a 32-entry TLB, the bus interface, the register file, and a 16 KB unified L1 cache. That cache is a deliberate addition the original 80386 lacked; it is 4-way set-associative with 16-byte lines, and it accounts for a significant fraction of the core’s performance on modern FPGA hardware. The core also uses FPGA DSP blocks for multiplication rather than implementing a multiplier in general-purpose logic.
The microcode ROM drives the sequencer directly. Each fetched instruction is decoded through the PLAs into a starting address in the ROM, and the sequencer walks through the micro-operations for that instruction, dispatching control signals to the datapath units. This is structurally identical to how the original 386 worked. The difference is that the “silicon” is an FPGA and the ROM is a recovered bitstream rather than a mask-programmed array.
z386 vs ao486: Microcode Fidelity vs Clean-Room Implementation
The two cores target different generations (386 vs 486SX), so the comparison is not a direct benchmark contest. It is a structural one: two approaches to the same problem class, running on the same FPGA board.
| Metric | z386 | ao486 |
|---|---|---|
| Target architecture | 80386 | 486SX |
Code size (cloc) | 8K lines | 17.6K lines |
| ALUTs (DE10-Nano) | 18K | 21K |
| BRAM (DE10-Nano) | 116K | 131K |
| Clock (DE10-Nano) | 85 MHz | 90 MHz |
| Doom FPS (max detail) | 16.5 | 21.0 |
| Control program | Extracted Intel microcode ROM | Behavioral RTL, Bochs-verified |
| Approach | Depends on extracted Intel microcode | Clean-room behavioral model |
z386 uses roughly half the code and fewer FPGA resources, but delivers lower performance. That gap reflects a real tradeoff. A clean-room core can optimize each instruction’s implementation independently because it controls the micro-architecture entirely. A microcode-driven core is constrained by the sequencing the ROM dictates. If Intel’s microcode takes four cycles for an operation that a behavioral model could do in two, the microcode core runs it in four. The fidelity is the point, not the speed.
Hackaday’s coverage frames the project as archaeological reconstruction, a characterization that fits. z386 is not trying to be the fastest retro x86 core. It is trying to be the most faithful to how the original silicon actually worked.
Performance on Real Hardware: Doom at 16.5 FPS
On a DE10-Nano (Cyclone V FPGA), z386 runs at 85 MHz and achieves 16.5 FPS in Doom at maximum detail settings. nand2mario benchmarks it as performing roughly like a 70 MHz cached 386 or a low-end 486, though with a worse CPI (cycles per instruction) than historical 386 processors. The CPI penalty is an expected consequence of the microcode-driven approach: the recovered ROM encodes Intel’s original cycle counts, which were tuned for the original silicon, not an FPGA.
The software compatibility list is still incomplete but functional. z386 boots DOS 6 and DOS 7, runs protected-mode programs with DOS/4GW and DOS/32A extenders, and handles Doom, Doom II, Cannon Fodder, FreeDOS, HIMEM, EMM386, and SeaBIOS. As of May 2026, Windows is not yet supported. Protected-mode coverage remains a work in progress.
The IP Question Nobody Is Asking
None of the coverage from Hackaday, LinuxIAC, or SesameDisk addresses the intellectual-property status of the extracted microcode. This is not a minor omission.
ao486’s clean-room approach exists for a reason. By modeling instruction behavior against Bochs (itself a clean-room software emulator) and never referencing Intel’s proprietary microcode, ao486’s authors built a defensible argument that their RTL is an original work.
z386 cannot make the same claim. Its control program is a verbatim copy of Intel’s mask-programmed ROM, recovered through die imaging. Whether that recovery constitutes fair use, reverse engineering, or copyright infringement is a legal question that has not been tested. Microcode occupies an awkward space in IP law: it is functional (it controls hardware), which weighs against copyright protection, but it is also expressive (it is a program written by Intel engineers), which weighs in favor.
For retrocomputing practitioners, the practical risk is low. Intel is unlikely to pursue a hobbyist FPGA project reviving a 41-year-old processor that the company stopped manufacturing before many of z386’s users were born. But the structural question matters for the model this project represents: if microcode extraction becomes the standard approach for vintage-CPU revival (and it is becoming more feasible as die imaging and ML-assisted extraction mature), the community is building on a foundation with unresolved legal footing.
What Comes Next
As of May 2026, z386’s current milestone is a usable DOS and protected-mode platform that runs real software. The road ahead is predictable: Windows support, more complete protected-mode coverage, and potentially targeting additional FPGA platforms beyond the Cyclone V and Gowin GW5A.
The more interesting trajectory is methodological. If the z8086-to-z386 progression holds, the extraction-plus-RTL approach could extend to later processors: the 486, the Pentium, perhaps beyond. Each step raises the extraction difficulty (later microcode is larger, more encoded, and protected by progressively aggressive obfuscation) and the IP stakes (Intel has shown willingness to enforce x86 IP against modern competitors, as the battles with AMD and VIA demonstrated). The bottleneck for accurate vintage-CPU revival has shifted from reverse-engineering instruction behavior to extracting and justifying reuse of proprietary microcode. The technical problem is being solved. The legal one is being deferred.
Frequently Asked Questions
What license covers ao486’s code, and how does it compare to z386’s IP situation?
ao486 splits its licensing: RTL components are BSD-licensed, while portions derived from Bochs carry the LGPL. This dual structure reflects its clean-room provenance, where behavioral modeling against Bochs kept most of the codebase under permissive terms. z386 has no comparable licensing clarity because its control program is a recovered copy of Intel’s proprietary mask ROM, not an independently authored work.
What is the 111-bit instruction word produced by the 386’s PLA decoders?
The Control PLA and Entry PLA together generate a 111-bit decoded instruction word that tells the microcode sequencer where to begin and what conditions to test. Its format was documented in Jim Slager’s 1986 ICCD paper. z386 must reproduce the exact bit positions within this encoding because the microcode ROM’s conditional branches reference specific bits by position. A wrong bit assignment in the decoder causes the microcode to misroute, producing bugs that look like random state corruption rather than repeatable instruction failures.
Does z386’s CPI penalty vary by instruction type?
The penalty is uneven. Instructions whose microcode was tuned for the original 386’s bus timing lose the most on FPGA fabric with different latency characteristics. The added 16 KB L1 cache offsets memory-bound instructions, but ALU-heavy sequences still execute at the ROM’s prescribed cycle counts. A behavioral core like ao486 can reorder or compress micro-operations for a target FPGA, which is why ao486 achieves higher throughput on the same hardware despite implementing a more complex instruction set.
Could the microcode-extraction approach extend to the 486 or Pentium?
Technically yes, but each generation raises the difficulty. The 8086 microcode that z8086 uses is relatively straightforward. The 386 introduced denser encoding and implicit hardware-state dependencies. The 486 and Pentium add larger ROMs, more encoded fields, and on-chip caches, pipelining, and branch prediction whose control logic is interleaved with the microcode. Replicating that interleaving in RTL requires matching far more hidden state than the 386 demanded, and Intel employed progressively aggressive obfuscation of the microcode storage itself in later parts.