For interpreted languages, the manifest your build pipeline produced is not the same thing as what actually runs. A preprint released on arXiv last week, MEM-SBOM, from Virginia Commonwealth University and the Volatility Foundation argues that build-time SBOMs miss dynamically imported, version-drifted, or injected code that only exists in a live process. The fix the authors propose is to stop trusting manifests and start reading memory.
What does “you see is not what you execute” mean?
A software bill of materials is supposed to be an inventory of what is in your software. For Python it has always been an inventory of what was available to be loaded, not what actually got loaded.
The MEM-SBOM paper (arXiv:2606.22827) formalizes a taxonomy the supply-chain field already half-knew: a build SBOM describes what was compiled or packaged; a deployed SBOM enumerates what is installed on disk; a runtime SBOM enumerates what is resident in memory. The gap between them is the gap between “the install” and “the execution,” and in dynamic languages that gap carries real code.
Python resolves imports at execution time. Conditional imports, plugins pulled in via importlib.import_module(), monkey-patches, and sub-interpreters all mean the set of code that runs is a subset, and occasionally a superset, of what the manifest declares. A build-time SBOM reads requirements.txt, pyproject.toml, or setup.py and reports the declared graph. The interpreter, at runtime, may load something else entirely, and the manifest will never know.
Why build-time SBOMs systematically miss Python
Build-time SBOMs fail for Python because the language resolves dependencies during execution rather than at packaging time, and no manifest can enumerate code that did not exist when the build ran.
The paper frames the gap plainly: SBOMs built from metadata or filesystem artifacts “fail to capture the components loaded and executed at runtime, especially in dynamic ecosystems such as Python.” Existing tools recover what is declared and what sits on disk; the abstract positions MEM-SBOM as recovering “all runtime packages missed by existing SBOM tools.” Version drift, dynamically loaded plugins, and conditional imports are the categories of code that fall through the manifest’s net.
How MEM-SBOM extracts the runtime SBOM from memory
MEM-SBOM is a suite of Volatility 3 plugins that recovers modules from the CPython interpreter’s internal structures, resolves their versions, and analyzes bytecode to build dependency graphs and flag reachable vulnerable functions. The abstract describes it as “the first memory forensics framework that generates SBOMs directly from the runtime state of Python applications.”
The choice of volatile memory is deliberate. Instrumentation-based runtime SBOMs require monitoring to be deployed in advance and to persist throughout execution, conditions the authors call difficult to satisfy in production and incident-response settings. A memory image, by contrast, is a forensic snapshot that can be taken after the fact, including during incident response when an attacker has already tampered with the running system.
On a 51-application evaluation set, MEM-SBOM reports 100% extraction accuracy for runtime packages. Treat that number with the skepticism a preprint deserves: it is 51 Python applications in a lab, under a non-adversarial baseline. What it does establish is that, on benign workloads, memory-based recovery loses nothing the manifest contained and gains what the manifest never saw.
The tornado case study: function-level reachability cuts false positives
The paper’s headline result for vulnerability assessment is that, across the tornado-dependent applications in its evaluation set, only Streamlit actually calls the vulnerable routines in the tornado dependency. Every other tornado importer in the set would be flagged by a package-level scanner, yet none reach the vulnerable code path.
MEM-SBOM gets there by analyzing bytecode to determine whether each vulnerable function is reachable from the application’s call graph. A CVE match counts as exposure only when the vulnerable routine sits on a live call path, not when it merely exists in an imported module.
The payoff is precision. A package-level scanner flags every application that imports tornado; the runtime evidence narrows exposure to the one application that actually exercises the vulnerable code. For a team triaging CVE matches across a fleet, that is the difference between upgrading a dependency everywhere and upgrading it where it matters.
The operational tax: privileged access, short-lived processes, and encrypted bytecode
Memory-based SBOM generation is not drop-in for a container fleet. It presupposes privileged access to a process’s memory image, and it still misses what is not resident at acquisition time.
The resident-at-acquisition limit is the hard one. MEM-SBOM analyzes only components loaded at the moment of memory capture; code that ran during installation and exited before the snapshot is invisible. For short-lived request handlers that load and exit between snapshots, the resident set at acquisition may be incomplete. The framework also assumes the Python runtime itself is not malicious; if the interpreter has been tampered with, the internal structures the plugins walk cannot be trusted. And vulnerability detection is bounded by the advisory feeds behind whatever scanner consumes the SBOM, so it catches known CVEs only.
Volatility 3 presupposes a memory image to analyze, which is the premise the paper itself states. In a container environment that maps to host-level memory capture or equivalent privileged visibility into the running workload, not a sidecar you attach to a Pod.
What security teams should do this quarter
Treat your static SBOM as a lower bound, layer runtime evidence onto high-value workloads, and audit EU Cyber Resilience Act exposure before runtime-grounded transparency becomes a procurement requirement.
Keep the build-time pipeline. Syft, CycloneDX, and their peers remain the cheap floor; the abstract frames MEM-SBOM as recovering “all runtime packages missed by existing SBOM tools,” not as replacing them. The cases where they lose are exactly the cases the paper targets: runtime-only packages, dynamically loaded plugins, and version drift. OWASP’s Dependency Graph and SBOM Cheat Sheet already says as much: it lists build-time generation as best practice but separately calls for “runtime/deployed: telemetry to validate what executes in production,” and recommends CycloneDX or SPDX, cosign/Sigstore/in-toto signing, and VEX documents to express exploitability.
For externally-facing, high-risk applications, the regulatory clock is running. Matproof frames the MEM-SBOM work as relevant to the EU Cyber Resilience Act and AI Act and advises compliance teams to check whether their SBOM process relies solely on build-time or manifest methods, and to pilot runtime SBOM generation for high-risk workloads. The threat scale the paper cites is concrete: more than 700,000 malicious packages were discovered in public registries in 2024 (arXiv:2606.22827), a 156% year-over-year increase, and over 12,000 were removed from PyPI in 2022 alone.
The defensible posture is not “build-time or runtime.” It is build-time as the cheap floor, runtime evidence layered where the blast radius justifies it, and a VEX discipline that distinguishes a dependency in memory from a vulnerability that actually fires. The container running uncatalogued code is not a hypothetical. The question is whether you can prove, from runtime evidence, which code that is.
Frequently Asked Questions
How does MEM-SBOM compare to the SBOM tools already on the market?
The paper benchmarked it against eight existing SBOM tools and found none support every Python metadata configuration or capture dynamically loaded packages. MEM-SBOM additionally surfaced ten cases where the version embedded in loaded code differed from the version recorded at install time, a class of drift manifest scanners cannot see.
What in-memory evasion techniques is MEM-SBOM built to defeat?
The framework catalogs five: module deletion or overwrite, import-hook interception, import-bypassing module creation, sub-interpreter injection, and garbage-collector manipulation. It counters them with a multi-layered traversal of module registries, thread contexts, the garbage collector, arenas, and heap regions, rather than trusting a single structure an attacker could poison.
Can two apps on the same host really run different versions of a shared dependency?
Yes, and the paper documents it with asgiref. Django’s StatReloader dynamically loads the newer version into memory during execution, while Celery, which has no auto-reload, keeps running the older version loaded at startup. A memory capture shows both versions resident; a manifest shows one.
Will the 100% extraction figure hold up against adversarial workloads?
Probably not as stated. The 100% result comes from 51 benign Python applications under a non-adversarial baseline, and the same paper builds an entire evasion taxonomy, from import-hook interception to sub-interpreter injection, that an attacker would deliberately deploy against it. Read the figure as an upper bound on benign recovery, not a production detection rate.
How does the runtime-drift gap translate into active compromise?
The fshec2 malware concealed credential theft inside compiled .pyc bytecode, the layer a build-time SBOM never inspects. The 2022 PyPI purge removed over 12,000 malicious packages that had been installable through declared dependencies. A payload loaded after install and then deleted from disk leaves a trace only in memory, the case memory forensics exists to close.