Can You Trust an AI Robustness Certificate? A Paper Says Verify It

A robustness certificate is only as trustworthy as the numerical machinery that produced it. A June 2026 preprint argues that certifications for neural networks deserve scrutiny in their own right, and older results show the floating-point verifiers that issue them can both miss planted backdoors and return wrong verdicts. Treat a passing certificate as a reading from an instrument, not as a guarantee.

What does a neural-network robustness certificate actually promise?

A robustness certificate states, for one specific input, the largest input distortion that will not change the network’s prediction: a numeric margin of safety against adversarial examples, the slightly perturbed inputs that flip a classifier’s output. According to arXiv:2606.23858, the certificate is conventionally modelled as an axis-aligned hyper-rectangle sitting in the input domain around the point being certified. Stay inside that box and the prediction holds; step outside it and the certificate says nothing.

The promise is conditional, and the precision matters. A certificate does not say a model is robust. It says the model is robust within a region, under a chosen distance metric, for a single input. Compress that into the line “our model is certified-robust” and you have already discarded the qualifiers that make the claim mean anything. The box is defined per input, so the natural way to summarise a model’s safety is to ask how large that box can be made. That maximisation is where most research effort lives, and most of the trouble with it.

What does the June 2026 preprint actually contribute?

arXiv:2606.23858, posted 22 June 2026 by Papamichail, Varsos, Flouris and Marques-Silva, is largely constructive rather than a takedown of existing certificates. Its headline contribution is the apothem measure, which scores a certification by its shortest side, the distance from the centre of the box to its nearest face, rather than by its volume. The authors give an algorithm that computes an apothem-optimal certification in a number of calls to a neural-network verifier that is linear in the input domain’s diameter.

Two results lift the paper above an incremental optimisation. The first is a proof of a negative: no volume-optimal, oracle-based algorithm can exist, even if the cost of calling the oracle is set to zero. Volume-maximisation, the dominant objective in prior work, is provably out of reach under this oracle model, which is why the apothem reframing is not cosmetic. The second is the dual certification, an interval that covers all instances of a class and yields an apothem-minimum upper bound on a certification rather than a single-input box.

The empirical claim is modest and should be quoted as such. The authors’ ParallelepipedoNN system, evaluated on MNIST and Fashion-MNIST, reports at least a two-fold improvement over prior work on the minimum edge length of the certification. That is a per-edge figure on two standard datasets, not a declaration that the certification field is unsound. The word “trustworthy” in the paper’s title refers to the optimality and computability of the certificate, not to a discovery that certificates are unreliable.

Why aren’t “complete” verifiers airtight?

A “complete” neural-network verifier is supposed to return a definite yes or no: sound and exhaustive, unlike approximate or “incomplete” methods that can miss counterexamples and report a clean bill of health for inputs that are not actually robust. That exhaustiveness is what makes a certificate feel like a proof rather than an estimate. But the mathematical guarantees of complete verifiers hold only under arbitrary-precision arithmetic and a reliable implementation, according to the ICLR 2021 “Fooling a Complete Neural Network Verifier” line of work. In practice, both the network and the verifier run on limited-precision floating point, and roundoff error becomes an attack surface in its own right.

This hinge does not come from the June 2026 paper. The floating-point loophole is documented in that earlier work, which shows the verifier’s own numerical methods can be exploited to make a sound-looking tool return unsound answers. The completeness guarantee is a statement about the algorithm in the limit; the executed binary is a statement about floating-point arithmetic, and the two are not the same object.

How often do verifiers actually get it wrong?

If you want a single number for how reliable today’s automated verifiers are, the NeuroCodeBench 2.0 benchmark (arXiv:2510.23389, 2025) supplies a blunt one. The benchmark contains 912 neural-network verification problems spanning activation functions, common layer types, and full networks up to 170,000 parameters, written in plain C and compatible with the SV-COMP software-verification competition format. It is the first rigorous evaluation of general software verifiers against NN code rather than against hand-rolled ML-specific tools that share the verifier’s own assumptions.

On those 912 problems, eight state-of-the-art software verifiers correctly solved only an average of 11% of cases while producing roughly 3% incorrect verdicts, according to the benchmark. Read the two figures together. The 11% correct rate says verifiers are weak; many problems simply go unsolved. The 3% wrong-verdict rate is the dangerous number, because a wrong verdict is not a timeout or an “unknown.” It is a confident, false answer that a downstream certificate will treat as ground truth. An auditor who sees a certificate pass has no easy way to tell whether it passed because the model is robust or because the verifier handed back a silent false positive.

Can a verifier miss a planted backdoor?

The wrong-verdict problem is not purely accidental. The ICLR 2021 poster demonstrates that roundoff error can be weaponised to construct “adversarial networks” whose true robustness radically differs from what a state-of-the-art complete verifier computes. The same numerical trick can insert a backdoor into a network that the verifier completely misses: a model that looks certified-clean but behaves maliciously under conditions the verifier’s arithmetic cannot represent.

The defence the authors propose is strikingly small, which is itself a signal. They suggest adding a very small perturbation to the weights, and they conjecture that other numerical attacks remain open and that exact verification would have to model every detail of the executed computation: every rounding mode, every fused-multiply-add, every order of operations. That is a high bar, and it reframes “complete verification” as an idealisation that real systems approach only approximately. The gap between the idealised algorithm and the executed binary is where both accidental wrong verdicts and planted backdoors live.

What does this mean for auditors and conformity assessment?

The practitioner consequence is that the unit of verification has grown. Under a regime where high-stakes systems are expected to carry documented evidence of safety properties, the kind of conformity-assessment posture anticipated under instruments such as the EU AI Act, a robustness certificate is one of the artefacts an assessor will reach for. None of the fetched source material directly addresses AI Act conformity assessment, though. The connection here is prospective, not documented in these papers, and should be read as anticipated use rather than established practice.

What the sources do support is narrower and more useful. A certificate’s value is bounded above by the soundness of the verifier that produced it. If the verifier returns wrong verdicts around 3% of the time on a general benchmark (arXiv:2510.23389) and can be fooled into missing backdoors (ICLR 2021), then an assessor who accepts a bare certificate is accepting the verifier’s error rate along with it. Auditing the verifier, its arithmetic mode, its precision, its known soundness gaps, becomes part of auditing the model. The June 2026 preprint sharpens the certificate itself (arXiv:2606.23858); it does not relieve the verifier-soundness problem, which is the load-bearing assumption a certificate rests on.

A note on what not to conflate. A separate 2025 framework (arXiv:2512.20865, IEEE Open Journal of Control Systems 2026) recasts data-poisoning and test-time robustness as a formal safety-verification problem using barrier certificates from control theory, with PAC bounds, and claims the first unified formal guarantees across both attack settings. That is a distinct control-theoretic approach to a different robustness question, and it should not be read as the same object as the per-input robustness certificates of 2606.23858. Lumping them together muddies both.

How should a practitioner read a robustness certificate?

When a vendor or paper hands you a robustness certificate, the certificate itself is the easy part. The hard questions are about the instrument that produced it.

Which verifier issued it, and is that verifier complete or incomplete? Incomplete methods can return “robust” for inputs that are not. Complete methods claim exhaustiveness but, per arXiv:2510.23389 and the ICLR 2021 result, are not immune to floating-point soundness failures.
What arithmetic did it run on? Arbitrary-precision guarantees do not survive a move to limited-precision floating point. If the deployment binary and the verification binary use different precision or rounding, the certificate describes a computation that is not the one that will run.
What is the optimality target? Volume-maximising certificates are provably not computable by an oracle-based algorithm (arXiv:2606.23858); apothem-optimal ones are, at linear oracle cost. A certificate that reports a volume without addressing the intractability result is either approximate or mislabelled.
Is the certificate per-input or per-class? A single-input box says nothing about other inputs. Dual certifications that bound all instances of a class give a weaker but broader guarantee.
Where is the wrong-verdict rate? If the verifier has been benchmarked, ask for its error rate. Around 3% incorrect on NeuroCodeBench 2.0 is a current datapoint for software verifiers on NN code, not a ceiling.

A certificate is a measurement, and every measurement carries the error bars of its instrument. The June 2026 work sharpens the measurement; the older verifier-soundness results are the error bars. Reading one without the other is how false assurance gets stamped onto a document.

Frequently Asked Questions

Do these verifier-soundness numbers apply to LLM-scale models, or only small classifiers?

The empirical evidence is narrow. ParallelepipedoNN was evaluated only on MNIST and Fashion-MNIST, and NeuroCodeBench 2.0 caps at 170,000 parameters. Both are orders of magnitude below modern production networks, so the 3% wrong-verdict rate is a floor measured on toy-scale classifiers, not a ceiling that transfers to billion-parameter systems.

How does apothem-optimal certification differ from CROWN or Lipschitz bounds?

CROWN and α-CROWN produce layer-wise Lipschitz bounds that cap the worst-case growth of the output, giving a conservative robustness radius. The apothem measure instead scores the certificate by its shortest side and admits an oracle-optimal algorithm, where volume-optimal methods are provably uncomputable. Lipschitz bounds stay cheaper to compute but carry no such optimality guarantee.

What does arbitrary-precision verification cost in practice?

Arbitrary-precision arithmetic is the only regime in which a complete verifier’s mathematical guarantees hold, and it is too expensive to run inside a production inference loop. The soundness regime the proofs assume and the performance regime the deployment binary runs in are therefore different systems, and no deployed complete verifier currently bridges them. The June 2026 preprint does not change this, because its apothem-optimal certificates still call the same oracle.

Does an apothem-optimal certificate escape the floating-point soundness gap?

No, an apothem-optimal certificate is computed by calling a neural-network verifier as an oracle, so its trustworthiness is capped by that oracle’s soundness. If the oracle runs on limited-precision floating point and can be fooled into missing backdoors, the certificate inherits that exposure. The June 2026 work optimises which box the oracle returns, not whether the oracle itself tells the truth.

What would push regulators to stop treating a passing certificate as proof of safety?

A demonstrated backdoor that survives in a certified-robust model after deployment. The ICLR 2021 work proves such backdoors can be planted and missed by a complete verifier, but no public incident has yet tied a deployed certified system to a floating-point soundness failure. Until one does, certificates will keep being accepted as stronger evidence than the verifier-reliability baseline justifies.