Can a Cryptographic Certificate Prove an AI Agent's Output Is Valid?

NIST has spent more than 50 years standardizing the digital-signature algorithms a certificate of validity would ride on. Such a certificate can bind an agent’s output to a signed claim, yet the cryptography itself proves integrity and authorship, not correctness, and it only helps where validity is cheap to re-check. For tasks where “valid” is a predicate a verifier can re-run (does the code compile, do the tests pass, does the output satisfy the spec), the certificate turns a model’s self-report into something checkable. For open-ended work, the certificate proves nothing useful, and review stays probabilistic.

The distinction matters because the two failure modes look identical from the outside. A model that merely claims a task is done and a model that can produce a checkable proof of done-ness both report success. Only one of them can be caught lying by a third party who does not re-do the work.

What a certificate of validity actually verifies (and what it doesn’t)

A certificate of validity can prove integrity, authentication, and non-repudiation, but not correctness. Cryptography in its standard formulation rests on four principles: confidentiality, integrity, non-repudiation, and authentication. Three of those are directly attestable with a digital signature. Under non-repudiation, the sender of encrypted information “cannot deny their intention to send the information,” which is what lets a signature bind a signer to a claim. Correctness is not on the list.

That gap is the whole story. A signature says the bytes are authentic and intact. Whether those bytes are right is a separate question, and cryptography has nothing to say about it on its own.

To turn “authentic and intact” into “valid,” you need a second ingredient: a predicate the verifier can evaluate against the output. The certificate signs a claim (“this output satisfies predicate P”), and the verifier re-runs P against the output to confirm. Without P, the certificate is a receipt. With P, it becomes a proof, because the verifier is no longer trusting the signer’s word; it is checking the signer’s claim.

Why attestation is not the same as a correctness proof

Attestation proves where code ran; a correctness proof proves the output is right, and most “verifiable AI” pitches quietly swap one for the other. Hardware attestation in confidential computing proves that a model ran on a particular trusted environment, on a particular build, with particular protections. That is a real guarantee. It is not a correctness proof.

Attestation answers “did this output come from the expected model, running the way it was supposed to run?” A validity certificate is supposed to answer “is this output correct?” Conflating the two is the easy mistake, because both sound like “trust.” June 2026 commentary on verifiable execution captured the right instinct when it argued that verifiable execution “belongs alongside the control plane, not against it.” Verifiability is a control you layer in, not a property a model grants you. But the control still has to prove something about the output, not just the execution environment.

Mechanism	What it proves	How it fails
Hardware attestation	Code ran on a known build in a protected environment	Proves provenance, not correctness
Validity certificate + predicate	Output satisfies a specific, re-runnable check	Only as good as the predicate
LLM-as-judge	Output probably resembles “good” per a model	Probabilistic; biased by the judge

Which agent tasks can carry a validity proof?

A task can carry a validity proof if and only if “valid” can be written as a predicate a verifier can re-run. The triage test is the single most useful thing a practitioner can take from this entire thread.

That maps cleanly onto a class of agent work. Code that must compile. Code that must pass a given test suite. Outputs that must parse against a schema. Formal specifications with a solver. Type-checked programs. Configuration that must validate against a policy. In each case the predicate is cheap, deterministic, and independent of the model that produced the output. The verifier does not need the model at all; it needs the output and the checker.

It maps just as cleanly onto the work that is not proof-eligible. “Write a good strategy memo.” “Is this analysis correct?” “Did the research answer the user’s real question?” “Is this code change the right change to make?” These have no cheap predicate. “Good,” “correct,” and “right” are judgments, and judgments require either a human or a probabilistic surrogate like an LLM judge or a benchmark score. No signature turns a judgment into a fact.

This is why the framing has to be per-task, not per-agent or per-deployment. A single coding agent produces proof-eligible outputs (the diff compiles, the tests pass) and judgment outputs (the change is worth making) in the same session. The certificate covers the first and is silent on the second.

Proof-carrying output versus LLM-as-judge: where each fails

A validity predicate fails by being too strict; an LLM judge fails by being probabilistic, and those are different diseases. The distinction is an old one. A mathematical proof is the canonical checkable artifact: each step follows from the last by a rule any reader can re-apply, and the result is determined by the proof, not by the person who wrote it. An LLM judge is, by definition, a subjective measure. It produces a score that probably correlates with quality, not a verdict determined by the artifact.

Both can be wrong, but they fail in opposite directions. A validity predicate has false negatives (correct output that trips a too-strict predicate) but no false positives on the thing it checks: if the tests pass, the tests pass. An LLM judge has false positives and false negatives in both directions, because its verdict is a sample from a distribution, not an evaluation of the artifact.

Where this bites hardest is long-horizon agents, where a run produces a long trajectory and a single claim of success. With no checkable artifact, the success claim is the model’s own report, graded by another model. That is two probabilistic systems vouching for each other. The cost of fabricating success is low: the agent reports done, the judge agrees. A certificate backed by a real predicate changes the economics. To fake success, an agent now has to produce output that passes a checker it does not control. That is a much higher bar, and it is the bar that matters.

What the cryptographic substrate looks like

The cryptography is solved; the open questions are all about the predicate. The substrate is the standard one. NIST runs the public collaborations that would underpin any such scheme: digital signature algorithms, post-quantum cryptography, cryptographic hash algorithms, and privacy-enhancing cryptography. A certificate of validity is, mechanically, a digital signature over (output, predicate, result) issued by whatever party is willing to stake its non-repudiation on the claim.

The interesting design questions are not cryptographic. Who defines the predicate? Who runs it? Is it part of the signed claim, so the verifier re-runs the same check, or referenced by hash? The signature scheme is settled; the predicate layer is where most “verifiable AI” proposals will quietly fall apart.

One durability caveat. Any signature scheme chosen today has to survive the transition to the quantum-resistant algorithms NIST is standardizing. A certificate meant to be verified years later is only as durable as the signature scheme it rides on.

The second-order effect: raising the cost of fabricated success

The real consequence is an audit-cost shift: the verifier stops paying to re-derive the work and pays only to re-run a check. Today, confirming that an agent succeeded at a coding task means re-reading the diff or re-running the work yourself, and the verifier pays the full cost of re-derivation. A certificate with a real predicate lets the verifier pay only the cost of the predicate: re-run the tests, re-check the spec. The producer pays the cost of producing a checkable artifact instead of a bare claim.

That is the real value, and it is narrow. It applies to the tasks in the proof-eligible bucket and to nothing else. A checkable artifact is how a claim of success stops being a promise and becomes evidence a third party can re-run.

The honest version of the pitch is not “cryptographic certificates make AI trustworthy.” It is narrower: for the subset of agent tasks where validity is a cheap predicate, certificates let a third party confirm success without re-doing the work or trusting the model. Everywhere else, the model still grades its own homework, and the only honest answer is that you are reviewing probabilistically. The technology does not close that gap. It draws a bright line around where the gap is closable and where it is not.

Frequently Asked Questions

What signature algorithms would a certificate use today to stay verifiable after quantum decryption?

A durability-conscious certificate would ride on NIST’s post-quantum signature standards finalized in 2024: ML-DSA (FIPS 204, lattice-based) for general use and SLH-DSA (FIPS 205, hash-based) for the longest-lived claims, because hash-based signatures rest only on hash-function assumptions and are treated as the conservative bet. Legacy RSA and ECDSA certificates that someone verifies years from now could be forged once a large-scale quantum machine runs Shor’s algorithm.

How does this connect to the regulatory pressure on AI claims?

The push for certificates maps onto the standard regulators and auditors already apply to AI claims, which Perrie Weiner framed in Fortune as requiring them to be ‘technically accurate, operationally supportable, and consistent with the company’s financial results.’ A certificate backed by a re-runnable predicate is one of the few ways to make a claim operationally supportable at the level auditors expect, whereas a benchmark score or a model self-report is not.

Who should define the validity predicate, the agent or the verifier?

The verifier, or a trusted third party, must own the predicate; if the producer picks the check, the economics of fabrication return. An agent that chooses its own test suite and then signs a certificate against it has replaced ‘did the work’ with ‘passed a test it wrote,’ which is the same self-grading failure the certificate was supposed to retire.

Can a certificate of validity be forged?

The signature layer is forge-resistant under standard cryptographic assumptions, but forgery is the wrong threat to worry about. A producer cannot counterfeit a signature, yet it can submit output against a check it wrote to be easy, pass, and walk away with a cryptographically valid certificate for incorrect work. The crypto protects the bytes; it does not protect against a producer who games the predicate instead of the signature.

What happens to old certificates when a signature scheme gets deprecated?

A certificate signed under a deprecated scheme becomes unverifiable once verifiers drop support, so a long-lived claim needs either re-signing under the new algorithm or a hybrid signature that bundles a classical and a post-quantum scheme. This is why the durability question is operational, not theoretical: certificates attesting to software provenance or compliance will need re-issue on a multi-year cadence as algorithms rotate.