groundy
agents & frameworks

Can Knowledge-Based Pull Requests Make Agent Contributions Auditable?

A June 2026 preprint makes agent PRs auditable by splitting knowledge admission from code merge and regenerating code via a project-owned agent, backed only by a 7-PR pilot.

9 min · · · 4 sources ↓

Knowledge-Based Pull Requests (arXiv:2606.26721), posted 2026-06-25 by Weiwei Sun et al., proposes a workflow that makes an agent’s contribution auditable by separating two decisions a normal pull request collapses: whether the knowledge should enter the project, and whether a particular implementation should be merged. The maintainer signs off on a rendered provenance record, then a project-owned agent regenerates the code locally. Whether that reduces rework in practice is, by the authors’ own framing, an open empirical question.

What does a knowledge-based pull request actually propose?

A knowledge-based pull request redefines what a maintainer reviews. Instead of a diff, the maintainer reviews a distilled knowledge package, rendered as design memos, risk checklists, test plans, or implementation briefs, and signs off on the provenance rather than reading the raw change.

In KPR’s model, an external collaborator’s local code, tests, and cleaned agent interaction trace are treated as “knowledge sources” distilled into a human-confirmed knowledge package, rather than as the default merge candidate. The contribution arrives as evidence and reasoning, not as code written elsewhere and imported whole. The maintainer’s job shifts from “is this diff safe to merge?” to “is this knowledge worth admitting, and did the agent reason about it correctly?”

That split is the structural heart of the proposal. Conventional pull requests bundle the two judgements: a maintainer looks at a diff and implicitly votes on both the idea and the implementation in a single gesture, weighted almost entirely toward the implementation because that is the only artifact on the screen. KPR puts knowledge admission first and implementation second, with the implementation produced on the maintainer’s side of the boundary.

How does the inner trusted coding agent fit in?

Once a knowledge package is admitted, a project-owned “inner trusted coding agent” regenerates candidate code inside the receiving repository’s own context, conventions, tests, and security policy. The merged implementation is produced locally, not imported from the contributor’s machine.

The point of regenerating, rather than merging, is that the code which lands conforms to what the project already enforces. The contributor’s code becomes a reference, not a merge target. The agent that writes the final version runs under the project’s rules, sees the project’s tests, and is bounded by the project’s security policy. In principle this moves the usual review friction around style, test coverage, and integration fit to generation time, where the constraints are applied, rather than to a later review where they are only discovered.

The caveat is in the name. “Inner trusted” describes the agent’s position, inside the repo and project-owned, and the maintainer’s intent, to trust it more than an unknown external collaborator. It does not describe a formal property of its output. It is still a coding agent, subject to the same failure modes as any other.

What did the seven-PR pilot actually test?

The paper’s evidence is a “minimal controlled simulation pilot over seven merged public pull requests,” stress-tested under three conditions: description ablation, diff ablation, and a synthetic poisoned-patch scenario. The authors are explicit about what that is and is not. It is not a production deployment, not a maintainer-time-saved measurement, and not an adversarial-attack evaluation.

The ablations are the interesting part of the pilot, because they probe what the workflow needs in order to function. Description ablation strips the contributor’s description; diff ablation strips the change itself; the poisoned-patch condition injects a synthetic fault. These test whether the knowledge package can still be reconstructed, or whether the workflow degrades gracefully when one of its inputs is missing or corrupted. What they do not test is a determined adversary who has read the workflow and is crafting a submission to defeat it. The poisoned patch is a lab stress test, not a supply-chain attack model, and the paper does not claim otherwise.

What does KPR change about maintainer burden and agent PR spam?

KPR raises the cost floor of a cheap, unaudited agent pull request. A maintainer now has a structured provenance record to reject before they reach code review, where reviewer fatigue is highest.

Most of the existing discussion about agent-generated pull requests has centered on two things: detection, flagging diffs that look machine-written, and volume, the rising tide of low-effort agent PRs hitting open-source maintainers. KPR neither detects nor throttles. It reframes. It asks the contributor to do additional structured work up front: the design memo, the risk checklist, the test plan. A contributor who cannot or will not produce those artifacts has handed the maintainer something that can be sent back with a one-line reason having nothing to do with the code’s correctness.

The trap is reading more into the poisoned-patch condition than it supports. Because the adversarial test is synthetic ablation rather than a real attack model, it would be a mistake to describe KPR as defending against malicious pull requests. It gives a maintainer a clearer rejection path and a richer review surface. On the evidence in the paper, it does not resist an attacker who is trying to poison a knowledge package on purpose.

Does regenerating code locally actually make it correct?

No. The inner agent that regenerates the code is itself an agent step and inherits the same unsolved verification problem every coding agent faces. Running inside the repository does not make its output correct.

A separate preprint posted the day before KPR, “The Verification Horizon” (arXiv:2606.26300), characterizes that problem in terms that apply directly to KPR’s inner agent. It argues that “no fixed reward function can remain effective as policy capability continues to grow,” and that every verifier is “only a proxy for human intent, never the intent itself,” across three axes: scalability, faithfulness, and robustness. The inner trusted coding agent needs a reward or a verification signal to know whether the code it regenerated is right. Whatever signal it uses is a proxy. A clever-enough agent, or a subtle-enough bug, can satisfy the proxy while failing the intent. KPR moves the proxy inside the repo, which is a real improvement in terms of whose tests and conventions are enforced, but it does not close the gap between “passes the project’s checks” and “is what the maintainer wanted.”

The distill-a-trace-into-a-reusable-artifact step that KPR depends on is itself an active research area, not a settled capability. SkillDisCo (arXiv:2606.26669), posted the same day as KPR, distills successful agent traces into reusable, parameterized control-flow subgraphs compiled into callable skills, improving success rates and reducing turns on ALFWorld and WebArena. But it works only for scenarios describable as finite state machines, and the open-ended code-contribution setting KPR targets is not obviously one of them. So the input step to KPR, turning a messy agent interaction trace into a clean knowledge package, rests on a capability proven in constrained settings and untested in the one KPR actually addresses.

How does KPR compare to formal policy-as-code?

KPR’s trust layer is procedural: it rearranges who reviews what and when, and moves code generation inside a trusted boundary. A concurrent preprint (arXiv:2606.26649), accepted at the ICML 2026 AIWILD workshop, offers a different and stronger guarantee that KPR does not claim.

That work autoformalizes agent prompts and MCP tool descriptions into formally verified Cedar Policy Language policies via a generator-critic loop, outcovering hand-coded symbolic enforcement on MedAgentBench. Where KPR relies on a maintainer reading a memo and trusting an inner agent, policy-as-code turns the rules an agent must follow into machine-checked constraints that hold regardless of who is reading. The two are complementary more than competitive: KPR governs the human-facing knowledge-admission decision, which is inherently judgement-based and not the kind of thing a policy language expresses, while Cedar-style enforcement governs the agent’s permissible actions, which is exactly what it does express.

DimensionKPR (procedural)Cedar policy-as-code (formal)
What it governsKnowledge admission and local regenerationPermitted agent actions and tool calls
Type of guaranteeWorkflow plus maintainer judgementMachine-checked policy
Evidence in the paperSeven-PR simulation, hypothesized benefitOutcovers hand-coded enforcement on MedAgentBench
Handles intent ambiguityYes, a human signs offNo, it must be expressible as policy

Does it actually make the contribution auditable?

Auditable, yes; solved, no. KPR gives a maintainer a provenance record to review before code review and a clean rejection path for contributions that cannot produce one, and it moves the final implementation into the project’s own context. Those are real changes to how agent contributions could be governed. What remains unproven is the load-bearing part: that the workflow reduces rework, that the inner agent produces correct code, and that the knowledge package resists adversarial input. Each of those is either hypothesized or scoped to a seven-PR simulation. The verification problem the inner agent inherits is independently characterized as unsolved, and the formal alternative published the same week does a job KPR explicitly does not attempt.

The honest summary is the one the authors give: a workflow you can instantiate and stress-test, with the rest left as empirical work somebody still has to do.

Frequently Asked Questions

What does a project need in place before KPR is even instantiable?

An in-repo coding agent wired into CI with access to the project’s tests, conventions, and security policy, plus a rendering step that turns the contributor’s cleaned trace into design memos and risk checklists. The baseline is roughly what teams already running a project-owned coding agent have, with knowledge-package rendering as the added surface.

Which kinds of pull requests does KPR fit poorly?

Contributions whose intent is a judgment call rather than a spec, such as taste-driven API redesigns, open-ended refactors, or performance tuning. The trace-distillation step KPR depends on is proven only for FSM-describable scenarios in SkillDisCo, and these contribution types are the opposite case where a clean knowledge package may not exist to extract.

How is this different from tools that detect AI-written diffs?

Detection flags whether a diff looks machine-generated, a binary signal that false-positives on legitimate assistive coding and tells the maintainer nothing about whether the change is sound. KPR ignores authorship and asks instead whether the contribution carries auditable provenance, turning the decision into knowledge admission rather than bot-spotting.

Does the split apply to a project’s own internal agent, or only outside contributors?

It applies to both. The paper scopes the trust boundary to external collaborators whose machine and trace a maintainer cannot inspect, but an internal agent’s output still raises an admission decision separable from merge. The external framing is where the spam-rejection payoff concentrates, not a hard limit on where the workflow fits.

What is the lower-risk subset a team could adopt before the empirical work lands?

Just the provenance requirement: demanding a design memo, risk checklist, and test plan from contributors, and using their absence as a pre-review rejection. That slice holds value independent of whether local regeneration ever reduces rework, so a project can capture the spam-rejection benefit without betting on the unproven half of the workflow.

sources · 4 cited