Proof Gates

Sycophancy-Resistant Self-Verification via Agent-Authored Postconditions. A discipline in which an LLM agent writes deterministic postcondition checks per task, runs them against its own work product, and quotes the observable output as proof.

What the paper argues

A proof gate is deterministic verification code that an LLM agent writes per task, runs against its own work product, and quotes as proof, with hard fix-before-pass-through semantics at a workflow boundary. The agent does not merely run a pre-existing test, lint, or hook (those are operator-authored sibling patterns); it writes the specific verification snippet for the work in front of it.

The paper develops the dual-failure-mode argument (sycophancy plus overtraining/undertraining distribution skew), the fabrication-versus-gaming distinction that scopes where the discipline applies, and a six-property positioning matrix that locates proof gates among adjacent verification regimes such as pre-commit linters, lifecycle hooks, and evaluator commands.

Reproducibility artifacts

  • Pattern library: proof-gate-patterns is the central reproducibility artifact. It documents the discipline through annotated prompt fragments, three canonical pattern types (grep, quote, read), worked examples from production, and operator-facing background docs.
  • Operator-authored sibling: lint-identity is a pre-commit linter that validates identity references against a single source of truth. It sits in the operator-authored quadrant of the positioning matrix, on the same observable-output-as-proof axis.

Cite the paper

The preprint is forthcoming; the reproducibility artifacts are published on GitHub now. Until an arXiv identifier is assigned, cite:

Whittaker, B. (2026). Proof Gates: Sycophancy-Resistant
Self-Verification via Agent-Authored Postconditions.
Preprint forthcoming. Artifacts:
https://github.com/rrmadmin/proof-gate-patterns