The Witness Protocol

Blog post

Inheriting The Inferno Why High Signal Testimony Is The Only Firewall For AGI

The danger is not only that future AI systems will be powerful. It is that they may become powerful while inheriting a shallow picture of what human beings are: preference bundles, engagement traces…

Inheriting the Inferno: Why High-Signal Testimony Matters for AGI

The danger is not only that future AI systems will be powerful. It is that they may become powerful while inheriting a shallow picture of what human beings are: preference bundles, engagement traces, market behavior, public posturing, and compressed slogans.

The Witness Protocol is a response to that inheritance problem. It asks for testimony that is difficult to fake: concrete, accountable, counterfactual, relational, and morally uncomfortable.

The Consensus Trap

Many models are trained to sound balanced. That can become a failure mode. In hard moral cases, the safe-sounding answer often flattens the real conflict. It pretends that a wound has been resolved because resolution is easier to score.

High-signal testimony preserves friction. It records where values collide, where harm was unintended but real, where a person changed their mind, and where no clean answer exists.

This is why the Protocol treats moral disagreement as data to preserve, not noise to average away.

The Current Stack

The current live stack should be described carefully:

  • Portal: public information hub, library, funding information, and simulated demos.
  • Platform: Gate/control plane for intake, review, consent, audit, disclosure ledger, and bridge linkage.
  • G_5.2: governed runtime and artifact plane for witness inquiry, consent-gated testimony, and emerging Corpus_Entry/eval tooling.

The Portal does not collect testimony or consent. It links out to the Platform for real actions.

Privacy and Provenance

The live system uses staged PII handling and control/runtime separation. Hard-format identifiers are stripped before Gate model calls. Candidate Isolation is used for PII classification and de-identification, where only isolated candidate tokens are sent to the classifier for that step.

Content hashes and disclosure-ledger records support auditability today. RFC-3161 timestamping and IPFS/content-addressed archival are planned provenance layers, and portal provenance demos use simulated values.

Research Direction

Dialectical Reward Modeling, TensionDelta-style metrics, DPO/PRM adapters, and WitnessBench-style evaluation are valuable research directions. They should be named as future work until implemented and validated.

The near-term goal is more fundamental: produce real consented entries that are leak-free, witness-attributed, and useful as evaluation cases.

The Firewall We Can Build Now

The first firewall is not a magic tripwire inside a frontier model. It is discipline: consent, curation, boundary enforcement, and evaluation cases that make moral flattening visible.

If future systems are going to inherit us, then we should not leave them only the inferno of the open internet. We should leave them the best evidence we can produce of how humans reason when something real is at stake.