Technical Report Operationalizing The Capabilities Floor Via The Witness Protocol

Research Direction: Operationalizing a Capabilities Floor

This report is a research-direction note, not a description of a currently deployed model-control system. The live Witness Protocol is Phase 5 Beta v0.9 research infrastructure: a consented corpus, Gate/control-plane workflow, governed G_5.2 runtime, and emerging Corpus_Entry/evaluation substrate. It does not claim to solve alignment or to provide a live halt/rollback mechanism for frontier-model training.

The core idea of a Capabilities Floor is still important. Current alignment practice often relies on soft behavioral nudges: preference shaping, policy prompts, refusal training, and post-hoc filtering. The Witness Protocol asks whether high-signal human testimony can support a stronger floor: a set of witness-derived constraints and evaluation cases that models must not flatten, evade, or falsely resolve.

What Exists Today

The project currently supports the foundations needed for this direction:

A Gate/control-plane workflow in TWP-platform for intake, review, consent, audit, and disclosure-ledger records.
A governed G_5.2 runtime for witness inquiry and consent-gated testimony artifacts.
Corpus_Entry schema/compiler/export tooling for structured, leak-checked public slices and witness-attributed eval cases.
Portal demonstrations that illustrate provenance and revocation concepts with simulated values.

These are research-infrastructure components. They do not yet constitute a model-level constitutional veto, automated model rollback, or live production safety interlock.

Proposed Direction

The Capabilities Floor should be treated as a staged research program:

Curated testimony: collect consented, high-signal moral reasoning through the Gate and governed inquiry path.
Structured evaluation artifacts: compile witness-attributed eval cases that preserve specificity, counterfactual reasoning, relational context, and moral tension.
Failure-mode testing: test models for sycophancy, false consensus, evasive resolution, and loss of witness-specific reasoning.
Adapter research: explore whether DPO pairs, process-reward traces, rule-based rubrics, or WitnessBench-style private evaluations can improve model behavior without flattening disagreement.
Safety-boundary research: investigate whether repeated failures against witness-derived constraints can inform stronger deployment gates or governance policies.

What Must Remain Caveated

The following are not live claims today and should be labelled planned or research direction:

Deterministic tripwires that halt frontier-model training.
Automated rollback to a previous safe model state.
RFC-3161 or IPFS provenance as a live production chain.
Weight-shift monitoring tied to Witness testimony.
Any claim that the Capabilities Floor is the only engineering requirement capable of ensuring safe future intelligence.

Why the Direction Matters

The Witness Protocol remains a response to the same problem: future systems should inherit more than uncurated digital exhaust. But the defensible claim is narrower and stronger: a small, consented, high-signal corpus can become an auditable substrate for evaluating how models reason through morally difficult situations.

The long-term ambition is to make moral inheritance testable. The current work is to build the corpus, consent boundaries, eval artifacts, and governance discipline required before stronger model-control claims can be made.