Loopzero

A falsifiable benchmark for collapse-warning claims in recursive systems.

A matched-false-positive benchmark, evaluated on frozen public market and recommender data, with the claim boundary specified in Lean 4.

An independent research program led by David Mullett.
Preprint · June 2026

The program

Recursive systems — markets and recommenders in the current benchmarks, with model-training loops as a secondary directional check — can enter a regime where local updates stay coherent while real forward progress becomes inaccessible. Loopzero asks one bounded question: does that regime leave an observable footprint before overt failure, and can a warning built on that footprint survive a fair test?

This sits in the lineage of early-warning signals for critical transitions — the variance, autocorrelation, and critical-slowing-down precursors developed by Scheffer, Dakos, and others — but starts from a different object: a formally specified no-progress obstruction rather than a statistical precursor, and holds it to the matched-false-positive discipline that the same literature's skeptics have rightly demanded.

The claim boundary is specified in Lean 4. The bridge from that formal object to real telemetry is treated as a conditional, falsifiable empirical question — not a verified one. Motivation and result are kept in separate registers throughout.

Claim ladder

  • Verified Formal obstruction A minimal no-progress obstruction over preorders, machine-checked in Lean 4 (collapse_via_progresscycle_public). Intentionally elementary — its job is to fix the claim boundary, not to contribute to formal methods.
  • Stated Conditional telemetry bridge An assumed measurement map from system state to three observable witnesses — gain (G), recursive persistence (p), diversity (δ). Stated as an assumption, tested empirically, never claimed as verified.
  • Protocol Matched-false-positive contract Detectors are compared at a single locked alert budget — the equal-false-positive band [0.03, 0.07] — rather than at convenient family-specific thresholds.
  • Tested Benchmark outcomes Frozen public benchmarks in two domains. Outcome, including non-acceptance, reported in full. See below.

The witnesses

G
Gain
Perturbation is amplified rather than damped.
p
Persistence
Recent internal state increasingly determines what comes next.
δ
Diversity
The accessible range of trajectories contracts.

The result

Reported in full

Across both flagship benchmarks — a segmented markets family (Volmageddon 2018, COVID 2020) and a MovieLens-25M recommender replay — no tested detector reached an accepted operating point under the locked band [0.03, 0.07].

That includes the standard early-warning comparators (variance EWS, AC1, CUSUM, Page-Hinkley, matrix profile, permutation entropy) and Loopzero's own pre-registered detector. What survives is the directional G/p/δ bridge and a falsifiable benchmark framework. Binary operational-detector acceptance remains open. A digitized LLM training-loop trajectory (Shumailov et al., 2024) is reported only as a directional-consistency check — not as a third benchmark.

What comes next

Non-acceptance is the starting condition.

The next round hardens the conditions the result was measured under: external replication on independent data, harder negative controls, a wider set of comparators, and additional domains — including a matched-false-positive benchmark for model collapse. The aim is to make the result easier to break, not easier to believe.

That work is open. Loopzero is recruiting coauthors and technical reviewers — formal-methods contributors to extend the Lean claim boundary, recommender-systems and market-microstructure researchers to pressure-test the empirical bridge, and early-warning-signal researchers working the same problem from other directions. The benchmark artifacts, code, frozen outputs, and reproducibility anchors are public where licensing allows, so the critique can be specific.

What comes after this paper depends on what survives it.