New metric 'prediction churn' highlights ML model instability

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have identified a new metric called "cross-sample prediction churn" to measure the instability of machine learning models in scientific applications. This metric quantifies how predictions change when different subsets of training data are used. Standard techniques like deep ensembles do not reduce this churn, but two data-side methods, K-bootstrap bagging and the proposed twin-bootstrap method, show significant improvements. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new metric to better evaluate the reliability of scientific machine learning models, potentially leading to more robust AI systems in research.

RANK_REASON Academic paper introducing a new metric and methods for scientific machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

Kevin Maik Jablonka

paper
safety

COVERAGE [1]

arXiv cs.LG TIER_1 · Kevin Maik Jablonka · 2026-05-13 17:50

Reducing cross-sample prediction churn in scientific machine learning

Scientific machine learning reports predictive performance. It does not report whether the same prediction would survive a different draw of training data. Across $9$ chemistry benchmarks, two classifiers trained on independent bootstraps of the same training set agree on aggrega…

COVERAGE [1]

Reducing cross-sample prediction churn in scientific machine learning

RELATED TOPICS