Real significance in large datasets: the similarity structure, a general method to assess the effect size beyond p-value

Diez Hermano, Sergio; Sánchez Jiménez, Abel; Aparicio Rodríguez, Gonazalo; Manubens, Paloma; Calvo Tapia, Carlos; Villacorta Atienza, José Antonio

Por favor, use este identificador para citar o enlazar este ítem:https://uvadoc.uva.es/handle/10324/84286

Título

Real significance in large datasets: the similarity structure, a general method to assess the effect size beyond p-value

Autor

Diez Hermano, Sergio

Sánchez Jiménez, Abel

Aparicio Rodríguez, Gonazalo

Manubens, Paloma

Calvo Tapia, Carlos

Villacorta Atienza, José Antonio

Año del Documento

2026

Editorial

PNAS

Descripción

Producción Científica

Documento Fuente

PNAS Nexus (en revisión, preprint)

Resumen

Statistical inference often relies on p-values to test the absence of effects, but in large datasets, even negligible differences become significant. Evaluating the practical, biological, or clinical magnitude of effects therefore becomes essential. Standardized measures, like Cohen’s d, enable broad comparisons but can obscure practical meaning, like dimensional metrics, such as confidence intervals, which in addition lack standardization. We introduce the similarity structure, a general framework that estimates the probability distribution of sizes of similar subsamples. This method is applicable to any data type, dimensionality or statistical test, and quantifies the effect size as the expected number of observations under similarity, providing a practical interpretation in terms of experimental effort, and extends Cohen’s d effect sizes to parameters beyond the mean. The method also enables the comparison of effect sizes –within the same study or across studies–through direct statistical inference, clarifying the relevance of observed differences. The similarity structure, applied here to real datasets, offers a general, transparent and versatile approach for interpreting and comparing effects in large-sample studies.

Revisión por pares