RT info:eu-repo/semantics/doctoralThesis T1 Fair Learning: an optimal transport based approach A1 Gordaliza Pastor, Paula A2 Universidad de Valladolid. Facultad de Ciencias K1 Metodología de reparación K1 Variación de Wasserstein K1 12 Matemáticas AB The aim of this thesis is two-fold. On the one hand, optimal transportation methods are studiedfor statistical inference purposes. On the other hand, the recent problem of fair learning isaddressed through the prism of optimal transport theory.The generalization of applications based on machine learning models in the everyday lifeand the professional world has been accompanied by concerns about the ethical issues that mayarise from the adoption of these technologies. In the rst part of the thesis, we motivate thefairness problem by presenting some comprehensive results from the study of the statistical paritycriterion through the analysis of the disparate impact index on the real and well-known AdultIncome dataset. Importantly, we show that trying to make fair machine learning models maybe a particularly challenging task, especially when the training observations contain bias. Thena review of Mathematics for fairness in machine learning is given in a general setting, with somenovel contributions in the analysis of the price for fairness in regression and classi cation. In thelatter, we nish this rst part by recasting the links between fairness and predictability in termsof probability metrics. We analyze repair methods based on mapping conditional distributionsto the Wasserstein barycenter. Finally, we propose a random repair which yields a tradeo between minimal information loss and a certain amount of fairness.The second part is devoted to the asymptotic theory of the empirical transportation cost. Weprovide a Central Limit Theorem for the Monge-Kantorovich distance between two empiricaldistributions with di erent sizes n and m, Wp(Pn;Qm); p 1; for observations on R. Inthe case p > 1 our assumptions are sharp in terms of moments and smoothness. We proveresults dealing with the choice of centering constants. We provide a consistent estimate ofthe asymptotic variance which enables to build two sample tests and con dence intervals tocertify the similarity between two distributions. These are then used to assess a new criterionof data set fairness in classi cation. Additionally, we provide a moderate deviation principlefor the empirical transportation cost in general dimension. Finally, Wasserstein barycentersand variance-like criterion using Wasserstein distance are used in many problems to analyze thehomogeneity of collections of distributions and structural relationships between the observations.We propose the estimation of the quantiles of the empirical process of theWasserstein's variationusing a bootstrap procedure. Then we use these results for statistical inference on a distributionregistration model for general deformation functions. The tests are based on the variance of thedistributions with respect to their Wasserstein's barycenters for which we prove central limittheorems, including bootstrap versions. YR 2020 FD 2020 LK http://uvadoc.uva.es/handle/10324/43392 UL http://uvadoc.uva.es/handle/10324/43392 LA eng NO Departamento de Estadística e Investigación Operativa DS UVaDOC RD 24-nov-2024