RT info:eu-repo/semantics/doctoralThesis T1 Contributions to data science from an optimal transport perspective A1 Rodríguez Vítores, David A2 Universidad de Valladolid. Escuela de Doctorado K1 Estadística K1 Statistics K1 Estadística K1 Optimal transport K1 Transporte Óptimo K1 12 Matemáticas AB The theory of optimal transport originated in the 18th century with Monge’s problem, which consists of moving mass from one location to another while minimizing the transportation cost. Kantorovich’s reformulation in the 20th century, by allowing mass splitting, transformed the problem into a convex one and guaranteed the existence of solutions. This advance led to the definition of the Wasserstein distance, a metric between probability distributions with an intuitive geometric interpretation and strong mathematical properties, which sparked significant interest within the scientific community. While early research focused on optimal transport as a rigorous mathematical tool, particularly due to its connections with weak convergence, its applications have since expanded to numerous fields. More recently, optimal transport has gained growing importance in data science, with applications including generative models, domain adaptation, and algorithmic fairness. In this new paradigm, interest in optimal transport and the Wasserstein distance has expanded well beyond its original theoretical scope, while also posing new mathematical challenges, which have helped maintain optimal transport as one of the most active and promising areas of contemporary research. This thesis is situated within this rich and evolving context and may be viewed as a collection of contributions to both the theoretical foundations of optimal transport and related distributional problems. The first contribution is an improved central limit theorem for the sliced Wasserstein distance. The classical Wasserstein distance presents significant computational and statistical challenges in high dimensions, which, for instance, hinder the formulation of a central limit theorem in arbitrary dimensions. The sliced Wasserstein distance, defined via one-dimensional projections, circumvents these issues. Existing asymptotic results typically require compact support. This thesis establishes a new central limit theorem based on the Efron-Stein inequality, which holds without assuming compact support. The second contribution addresses the growing importance of privacy in data-driven applications. We develop a framework for private learning with sliced Wasserstein gradients. Although Wasserstein losses do not enjoy the typical finite-sum structure, we show that its gradient admits a favorable decomposition that enables private optimization with rigorous differential privacy guarantees, applicable to tasks such as fairness-aware training and sliced Wasserstein autoencoders. The third contribution addresses model selection in Gaussian mixture models. Classical parsimonious approaches to covariance matrix estimation often impose overly restrictive assumptions or require a large number of parameters. We propose a novel method that classifies covariance matrices into similarity groups based on likelihood criteria. This results in intermediate models that provide greater flexibility, interpretability, and improved statistical performance in clustering and classification tasks. Finally, the last contribution investigates Wasserstein barycenters of singular Gaussian distributions. While barycenters for non-singular Gaussians are well understood and can be computed efficiently, the singular case continues to pose significant challenges. In this context, we present novel results on the characterization of optimal transport maps, the optimality conditions for barycenters, and the convergence of first-order optimization methods. YR 2025 FD 2025 LK https://uvadoc.uva.es/handle/10324/80838 UL https://uvadoc.uva.es/handle/10324/80838 LA eng NO Escuela de Doctorado DS UVaDOC RD 12-ene-2026