Mostrar el registro sencillo del ítem

dc.contributor.authorVillaizán Vallelado, Mario 
dc.contributor.authorSalvatori, Matteo
dc.contributor.authorSegura, Carlos
dc.contributor.authorArapakis, Ioannis
dc.date.accessioned2025-10-20T12:02:20Z
dc.date.available2025-10-20T12:02:20Z
dc.date.issued2025
dc.identifier.citationACM Transactions on Knowledge Discovery from Data, 2025, vol. 19, n.º 6.es
dc.identifier.issn1556-4681es
dc.identifier.urihttps://uvadoc.uva.es/handle/10324/78812
dc.descriptionProducción Científicaes
dc.description.abstractData imputation and data generation have important applications across many domains where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful generative models capable of capturing complex data distributions across various data modalities such as image, audio, and time series. Recently, they have been also adapted to generate tabular data. In this article, we propose a diffusion model for tabular data that introduces three key enhancements: (1) a conditioning attention mechanism, (2) an encoder-decoder transformer as the denoising network, and (3) dynamic masking. The conditioning attention mechanism is designed to improve the model’s ability to capture the relationship between the condition and synthetic data. The transformer layers help model interactions within the condition (encoder) or synthetic data (decoder), while dynamic masking enables our model to efficiently handle both missing data imputation and synthetic data generation tasks within a unified framework. We conduct a comprehensive evaluation by comparing the performance of diffusion models with transformer conditioning against state-of-the-art techniques such as Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models, on benchmark datasets. Our evaluation focuses on the assessment of the generated samples with respect to three important criteria, namely: (1) machine learning efficiency, (2) statistical similarity, and (3) privacy risk mitigation. For the task of data imputation, we consider the efficiency of the generated samples across different levels of missing features. The results demonstrate average superior machine learning efficiency and statistical accuracy compared to the baselines, while maintaining privacy risks at a comparable level, particularly showing increased performance in datasets with a large number of features. By conditioning the data generation on a desired target variable, the model can mitigate systemic biases, generate augmented datasets to address data imbalance issues, and improve data quality for subsequent analysis. This has significant implications for domains such as healthcare and finance, where accurate, unbiased, and privacy-preserving data are critical for informed decision-making and fair model outcomes.es
dc.format.mimetypeapplication/pdfes
dc.language.isoenges
dc.publisherAssociation for Computing Machineryes
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectImputación de datoses
dc.subjectGeneración de datos sintéticoses
dc.subjectModelo de difusiónes
dc.subjectModelo generativoes
dc.subjectTransformadores
dc.titleDiffusion Models for Tabular Data Imputation and Synthetic Data Generationes
dc.typeinfo:eu-repo/semantics/articlees
dc.rights.holder© 2025 Copyright held by the owner/author(s).es
dc.identifier.doi10.1145/3742435es
dc.relation.publisherversionhttps://dl.acm.org/doi/pdf/10.1145/3742435es
dc.identifier.publicationfirstpage1es
dc.identifier.publicationissue6es
dc.identifier.publicationlastpage32es
dc.identifier.publicationtitleACM Transactions on Knowledge Discovery from Dataes
dc.identifier.publicationvolume19es
dc.peerreviewedSIes
dc.description.projectUnión Europea-Horizonte 2020: 101168560es
dc.identifier.essn1556-472Xes
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem