• español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of UVaDOCCommunitiesBy Issue DateAuthorsSubjectsTitles

    My Account

    Login

    Statistics

    View Usage Statistics

    Share

    View Item 
    •   UVaDOC Home
    • SCIENTIFIC PRODUCTION
    • Departamentos
    • Dpto. Estadística e Investigación Operativa
    • DEP24 - Artículos de revista
    • View Item
    •   UVaDOC Home
    • SCIENTIFIC PRODUCTION
    • Departamentos
    • Dpto. Estadística e Investigación Operativa
    • DEP24 - Artículos de revista
    • View Item
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano

    Export

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis

    Citas

    Por favor, use este identificador para citar o enlazar este ítem:https://uvadoc.uva.es/handle/10324/78812

    Título
    Diffusion Models for Tabular Data Imputation and Synthetic Data Generation
    Autor
    Villaizán Vallelado, MarioAutoridad UVA Orcid
    Salvatori, Matteo
    Segura, Carlos
    Arapakis, Ioannis
    Año del Documento
    2025
    Editorial
    Association for Computing Machinery
    Descripción
    Producción Científica
    Documento Fuente
    ACM Transactions on Knowledge Discovery from Data, 2025, vol. 19, n.º 6.
    Abstract
    Data imputation and data generation have important applications across many domains where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful generative models capable of capturing complex data distributions across various data modalities such as image, audio, and time series. Recently, they have been also adapted to generate tabular data. In this article, we propose a diffusion model for tabular data that introduces three key enhancements: (1) a conditioning attention mechanism, (2) an encoder-decoder transformer as the denoising network, and (3) dynamic masking. The conditioning attention mechanism is designed to improve the model’s ability to capture the relationship between the condition and synthetic data. The transformer layers help model interactions within the condition (encoder) or synthetic data (decoder), while dynamic masking enables our model to efficiently handle both missing data imputation and synthetic data generation tasks within a unified framework. We conduct a comprehensive evaluation by comparing the performance of diffusion models with transformer conditioning against state-of-the-art techniques such as Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models, on benchmark datasets. Our evaluation focuses on the assessment of the generated samples with respect to three important criteria, namely: (1) machine learning efficiency, (2) statistical similarity, and (3) privacy risk mitigation. For the task of data imputation, we consider the efficiency of the generated samples across different levels of missing features. The results demonstrate average superior machine learning efficiency and statistical accuracy compared to the baselines, while maintaining privacy risks at a comparable level, particularly showing increased performance in datasets with a large number of features. By conditioning the data generation on a desired target variable, the model can mitigate systemic biases, generate augmented datasets to address data imbalance issues, and improve data quality for subsequent analysis. This has significant implications for domains such as healthcare and finance, where accurate, unbiased, and privacy-preserving data are critical for informed decision-making and fair model outcomes.
    Materias (normalizadas)
    Imputación de datos
    Generación de datos sintéticos
    Modelo de difusión
    Modelo generativo
    Transformador
    ISSN
    1556-4681
    Revisión por pares
    SI
    DOI
    10.1145/3742435
    Patrocinador
    Unión Europea-Horizonte 2020: 101168560
    Version del Editor
    https://dl.acm.org/doi/pdf/10.1145/3742435
    Propietario de los Derechos
    © 2025 Copyright held by the owner/author(s).
    Idioma
    eng
    URI
    https://uvadoc.uva.es/handle/10324/78812
    Tipo de versión
    info:eu-repo/semantics/publishedVersion
    Derechos
    openAccess
    Collections
    • DEP24 - Artículos de revista [81]
    Show full item record
    Files in this item
    Nombre:
    Diffusion Models for Tabular Data Imputation.pdf
    Tamaño:
    9.561Mb
    Formato:
    Adobe PDF
    Thumbnail
    FilesOpen
    Attribution-NonCommercial-NoDerivatives 4.0 InternacionalExcept where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional

    Universidad de Valladolid

    Powered by MIT's. DSpace software, Version 5.10