• español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of UVaDOCCommunitiesBy Issue DateAuthorsSubjectsTitles

    My Account

    Login

    Statistics

    View Usage Statistics

    Share

    View Item 
    •   UVaDOC Home
    • SCIENTIFIC PRODUCTION
    • Departamentos
    • Dpto. Teoría de la Señal y Comunicaciones e Ingeniería Telemática
    • DEP71 - Artículos de revista
    • View Item
    •   UVaDOC Home
    • SCIENTIFIC PRODUCTION
    • Departamentos
    • Dpto. Teoría de la Señal y Comunicaciones e Ingeniería Telemática
    • DEP71 - Artículos de revista
    • View Item
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano

    Export

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis

    Citas

    Por favor, use este identificador para citar o enlazar este ítem:https://uvadoc.uva.es/handle/10324/67944

    Título
    A MapReduce opinion mining for COVID-19-related tweets classification using enhanced ID3 decision tree classifier
    Autor
    Es-Sabery, Fatima
    Es-Sabery, Khadija
    Qadir, Junaid
    Sainz de Abajo, BeatrizAutoridad UVA Orcid
    Hair, Abdellatif
    García Zapirain, Begoña
    Torre Díez, Isabel de laAutoridad UVA Orcid
    Año del Documento
    2021
    Editorial
    IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC.
    Descripción
    Producción Científica
    Documento Fuente
    IEEE Access, Abril 2021, vol. 9, p. 58706-58739.
    Abstract
    Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in online social media platforms has picked the interest of thousands of scientific researchers. Because the reviews, tweets and blogs acquired from these social media networks, act as a significant source for enhancing the decision making process. The obtained textual data (reviews, tweets, or blogs) are classified into three different class labels which are negative, neutral and positive for analyzing and extracting relevant information from the given dataset. In this contribution, we introduce an innovative MapReduce improved weighted ID3 decision tree classification approach for OM, which consists mainly of three aspects: Firstly We have used several feature extractors to efficiently detect and capture the relevant data from the given tweets, including N-grams or character-level, Bag-Of-Words, word embedding (GloVe, Word2Vec), FastText, and TF-IDF. Secondly, we have applied a multiple feature selector to reduce the high feature’s dimensionality, including Chi-square, Gain Ratio, Information Gain, and Gini Index. Finally, we have employed the obtained features to carry out the classification task using an improved ID3 decision tree classifier, which aims to calculate the weighted information gain instead of information gain used in traditional ID3. In other words, to measure the weighted information gain for the current conditioned feature, we follow two steps: First, we compute the weighted correlation function of the current conditioned feature. Second, we multiply the obtained weighted correlation function by the information gain of this current conditioned feature. This work is implemented in a distributed environment using the Hadoop framework, with its programming framework MapReduce and its distributed file system HDFS. Its primary goal is to enhance the performance of a well-known ID3 classifier in terms of accuracy, execution time, and ability to handle the massive datasets. We have carried out several experiences that aims to assess the effectiveness of our suggested classifier compared to some other contributions chosen from the literature. The experimental results demonstrated that our ID3 classifier works better on COVID-19_Sentiments dataset than other classifiers in terms of Recall (85.72 %), specificity (86.51 %), error rate (11.18 %), false-positive rate (13.49 %), execution time (15.95s), kappa statistic (87.69 %), F1-score (85.54 %), classification rate (88.82 %), false-negative rate (14.28 %), precision rate (86.67 %), convergence (it convergent towards the iteration 90), stability (it is more stable with mean deviation standard equal to 0.12 %), and complexity (it requires much lower time and space computational complexity).
    Palabras Clave
    Big data
    Opinion Mining
    ISSN
    2169-3536
    Revisión por pares
    SI
    DOI
    10.1109/ACCESS.2021.3073215
    Patrocinador
    Este trabajo ha sido financiado a través de la subvención IT 905-16, del eVIDA Research Group de la Universidad de Deusto.
    Version del Editor
    https://ieeexplore.ieee.org/document/9404185
    Propietario de los Derechos
    "© Todos los derechos reservados". Propietario de los derechos: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC.
    Idioma
    eng
    URI
    https://uvadoc.uva.es/handle/10324/67944
    Tipo de versión
    info:eu-repo/semantics/publishedVersion
    Derechos
    openAccess
    Collections
    • DEP71 - Artículos de revista [358]
    Show full item record
    Files in this item
    Nombre:
    A_MapReduce OM.pdf
    Tamaño:
    3.921Mb
    Formato:
    Adobe PDF
    Descripción:
    Artículo principal
    Thumbnail
    FilesOpen
    Attribution-NonCommercial-NoDerivatives 4.0 InternacionalExcept where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional

    Universidad de Valladolid

    Powered by MIT's. DSpace software, Version 5.10