<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-27T19:45:38Z</responseDate><request verb="GetRecord" identifier="oai:uvadoc.uva.es:10324/80150" metadataPrefix="edm">https://uvadoc.uva.es/oai/request</request><GetRecord><record><header><identifier>oai:uvadoc.uva.es:10324/80150</identifier><datestamp>2025-12-15T09:25:05Z</datestamp><setSpec>com_10324_1191</setSpec><setSpec>com_10324_931</setSpec><setSpec>com_10324_894</setSpec><setSpec>col_10324_1379</setSpec></header><metadata><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ore="http://www.openarchives.org/ore/terms/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:edm="http://www.europeana.eu/schemas/edm/" xsi:schemaLocation="http://www.w3.org/1999/02/22-rdf-syntax-ns# http://www.europeana.eu/schemas/edm/EDM.xsd">
<edm:ProvidedCHO rdf:about="https://uvadoc.uva.es/handle/10324/80150">
<dc:creator>Chaves Villota, Andrea</dc:creator>
<dc:creator>Jiménez Martín, Ana</dc:creator>
<dc:creator>Jojoa Acosta, Mario Fernando</dc:creator>
<dc:creator>Bahillo Martínez, Alfonso</dc:creator>
<dc:creator>García Domínguez, Juan Jesús</dc:creator>
<dc:date>2026</dc:date>
<dc:description>Producción Científica</dc:description>
<dc:description>Emotion Recognition (ER) has gained significant attention due to its importance in advanced human-machine interaction and its widespread real-world applications. In recent years, research&#xd;
on ER systems has focused on multiple key aspects, including the development of high-quality&#xd;
emotional databases, the selection of robust feature representations, and the implementation&#xd;
of advanced classifiers leveraging AI-based techniques. Despite this progress in research, ER&#xd;
still faces significant challenges and gaps that must be addressed to develop accurate and&#xd;
reliable systems. To systematically assess these critical aspects, particularly those centered on&#xd;
AI-based techniques, we employed the PRISMA methodology. Thus, we include journal and&#xd;
conference papers that provide essential insights into key parameters required for dataset&#xd;
development, involving emotion modeling (categorical or dimensional), the type of speech&#xd;
data (natural, acted, or elicited), the most common modalities integrated with acoustic and&#xd;
linguistic data from speech and the technologies used. Similarly, following this methodology,&#xd;
we identified the key representative features that serve as critical emotional information sources&#xd;
in both modalities. For acoustic, this included those extracted from the time and frequency&#xd;
domains, while for linguistic, earlier embeddings and the most common transformer models&#xd;
were considered. In addition, Deep Learning (DL) and attention-based methods were analyzed&#xd;
for both. Given the importance of effectively combining these diverse features for improving ER,&#xd;
we then explore fusion techniques based on the level of abstraction. Specifically, we focus on&#xd;
traditional approaches, including feature-, decision-, DL-, and attention-based fusion methods.&#xd;
Next, we provide a comparative analysis to assess the performance of the approaches included&#xd;
in our study. Our findings indicate that for the most commonly used datasets in the literature:&#xd;
IEMOCAP and MELD, the integration of acoustic and linguistic features reached a weighted&#xd;
accuracy (WA) of 85.71% and 63.80%, respectively. Finally, we discuss the main challenges&#xd;
and propose future guidelines that could enhance the performance of ER systems using acoustic&#xd;
and linguistic features from speech.</dc:description>
<dc:format>application/pdf</dc:format>
<dc:identifier>https://uvadoc.uva.es/handle/10324/80150</dc:identifier>
<dc:language>eng</dc:language>
<dc:publisher>Elsevier Ltd.</dc:publisher>
<dc:title>Deep feature representations and fusion strategies for speech emotion recognition from acoustic and linguistic modalities: A systematic review</dc:title>
<dc:type>info:eu-repo/semantics/article</dc:type>
<edm:type>TEXT</edm:type>
</edm:ProvidedCHO>
<ore:Aggregation rdf:about="https://uvadoc.uva.es/handle/10324/80150#aggregation">
<edm:aggregatedCHO rdf:resource="https://uvadoc.uva.es/handle/10324/80150"/>
<edm:dataProvider>UVaDOC. Repositorio Documental de la Universidad de Valladolid</edm:dataProvider>
<edm:isShownAt rdf:resource="https://uvadoc.uva.es/handle/10324/80150"/>
<edm:isShownBy rdf:resource="https://uvadoc.uva.es/bitstream/10324/80150/1/Deep%20feature%20representations%20and%20fusion%20strategies%20for%20speech%20emotion%20recognition%20from%20acoustic%20and%20linguistic%20modalities%20A%20systematic%20review.pdf"/>
<edm:provider>Hispana</edm:provider>
<edm:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
</ore:Aggregation>
<edm:WebResource rdf:about="https://uvadoc.uva.es/bitstream/10324/80150/1/Deep%20feature%20representations%20and%20fusion%20strategies%20for%20speech%20emotion%20recognition%20from%20acoustic%20and%20linguistic%20modalities%20A%20systematic%20review.pdf">
<edm:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
</edm:WebResource>
</rdf:RDF></metadata></record></GetRecord></OAI-PMH>