<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-05T18:26:53Z</responseDate><request verb="GetRecord" identifier="oai:uvadoc.uva.es:10324/80150" metadataPrefix="marc">https://uvadoc.uva.es/oai/request</request><GetRecord><record><header><identifier>oai:uvadoc.uva.es:10324/80150</identifier><datestamp>2025-12-15T09:25:05Z</datestamp><setSpec>com_10324_1191</setSpec><setSpec>com_10324_931</setSpec><setSpec>com_10324_894</setSpec><setSpec>col_10324_1379</setSpec></header><metadata><record xmlns="http://www.loc.gov/MARC21/slim" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dcterms="http://purl.org/dc/terms/" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<leader>00925njm 22002777a 4500</leader>
<datafield tag="042" ind1=" " ind2=" ">
<subfield code="a">dc</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" ">
<subfield code="a">Chaves Villota, Andrea</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" ">
<subfield code="a">Jiménez Martín, Ana</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" ">
<subfield code="a">Jojoa Acosta, Mario Fernando</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" ">
<subfield code="a">Bahillo Martínez, Alfonso</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="720" ind1=" " ind2=" ">
<subfield code="a">García Domínguez, Juan Jesús</subfield>
<subfield code="e">author</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2026</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Emotion Recognition (ER) has gained significant attention due to its importance in advanced human-machine interaction and its widespread real-world applications. In recent years, research&#xd;
on ER systems has focused on multiple key aspects, including the development of high-quality&#xd;
emotional databases, the selection of robust feature representations, and the implementation&#xd;
of advanced classifiers leveraging AI-based techniques. Despite this progress in research, ER&#xd;
still faces significant challenges and gaps that must be addressed to develop accurate and&#xd;
reliable systems. To systematically assess these critical aspects, particularly those centered on&#xd;
AI-based techniques, we employed the PRISMA methodology. Thus, we include journal and&#xd;
conference papers that provide essential insights into key parameters required for dataset&#xd;
development, involving emotion modeling (categorical or dimensional), the type of speech&#xd;
data (natural, acted, or elicited), the most common modalities integrated with acoustic and&#xd;
linguistic data from speech and the technologies used. Similarly, following this methodology,&#xd;
we identified the key representative features that serve as critical emotional information sources&#xd;
in both modalities. For acoustic, this included those extracted from the time and frequency&#xd;
domains, while for linguistic, earlier embeddings and the most common transformer models&#xd;
were considered. In addition, Deep Learning (DL) and attention-based methods were analyzed&#xd;
for both. Given the importance of effectively combining these diverse features for improving ER,&#xd;
we then explore fusion techniques based on the level of abstraction. Specifically, we focus on&#xd;
traditional approaches, including feature-, decision-, DL-, and attention-based fusion methods.&#xd;
Next, we provide a comparative analysis to assess the performance of the approaches included&#xd;
in our study. Our findings indicate that for the most commonly used datasets in the literature:&#xd;
IEMOCAP and MELD, the integration of acoustic and linguistic features reached a weighted&#xd;
accuracy (WA) of 85.71% and 63.80%, respectively. Finally, we discuss the main challenges&#xd;
and propose future guidelines that could enhance the performance of ER systems using acoustic&#xd;
and linguistic features from speech.</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">Computer Speech &amp; Language Volume, 2026, vol. 96, p. 101873</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">0885-2308</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">https://uvadoc.uva.es/handle/10324/80150</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">10.1016/j.csl.2025.101873</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">101873</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">Computer Speech &amp; Language</subfield>
</datafield>
<datafield tag="024" ind2=" " ind1="8">
<subfield code="a">96</subfield>
</datafield>
<datafield tag="245" ind1="0" ind2="0">
<subfield code="a">Deep feature representations and fusion strategies for speech emotion recognition from acoustic and linguistic modalities: A systematic review</subfield>
</datafield>
</record></metadata></record></GetRecord></OAI-PMH>