Depression classification from tweets using small deep transfer learning Language Models

Rizwan, Muhammad; Mushtaq, Muhammad Faheem; Akram, Urooj; Mehmood, Arif; Ashraf, Imran; Sahelices Fernández, Benjamín

doi:10.1109/ACCESS.2022.3223049

Por favor, use este identificador para citar o enlazar este ítem:https://uvadoc.uva.es/handle/10324/65895

Título

Depression classification from tweets using small deep transfer learning Language Models

Autor

Rizwan, Muhammad

Mushtaq, Muhammad Faheem

Akram, Urooj

Mehmood, Arif

Ashraf, Imran

Sahelices Fernández, Benjamín

Año del Documento

2022

Editorial

Institute of Electrical and Electronics Engineers

Documento Fuente

Vol. 10, pp. 129176-129189

Abstract

Depression detection from social media texts such as Tweets or Facebook comments could be very beneficial as early detection of depression may even avoid extreme consequences of long-term depression i.e. suicide. In this study, depression intensity classification is performed using a labeled Twitter dataset. Further, this study makes a detailed performance evaluation of four transformer-based pre-trained small language models, particularly those having less than 15 million tunable parameters i.e. Electra Small Generator (ESG), Electra Small Discriminator (ESD), XtremeDistil-L6 (XDL) and Albert Base V2 (ABV) for classification of depression intensity using Tweets. The models are fine-tuned to get the best performance by applying different hyperparameters. The models are tested by classification of depression intensity of labeled tweets for three label classes i.e. ‘severe’, ‘moderate’, and ‘mild’ by downstream fine-tuning the parameters. Evaluation metrics such as accuracy, F1, precision, recall, and specificity are calculated to evaluate the performance of the models. Comparative analysis of these models is also done with a moderately larger model i.e. DistilBert which has 67 million tunable parameters for the same task with the same experimental settings. Results indicate that ESG outperforms all other models including DistilBert due to its better deep contextualized text representation as it gets the best F1 score of 89% with comparatively less training time. Further optimization of ESG is also proposed to make it suitable for low-powered devices. This study helps to achieve better classification performance of depression detection as well as to choose the best language model in terms of performance and less training time for Twitter-related downstream NLP tasks.

Palabras Clave

Depression

Bit error rate

Social networking (online)

Transformers

Public healthcare

Transfer learning

Blogs

ISSN

2169-3536

Revisión por pares

DOI

10.1109/ACCESS.2022.3223049

Patrocinador

This work was supported in part by the Department of Informatics, University of Valladolid, Spain; in part by the Spanish Ministry of Economy and Competitiveness through Feder Funds under Grant TEC2017-84321-C4-2-R; in part by MINECO/AEI/ERDF (EU) under Grant PID2019-105660RB-C21 / AEI / 10.13039/501100011033; in part by the Aragón Government under Grant T58_20R research group; and in part by the Construyendo Europa desde Aragón under Grant ERDF 2014-2020

Version del Editor

https://ieeexplore.ieee.org/document/9954391/keywords#keywords