RT info:eu-repo/semantics/article T1 Exploratory study on class imbalance and solutions for network traffic classification A1 Egea Gómez, Santiago A1 Hernández Callejo, Luis A1 Carro Martínez, Belén A1 Sánchez Esguevillas, Antonio Javier K1 Machine learning K1 Network management K1 Class Imbalance K1 Network traffic classification K1 33 Ciencias Tecnológicas K1 3325 Tecnología de las Telecomunicaciones AB Network Traffic Classification is a fundamental component in network management, and the fast-paced advances in Machine Learning have motivated the application of learning techniques to identify network traffic. The intrinsic features of Internet networks lead to imbalanced class distributions when datasets are conformed, phenomena called Class Imbalance and that is attaching an increasing attention in many research fields. In spite of performance losses due to Class Imbalance, this issue has not been thoroughly studied in Network Traffic Classification and some previous works are limited to few solutions and/or assumed misleading methodological approaches. In this article, we deal with Class Imbalance in Network Traffic Classification, studying the presence of this phenomenon and analyzing a wide number of solutions in two different Internet environments: a lab network and a high-speed backbone. Namely, we experimented with 21 data-level algorithms, six ensemble methods and one cost-level approach. Throughout the experiments performed, we have applied the most recent methodological aspects for imbalanced problems, such as: DOB-SCV validation approach or the performance metrics assumed. And last but not least, the strategies to tune parameters and our algorithm implementations to adapt binary methods to multiclass problems are presented and shared with the research community, including two ensemble techniques used for the first time in Machine Learning to the best of our knowledge. Our experimental results reveal that some techniques mitigated Class Imbalance with interesting benefit for traffic classification models. More specifically, some algorithms reached increases greater than 8% in overall accuracy and greater than 4% in AUC-ROC for the most challenging network scenario. PB Elsevier SN 0925-2312 YR 2019 FD 2019 LK https://uvadoc.uva.es/handle/10324/54159 UL https://uvadoc.uva.es/handle/10324/54159 LA eng NO Neurocomputing, 2019, vol. 343, p. 100-119 NO Producción Científica DS UVaDOC RD 19-nov-2024