RT info:eu-repo/semantics/article
T1 Exploratory study on class imbalance and solutions for network traffic classification
A1 Egea Gómez, Santiago
A1 Hernández Callejo, Luis
A1 Carro Martínez, Belén
A1 Sánchez Esguevillas, Antonio Javier
K1 Machine learning
K1 Network management
K1 Class Imbalance
K1 Network traffic classification
K1 33 Ciencias Tecnológicas
K1 3325 Tecnología de las Telecomunicaciones
AB Network Traffic Classification is a fundamental component in network management, and the fast-paced advances in Machine Learning have motivated the application of learning techniques to identify network traffic. The intrinsic features of Internet networks lead to imbalanced class distributions when datasets are conformed, phenomena called Class Imbalance and that is attaching an increasing attention in many research fields. In spite of performance losses due to Class Imbalance, this issue has not been thoroughly studied in Network Traffic Classification and some previous works are limited to few solutions and/or assumed misleading methodological approaches. In this article, we deal with Class Imbalance in Network Traffic Classification, studying the presence of this phenomenon and analyzing a wide number of solutions in two different Internet environments: a lab network and a high-speed backbone. Namely, we experimented with 21 data-level algorithms, six ensemble methods and one cost-level approach. Throughout the experiments performed, we have applied the most recent methodological aspects for imbalanced problems, such as: DOB-SCV validation approach or the performance metrics assumed. And last but not least, the strategies to tune parameters and our algorithm implementations to adapt binary methods to multiclass problems are presented and shared with the research community, including two ensemble techniques used for the first time in Machine Learning to the best of our knowledge. Our experimental results reveal that some techniques mitigated Class Imbalance with interesting benefit for traffic classification models. More specifically, some algorithms reached increases greater than 8% in overall accuracy and greater than 4% in AUC-ROC for the most challenging network scenario.
PB Elsevier
SN 0925-2312
YR 2019
FD 2019
LK https://uvadoc.uva.es/handle/10324/54159
UL https://uvadoc.uva.es/handle/10324/54159
LA eng
NO Neurocomputing, 2019, vol. 343, p. 100-119
NO Producción Científica
DS UVaDOC
RD 06-ago-2025