Mostrar el registro sencillo del ítem

dc.contributor.authorLópez Martín, Manuel
dc.contributor.authorCarro Martínez, Belén 
dc.contributor.authorArribas Sánchez, Juan Ignacio 
dc.contributor.authorSánchez Esguevillas, Antonio Javier
dc.date.accessioned2022-07-25T07:41:21Z
dc.date.available2022-07-25T07:41:21Z
dc.date.issued2021
dc.identifier.citationKnowledge-Based Systems, 2021, vol. 219, p. 106887es
dc.identifier.issn0950-7051es
dc.identifier.urihttps://uvadoc.uva.es/handle/10324/54206
dc.descriptionProducción Científicaes
dc.description.abstractIncluding high-dimensional categorical predictors in a machine learning model is a major challenge. This is particularly appropriate for the IP and Port addresses of network connections when they are considered as predictors (features) in machine learning models. These features are particularly important for network intrusion detection, as many attacks exploit information about IP/Port addresses. The sparsity and high dimensionality of these features make it difficult their inclusion into the models, being discarded as useful information in many cases. This work proposes to replace the original network addresses by new features based on a set of distances defined between different components of the source and destination IP and Port addresses. These distances incorporate information on the probability of co-occurrence of source and destination addresses. The distances are calculated using a dense, low-dimensional vector representation (embedding) of the different network address components. The embeddings are obtained with a neural network, which requires few computational resources, plus an additional hash function that collapses the extremely large range of IP and Port values, making the model implementation feasible. A self-supervised learning framework under a hierarchical model is used to train the encoding network. The novel features can be used to predict future co-occurrence of source and destination network addresses, and, when applied as features in a supervised model, they significantly increase the prediction performance of most classifiers for the detection of network intrusions. We demonstrate this prediction improvement over two modern network intrusion datasets: CICIDS2017 and CICDDoS2019.es
dc.format.mimetypeapplication/pdfes
dc.language.isoenges
dc.publisherElsevieres
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subject.classificationHash functiones
dc.subject.classificationSelf-supervised learninges
dc.subject.classificationNeural networkes
dc.subject.classificationNetwork address embeddinges
dc.subject.classificationNetwork intrusion detectiones
dc.titleNetwork intrusion detection with a novel hierarchy of distances between embeddings of hash IP addresseses
dc.typeinfo:eu-repo/semantics/articlees
dc.identifier.doi10.1016/j.knosys.2021.106887es
dc.identifier.publicationfirstpage106887es
dc.identifier.publicationtitleKnowledge-Based Systemses
dc.identifier.publicationvolume219es
dc.peerreviewedSIes
dc.description.projectMinisterio de Ciencia, Innovación y Universidades Proyectos de I+D+i ‘‘Retos investigación’’, (grant RTI2018-098958- B-I00)es
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.type.hasVersioninfo:eu-repo/semantics/submittedVersiones
dc.subject.unesco33 Ciencias Tecnológicases
dc.subject.unesco3325 Tecnología de las Telecomunicacioneses


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem