RT info:eu-repo/semantics/masterThesis
T1 Compresión de datasets RDF en HDT usando Spark
A1 Barrales Ruiz, Carlos Vladimir
A2 Universidad de Valladolid. Escuela Técnica Superior de Ingenieros de Telecomunicación
K1 Apache Spark (Procesador de datos)
K1 RDF
K1 Hadoop MapReduce
K1 HDT (Formato)
AB Apache Spark is a general purpose big data processing framework using the mapreduceparadigm, quickly becoming very popular. Although the information providedby Spark authors indicates a substantial improvement in performance against Hadoop,there is very little evidence in the literature of specific tests that reliably proves suchclaims. In this Master Work study the benefits of Spark and the most important factorson which they depend, considering as a reference the transformation of RDF datasetsinto HDT format. The main objective of this work is to perform one exploratory studyto leverage Spark solving the HDT serialization problem, finding ways to remove limitationsof the current implementations, like the memory need which use to increasewith the dataset size. To do that, first we’ve setup a open environment to ensure reproducibilityand contributed with 3 different approaches implementing the most heavytask in the HDT serialization. The test performed with different dataset sizes showedthe benefits obtained with the proposed solution compared to legacy Hadoop MapReduceimplementation, as well as some highlights to improve even more the serializationalgorithm.
YR 2017
FD 2017
LK http://uvadoc.uva.es/handle/10324/27615
UL http://uvadoc.uva.es/handle/10324/27615
LA eng
NO Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)
DS UVaDOC
RD 01-may-2024