RT info:eu-repo/semantics/masterThesis T1 Compresión de datasets RDF en HDT usando Spark A1 Barrales Ruiz, Carlos Vladimir A2 Universidad de Valladolid. Escuela Técnica Superior de Ingenieros de Telecomunicación K1 Apache Spark (Procesador de datos) K1 RDF K1 Hadoop MapReduce K1 HDT (Formato) AB Apache Spark is a general purpose big data processing framework using the mapreduceparadigm, quickly becoming very popular. Although the information providedby Spark authors indicates a substantial improvement in performance against Hadoop,there is very little evidence in the literature of specific tests that reliably proves suchclaims. In this Master Work study the benefits of Spark and the most important factorson which they depend, considering as a reference the transformation of RDF datasetsinto HDT format. The main objective of this work is to perform one exploratory studyto leverage Spark solving the HDT serialization problem, finding ways to remove limitationsof the current implementations, like the memory need which use to increasewith the dataset size. To do that, first we’ve setup a open environment to ensure reproducibilityand contributed with 3 different approaches implementing the most heavytask in the HDT serialization. The test performed with different dataset sizes showedthe benefits obtained with the proposed solution compared to legacy Hadoop MapReduceimplementation, as well as some highlights to improve even more the serializationalgorithm. YR 2017 FD 2017 LK http://uvadoc.uva.es/handle/10324/27615 UL http://uvadoc.uva.es/handle/10324/27615 LA eng NO Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos) DS UVaDOC RD 01-may-2024