2024-03-29T15:44:01Zhttp://uvadoc.uva.es/oai/requestoai:uvadoc.uva.es:10324/276152021-05-22T02:28:59Zcom_10324_38col_10324_787
Compresión de datasets RDF en HDT usando Spark
Barrales Ruiz, Carlos Vladimir
Fuente Redondo, Pablo Lucio de la
Martínez Prieto, Miguel Angel
Universidad de Valladolid. Escuela Técnica Superior de Ingenieros de Telecomunicación
Apache Spark is a general purpose big data processing framework using the mapreduce
paradigm, quickly becoming very popular. Although the information provided
by Spark authors indicates a substantial improvement in performance against Hadoop,
there is very little evidence in the literature of specific tests that reliably proves such
claims. In this Master Work study the benefits of Spark and the most important factors
on which they depend, considering as a reference the transformation of RDF datasets
into HDT format. The main objective of this work is to perform one exploratory study
to leverage Spark solving the HDT serialization problem, finding ways to remove limitations
of the current implementations, like the memory need which use to increase
with the dataset size. To do that, first we’ve setup a open environment to ensure reproducibility
and contributed with 3 different approaches implementing the most heavy
task in the HDT serialization. The test performed with different dataset sizes showed
the benefits obtained with the proposed solution compared to legacy Hadoop MapReduce
implementation, as well as some highlights to improve even more the serialization
algorithm.
2017-12-13T19:31:04Z
2017-12-13T19:31:04Z
2017-12-13T19:31:04Z
2017
info:eu-repo/semantics/masterThesis
http://uvadoc.uva.es/handle/10324/27615
eng
info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-nd/4.0/
Attribution-NonCommercial-NoDerivatives 4.0 International