Mostrar el registro sencillo del ítem
dc.contributor.advisor | Fuente Redondo, Pablo Lucio de la | es |
dc.contributor.advisor | Martínez Prieto, Miguel Angel | es |
dc.contributor.author | Barrales Ruiz, Carlos Vladimir | |
dc.contributor.editor | Universidad de Valladolid. Escuela Técnica Superior de Ingenieros de Telecomunicación | es |
dc.date.accessioned | 2017-12-13T19:31:04Z | |
dc.date.available | 2017-12-13T19:31:04Z | |
dc.date.issued | 2017 | |
dc.identifier.uri | http://uvadoc.uva.es/handle/10324/27615 | |
dc.description.abstract | Apache Spark is a general purpose big data processing framework using the mapreduce paradigm, quickly becoming very popular. Although the information provided by Spark authors indicates a substantial improvement in performance against Hadoop, there is very little evidence in the literature of specific tests that reliably proves such claims. In this Master Work study the benefits of Spark and the most important factors on which they depend, considering as a reference the transformation of RDF datasets into HDT format. The main objective of this work is to perform one exploratory study to leverage Spark solving the HDT serialization problem, finding ways to remove limitations of the current implementations, like the memory need which use to increase with the dataset size. To do that, first we’ve setup a open environment to ensure reproducibility and contributed with 3 different approaches implementing the most heavy task in the HDT serialization. The test performed with different dataset sizes showed the benefits obtained with the proposed solution compared to legacy Hadoop MapReduce implementation, as well as some highlights to improve even more the serialization algorithm. | es |
dc.description.sponsorship | Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos) | es |
dc.format.mimetype | application/zip | es |
dc.language.iso | eng | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.subject.classification | Apache Spark (Procesador de datos) | es |
dc.subject.classification | RDF | es |
dc.subject.classification | Hadoop MapReduce | es |
dc.subject.classification | HDT (Formato) | es |
dc.title | Compresión de datasets RDF en HDT usando Spark | es |
dc.type | info:eu-repo/semantics/masterThesis | es |
dc.description.degree | Máster en Investigación en Tecnologías de la Información y las Comunicaciones | es |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
- Trabajos Fin de Máster UVa [6577]
La licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 International