Mostrar el registro sencillo del ítem

dc.contributor.advisorMartínez Prieto, Miguel Angel es
dc.contributor.advisorBregón Bregón, Aníbal es
dc.contributor.authorAlonso Isla, Álvaro
dc.contributor.editorUniversidad de Valladolid. Escuela Técnica Superior de Ingenieros de Telecomunicación es
dc.date.accessioned2018-11-23T12:24:36Z
dc.date.available2018-11-23T12:24:36Z
dc.date.issued2018
dc.identifier.urihttp://uvadoc.uva.es/handle/10324/32896
dc.description.abstractThe distributed system Hadoop has become very popular for storing and process large amounts of data (Big Data). As it is composed of many machines, its file system, called HDFS (Hadoop Distributed File System), is also distributed. But as HDFS is not a traditional storage system, plenty of new file formats have been developed, to take advantage of its features. In this work we study that new formats to find out their characteristics, and being able to decide which ones can be better knowing the needs of our data. For that goal, we have made a theoretical framework to compare them, and easily recognize which formats fit our needs. Also we have made an experimental study to find out how the formats work in some specific situations, selecting two very different datasets and a set of simple queries, resolved with MapReduce jobs, written with Java or run using Hive tool. The final goal of this work is to be able to identify the different strengths and weakenesses of the file formats.es
dc.description.sponsorshipDepartamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)es
dc.format.mimetypeapplication/pdfes
dc.language.isoenges
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.classificationBig Dataes
dc.subject.classificationHadoopes
dc.subject.classificationHDFSes
dc.subject.classificationMapReducees
dc.titleHDFS File Formats: Study and Performance Comparisones
dc.typeinfo:eu-repo/semantics/masterThesises
dc.description.degreeMáster en Investigación en Tecnologías de la Información y las Comunicacioneses
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem