dc.contributor.advisor | Martínez Prieto, Miguel Angel | es |
dc.contributor.advisor | Bregón Bregón, Aníbal | es |
dc.contributor.author | Alonso Isla, Álvaro | |
dc.contributor.editor | Universidad de Valladolid. Escuela Técnica Superior de Ingenieros de Telecomunicación | es |
dc.date.accessioned | 2018-11-23T12:24:36Z | |
dc.date.available | 2018-11-23T12:24:36Z | |
dc.date.issued | 2018 | |
dc.identifier.uri | http://uvadoc.uva.es/handle/10324/32896 | |
dc.description.abstract | The distributed system Hadoop has become very popular for storing and process large amounts of data (Big Data). As it is composed of many machines, its file system, called
HDFS (Hadoop Distributed File System), is also distributed. But as HDFS is not a traditional
storage system, plenty of new file formats have been developed, to take advantage
of its features. In this work we study that new formats to find out their characteristics,
and being able to decide which ones can be better knowing the needs of our data. For
that goal, we have made a theoretical framework to compare them, and easily recognize
which formats fit our needs. Also we have made an experimental study to find out how the
formats work in some specific situations, selecting two very different datasets and a set of
simple queries, resolved with MapReduce jobs, written with Java or run using Hive tool.
The final goal of this work is to be able to identify the different strengths and weakenesses
of the file formats. | es |
dc.description.sponsorship | Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos) | es |
dc.format.mimetype | application/pdf | es |
dc.language.iso | eng | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.subject.classification | Big Data | es |
dc.subject.classification | Hadoop | es |
dc.subject.classification | HDFS | es |
dc.subject.classification | MapReduce | es |
dc.title | HDFS File Formats: Study and Performance Comparison | es |
dc.type | info:eu-repo/semantics/masterThesis | es |
dc.description.degree | Máster en Investigación en Tecnologías de la Información y las Comunicaciones | es |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |