Please use this identifier to cite or link to this item:
Title: A Framework for Building Hypercubes Using MapReduce
Author: Tapiador, D.
O'Mullane, William
Brown, A. G. A.
Luri Carrascoso, Xavier
Huedo, E.
Osuna, P.
Keywords: Diagrames
Mineria de dades
Computació distribuïda
Data mining
Computational grids (Computer systems)
Via Làctia
Issue Date: 13-May-2014
Publisher: Elsevier B.V.
Abstract: The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.
Note: Versió postprint del document publicat a:
It is part of: Computer Physics Communications, 2014, vol. 185, num. 5, p. 1429-1438
Related resource:
ISSN: 0010-4655
Appears in Collections:Articles publicats en revistes (Física Quàntica i Astrofísica)

Files in This Item:
File Description SizeFormat 
633595.pdf1.17 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.