Por favor, use este identificador para citar o enlazar este ítem:http://uvadoc.uva.es/handle/10324/39071
Simplifying the distributed multi-GPU programming of a hyperspectral image registration algorithm
High Performance Computing Systems Conference 2019
Año del Documento
High Performance Computing Systems Conference 2019, July 15 – 19, 2019 Dublin, Ireland
Hyperspectral image registration is a relevant task for real-time applications like environmental disasters management or search and rescue scenarios. Traditional algorithms for this problem were not really devoted to real-time performance. The HYFMGPU algorithm arose as a high-performance GPU-based solution to solve such a lack. Nevertheless, a single-GPU solution is not enough, as sensors are evolving and then generating images with finer resolutions and wider wavelength ranges. An MPI+CUDA distributed multi-GPU implementation of HYFMGPU was previously presented. However, this solution shows the programming complexity of combining MPI with an accelerator programming model. In this paper we present a new and more abstract programming approach for this type of applications, which provides a high efficiency while simplifying the programming of the distributed code. The solution uses Hitmap, a library to ease the programming of parallel applications based on distributed arrays. It uses a more algorithm-oriented approach than MPI, including abstractions for the automatic partition and mapping of arrays at runtime with arbitrary granularity, as well as techniques to build flexible communication patterns that transparently adapt to the data partitions. We show how these abstractions apply to this application class. We present a comparison of development effort metrics between the original MPI implementation and the one based on Hitmap, with reductions of up to 95% for the Halstead score in specific work redistribution steps. We finally present experimental results showing that these abstractions are internally implemented in a high efficient way that can reduce the overall performance time in up to 37% comparing with the original MPI implementation.
Este trabajo forma parte del proyecto de investigación PCAS Grant TIN2017-88614-R y la Junta de Castilla y León, proyecto PROPHET, VA082P17
Tipo de versión