Simplifying the distributed multi-GPU programming of a hyperspectral image registration algorithm

Fernández Fabeiro, Jorge; González Escribano, Arturo; Llanos Ferraris, Diego Rafael

Por favor, use este identificador para citar o enlazar este ítem:http://uvadoc.uva.es/handle/10324/39071

Título

Simplifying the distributed multi-GPU programming of a hyperspectral image registration algorithm

Autor

Fernández Fabeiro, Jorge

González Escribano, Arturo

Llanos Ferraris, Diego Rafael

Congreso

High Performance Computing Systems Conference 2019

Año del Documento

2019

Editorial

IEEE

Descripción Física

8 p

Documento Fuente

High Performance Computing Systems Conference 2019, July 15 – 19, 2019 Dublin, Ireland

Resumen

Hyperspectral image registration is a relevant task for real-time applications like environmental disasters management or search and rescue scenarios. Traditional algorithms for this problem were not really devoted to real-time performance. The HYFMGPU algorithm arose as a high-performance GPU-based solution to solve such a lack. Nevertheless, a single-GPU solution is not enough, as sensors are evolving and then generating images with finer resolutions and wider wavelength ranges. An MPI+CUDA distributed multi-GPU implementation of HYFMGPU was previously presented. However, this solution shows the programming complexity of combining MPI with an accelerator programming model. In this paper we present a new and more abstract programming approach for this type of applications, which provides a high efficiency while simplifying the programming of the distributed code. The solution uses Hitmap, a library to ease the programming of parallel applications based on distributed arrays. It uses a more algorithm-oriented approach than MPI, including abstractions for the automatic partition and mapping of arrays at runtime with arbitrary granularity, as well as techniques to build flexible communication patterns that transparently adapt to the data partitions. We show how these abstractions apply to this application class. We present a comparison of development effort metrics between the original MPI implementation and the one based on Hitmap, with reductions of up to 95% for the Halstead score in specific work redistribution steps. We finally present experimental results showing that these abstractions are internally implemented in a high efficient way that can reduce the overall performance time in up to 37% comparing with the original MPI implementation.