Mostrar el registro sencillo del ítem

dc.contributor.authorRodríguez Gutiez, Eduardo 
dc.contributor.authorMoretón Fernández, Ana
dc.contributor.authorGonzález Escribano, Arturo 
dc.contributor.authorLlanos Ferraris, Diego Rafael 
dc.date.accessioned2019-11-05T18:08:18Z
dc.date.available2019-11-05T18:08:18Z
dc.date.issued2019
dc.identifier.citationTowards a BLAS library truly portable between different Accelerator types. Eduardo Rodríguez Gutiez, Arturo González Escribano, Diego R. Llanos. The Journal of Supercomputing (Q2). First online: 10 june 2019. DOI: 10.1007/s11227-019-02925-3es
dc.identifier.issn0920-8542es
dc.identifier.urihttp://uvadoc.uva.es/handle/10324/39040
dc.description.abstractScientific applications are some of the most computationally demanding software pieces. Their core is usually a set of linear algebra operations, which may represent a significant part of the overall run-time of the application. BLAS libraries aim to solve this problem by exposing a set of highly optimized, reusable routines. There are several implementations specifically tuned for different types of computing platforms, including coprocessors. Some examples include the one bundled with the Intel MKL library, which targets Intel CPUs or Xeon Phi coprocessors, or the cuBLAS library, which is specifically designed for NVIDIA GPUs. Nowadays, computing nodes in many supercomputing clusters include one or more different coprocessor types. To fully exploit these platforms might require programs that can adapt at run-time to the chosen device type, hardwiring in the program the code needed to use a different library for each device type that can be selected. This also forces the programmer to deal with different interface particularities and mechanisms to manage the memory transfers of the data structures used as parameters. This paper presents a unified, performance-oriented, and portable interface for BLAS. This interface has been integrated into a heterogeneous programming model (Controllers) which supports groups of CPU cores, Xeon Phi accelerators, or NVIDIA GPUs in a transparent way. The contribution of this paper includes: An abstraction layer to hide programming differences between diverse BLAS libraries; new types of kernel classes to support the context manipulation of different external BLAS libraries; a new kernel selection policy that considers both programmer kernels and different external libraries; a complete new Controller library interface for the whole collection of BLAS routines. This proposal enables the creation of BLAS-based portable codes that can execute on top of different types of accelerators by changing a single initialization parameter. Our software internally exploits different preexisting and widely known BLAS library implementations, such as cuBLAS, MAGMA, or the one found in Intel MKL. It transparently uses the most appropriate library for the selected device. Our experimental results show that our abstraction does not introduce significant performance penalties, while achieving the desired portability.es
dc.format.mimetypeapplication/pdfes
dc.language.isospaes
dc.publisherThe Journal of Supercomputinges
dc.rights.accessRightsinfo:eu-repo/semantics/embargoedAccesses
dc.titleToward a BLAS library truly portable across different accelerator typeses
dc.typeinfo:eu-repo/semantics/articlees
dc.identifier.doi10.1007/s11227-019-02925-3es
dc.identifier.publicationtitleThe Journal of Supercomputinges
dc.peerreviewedSIes
dc.description.projectEste trabajo forma parte del proyecto de investigación PCAS Grant TIN2017-88614-R y la Junta de Castilla y León, proyecto PROPHET, VA082P17.es
dc.identifier.essn1573-0484es
dc.type.hasVersioninfo:eu-repo/semantics/draftes


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem