Mostrar el registro sencillo del ítem

dc.contributor.authorSantamaria-Valenzuela, Inmaculada
dc.contributor.authorCarratalá-Sáez, Rocío
dc.contributor.authorTorres, Yuri
dc.contributor.authorLlanos, Diego R.
dc.contributor.authorGonzalez-Escribano, Arturo
dc.date.accessioned2024-09-15T08:55:37Z
dc.date.available2024-09-15T08:55:37Z
dc.date.issued2023
dc.identifier.citationMappings and patterns to improve the triangular matrix product on distributed systems, Conference: 2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)At: Santa Fe, Nevada, USA, October 2023es
dc.identifier.urihttps://uvadoc.uva.es/handle/10324/69761
dc.description.abstractMatrix multiplication is one of the most costly linear algebra operations, very often present in scientific computational applications. Current generic linear algebra libraries, such as ScaLAPACK and its recent evolution SLATE, include functionalities for generic and triangular matrix multiplication. They generally rely on block-cyclic partitioning, which has two main advantages. First, it provides good interoperability with other functionalities of the libraries. Second, it provides a good balance of computation and inter-process communications. The focus of these libraries is performance and scalability, targeting even huge number of processes. Nevertheless, many enterprises and computing centers work with commodity clusters or small partitions with a reduced amount of nodes. In this paper, we propose and evaluate a combination of data distributions and communication patterns intending to optimize the triangular matrix product in distributed memory systems when targeting commodity clusters (up to approximately 36 nodes). The main four ideas are: Use panels (horizontal or vertical band partitions) instead of tiling; avoid zero-elements in communication buffers; balance the number of elements in communicated buffers; and evaluate the performance when combined with both pipeline and broadcast communication strategies. We compare our implementation performance against the state-ofthe-art implementations provided by ScaLAPACK and SLATE. The results show that we outperform both of them. Our proposal is up to 41% faster than ScaLAPACK, and up to 6.7% faster than SLATE.es
dc.format.mimetypeapplication/pdfes
dc.language.isoenges
dc.publisherIEEEes
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.subjectInformáticaes
dc.subject.classificationTriangular matrices, matrix product, distributed systems, SLATE, ScaLAPACKes
dc.titleMappings and patterns to improve the triangular matrix product on distributed systemses
dc.typeinfo:eu-repo/semantics/articlees
dc.identifier.doi10.1109/CLUSTERWorkshops61457.2023.00026es
dc.relation.publisherversionhttps://ieeexplore.ieee.org/abstract/document/10321877es
dc.identifier.publicationfirstpage62es
dc.identifier.publicationlastpage63es
dc.peerreviewedSIes
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones
dc.subject.unesco1203 Ciencia de Los Ordenadoreses
dc.subject.unesco3304es


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem