Mappings and patterns to improve the triangular matrix product on distributed systems

Santamaría Valenzuela, María Inmaculada; Carratalá Sáez, Rocío; Torres de la Sierra, Yuri; Llanos Ferraris, Diego Rafael; González Escribano, Arturo

doi:10.1109/CLUSTERWorkshops61457.2023.00026

Título

Mappings and patterns to improve the triangular matrix product on distributed systems

dc.contributor.author	Santamaría Valenzuela, María Inmaculada
dc.contributor.author	Carratalá Sáez, Rocío
dc.contributor.author	Torres de la Sierra, Yuri
dc.contributor.author	Llanos Ferraris, Diego Rafael
dc.contributor.author	González Escribano, Arturo
dc.date.accessioned	2024-09-15T08:55:37Z
dc.date.available	2024-09-15T08:55:37Z
dc.date.issued	2023
dc.identifier.citation	Mappings and patterns to improve the triangular matrix product on distributed systems, Conference: 2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)At: Santa Fe, Nevada, USA, October 2023	es
dc.identifier.uri	https://uvadoc.uva.es/handle/10324/69761
dc.description.abstract	Matrix multiplication is one of the most costly linear algebra operations, very often present in scientific computational applications. Current generic linear algebra libraries, such as ScaLAPACK and its recent evolution SLATE, include functionalities for generic and triangular matrix multiplication. They generally rely on block-cyclic partitioning, which has two main advantages. First, it provides good interoperability with other functionalities of the libraries. Second, it provides a good balance of computation and inter-process communications. The focus of these libraries is performance and scalability, targeting even huge number of processes. Nevertheless, many enterprises and computing centers work with commodity clusters or small partitions with a reduced amount of nodes. In this paper, we propose and evaluate a combination of data distributions and communication patterns intending to optimize the triangular matrix product in distributed memory systems when targeting commodity clusters (up to approximately 36 nodes). The main four ideas are: Use panels (horizontal or vertical band partitions) instead of tiling; avoid zero-elements in communication buffers; balance the number of elements in communicated buffers; and evaluate the performance when combined with both pipeline and broadcast communication strategies. We compare our implementation performance against the state-ofthe-art implementations provided by ScaLAPACK and SLATE. The results show that we outperform both of them. Our proposal is up to 41% faster than ScaLAPACK, and up to 6.7% faster than SLATE.	es
dc.format.mimetype	application/pdf	es
dc.language.iso	eng	es
dc.publisher	IEEE	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.subject	Informática	es
dc.subject.classification	Triangular matrices, matrix product, distributed systems, SLATE, ScaLAPACK	es
dc.title	Mappings and patterns to improve the triangular matrix product on distributed systems	es
dc.type	info:eu-repo/semantics/article	es
dc.identifier.doi	10.1109/CLUSTERWorkshops61457.2023.00026	es
dc.relation.publisherversion	https://ieeexplore.ieee.org/abstract/document/10321877	es
dc.identifier.publicationfirstpage	62	es
dc.identifier.publicationlastpage	63	es
dc.peerreviewed	SI	es
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es
dc.subject.unesco	1203 Ciencia de Los Ordenadores	es
dc.subject.unesco	3304	es

Ficheros en el ítem

Nombre:: IEEE_Cluster_extended_abstract.pdf
Tamaño:: 440.1Kb
Formato:: PDF

Visualizar/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

DEP41 - Artículos de revista [137]

Mostrar el registro sencillo del ítem