RT info:eu-repo/semantics/article T1 An autotuning approach to select the inter-GPU communication library on heterogeneous systems A1 Cámara Moreno, Jesús A1 Cuenca, Javier A1 Galindo, Víctor A1 Vicente, Arturo A1 Boratto, Murilo K1 Autotuning K1 Communication libraries K1 Multi-GPU K1 Heterogeneous computing K1 1203.17 Informática AB In this work, an automatic optimisation approach for parallel routines on multi-GPUsystems is presented. Several inter-GPU communication libraries (such as CUDA-Aware MPI or NCCL) are used with a set of routines to perform the numerical oper-ations among the GPUs located on the compute nodes. The main objective is theselection of the most appropriate communication library, the number of GPUs to beused and the workload to be distributed among them in order to reduce the cost ofdata movements, which represent a large percentage of the total execution time. Tothis end, a hierarchical modelling of the execution time of each routine to be opti-mised is proposed, combining experimental and theoretical approaches. The resultsshow that near-optimal decisions are taken in all the scenarios analysed. PB Springer SN 0920-8542 YR 2024 FD 2024 LK https://uvadoc.uva.es/handle/10324/75227 UL https://uvadoc.uva.es/handle/10324/75227 LA eng NO The Journal of Supercomputing, 2024, vol. 81, n. 1 NO Producción Científica DS UVaDOC RD 07-abr-2025