Show simple item record

dc.contributor.authorCámara Moreno, Jesús 
dc.contributor.authorCuenca, Javier
dc.contributor.authorGarcía, Luis Pedro
dc.contributor.authorGiménez, Domingo
dc.date.accessioned2025-01-27T18:28:31Z
dc.date.available2025-01-27T18:28:31Z
dc.date.issued2014
dc.identifier.citationParallel Computing, 2014, Volume 40, Issue 7, Pages 309-327es
dc.identifier.issn0167-8191es
dc.identifier.urihttps://uvadoc.uva.es/handle/10324/74462
dc.descriptionProducción Científicaes
dc.description.abstractThe most computationally demanding scientific problems are solved with large parallel systems. In some cases these systems are Non-Uniform Memory Access (NUMA) multiprocessors made up of a large number of cores which share a hierarchically organized memory. The main basic component of these scientific codes is often matrix multiplication, and the efficient development of other linear algebra packages is directly based on the matrix multiplication routine implemented in the BLAS library. BLAS library is used in the form of packages implemented by the vendors or free implementations. The latest versions of this library are multithreaded and can be used efficiently in multicore systems, but when they are used inside parallel codes, the two parallelism levels can interfere and produce a degradation of the performance. In this work, an auto-tuning method is proposed to select automatically the optimum number of threads to use at each parallel level when multithreaded linear algebra routines are called from OpenMP parallel codes. The method is based on a simple but effective theoretical model of the execution time of the two-level routines. The methodology is applied to a two-level matrix–matrix multiplication and to different matrix factorizations (LU, QR and Cholesky) by blocks. Traditional schemes which directly use the multithreaded routine of BLAS, dgemm, are compared with schemes combining the multithreaded dgemm with OpenMP.es
dc.format.mimetypeapplication/pdfes
dc.language.isoenges
dc.publisherElsevieres
dc.rights.accessRightsinfo:eu-repo/semantics/restrictedAccesses
dc.subjectComputación Paralelaes
dc.subjectAuto-Tuninges
dc.subject.classificationAuto-tuninges
dc.subject.classificationLinear Algebraes
dc.subject.classificationPerformance Modelinges
dc.titleAuto-tuned nested parallelism: A way to reduce the execution time of scientific software in NUMA systemses
dc.typeinfo:eu-repo/semantics/articlees
dc.rights.holderElsevier B.V.es
dc.identifier.doi10.1016/j.parco.2014.03.011es
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/abs/pii/S0167819114000416es
dc.identifier.publicationfirstpage309es
dc.identifier.publicationissue7es
dc.identifier.publicationlastpage327es
dc.identifier.publicationtitleParallel Computinges
dc.identifier.publicationvolume40es
dc.peerreviewedSIes
dc.description.projectEste trabajo forma parte del proyecto de investigación TIN2012-38341-C04-03 financiado por el Ministerio de Economía (MINECO)es
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones
dc.subject.unesco1203 Ciencia de Los Ordenadoreses
dc.subject.unesco3304 Tecnología de Los Ordenadoreses


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record