Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs

Torres de la Sierra, Yuri; González Escribano, Arturo; Llanos Ferraris, Diego Rafael

doi:10.1109/ISPA.2012.92

Título

Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs

dc.contributor.author	Torres de la Sierra, Yuri
dc.contributor.author	González Escribano, Arturo
dc.contributor.author	Llanos Ferraris, Diego Rafael
dc.date.accessioned	2025-02-20T08:58:03Z
dc.date.available	2025-02-20T08:58:03Z
dc.date.issued	2012
dc.identifier.citation	IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2012, At: Leganés, Madrid, Spain, p. 617-624	es
dc.identifier.uri	https://uvadoc.uva.es/handle/10324/75090
dc.description	Producción Científica	es
dc.description.abstract	The NVIDIA graphics processing units (GPUs) are playing an important role as general purpose programming devices. The implementation of parallel codes to exploit the GPU hardware architecture is a task for experienced programmers. The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded. The threadblock configuration has a significant impact on the global performance of the program. While in CUDA parallel programming model it is always necessary to specify the threadblock size and shape, the OpenCL standard also offers an automatic mechanism to take this delicate decision. In this paper we present a study of these criteria for Fermi architecture, introducing a general approach for threadblock choice, and showing that there is considerable room for improvement in OpenCL automatic strategy.	es
dc.format.extent	8 p.	es
dc.format.mimetype	application/pdf	es
dc.language.iso	eng	es
dc.publisher	IEEE	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.subject	Informática	es
dc.subject.classification	GPGPU, automatic code tuning, Fermi, CUDA, OpenCL	es
dc.title	Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs	es
dc.type	info:eu-repo/semantics/conferenceObject	es
dc.identifier.doi	10.1109/ISPA.2012.92	es
dc.relation.publisherversion	https://ieeexplore.ieee.org/document/6280352	es
dc.title.event	IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2012	es
dc.description.project	This research is partly supported by the Ministerio de Industria, Spain (CENIT OCEANLIDER), MICINN (Spain) and the European Union FEDER (CAPAP-H3 network TIN2010- 12011-E, TIN2011-25639), and the HPC-EUROPA2 project (project number: 228398) with the support of the European Commission - Capacities Area - Research Infrastructures Initiative.	es
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es
dc.subject.unesco	1203	es
dc.subject.unesco	3304	es

Ficheros en el ítem

Nombre:: Using_Fermi_Architecture_Knowl ...
Tamaño:: 190.8Kb
Formato:: PDF

Visualizar/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

DEP41 - Comunicaciones a congresos, conferencias, etc. [103]

Mostrar el registro sencillo del ítem