RT info:eu-repo/semantics/conferenceObject
T1 Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs
A1 Torres de la Sierra, Yuri
A1 González Escribano, Arturo
A1 Llanos Ferraris, Diego Rafael
K1 Informática
K1 GPGPU, automatic code tuning, Fermi, CUDA, OpenCL
K1 1203
K1 3304
AB The NVIDIA graphics processing units (GPUs) are playing an important role as general purpose programming devices. The implementation of parallel codes to exploit the GPU hardware architecture is a task for experienced programmers. The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded. The threadblock configuration has a significant impact on the global performance of the program. While in CUDA parallel programming model it is always necessary to specify the threadblock size and shape, the OpenCL standard also offers an automatic mechanism to take this delicate decision. In this paper we present a study of these criteria for Fermi architecture, introducing a general approach for threadblock choice, and showing that there is considerable room for improvement in OpenCL automatic strategy.
PB IEEE
YR 2012
FD 2012
LK https://uvadoc.uva.es/handle/10324/75090
UL https://uvadoc.uva.es/handle/10324/75090
LA eng
NO IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA), 2012, At: Leganés, Madrid, Spain, p. 617-624
NO Producción Científica
DS UVaDOC
RD 28-feb-2026