2024-03-28T21:57:35Zhttp://uvadoc.uva.es/oai/requestoai:uvadoc.uva.es:10324/240492021-10-20T07:05:17Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Llamas Bello, César
González Delgado, Manuel Ángel
González Rebollo, Miguel Ángel
Vegas Hernández, Jesús María
2016
Innovación Educativa
Creación de una plataforma novedosa que aplica modelos y tecnologías de Open source hardware y software para la innovación de los laboratorios de Física de los primeros cursos de Física en el ámbito de la cinemática mediante sensores electrónicos, controladores y tecnologías de comunicación inalámbrica.
application/pdf
http://uvadoc.uva.es/handle/10324/24049
eng
IATED
Informática
Educación
Improving the Physics Laboratory Experience Through Sensors on a Wireless Open Source Hardware and Software Platform
info:eu-repo/semantics/conferenceObject
7 pg.
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291142021-06-23T11:18:14Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2015
Producción Científica
Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design.
Programming in this kind of environment is challenging. Many successful parallel programming models and tools have been proposed for specific environments. However, the application programmer still faces many important decisions not related with the parallel algorithms, but with implementation issues that are key for obtaining efficient programs. For example, decisions about partition and locality vs. synchronization/communication costs; grain selection and tiling; proper parallelization strategies for each grain level; or mapping, layout,
and scheduling details. Moreover, many of these decisions may change for different machine details or structure, or even with data sizes. This paper presents an automatic code generation system for mixed distributed- and shared-memory parallel multicomputers. We present an extension of the Trasgo programming model. This extended model supports a wider range of parallel structures and applications where coordination is expressed at an abstract level. Transparent modular objects are invoked to guide the partition and mapping of both data and processes, across the whole system. We present a technique that, for affine expressions, compute exact aggregated communications at the distributed level. It uses intersection of remote and local footprints in terms of the mapping policies selected. Moreover, Trasgo 2.0 integrates polyhedral analysis tools to obtain optimizations inside each shared-memory parallel node at the shared level. This approach allows to automatically generate mul-
tilevel parallel programs that adapt their communication and synchronization structures to the target machine. Our experimental results for both, shared- and distributed-memory environments, show how this approach can automatically produce efficient codes when compared with manually-optimized codes using MPI or OpenMP models.
application/pdf
http://uvadoc.uva.es/handle/10324/29114
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
Trasgo 2.0: Code generation for parallel distributed- and shared-memory hierarchical systems
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291152021-06-23T11:18:15Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2015
Producción Científica
Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to efficiently use their distributed- and shared-memory hierarchies. This implies to combine different programming paradigms and tools at different levels of the program design. This paper presents an approach to ease the programming for mixed distributed and shared memory parallel computers. The coordination at the distributed memory level is simplified using Hitmap, a library for distributed computing using hierarchical tiling of data structures. We show how this tool can be integrated with shared-memory programming models and automatic code generation tools to efficiently exploit the multicore environment of each multicomputer node. This approach allows to exploit the most appropriate techniques for each model, easily generating multilevel parallel programs that automatically adapt their communication and synchronization structures to the target machine. Our experimental results show how this approach mimics or even improves the best performance results obtained with manually optimized codes using pure MPI or OpenMP models.
application/pdf
http://uvadoc.uva.es/handle/10324/29115
eng
IEEE Press
On the run-time cost of distributed memory communications generated using the polyhedral model
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291162021-06-23T11:18:16Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2015
Producción Científica
Currently, the generation of parallel codes which are portable to different kinds of parallel computers is a challenge. Many approaches have been proposed during the last years following two different paths. Programming from scratch using new programming languages and models that deal with parallelism explicitly, or automatically generating parallel codes from already existing sequential programs. Using the current main-trend parallel languages, the programmer deals with mapping and optimization details that forces to take into account details of the execution platform to obtain a good performance. In code generators from sequential programs, programmers cannot control basic mapping decisions, and many times the programmer needs to transform the code to expose to the compiler information needed to leverage important optimizations. This paper presents a new high-level parallel programming language named CMAPS, designed to be used with the Trasgo parallel programming framework. This language provides a simple and explicit way to express parallelism in a highly abstract level. The programmer does not face decisions about granularity, thread management, or interprocess communication. Thus, the programmer can express di erent parallel paradigms in a easy, uni ed, abstract, and portable form. The language supports the necessary features imposed by transformation models such as Trasgo, to generate parallel codes that adapt their communication and synchronization structures for target machines composed by mixed distributed- and shared-memory parallel multicomputers.
application/pdf
http://uvadoc.uva.es/handle/10324/29116
spa
Universidad de Salamanca
A New High-Level Parallel Portable Language for Hierarchical Systems in Trasgo
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291172021-06-23T11:18:17Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Estébanez López, Álvaro
Llanos Ferraris, Diego Rafael
Orden, David
Palop del Río, Belén
2015
Producción Científica
Scheduling is one of the factors that most directly affect performance in Thread-Level Speculation (TLS). Since loops may present dependences that cannot be predicted before runtime, Finding a good chunk size is not a simple task. The most used mechanism, Fixed-Size Chunking (FSC), requires many \dry-runs" to set the optimal chunk size. If the loop does not present dependence violations at runtime, scheduling only needs to deal with load balancing issues. For loops where the general pattern of dependences is known, as is the case with Randomized Incremental Algorithms, specialized mechanisms have been designed to maximize performance. To make TLS available to a wider community, a general scheduling algorithm that does not require a-priori knowledge of the expected pattern of dependences nor previous dry-runs to adjust any parameter is needed. In this paper, we present an algorithm that estimates at runtime the best size of the next chunk to be scheduled.
This algorithm takes advantage of our previous knowledge in the design and test of other scheduling mechanisms, and it has a solid mathematical basis. The result is a method that, using information of the execution of the previous chunks, decides the size of the next chunk to be scheduled. Our experimental results show that the use of the proposed scheduling
function compares or even increases the performance that can be obtained by FSC, greatly reducing the need of a a costly and careful search for the best fixed chunk size.
application/pdf
http://uvadoc.uva.es/handle/10324/29117
eng
Springer
Moody Scheduling for Speculative Parallelization
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291182021-06-23T11:18:18Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Estébanez López, Álvaro
Llanos Ferraris, Diego Rafael
González Escribano, Arturo
2015
Producción Científica
Intel Xeon Phi accelerators are one of the newest devices used in the field of parallel computing. However, there are comparatively few studies concerning their performance when using most of the existing parallelization techniques. One of them is thread-level speculation, a technique that optimistically tries to extract parallelism of loops without the need of a compile-time analysis that guarantees that the loop can be executed in parallel. In this article we evaluate the performance delivered by an Intel Xeon Phi coprocessor when using a software, state-of-the-art thread-level speculative parallelization library in the execution of well-known benchmarks. Our results show that, although the Xeon Phi delivers a relatively good speedup in comparison with a shared-memory architecture in terms of scalability, the low
computing power of its computational units when specific vectorization and SIMD instructions are not exploited, indicates that further development of new specific techniques for this platform is needed to make it competitive for the application of speculative parallelization comparing with high-end processors or conventional shared-memory systems.
application/pdf
http://uvadoc.uva.es/handle/10324/29118
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
Evaluating the capabilities of the Xeon Phi platform in the context of software-only, thread-level speculation
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291192021-06-23T11:18:19Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Fresno Bausela, Javier
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2016
Producción Científica
Dataflow programming consists in developing a program by describing its sequential stages and the interactions between them. The runtimes supporting this kind of programming are responsible of exploiting the parallelism by concurrently executing the different stages when their dependencies have been met. In this paper we introduce a new parallel programming model and framework based on the dataflow paradigm. Its features are: It is a unique one-tier model that supports hybrid shared- and distributed-memory systems; it can express activities arbitrarily linked, including cycles; it uses a distributed work-stealing mechanism to allow Multiple-Producer/Multiple-Consumer configurations; and it has a run-time mechanism for the reconfiguration of the dependences network which also allows to create task-to-task
affinities. We present an evaluation using examples of different classes of applications. Experimental results show that programs generated using this framework deliver good performance, and that the new abstractions introduce minimal overheads.
application/pdf
http://uvadoc.uva.es/handle/10324/29119
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
One Tier Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291212021-06-23T11:18:20Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Rodríguez Gutiez, Eduardo
Martinez Gil, Francisco
Orduña Huertas, Juan Manuel
González Escribano, Arturo
2016
Producción Científica
Multi-agent systems allow the modelling of complex, heterogeneous, and distributed systems in a realistic way. MARL-Ped is a multi-agent system tool, based on the MPI standard, for the simulation of different scenarios of pedestrians who autonomously learn the best behavior by Reinforcement Learning. MARL-Ped uses one MPI process for each agent by design, with a fixed fine-grain granularity. This requirement limits the performance of the simulations for a restricted number of processors that is lesser than the number of agents. On the other hand, Hitmap is a library to ease the programming of parallel applications based on distributed arrays. It includes abstractions for the automatic partition and mapping of arrays at runtime with arbitrary granularity, as well as functionalities to build flexible communication patterns that transparently adapt to the data partitions. In this work, we present the methodology and techniques of granularity selection in Hitmap, applied to the simulations of agent systems. As a first approximation, we use the MARL-Ped multi-agent pedestrian simulation software as a case of study for intra-node cases. Hitmap allows to transparently map agents to processes, reducing oversubscription and intra-node communication overheads. The evaluation results show significant advantages when using Hitmap, increasing the flexibility, performance, and agent-number scalability for a fixed number of processing elements, allowing a better exploitation of isolated nodes.
application/pdf
http://uvadoc.uva.es/handle/10324/29121
eng
Springer
MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291222021-06-23T11:18:21Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Rodríguez Gutiez, Eduardo
Martinez Gil, Francisco
Orduña Huertas, Juan Manuel
González Escribano, Arturo
2016
Producción Científica
Resumen|Los sistemas Multi-agente están constituidos por piezas software llamadas agentes que son capaces de percibir el entorno y actuar en él de manera autónoma. MARL-Ped es un modelo Multi-agente de peatones donde cada agente (peatón) aprende el comportamiento adecuado para la simulación de diferentes situaciones (aglomeraciones, cruces, evacuaciones de recintos cerrados,...). MARL-Ped utiliza el estándar de paso de mensajes MPI para su explotación en sistemas distribuidos de forma portable. Programar utilizando directamente sistemas de paso de mensajes requiere un gran esfuerzo si se desean introducir políticas de reparto de carga fexibles y que se adapten a la plataforma de destino. Hitmap es una biblioteca de funciones para facilitar la programación de aplicaciones paralelas, basada en arrays distribuidos. Introduce abstracciones para la partición
y mapeo transparente de arrays, así como para construir patrones de comunicación
exibles que se adaptan a la partición de forma automática. En este trabajo presentamos la metodología y técnicas de Hitmap aplicadas a la simulación de agentes, utilizando MARL-Ped como caso de estudio. Mostramos conceptual y experimentalmente las ventajas de aplicar Hitmap para aumentar la productividad de este tipo de aplicaciones, tanto en adaptabilidad como en rendimiento, permitiendo agrupar agentes en procesos y reduciendo los costes de comunicación y sobrecargas de forma transparente.
application/pdf
http://uvadoc.uva.es/handle/10324/29122
spa
Universidad de Salamanca
MARL-Ped+Hitmap: aumentando la productividad de simulaciones basadas en agentes con una herramienta de arrays distribuidos
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291232021-06-23T11:18:22Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Alonso Mayo, Alejandro
Ortega Arranz, Héctor
González Escribano, Arturo
2016
Producción Científica
Nowadays the use of hardware accelerators, such as the Graphics Processing Units (GPUs) or XeonPhi coprocessors, is key to solve computationally costly problems that require High Performance Computing (HPC). However, programming solutions for an efficient deployment in this kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data needed to be computed in each moment at the different computing platforms considering architectural details. We introduce the communicator concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators or multi-core devices in a transparent way. Furthermore, this model also gives the possibility to the programmer of launching CPU kernels in the multi-core processors with the same abstraction and methodology used for the accelerators. In this way, the burden of coding two different codes for managing the different computational devices is alleviated. Additionally, this entity allows the programmer to simplify the proper selection of values for kernel-launching configuration parameters. This is done through a simple characterization process of the kernel code to be executed. A programming model involving the communicator entity is described in this article. Finally, we also present a prototype library that implements the communicator model, together with its application in several study cases. Its use has led to reductions in the development costs with significantly low overheads in the execution times when compared to manually programmed and optimized solutions using CUDA and OpenMP directly.
application/pdf
http://uvadoc.uva.es/handle/10324/29123
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
Communicators: an abstraction to ease the use of hardware accelerators
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291252021-06-23T11:18:23Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Barba Gutiérrez, Daniel
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2016
Producción Científica
OpenACC existe ya desde hace algunos años y, durante los mismos, han ido apareciendo una serie de compiladores tanto en el ámbito académico como en la industria. Debido a la novedad del estándar OpenACC así como al continuo desarrollo de los compiladores existentes, una suite de benchmarks específicamente creada para analizar el comportamiento
del código generado por estos compiladores en distintas máquinas adquiere una utilidad
importante. En este artículo presentamos TORMENT OpenACC, una suite de benchmarks preparada para ser compilada por diferentes compiladores y que ofrece un resumen de los resultados obtenidos. Asimismo, junto a esta herramienta hemos desarrollado una métrica adecuada para la puntuación de los pares compilador-máquina y que hemos denominado Puntuación TORMENT ACC.
application/pdf
http://uvadoc.uva.es/handle/10324/29125
spa
Universidad de Salamanca
Una herramienta de benchmarking para compiladores de OpenACC
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291302021-06-23T11:18:24Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Barba Gutiérrez, Daniel
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2016
Producción Científica
The purpose of this contribution is to discuss the role of software-based TLS solutions in the following years. From the software side, automatic parallelization techniques such as those based on the polyhedral model extracts parallelism from an increasing number of applications. The question here is whether this reduces the need from speculative runtime techniques. From the hardware side, the advent of manycore systems with dozens or hundreds of processors makes classic TLS techniques to have diminished returns. To deal with this scenario, an update of TLS runtime architectures may be desirable.
application/pdf
http://uvadoc.uva.es/handle/10324/29130
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
The role of thread-level speculation in the manycore era
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291312021-06-23T11:18:25Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Barba Gutiérrez, Daniel
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2016
Producción Científica
OpenACC has been on development for a few years now. The OpenACC 2.5 specification was recently made public and there are some initiatives for developing full implementations of the standard to make use of accelerator capabilities. There is much to be done yet, but currently, OpenACC for GPUs is reaching a good maturity level in various implementations of the standard, using CUDA and OpenCL as backends. Nvidia is investing in this project and they have released an OpenACC Toolkit, including the PGI Compiler. There are, however, more developments out there. In this work, we analyze different available OpenACC compilers that have been developed by companies or universities during the last years. We check their performance and maturity, keeping in mind that OpenACC is designed to be used without extensive knowledge about parallel programming. Our results show that the compilers are on their way to a reasonable maturity, presenting different strengths and weaknesses.
application/pdf
http://uvadoc.uva.es/handle/10324/29131
eng
Springer
Comparative Analysis of OpenACC Compilers
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291322021-06-23T11:18:26Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Aldea López, Sergio
Llanos Ferraris, Diego Rafael
González Escribano, Arturo
2016
Producción Científica
Transactional Memory (TM) is a technique that aims to mitigate the performance losses that are inherent to the serialization of accesses in critical sections. Some studies have shown that the use of TM may lead to performance improvements, despite the existence of management overheads. However, the relative performance of TM, with respect to classical critical sections management depends greatly on the actual percentage of times that the same data is handled simultaneously by two transactions. In this paper, we compare the relative performance of the critical sections provided by OpenMP with respect to two Software Transactional Memory (STM) implementations. These three methods are used to manage concurrent data accesses in ATLaS, a software-based, Thread-Level Speculation (TLS) system. The complexity of this application makes it extremely di cult to predict whether two transactions may conflict or not, and how many times the transactions will be executed. Our experimental results show that the STM solutions only deliver a performance comparable to OpenMP when there are almost no conflicts. In any other case, their performance losses make OpenMP the best alternative to manage critical sections.
application/pdf
http://uvadoc.uva.es/handle/10324/29132
eng
Universidad de Salamanca
Critical Sections and Software Transactional Memory Comparison in the Context of a TLS Runtime Library
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291332021-06-23T11:18:27Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Barba Gutiérrez, Daniel
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
OpenACC is a parallel programming model for hardware accelerators, such as GPUs or Xeon Phi, which has been in development for several years by now. During this time, different compilers have appeared, both commercial and open source, which are still on development stage. Due to the fact that both the OpenACC standard and its implementations are relatively recent, we propose a benchmark suite specifically designed to check the performance of the OpenACC features in the code generated by different compilers on different architectures. Our benchmark suite is named TORMENT OpenACC2016. Along with this tool we have developed an adequate metric for the comparison of performance among different machine-compiler pairs which we have named TORMENT ACC2016 Score. The version 1 of TORMENT OpenACC2016 presented in this paper, contains six benchmarks, and is available online.
application/pdf
http://uvadoc.uva.es/handle/10324/29133
eng
IEEE Press
TORMENT OpenACC2016: A benchmarking tool for OpenACC compilers
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291342021-06-23T11:18:32Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
We propose to move to runtime part of the compile-time analysis needed to generate the communication code for distributed-memory systems. Communication stages on
distributed-memory systems have a significant impact on performance, thus the reduction of the communication times is key for improving performance in terms of runtime execution. We have developed a technique that uses a hierarchical tiling array library to represent and manage rectangular index spaces at runtime. The data to be received and/or sent by a local process to another one is calculated by intersecting the set of indexes read or written by a process with the set of indexes written or read by the local process.
application/pdf
http://uvadoc.uva.es/handle/10324/29134
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
A Runtime Analysis for communication calculation on Distributed-memory Systems
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291352021-06-23T11:18:33Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
Rodríguez Gutiez, Eduardo
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
Supercomputers are becoming more heterogeneous. They are composed by several machines with different computation capabilities and different kinds and families of accelerators, such as GPUs or Intel Xeon Phi coprocessors. Programming these machines is a hard task, that requires a deep study of the architectural details, in order to exploit efficiently each computational unit.
In this paper, we present an extension of a GPU-CPU heterogeneous programming model, to include support for Intel Xeon Phi coprocessors. This contribution extends the previous model and its implementation, by taking advantage of both the GPU communication model and the CPU execution model of the original approach, to derive a new approach for the Xeon Phi. Our experimental results show that using our approach, the programming effort needed for changing the kind of target devices is highly reduced for several study cases. For example, using our model to program a Mandelbrot benchmark, the 97% of the application code is reused between a GPU implementation and a Xeon Phi implementation.
application/pdf
http://uvadoc.uva.es/handle/10324/29135
eng
Springer
Supporting the Xeon Phi coprocessor in a Heterogeneous Programming Model
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291362021-06-23T11:18:34Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming mixing accelerators and CPU cores. However, the portability compromises in many cases the efficiency on different devices, and there are details about the coordination
of different types of devices that should be still tackled by the programmer. In this work we introduce the Multi-Controler (MCtrl), an abstract entity implemented in a library, that coordinates the management of heterogeneous devices, including accelerators with
different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying the data partition, mapping, and transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations
defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA’s GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing the data movements, and hiding the launching details. Results of an experimental study with four
study cases indicates that our abstraction allows the development of flexible and high efficient programs, that adapt to the heterogeneous environment.
application/pdf
http://uvadoc.uva.es/handle/10324/29136
eng
Universidad de Valladolid, Escuela de Ingeniería Informática
Multi-Device Controllers: A Library To Simplify The Parallel Heterogeneous Programming
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291372021-06-23T11:18:35Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Rodríguez Gutiez, Eduardo
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
Las rutinas de álgebra lineal BLAS son ampliamente utilizadas en aplicaciones científicas de
todo tipo. Existen implementaciones específicamente optimizadas para diferentes tipos de plataformas de cómputo incluyendo aceleradores. Por ejemplo, la implementación contenida en la biblioteca Intel MKL, aparte de ejecutarse en CPUs, incluye versiones para Xeon Phi, mientras que la biblioteca cuBLAS está especialmente dise~nada para GPUs de NVIDIA.
Sin embargo, los mecanismos para gestionar la memoria utilizada por las estructuras de datos sobre las que se realiza el computo son diferentes en cada implementación, así como algunos mecanismos relacionados con las llamadas y el paso de parámetros. En este artículo presentamos una interfaz única para BLAS, integrada en un modelo de programación heterogénea (Controllers) que soporta grupos de núcleos de CPU, aceleradores Xeon Phi o GPUs de NVIDIA de forma transparente para el programador. Con esta propuesta es posible construir programas portables basados en rutinas BLAS, que se ejecutan en diferentes tipos de aceleradores cambiando simplemente un parámetro de inicialización. Nuestra propuesta
explota internamente la biblioteca específica para cada tipo de dispositivo. Las diferencias en sus interfaces y en los mecanismos externos para gestionar la memoria de los dispositivos, minimizando transferencias, son transparentes para el programador. Los resultados experimentales muestran que nuestra abstracción no introduce pérdidas de rendimiento significativas.
application/pdf
http://uvadoc.uva.es/handle/10324/29137
spa
Universidad de Málaga
Hacia una biblioteca BLAS realmente portable entre diferentes tipos de aceleradores
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291382021-06-23T11:18:36Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Ji Ye, Senmao
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
En el patrón de computación denominado stencil cada elemento de una estructura de datos
de tipo array se actualiza iterativamente en función de los valores de sus vecinos. Entre otras aplicaciones, este patrón permite resolver numéricamente sistemas de ecuaciones en derivadas parciales, por lo que es de gran interés en el computo científico, creciendo
incesantemente los requerimientos de tamaño de datos y carga computacional en problemas reales. La estructura de este patrón permite utilizar estrategias sencillas de paralelismo de datos, por lo que su paralelización, tanto en CPUs como en aceleradores es de gran interés. Sin embargo, la necesidad de sincronización y comunicación entre elementos de proceso deriva en problemas relacionados con la capacidad de distribuir la carga y explotar multiples
dispositivos simultaneamente. En este trabajo presentamos un repaso y actualización de técnicas de programación eficientes basadas en MPI y CUDA para explotar este patrón de computación en sistemas multi-GPU distribuidos. Nuestros resultados muestran cómo las técnicas utilizadas pueden aliviar los problemas de comunicación entre host y GPUs, obteniendo rendimientos y escalabilidad en función de las capacidades del sistema de interconexión entre nodos.
application/pdf
http://uvadoc.uva.es/handle/10324/29138
spa
Universidad de Málaga
Técnicas de implementación de Stencils en multi-GPU distribuidas
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/291392021-06-23T11:18:42Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Barba Gutiérrez, Daniel
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2017
Producción Científica
OpenACC is a parallel programming model for automatic parallelization of sequential code using compiler directives or pragmas. OpenACC is intended to be used with accelerators such as GPUs and Xeon Phi. The different implementations of the standard, although still in early development, are primarily focused on GPU execution. In this study, we analyze how the different OpenACC compilers available under certain premises behave when the clauses affecting the underlying block geometry implementation are modified. These clauses are the Gang number, Worker number, and Vector Size defined by the standard.
application/pdf
http://uvadoc.uva.es/handle/10324/29139
eng
Universidad de Salamanca
Analysis of OpenACC Performance using Different Block Geometries
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/321852021-06-23T11:18:43Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Moretón Fernández, Ana
Rodríguez Gutiez, Eduardo
Torres de la Sierra, Yuri
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2018
In this paper we summarize the recent research advances of our group designing runtime and code generation solutions in the context of the Multi-Controller model. The Multi-Controller is an abstract entity implemented in a library that coordinates the management of several heterogeneous devices, including different types of accelerators and sets of CPU-cores, in an homogeneous way.
application/pdf
http://uvadoc.uva.es/handle/10324/32185
spa
Informática
Advances in the MultiController model: Programming heterogeneous systems in a homogeneous way
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/321862021-06-23T11:18:44Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Fernández Fabeiro, Jorge
Ordóñez, Álvaro
González Escribano, Arturo
Blanco Heras, Dora
2018
The task consisting on estimating the translation, rotation and scaling of an image with respect to another take of the same scene obtained at different times, viewpoints and/or lightning conditions is known as image registration. Applications like environmental disasters management or rescue operations depend on real-time hyperspectral images registration, but most of the current FFT-based techniques ignore such performance needs. Ordóñez et al. proposed HYFMGPU [1], a single-GPU algorithm whose performance makes it suitable for real-time use cases. As hyperspectral sensors improve, both the size of images and the wavelength ranges covered are expected to increase, so that a multi-GPU implementation is proposed to satisfy such growing needs.
application/pdf
http://uvadoc.uva.es/handle/10324/32186
spa
Informática
Towards a multi-device versión of the HYFMGPU Algorithm
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/321872021-06-23T11:18:45Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Taboada Rodero, Ismael José
Torres de la Sierra, Yuri
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2018
El uso de aceleradores hardware de alto rendimiento, tales como las unidades de procesamiento gráfico (GPUs), ha ido en creciente aumento en los sistemas de supercomputación. Esta tendencia es fácilmente apreciable en la lista de computadoras mostradas por la clasificación TOP500. Programar este tipo de dispositivos es una tarea costosa que requiere un alto conocimiento sobre la arquitectura de cada uno de los aceleradores. Esta dificultad aumenta cuando se pretende explotar, de forma eficiente, los diferentes
recursos hardware de un dispositivo. Este trabajo propone un modelo de programación que permite el solapamiento de tareas de comunicación y computación en dispositivos GPU mejorando así el rendimiento de las aplicaciones. Nuestro estudio experimental muestra que este modelo oculta, de forma transparente, las latencias de comunicación si hay suficiente carga de comunicación, obteniendo hasta un 61.10% de mejora de rendimiento comparado con nuestra implementación síncrona.
application/pdf
http://uvadoc.uva.es/handle/10324/32187
spa
Universidad de Zaragoza
Solapamiento transparente de tareas de comunicación y computación para mejor rendimiento de aplicaciones de GPU
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/321882021-06-23T11:18:46Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Diego, Ester
Carravilla, David
Vicente, Guillermo
Campo Pando, Héctor del
Barba Gutiérrez, Daniel
Llanos Ferraris, Diego Rafael
March, Juan A.
2018
Waste disposal and recycling is becoming one of the main problems in Western countries. Improving both recycling culture among citizenship and waste collection and treatment logistics is critical to augment the percentage of waste being recycled. In this paper we present STERLING, an initiative that aims to help in both fields. STERLING is a framework composed by a low-cost, low-energy sensor installed in recycling containers to measure fi ll level and other physical parameters. The sensor is activated magnetically each time the container lid is opened by a user. Instead of directly sending this information to a cloud-based server, our sensor broadcasts a Bluetooth Low Energy (BLE) packet to the surrounding area. An App running in the mobile phone of the user performs two actions: To capture this information, re-sending it to the cloud-based server, and to assign credit to this particular user for having used the recycling container. In this way, users are rewarded for using the container, and the infrastructure bene ts from cost-free communications from the container to the server. In this paper we will describe our idea in detail, showing how it can be used to develop a rewarding schema that encourages recycling.
application/pdf
http://uvadoc.uva.es/handle/10324/32188
eng
SPIE
Informática
STERLING: A framework for serious games to encourage recycling
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/390712022-01-18T12:44:28Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Fernández Fabeiro, Jorge
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2019
Hyperspectral image registration is a relevant task for real-time applications like environmental disasters management or search and rescue scenarios. Traditional algorithms for this problem were not really devoted to real-time performance. The HYFMGPU algorithm arose as a high-performance GPU-based solution to solve such a lack. Nevertheless, a single-GPU solution is not enough, as sensors are evolving and then generating images with finer resolutions and wider wavelength ranges. An MPI+CUDA distributed multi-GPU implementation of HYFMGPU was previously presented. However, this solution
shows the programming complexity of combining MPI with an accelerator programming model. In this paper we present a new and more abstract programming approach for this type of applications, which provides a high efficiency while simplifying the programming of the distributed code. The solution uses Hitmap, a library to ease the programming of parallel applications based on distributed arrays. It uses a more algorithm-oriented approach than MPI, including abstractions for the automatic partition and mapping of arrays at runtime with arbitrary granularity, as well as techniques to build flexible communication patterns that transparently adapt to the data partitions. We show how these abstractions apply to this application class. We present a comparison of development effort metrics between the
original MPI implementation and the one based on Hitmap, with reductions of up to 95% for the Halstead score in specific work redistribution steps. We finally present experimental results showing that these abstractions are internally implemented in a high efficient way that can reduce the overall performance time in up to 37% comparing with the original MPI implementation.
application/pdf
http://uvadoc.uva.es/handle/10324/39071
spa
IEEE
Simplifying the distributed multi-GPU programming of a hyperspectral image registration algorithm
info:eu-repo/semantics/conferenceObject
8 p
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/390722021-06-23T11:18:53Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Rodríguez Gutiez, Eduardo
Moretón Fernández, Ana
González Escribano, Arturo
Llanos Ferraris, Diego Rafael
2019
Linear algebra kernels are in the core of many scientific applications. We propose a unified, performance-oriented, and portable interface for BLAS.
application/pdf
http://uvadoc.uva.es/handle/10324/39072
spa
Toward aBLAS library truly portable across different accelerator types [Poster]
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/576992023-01-10T08:55:50Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Cardeñoso Payo, Valentín
González Ferreras, César
Escudero Mancebo, David
2021
IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.
application/pdf
https://uvadoc.uva.es/handle/10324/57699
spa
Tecnologías del Habla
Proceedings IberSPEECH 2020
info:eu-repo/semantics/conferenceObject
292 p.
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/590152023-03-30T10:14:23Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Aparicio de la Fuente, Amador
Martínez González, María Mercedes
Cardeñoso Payo, Valentín
2022
Android es el sistema operativo con mayor presencia en dispositivos móviles. El mecanismo de permisos se utiliza para conceder o restringir a las aplicaciones el acceso a los datos y recursos del dispositivo. Las aplicaciones solicitan permisos para acceder a ellos. Nuestra propuesta tiene como objetivo obtener una métrica basada en los permisos, fácil de utilizar para los propietarios de los dispositivos, que les proporcione una orientación sobre el riesgo para su privacidad que asumen cuando instalan una aplicación en su dispositivo. Como novedad rele-vante frente a propuestas anteriores, planteamos utilizar los grupos de permisos como uno de sus parámetros. Los grupos de permisos expresan conceptos más asequibles para cualquier tipo de usuario que los permisos individuales y son aquello sobre lo que en realidad los usuarios pueden actuar. Introducimos así el criterio de la usabilidad, lo que nos permite obtener una tecnología más humana
application/pdf
https://uvadoc.uva.es/handle/10324/59015
spa
Métrica basada en grupos de permisos para entender el impacto de las aplicaciones Android sobre la privacidad
info:eu-repo/semantics/conferenceObject
CISTI'2022: 17ª Conferencia Ibérica de Sistemas y Tecnologías de Información
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/590162023-03-30T10:10:54Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Aparicio de la Fuente, Amador
Martínez González, María Mercedes
Cardeñoso Payo, Valentín
2022
One of the ways to authenticate users of mobile devices is by sending One Time Password (OTP) codes via SMS messages. In order to facilitate the use of these codes by customers, Google has proposed APIs that allow the automatic verifica-tion of the SMS messages without the intervention of the users themselves. One of these APIs is the SMS Retriever API for Android devices. This article presents a study of this API. Different scenarios of interaction between mobile apps and SMS OTP servers are posed to determine which implementations of the SMS Re-triever API are vulnerable. The study presented here focuses on Spain’s banking sector. The results show that there are vulnerable implementations which would allow cybercriminals to steal the users’ SMS OTP codes. The desirable equilibri-um between ease of use and security needs to be improved in order to maintain the high level of security which has traditionally characterized this sector. The proposed methodology, applied here to this particular sector (banking), is never-theless simple enough to be applied to any other sector. One of its advantages is that it proposes a method for detecting bad implementations of the SMS Retriever API on the server side, based analyses of the apps, which would make it easily applicable.
application/pdf
https://uvadoc.uva.es/handle/10324/59016
eng
Springer, Cham
Vulnerabilities of the SMS Retriever API for the auto-matic verification of SMS OTP codes in the banking sector
info:eu-repo/semantics/conferenceObject
UCAmI 2022: 14th International Conference on Ubiquitous Computing and Ambient Intelligence
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/599442023-07-04T10:27:49Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Aparicio de la Fuente, Amador
Crespo Guerrero, Javier
Martínez González, María Mercedes
Cardeñoso Payo, Valentín
2023
Producción Científica
Android is the operating system with the largest presence on
mobile devices. The permissions mechanism is used to grant or restrict
the access of applications to the device’s data and resources. Applications
request permission to access them and users decide whether to grant or
deny them. Our proposal is to obtain a permissions-based metric, easy
to use for device owners, to provide them with guidance on the risk to
their privacy that they assume when they install an app on their device
and how to minimize this risk. A distinctive feature compared to other
proposals is that we use permission groups as one of the parameters.
These permission groups express concepts that are more accessible to any
type of user than individual permissions and are what users can actually
act on. This has the advantage of being easier for users to understand. To
facilitate its use, we have developed a service that allows you to consult
it, but also to perform simulations to check how granting or denying each
group of permissions requested by an application affects before making
decisions and taking risks on the device itself. We thus introduce the
criterion of usability, which allows us to obtain a more human technology,
available to empowered users.
application/pdf
https://uvadoc.uva.es/handle/10324/59944
eng
A human oriented Privacy Impact Metric for mobile apps
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/602552023-08-17T09:04:42Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Pisabarro Marrón, Alma María
Vivaracho Pascual, Carlos Enrique
Arias Herguedas, Silvia
Ortega Arranz, Alejandro
Jiménez Gil, Luis Ignacio
2023
El uso de juegos en el ámbito educativo ha mostrado potenciales beneficios para los estudiantes universita-rios, incluyendo una mayor implicación y motivación para con sus asignaturas y contenidos. Sin embargo, la creación, adaptación y monitorización de estos juegos por parte de los docentes se ha identificado como una limitación para su uso y adopción. Este trabajo presenta una plataforma de juegos serios (GamiSpace) que permite a los docentes añadir, configurar, eliminar y ocultar juegos, pudiendo ser utilizada por diferentes asignaturas simultáneamente. La plataforma incorpora una API que permite recopilar analíticas de la interacción de los estudiantes con los juegos (game analytics). De esta manera, los docentes pueden monitorizar la participación y la progresión de los estudiantes de una forma menos intrusiva, y usar esa información para el diseño de sus clases. Finalmente, la plataforma también permite la configuración de competiciones individuales o por equipos, pudiendo fomentar así la colaboración/competición entre los estudiantes. La plataforma, junto con una serie de juegos instalados en ella, ha sido evaluada en la asignatura de Fundamentos de Programación del Grado en Ingeniería Informática de una universidad española, obteniendo resultados prometedores para su uso en otras asignaturas.
application/pdf
https://uvadoc.uva.es/handle/10324/60255
spa
GamiSpace: una plataforma de juegos abierta y configurable con soporte para analíticas
info:eu-repo/semantics/conferenceObject
8 páginas
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/635092023-12-11T20:01:15Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Ortega-Arranz, Alejandro
Topali, Paraskevi
Asensio-Pérez, Juan Ignacio
Villagrá Sobrino, Sara L.
Martínez-Monés, Alejandra
Dimitriadis, Yannis
2022
Producción Científica
The provision of personalized and timely feedback can become challenging when shifting from face-to-face to online learning. Feedback is not only about providing support to students, but also about identifying when and which students need what kind of support. Usually, educators carry out such activities manually. However, the manual identification, personalization and provision of feedback might turn unmanageable, especially in large-scale environments. Previous works proposed the use of data-driven tools to automate the feedback provision with the active involvement of human agents in its design. Nevertheless, to the best of our knowledge, these tools do not guide instructors in the process of feedback design and sense-making of the data-driven information. This paper presents e-FeeD4Mi, a web-based tool developed to support instructors in the design and automatic enactment of feedback in multiple virtual learning environments. We developed e-FeeD4Mi following a Design-Based Research approach and its potential for adoption has been evaluated in two evaluation studies.
application/pdf
https://uvadoc.uva.es/handle/10324/63509
eng
Springer
1203.10 Enseñanza Con Ayuda de Ordenador
e-FeeD4Mi: Automating tailored LA-informed feedback in Virtual Learning Environments
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana
oai:uvadoc.uva.es:10324/659452024-02-07T20:01:53Zcom_10324_1165com_10324_931com_10324_894col_10324_1337
Leier, A.
Marquez-Lago, TT.
Barrio, M.
2013-07-06
Reduction of chemical reaction networks with distributed delays
application/pdf
https://uvadoc.uva.es/handle/10324/65945
eng
WILEY-BLACKWELL
Reduction of chemical reaction networks with distributed delays
info:eu-repo/semantics/conferenceObject
TEXT
UVaDOC. Repositorio Documental de la Universidad de Valladolid
Hispana