Predictable Parallel Computing

Framework for the Analysis and Configuration of Real-Time OpenMP Applications

Link: https://ieeexplore.ieee.org/abstract/document/10218276

Authors: Tiago Carvalho, Luis Miguel Pinho, Mohammad Samadi, Sara Royuela, Adrian Munera, Eduardo Quiñones

Date: 2023/07/18

In: IEEE 21st International Conference on Industrial Informatics (INDIN)

Abstract

High-performance cyber-physical applications impose several requirements with respect to performance, functional correctness and non-functional aspects. Nowadays, the design of these systems usually follows a model-driven approach, where models generate executable applications, usually with an automated approach. As these applications might execute in different parallel environments, their behavior becomes very hard to predict, and making the verification of non-functional requirements complicated. In this regard, it is crucial to analyse and understand the impact that the mapping and scheduling of computation have on the real-time response of the applications. In fact, different strategies in these steps of the parallel orchestration may produce significantly different interference, leading to different timing behaviour.Tuning the application parameters and the system configuration proves to be one of the most fitting solutions. The design space can however be very cumbersome for a developer to test manually all combinations of application and system configurations. This paper presents a methodology and a toolset to profile, analyse, and configure the timing behaviour of high-performance cyber-physical applications and the target platforms. The methodology leverages on the possibility of generating a task dependency graph representing the parallel computation to evaluate, through measurements, different mapping configurations and select the one that minimizes response time.

Bibtex

@inproceedings{carvalho2023framework, title={Framework for the Analysis and Configuration of Real-Time OpenMP Applications}, author={Carvalho, Tiago and Pinho, Luis Miguel and Samadi, Mohammad and Royuela, Sara and Munera, Adrian and Qui{\~n}ones, Eduardo}, booktitle={IEEE 21st International Conference on Industrial Informatics (INDIN)}, pages={1--8}, year={2023}, organization={IEEE} }

Software-Based Fault-Detection Technique for Object Tracking in Autonomous Vehicles

Link: https://ieeexplore.ieee.org/abstract/document/10155027

Authors: Alessio Medaglini, Sandro Bartolini, Gianluca Mandó, Eduardo Quinones, Sara Royuela

Date: 2023/06/6

In: Mediterranean Conference on Embedded Computing (MECO)

Abstract

Autonomous vehicles are nowadays gaining popularity in many different sectors, from automotive to aviation, and find application in increasingly complex and strategic contexts. In this domain, Obstacle Detection and Avoidance Systems (ODAS) are crucial and, since they are safety-critical systems, they must employ fault-detection and management techniques to maintain correct behavior. One of the most popular techniques to obtain a reliable system is the use of redundancy, both at the hardware and at the software levels. With the objective of improving fault-detection while producing little impact on the programmability of the system, this paper introduces a general and lightweight monitoring technique based on a user-directed observer design pattern, which aims at monitoring the validity of predicates over state variables of the algorithms in execution. This can increase the fault-detection capability and even anticipate the detection time of some faults that would be caught by replication only at later times. Results are evaluated on a real-world use-case from the railway domain, and show how the proposed fault-detection mechanism can increase the overall reliability of the system by up to 24.4% compared to replication alone in case of crowded scenarios over the entire tracking process, and up to 43.9% in specific phases.

Bibtex

@inproceedings{medaglini2023software, title={Software-Based Fault-Detection Technique for Object Tracking in Autonomous Vehicles}, author={Medaglini, Alessio and Bartolini, Sandro and Mand{\'o}, Gianluca and Quinones, Eduardo and Royuela, Sara}, booktitle={12th Mediterranean Conference on Embedded Computing (MECO)}, pages={1--7}, year={2023}, organization={IEEE} }

Taskgraph: A Low Contention OpenMP Tasking Framework

Link: https://ieeexplore.ieee.org/abstract/document/10146446

Authors: Chenle Yu, Sara Royuela, Eduardo Quiñones

Date: 2023/06/8

In: IEEE Transactions on Parallel and Distributed Systems

Abstract

OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a tasking model that offers a high-level of abstraction to effectively exploit structured (loop-based) and highly dynamic unstructured (task-based) parallelism in an easy and flexible way. Unfortunately, the run-time overheads introduced to manage tasks are (very) high in most common OpenMP frameworks (e.g., GCC, LLVM), which defeats the potential benefits of the tasking model, and makes it suitable for coarse-grained tasks only. This paper presents taskgraph , a framework that uses a task dependency graph (TDG) to represent a region of code implemented with OpenMP tasks in order to reduce the run-time overheads associated with the management of tasks, i.e., contention and parallel orchestration, including task creation and synchronization. The TDG avoids the overheads related to the resolution of task dependencies and greatly reduces those deriving from accesses to shared resources. Moreover, the taskgraph framework introduces in OpenMP the record-and-replay execution model that accelerates the taskgraph region from its second execution. Overall, the multiple optimizations presented in this paper allow exploiting fine-grained OpenMP tasks to cope with the trend in current applications pointing to leverage massive on-node parallelism, fine-grained and dynamic scheduling paradigms. The framework is implemented on LLVM 15.0. Results show that the taskgraph implementation outperforms the vanilla OpenMP system in terms of performance and scalability, for all structured and unstructured parallelism, and considering coarse and fine grained tasks. Furthermore, the proposed framework makes the tasking model a competitive alternative to the OpenMP thread model in most cases.

Bibtex

@article{yu2023taskgraph, title={Taskgraph: A Low Contention OpenMP Tasking Framework}, author={Yu, Chenle and Royuela, Sara and Qui{\~n}ones, Eduardo}, journal={IEEE Transactions on Parallel and Distributed Systems}, year={2023} }

A Non Linear Control Method with Reinforcement Learning for Adaptive Optics with Pyramid Sensors

Link: https://upcommons.upc.edu/bitstream/handle/2117/328240/extrae_paraver_acm.pdf

Authors: Bartomeu Pou, Jeffrey Smith, Eduardo Quinones, Mario Martin, Damien Gratadour

Date: 2022/05/01

In: Artificial Intelligence for Science and Operations in Astronomy

Abstract

Extreme Adaptive Optics (AO) systems are designed to provide high resolution and high contrast observing capabilities on the largest ground-based telescopes through exquisite phase reconstruction accuracy. In that context, the pyramid wavefront sensor (P-WFS) has shown promise to deliver the means to provide such accuracy due to its high sensitivity. However, traditional methods cannot leverage the highly non-linear P-WFS measurements to their full potential. We present a predictive control method based on Reinforcement Learning (RL) for AO control with a P-WFS. The proposed approach is data-driven, has no assumptions about the system's evolution, and is non-linear due to the usage of neural networks. First, we discuss the challenges of using an RL control method with a P-WFS and propose solutions. Then, we show that our method outperforms an optimized integrator controller. Finally, we discuss its possible path for an actual implementation.

Bibtex

@inproceedings{pou2022non, title={A Non Linear Control Method with Reinforcement Learning for Adaptive Optics with Pyramid Sensors}, author={Pou, Bartomeu and Smith, Jeffrey and Quinones, Eduardo and Martin, Mario and Gratadour, Damien}, booktitle={SciOps 2022: Artificial Intelligence for Science and Operations in Astronomy (SCIOPS). Proceedings of the ESA/ESO SCOPS Workshop held 16-20 May}, pages={26}, year={2022} }

Heuristic-based Task-to-Thread Mapping in Multi-Core Processors

Link: https://upcommons.upc.edu/bitstream/handle/2117/377548/MSG_EFTA.pdf

Authors: Mohammad S. Gharajeh, Sara Royuela, Luis M. Pinho, Tiago Carvalho, Eduardo Quiñones

Date: 2022/09/6

In: International Conference on Emerging Technologies and Factory Automation

Abstract

OpenMP can be used in real-time applications to enhance system performance. However, predictability of OpenMP applications is still a challenge. This paper investigates heuristics for the mapping of OpenMP task graphs in underlying threads, for the development of time-predictable OpenMP programs. These approaches are based on a global scheduling queue, as well as per-thread allocation queues. The proposed method is divided into scheduling and allocation phases. In the former phase, OpenMP task-parts are discovered from OpenMP graph and placed in the scheduling queue. Afterwards, an appropriate allocation queue is selected for each task-part using four heuristic algorithms. In the latter phase, the best task-part is selected from the allocation queue to be allocated to and executed by an idle thread. Preliminary simulation results show that the new method overcomes BFS and WFS in terms of scheduling time and idle time.

Bibtex

@inproceedings{gharajeh2022heuristic, title={Heuristic-based Task-to-Thread Mapping in Multi-Core Processors}, author={Gharajeh, Mohammad Samadi and Royuela, Sara and Pinho, Luis Miguel and Carvalho, Tiago and Qui{\~n}ones, Eduardo}, booktitle={2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA)}, pages={1--4}, year={2022}, organization={IEEE} }

Real-time Issues in the Ada Parallel Model with OpenMP

Link: https://upcommons.upc.edu/handle/2117/349202

Authors: Luis Miguel Pinho, Sara Royuela, Eduardo Quiñones

Date: 2021/04/27

In: ACM SIGAda Ada Letters

Abstract

The current proposal for the next revision of the Ada language considers the possibility to map the language parallel features to an underlying OpenMP runtime. As previously presented, and discussed in previous workshops, the works on fine-grain parallelism in Ada map well to the OpenMP tasking model for parallelism. Nevertheless, and although the general model of integration, and the semantic constructs are already reflected in the proposed revision of the standard, the integration of these new features with the Real-Time Systems Annex of Ada is still not complete. This paper presents an overview of what is supported and the still open issues.

Bibtex

@article{miguel2021real, title={Real-time Issues in the Ada Parallel Model with OpenMP}, author={Miguel Pinho, Luis and Royuela, Sara and Qui{\~n}ones, Eduardo}, journal={ACM SIGAda Ada Letters}, volume={40}, number={2}, pages={96--102}, year={2021}, publisher={ACM New York, NY, USA} }

The OpenMP API for High Integrity Systems: Moving Responsibility from Users to Vendors

Link: https://upcommons.upc.edu/handle/2117/346447

Authors: Michael Klemm, Eduardo Quiñones, Tucker Taft, Dirk Ziegenbein, Sara Royuela

Date: 2021/04/27

In: HILT 2020 Workshop on Safe Languages and Technologies for Structured and Efficient Parallel and Distributed/Cloud Computing

Abstract

OpenMP is traditionally focused on boosting performance in HPC systems. However, other domains are showing an increasing interest in the use of OpenMP by virtue of key aspects introduced in recent versions of the specification: the tasking model, the accelerator model, and other features like the requires and the assumes directives, which allow defining certain contracts. One example is the safety-critical embedded domain, where several efforts have been initiated towards the adoption of OpenMP. However, the OpenMP specification states that "application developers are responsible for correctly using the OpenMP API to produce a conforming program", being not acceptable in high integrity systems, where aspects such as reliability and resiliency have to be ensured at different levels of criticality. In this scope, programming languages like Ada propose a different paradigm by exposing fewer features to the user, and leaving the responsibility of safely exploiting the full underlying architecture to the compiler and the runtime systems, instead. The philosophy behind this kind of model is to move the responsibility of producing correct parallel programs from users to vendors. In this panel, actors from different domains involved in the use of parallel programming models for the development of high-integrity systems share their thoughts about this topic.

Bibtex

@article{klemm2021openmp, title={{The OpenMP API for High Integrity Systems: Moving Responsibility from Users to Vendors}}, author={Klemm, Michael and Qui{\~n}ones, Eduardo and Taft, Tucker and Ziegenbein, Dirk and Royuela, Sara}, journal={ACM SIGAda Ada Letters}, volume={40}, number={2}, pages={48--50}, year={2021}, publisher={ACM New York, NY, USA} }

Enhancing OpenMP Tasking Model: Performance and Portability

Link: https://upcommons.upc.edu/handle/2117/351422

Authors: Chenle Yu, Sara Royuela, Eduardo Quiñones

Date: 2021/9/14

In: International Workshop on OpenMP

Abstract

OpenMP, as the de-facto standard programming model in symmetric multiprocessing for HPC, has seen its performance boosted continuously by the community, either through implementation enhancements or specification augmentations. Furthermore, the language has evolved from a prescriptive nature, as defined by the thread-centric model, to a descriptive behavior, as defined by the task-centric model. However, the overhead related to the orchestration of tasks is still relatively high. Applications exploiting very fine-grained parallelism and systems with a large number of cores available might fail on scaling. In this work, we propose to include the concept of Task Dependency Graph (TDG) in the specification by introducing a new clause, named taskgraph, attached to task or target directives. By design, the TDG allows alleviating the overhead associated with the OpenMP tasking model, and it also facilitates linking OpenMP with other programming models that support task parallelism. According to our experiments, a GCC implementation of the taskgraph is able to significantly reduce the execution time of fine-grained task applications and increase their scalability with regard to the number of threads.

Bibtex

@inproceedings{yu2021enhancing, title={{Enhancing OpenMP Tasking Model: Performance and Portability}}, author={Yu, Chenle and Royuela, Sara and Qui{\~n}ones, Eduardo}, booktitle={International Workshop on OpenMP}, pages={35--49}, year={2021}, organization={Springer} }

Denoising wavefront sensor images with deep neural networks

Link: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11448/114484J/Denoising-wavefront-sensor-images-with-deep-neural-networks/10.1117/12.2576242.short?SSO=1

Authors: Bartomeu Pou Mulet, Eduardo Quiñones, Damien Gratadour, Mario Martín

Date: 2020/12/13

In: SPIE

Abstract

A classical closed-loop adaptive optics system with a Shack-Hartmann wavefront sensor (WFS) relies on a center of gravity approach to process the WFS information and an integrator with gain to produce the commands to a Deformable Mirror (DM) to compensate wavefront perturbations. In this kind of systems, noise in the WFS images can propagate to errors in centroids computation, and thus, lead the AO system to perform poorly in closed-loop operations. In this work, we present a deep supervised learning method to denoise the WFS images based on convolutional denoising autoencoders. Our method is able to denoise the images up to a high noise level and improve the integrator performance almost to the level of a noise-free situation.

Bibtex

@inproceedings{pou2020denoising, title={Denoising wavefront sensor image with deep neural networks}, author={Pou, B and Qui{\~n}ones, E and Gratadour, Damien and Martin, M}, booktitle={Adaptive Optics Systems VII}, volume={11448}, pages={114484J}, year={2020}, organization={International Society for Optics and Photonics} } }

A toolchain to verify the parallelization of OmpSs-2 applications

Link: https://upcommons.upc.edu/handle/2117/330464

Authors: Simone Economo, Sara Royuela, Eduard Ayguadé, Vicenç Beltran

Date: 2020/08/24

In: European Conference on Parallel Processing (Euro-Par)

Abstract

Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.

Bibtex

@inproceedings{economo2020toolchain, title={A toolchain to verify the parallelization of OmpSs-2 applications}, author={Economo, Simone and Royuela, Sara and Ayguad{\'e}, Eduard and Beltran, Vicen{\c{c}}}, booktitle={European Conference on Parallel Processing}, pages={18--33}, year={2020}, organization={Springer} }

Towards a Qualifiable OpenMP Framework for Embedded Systems

Link: https://upcommons.upc.edu/handle/2117/191570

Authors: Adrian Munera, Sara Royuela, Eduardo Quiñones

Date: 2020/03/09

In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020

Abstract

OpenMP is a very convenient programming model for critical real-time parallel applications due to its powerful tasking model and its proven time predictability. However, current implementations are not suitable for critical environments based on the intensive use of dynamically allocated memory needed to efficiently manage the parallel execution. This jeopardizes the qualification processes needed to ensure that the integrated software stack is compliant with system requirements. This paper proposes a novel OpenMP framework that statically allocates the data structures needed to efficiently manage the parallel execution of OpenMP tasks. Our framework is composed of a compiler that captures the environment of the OpenMP tasks instantiated along the parallel execution and bounds the exposed parallelism, and a runtime implementing a lazy task creation policy that significantly reduces the runtime memory requirements, whilst exploiting parallelism efficiently. The evaluation shows that our tool achieves the same performance as current OpenMP implementations, while bounds and drastically reduces the dynamic memory requirements at run-time.

Bibtex

@inproceedings{munera2020towards, title={Towards a qualifiable OpenMP framework for embedded systems}, author={Munera, Adrian and Royuela, Sara and Qui{\~n}ones, Eduardo}, booktitle={2020 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, pages={903--908}, year={2020}, organization={IEEE} }

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver

Link: https://upcommons.upc.edu/bitstream/handle/2117/328240/extrae_paraver_acm.pdf

Authors: Adrian Munera, Sara Royuela, German Llort, Estanislau Mercadal, Franck Wartel, Eduardo Quiñones

Date: 2020/08/17

In: International Conference on Parallel Processing

Abstract

Cutting-edge functionalities in embedded systems require the use of parallel architectures to meet their performance requirements. This imposes the introduction of a new layer in the software stacks of embedded systems: the parallel programming model. Unfortunately, the tools used to analyze embedded systems fall short to characterize the performance of parallel applications at a parallel programming model level, and correlate this with information about non-functional requirements such as real-time, energy, memory usage, etc. HPC tools, like Extrae, are designed with that level of ab- straction in mind, but their main focus is on performance evaluation. Overall, providing insightful information about the performance of parallel embedded applications at the parallel programming model level, and relate it to the non-functional requirements, is of paramount importance to fully exploit the performance capabilities of parallel embedded architectures. This paper contributes to the state-of-the-art of analysis tools for embedded systems by: (1) analyzing the particular constraints of embedded systems compared to HPC systems (e.g., static setting, restricted memory, limited drivers) to support HPC analysis tools; (2) porting Extrae, a powerful tracing tool from the HPC domain, to the GR740 platform, a SoC used in the space domain; and (3) aug- menting Extrae with new features needed to correlate the parallel execution with the following non-functional requirements: energy, temperature and memory usage. Finally, the paper presents the usefulness of Extrae to characterize OpenMP applications and its non-functional requirements, evaluating different aspects of the applications running in the GR740.

Bibtex

@inproceedings{munera2020experiences, title={Experiences on the characterization of parallel applications in embedded systems with extrae/paraver}, author={Munera, Adrian and Royuela, Sara and Llort, Germ{\'a}n and Mercadal, Estanislao and Wartel, Franck and Qui{\~n}ones, Eduardo}, booktitle={Proceedings of the 49th International Conference on Parallel Processing}, pages={1--11}, year={2020} }

OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices

Link: https://upcommons.upc.edu/bitstream/handle/2117/190303/OpenMP_to_CUDA.pdf

Authors: Chenle Yu, Sara Royuela, Eduardo Quiñones

Date: 2020/05/25

In: International Workshop on Software and Compilers for Embedded Systems

Abstract

Heterogeneous computing is increasingly being used in a diversity of computing systems, ranging from HPC to the real-time embedded domain, to cope with the performance requirements. Due to the variety of accelerators, e.g., FPGAs, GPUs, the use of high-level parallel programming models is desirable to exploit the performance capabilities of them, while maintaining an adequate productivity level. In that regard, OpenMP is a well-known high-level programming model that incorporates powerful task and accelerator models capable of efficiently exploiting structured and unstructured parallelism in heterogeneous computing. This paper presents a novel compiler transformation technique that automatically transforms OpenMP code into CUDA graphs, combining the benefits of programmability of a high-level programming model such as OpenMP, with the performance benefits of a low-level programming model such as CUDA. Evaluations have been performed on two NVIDIA GPUs from the HPC and embedded domains, i.e., the V100 and the Jetson AGX respectively.

Bibtex

@inproceedings{yu2020openmp, title={OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices}, author={Yu, Chenle and Royuela, Sara and Qui{\~n}ones, Eduardo}, booktitle={Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems}, pages={42--47}, year={2020} }

The AMPERE Project: A Model-driven development framework for highly Parallel and EneRgy-Efficient computation supporting multi-criteria optimization

Link: https://upcommons.upc.edu/handle/2117/191574

Authors: Eduardo Quiñones, Sara Royuela, Claudio Scordino, Paolo Gai, Luís Miguel Pinho, Luís Nogueira, Jan Rollo, Tommaso Cucinotta, Alessandro Biondi, Arne Hamann, Dirk Ziegenbein, Hadi Saoud, Romain Soulat, Björn Forsberg, Luca Benini, Gianluca Mandò, Luigi Rucher

Date: 2020/05/19

In: International Symposium on Real-Time Distributed Computing (ISORC)

Abstract

The high-performance requirements needed to implement the most advanced functionalities of current and future Cyber-Physical Systems (CPSs) are challenging the development processes of CPSs. On one side, CPSs rely on model-driven engineering (MDE) to satisfy the non-functional constraints and to ensure a smooth and safe integration of new features. On the other side, the use of complex parallel and heterogeneous embedded processor architectures becomes mandatory to cope with the performance requirements. In this regard, parallel programming models, such as OpenMP or CUDA, are a fundamental brick to fully exploit the performance capabilities of these architectures. However, parallel programming models are not compatible with current MDE approaches, creating a gap between the MDE used to develop CPSs and the parallel programming models supported by novel and future embedded platforms.The AMPERE project will bridge this gap by implementing a novel software architecture for the development of advanced CPSs. To do so, the proposed software architecture will be capable of capturing the definition of the components and communications described in the MDE framework, together with the non-functional properties, and transform it into key parallel constructs present in current parallel models, which may require extensions. These features will allow for making an efficient use of underlying parallel and heterogeneous architectures, while ensuring compliance with non-functional requirements, including those on real-time performance of the system.

Bibtex

@inproceedings{quinones2020ampere, title={The AMPERE Project:: A Model-driven development framework for highly Parallel and EneRgy-Efficient computation supporting multi-criteria optimization}, author={Qui{\~n}ones, Eduardo and Royuela, Sara and Scordino, Claudio and Gai, Paolo and Pinho, Lu{\'\i}s Miguel and Nogueira, Lu{\'\i}s and Rollo, Jan and Cucinotta, Tommaso and Biondi, Alessandro and Hamann, Arne and others}, booktitle={2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC)}, pages={201--206}, year={2020}, organization={IEEE} }

Enabling Ada and OpenMP runtimes interoperability through template-based execution

Link: https://upcommons.upc.edu/handle/2117/189546

Authors: Sara Royuela, Luis Miguel Pinho, Eduardo Quiñones

Date: 2020/05/01

In: Journal of Systems Architecture.

Abstract

The growing trend to support parallel computation to enable the performance gains of the recent hardware architectures is increasingly present in more conservative domains, such as safety-critical systems. Applications such as autonomous driving require levels of performance only achievable by fully leveraging the potential parallelism in these architectures. To address this requirement, the Ada language, designed for safety and robustness, is considering to support parallel features in the next revision of the standard (Ada 202X). Recent works have motivated the use of OpenMP, a de facto standard in high-performance computing, to enable parallelism in Ada, showing the compatibility of the two models, and proposing static analysis to enhance reliability. This paper summarizes these previous efforts towards the integration of OpenMP into Ada to exploit its benefits in terms of portability, programmability and performance, while providing the safety benefits of Ada in terms of correctness. The paper extends those works proposing and evaluating an application transformation that enables the OpenMP and the Ada runtimes to operate (under certain restrictions) as they were integrated. The objective is to allow Ada programmers to (naturally) experiment and evaluate the benefits of parallelizing concurrent Ada tasks with OpenMP while ensuring the compliance with both specifications.

Bibtex

@article{royuela2020enabling, title={Enabling Ada and OpenMP runtimes interoperability through template-based execution}, author={Royuela, Sara and Pinho, Lu{\'\i}s Miguel and Qui{\~n}ones, Eduardo}, journal={Journal of Systems Architecture}, volume={105}, pages={101702}, year={2020}, publisher={Elsevier} }

An ILP-based real-time scheduler for distributed and heterogeneous computing environments

Link: https://upcommons.upc.edu/bitstream/handle/2117/167774/43_An_ILP-based_real-time.pdf

Authors: Eudald Sabaté, María A Serrano, Eduardo Quiñones

Date: 2019/5/7

In: BSC Severo Ochoa International Doctoral Symposium (6th: 2019: Barcelona). Book of abstracts

Abstract

The digitalization process is making cities to rapidly increase the amount of data to be processed upon which data analytics can extract valuable knowledge. However, this phenomenon is facing many important challenges. On one side, the advent of connected and autonomous vehicles challenges data analytics methods due to the need of accomplishing real-time requirements. On the other side, the dispersion nature of data sources makes current big data analytics methods, commonly designed to execute in centralized and computationally intensive (cloud-based) environments, not suitable for smart cities. The use of distributed computing environments composed of advanced parallel embedded processor architectures at the edge, eg, NVIDIA Jetson, Kalray MPPA, can help alleviating the pressure on centralized cloud-based solutions, while providing the real-time guarantees needed to implement advanced mobility functionalities on cars and cities. To do so, this work presents a novel scheduler (based on ILP formulation) to optimally distribute the computation across the compute continuum composed of multiple edge devices, while providing real-time guarantees. Our scheduler, implemented in the COMPSs distributed programming model developed at BSC, statically assigns tasks to those edge devices so that the overall response time of the workflow is minimized. It takes into account an execution time upper bound of the computation and communication existing in the workflow.

Bibtex

@inproceedings{sabate2019ilp, title={An ILP-based real-time scheduler for distributed and heterogeneous computing environments}, author={Sabat{\'e}, Eudald and Serrano, Maria A and Qui{\~n}ones, Eduardo}, booktitle={Book of abstracts}, pages={100--101}, year={2019}, organization={Barcelona Supercomputing Center} }

The cooperative parallel: A discussion about run-time schedulers for nested parallelism

Link: https://upcommons.upc.edu/handle/2117/168515

Authors: Sara Royuela, María A Serrano, Marta Garcia-Gasulla, Sergi Mateo Bellido, Jesús Labarta, Eduardo Quiñones

Date: 2019/9/11

In: International Workshop on OpenMP

Abstract

Nested parallelism is a well-known parallelization strategy to exploit irregular parallelism in HPC applications. This strategy also fits in critical real-time embedded systems, composed of a set of concurrent functionalities. In this case, nested parallelism can be used to further exploit the parallelism of each functionality. However, current run-time implementations of nested parallelism can produce inefficiencies and load imbalance. Moreover, in critical real-time embedded systems, it may lead to incorrect executions due to, for instance, a work non-conserving scheduler. In both cases, the reason is that the teams of OpenMP threads are a black-box for the scheduler, i.e., the scheduler that assigns OpenMP threads and tasks to the set of available computing resources is agnostic to the internal execution of each team. This paper proposes a new run-time scheduler that considers dynamic information of …

Bibtex

@inproceedings{royuela2019cooperative, title={The cooperative parallel: A discussion about run-time schedulers for nested parallelism}, author={Royuela, Sara and Serrano, Maria A and Garcia-Gasulla, Marta and Bellido, Sergi Mateo and Labarta, Jes{\'u}s and Qui{\~n}ones, Eduardo}, booktitle={International Workshop on OpenMP}, pages={171--185}, year={2019}, organization={Springer} }

Techniques for reducing and bounding OpenMP dynamic memory

Link: https://upcommons.upc.edu/handle/2117/167769

Authors: Adrian Munera, Sara Royuela, Eduardo Quiñones

Date: 2019/5/7

In: BSC Severo Ochoa International Doctoral Symposium (6th: 2019: Barcelona). Book of abstracts

Abstract

OpenMP offers a tasking model very convenient to develop critical real-time parallel applications by virtue of its time predictability. However, current implementations make an intensive use of dynamic memory to efficiently manage the parallel execution. This jeopardizes the qualification process and limits the use of OpenMP in architectures with limited amount of memory. This work introduces an OpenMP framework that statically allocates the data structures needed to efficiently manage parallel execution in OpenMP programs. We achieve the same performance than current implementations, while bounding and reducing the dynamic memory requirements at runtime.

Bibtex

@inproceedings{munera2019techniques, title={Techniques for reducing and bounding OpenMP dynamic memory}, author={Munera, Adrian and Royuela, Sara and Qui{\~n}ones, Eduardo}, booktitle={Book of abstracts}, pages={91--92}, year={2019}, organization={Barcelona Supercomputing Center} }

High-Performance and Time-Predictable Embedded Computing

Link: https://pdfs.semanticscholar.org/5431/7ad86b580d852e22761ef4ed95c3b163ffe2.pdf

Authors: Luís Miguel Pinho, Eduardo Quiñones, Marko Bertogna, Andrea Marongiu, Vincent Nélis, Paolo Gai, Juan Sancho

Date: 2018/7/4

In:

Abstract

Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platformsProgramming modelsMapping and scheduling of parallel computationsTiming and schedulability analysisRuntimes and operating systems The work reflected in this book was done in the scope of the European project P‑SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things.

Bibtex

@book{pinho2018high, title={High-Performance and Time-Predictable Embedded Computing}, author={Pinho, Lu{\'\i}s Miguel and Quinones, Eduardo and Marongiu, Andrea}, year={2018}, publisher={River Publishers} }

OpenMP Runtime

Link: http://books.google.com/books?hl=en&lr=&id=pRNwDwAAQBAJ&oi=fnd&pg=PA145&dq=info:hULDxfqtSvgJ:scholar.google.com&ots=sKWxW76pje&sig=tZfmVZLNUPuMZu3kAc2B4b4TCJo

Authors: Andrea Marongiu, Giuseppe Tagliavini, Eduardo Quiñones

Date: 2018/7/1

In: High-Performance and Time-Predictable Embedded Computing

Abstract

This chapter introduces the design of the OpenMP runtime and its key components, the offloading library and the tasking runtime library. Starting from the execution model introduced in the previous chapters, we first abstractly describe the main interactions among the main actors involved in program execution. Then we focus on the optimized design of the offloading library and the tasking runtime library, followed by their performance characterization.

Bibtex

@article{itzkowitz2007openmp, title={An OpenMP runtime API for profiling}, author={Itzkowitz, Marty and Mazurov, Oleg and Copty, Nawal and Lin, Yuan and Lin, Y}, journal={OpenMP ARB as an official ARB White Paper available online at http://www. compunity. org/futures/omp-api. html}, volume={314}, pages={181--190}, year={2007} }

Predictable Parallel Programming with OpenMP

Link: http://books.google.com/books?hl=en&lr=&id=pRNwDwAAQBAJ&oi=fnd&pg=PA33&dq=info:KTyufEpsSFcJ:scholar.google.com&ots=sKWxW87rjj&sig=E4tao0WHyLuBaWHK2lipPgL2jiY

Authors: María A Serrano, Sara Royuela, Andrea Marongiu, Eduardo Quiñones

Date: 2018/7/4

In: High-Performance and Time-Predictable Embedded Computing

Abstract

This chapter motivates the use of the OpenMP (Open Multi-Processing) parallel programming model to develop future critical real-time embedded systems, and analyzes the time-predictable properties of the OpenMP tasking model. Moreover, this chapter presents the set of compiler techniques needed to extract the timing information of an OpenMP program in the form of an OpenMP Direct Acyclic Graph or OpenMP-DAG.

Bibtex

@article{serrano2018predictable, title={Predictable Parallel Programming with OpenMP}, author={Serrano, Maria A and Royuela, Sara and Marongiu, Andrea and Quinones, Eduardo}, journal={High-Performance and Time-Predictable Embedded Computing}, pages={33}, year={2018}, publisher={River Publishers} }

Mapping, Scheduling, and Schedulability Analysis

Link: http://books.google.com/books?hl=en&lr=&id=pRNwDwAAQBAJ&oi=fnd&pg=PA63&dq=info:VdDkSwlAB-gJ:scholar.google.com&ots=sKWxW87qrm&sig=xEGPdG4XPS_ddc8dCRbcf3zDgOk

Authors: Paolo Burgio, Marko Bertogna, Alessandra Melani, Eduardo Quiñones, María A Serrano

Date: 2018/7/4

In: High-Performance and Time-Predictable Embedded Computing

Abstract

This chapter presents how the P-SOCRATES framework addresses the issue of scheduling multiple real-time tasks (RT tasks), made of multiple and concurrent non-preemptable task parts. In its most generic form, the scheduling problem in the architectural framework is a dual problem: scheduling task-to-threads, and scheduling thread-to-core replication.

Bibtex

@article{burgio2018mapping, title={Mapping, Scheduling, and Schedulability Analysis}, author={Burgio, Paolo and Bertogna, Marko and Melani, Alessandra and Qui{\~n}ones, Eduardo and Serrano, Maria A}, journal={High-Performance and Time-Predictable Embedded Computing}, pages={63}, year={2018}, publisher={River Publishers} }

Big data analytics for smart cities: the H2020 CLASS project

Link: https://upcommons.upc.edu/handle/2117/167233

Authors: Eduardo Quiñones, Marko Bertogna, Erez Hadad, Ana J Ferrer, Luca Chiantore, Alfredo Reboa

Date: 2018/6/4

In: Proceedings of the 11th ACM International Systems and Storage Conference (SYSTOR)

Abstract

Applying big-data technologies to field applications has resulted in several new needs. First, processing data across a compute continuum spanning from cloud to edge to devices, with varying capacity, architecture etc. Second, some computations need to be made predictable (real-time response), thus supporting both data-in-motion processing and larger-scale data-at-rest processing. Last, employing an event-driven programming model that supports mixing different APIs and models, such as Map/Reduce, CEP, sequential code, etc.

Bibtex

@inproceedings{quinones2018big, title={Big data analytics for smart cities: the H2020 CLASS project}, author={Qui{\~n}ones, Eduardo and Bertogna, Marko and Hadad, Erez and Ferrer, Ana J and Chiantore, Luca and Reboa, Alfredo}, booktitle={SYSTOR'18 Proceedings of the 11th ACM International Systems and Storage Conference}, pages={130--130}, year={2018}, organization={ACM} }

Response-time analysis of DAG tasks supporting heterogeneous computing

Link: https://upcommons.upc.edu/bitstream/handle/2117/118441/Response-Time%20Analysis%20of%20DAG%20Tasks%20Supporting.pdf?sequence=1&isAllowed=y

Authors: María A Serrano, Eduardo Quiñones

Date: 2018/6/24

In: ACM/ESDA/IEEE Design Automation Conference (DAC)

Abstract

Hardware platforms are evolving towards parallel and heterogeneous architectures to overcome the increasing necessity of more performance in the real-time domain. Parallel programming models are fundamental to exploit the performance capabilities of these architectures. This paper proposes a novel response time analysis (RTA) for verifying the schedulability of DAG tasks supporting heterogeneous computing. It analyzes the impact of executing part of the DAG in the accelerator device. As a result, the response time upper bound of the system is more precise than the one provided by currently existing RTA targeting homogeneous architectures.

Bibtex

@inproceedings{serrano2018response, title={Response-time analysis of DAG tasks supporting heterogeneous computing}, author={Serrano, Maria A and Qui{\~n}ones, Eduardo}, booktitle={2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)}, pages={1--6}, year={2018}, organization={IEEE} }

Combining the tasklet model with OpenMP

Link: https://www.cister.isep.ipp.pt/docs/combining_the_tasklet_model_with_openmp/1358/

Authors: Luís Miguel Pinho, Eduardo Quiñones, Sara Royuela

Date: 2018/4/18

In: 19th International Real-Time Ada Workshop

Abstract

Previous workshops have discussed a proposal to augment Ada with fine-grained parallelism, based on the notion of tasklets, a lightweight parallel entity. Recent works have shown the convergence of this model with the OpenMP tasking model and have proposed their coexistence. In this paper we provide a status of the existent works, and describe how these models could be combined.

Bibtex

@inproceedings{pinho2018combining, title={Combining the tasklet model with OpenMP}, author={Pinho, Luis Miguel and Qui{\~n}onez, Eduardo and Royuela, Sara}, booktitle={19th International Real-Time Ada Workshop}, pages={14--18}, year={2018} }

Safe parallelism: compiler analysis techniques for ada and OpenMP

Link: https://upcommons.upc.edu/bitstream/handle/2117/119214/Royuela%20et%20al.pdf

Authors: Sara Royuela, Xavier Martorell, Eduardo Quiñones, Luís Miguel Pinho

Date: 2018/6/18

In: Ada-Europe International Conference on Reliable Software Technologies

Abstract

There is a growing need to support parallel computation in Ada to cope with the performance requirements of the most advanced functionalities of safety-critical systems. In that regard, the use of parallel programming models is paramount to exploit the benefits of parallelism. Recent works motivate the use of OpenMP for being a de facto standard in high-performance computing for programming shared memory architectures. These works address two important aspects towards the introduction of OpenMP in Ada: the compatibility of the OpenMP syntax with the Ada language, and the interoperability of the OpenMP and the Ada runtimes, demonstrating that OpenMP complements and supports the structured parallelism approach of the tasklet model. This paper addresses a third fundamental aspect: functional safety from a compiler perspective. Particularly, it focuses on race conditions and …

Bibtex

@inproceedings{royuela2018safe, title={Safe parallelism: compiler analysis techniques for ada and OpenMP}, author={Royuela, Sara and Martorell, Xavier and Qui{\~n}ones, Eduardo and Pinho, Luis Miguel}, booktitle={Ada-Europe International Conference on Reliable Software Technologies}, pages={141--157}, year={2018}, organization={Springer} }

Converging safety and high-performance domains: Integrating OpenMP into Ada

Link: https://ieeexplore.ieee.org/abstract/document/8342162/

Authors: Sara Royuela, Luís Miguel Pinho, Eduardo Quiñones

Date: 2018/3/19

In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Abstract

The use of parallel heterogeneous embedded architectures is needed to implement the level of performance required in advanced safety-critical systems. Hence, there is a demand for using high level parallel programming models capable of efficiently exploiting the performance opportunities. In this paper, we evaluate the incorporation of OpenMP, a parallel programming model used in HPC, into Ada, a language spread in safety-critical domains. We demonstrate that the execution model of OpenMP is compatible with the recently proposed Ada tasklet model, meant to exploit fine-grain structured parallelism. Moreover, we show the compatibility of the OpenMP and tasklet models, enabling the use of OpenMP directives in Ada to further exploit unstructured parallelism and heterogeneous computation. Finally, we state the safety properties of OpenMP and analyze the interoperability between the OpenMP and Ada …

Bibtex

@inproceedings{royuela2018converging, title={Converging safety and high-performance domains: Integrating OpenMP into Ada}, author={Royuela, Sara and Pinho, Luis Miguel and Quinones, Eduardo}, booktitle={2018 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, pages={1021--1026}, year={2018}, organization={IEEE} }

Towards an OpenMP Specification for Critical Real-Time Systems

Link: https://upcommons.upc.edu/bitstream/handle/2117/125131/Towards an OpenMP Specification for Critical.pdf?sequence=1&isAllowed=y

Authors: María A Serrano, Sara Royuela, Eduardo Quiñones

Date: 2018/9/26

In: International Workshop on OpenMP

Abstract

OpenMP is increasingly being considered as a convenient parallel programming model to cope with the performance requirements of critical real-time systems. Recent works demonstrate that OpenMP enables to derive guarantees on the functional and timing behavior of the system, a fundamental requirement of such systems. These works, however, focus only on the exploitation of fine grain parallelism and do not take into account the peculiarities of critical real-time systems, commonly composed of a set of concurrent functionalities. OpenMP allows exploiting the parallelism exposed within real-time tasks and among them. This paper analyzes the challenges of combining the concurrency model of real-time tasks with the parallel model of OpenMP. We demonstrate that OpenMP is suitable to develop advanced critical real-time systems by virtue of few changes on the specification, which allow the …

Bibtex

@inproceedings{serrano2018towards, title={Towards an openmp specification for critical real-time systems}, author={Serrano, Maria A and Royuela, Sara and Qui{\~n}ones, Eduardo}, booktitle={International Workshop on OpenMP}, pages={143--159}, year={2018}, organization={Springer} }

High-performance parallelisation of real-time applications

Link: http://recipp.ipp.pt/handle/10400.22/9680

Authors: Luís Miguel Pinho, Vincent Nélis, Eduardo Quiñones, Paolo Burgio, Andrea Marongiu, Paolo Gai, Juan Sancho

Date: 2017

In: Embedded World Conference

Abstract

This paper presents an overview of the P-SOCRATES methodology and tools, instantiated in the UpScale SDK (Software Development Kit) for the development of time-predictable high-performance applications. The proposed methodology was designed to provide an integrated SDK to fully exploit the huge performance opportunities brought by the most advanced many-core processors, whilst ensuring a predictable performance and maintaining (or even reducing) development costs of applications. The paper also provides the performance results of the application of the SDK in relevant embedded usecases.

Bibtex

@inproceedings{pinho2017high, title={High-performance parallelisation of real-time applications}, author={Pinho, Lu{\'\i}s Miguel and N{\'e}lis, Vincent and Quino{\~n}es, Eduardo and Burgio, Paolo and Marongiu, Andrea and Gai, Paolo and Sancho, Juan}, booktitle={Embedded World Conference 2017}, year={2017} }

Parcus: energy-aware and robust parallelization of AUTOSAR legacy applications

Link: https://ieeexplore.ieee.org/abstract/document/7939052/

Authors: Sebastian Kehr, Eduardo Quiñones, Dominik Langen, Bert Böddeker, Günter Schäfer

Date: 2017/4/18

In: IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

Abstract

Embedded multicore processors are an attractive alternative to sophisticated single-core processors for the use in automobile electronic control units (ECUs), due to their expected higher performance and energy efficiency. Parallelization approaches for AUTOSAR legacy software exploit these benefits. Nevertheless, these approaches focus on extracting performance neglecting the system's worst-case sensor/actuator latency and energy consumption. This paper presents Parcus, an energy-and latency-aware parallelization technique that combines both runnable-and tasklevel parallelism. Parcus explicitly models the traversal of data from sensor to actuator through task instances, enabling to consider the latency imposed by parallelization techniques. The parallel schedule quality (PSQ) metric quantifies the success of the parallelization, for which it takes the latency and the processor frequency into account. We …

Bibtex

@inproceedings{kehr2017parcus, title={Parcus: energy-aware and robust parallelization of AUTOSAR legacy applications}, author={Kehr, Sebastian and Qui{\~n}ones, Eduardo and Langen, Dominik and B{\"o}ddeker, Bert and Sch{\"a}fer, G{\"u}nter}, booktitle={2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)}, pages={343--352}, year={2017}, organization={IEEE} }

OpenMP tasking model for Ada: safety and correctness

Link: http://www.cister.isep.ipp.pt/docs/openmp_tasking_model_for_ada__safety_and_correctness/1306/view.pdf

Authors: Sara Royuela, Xavier Martorell, Eduardo Quiñones, Luís Miguel Pinho

Date: 2017/6/12

In: Ada-Europe International Conference on Reliable Software Technologies

Abstract

The safety-critical real-time embedded domain increasingly demands the use of parallel architectures to fulfill performance requirements. Such architectures require the use of parallel programming models to exploit the underlying parallelism. This paper evaluates the applicability of using OpenMP, a widespread parallel programming model, with Ada, a language widely used in the safety-critical domain. Concretely, this paper shows that applying the OpenMP tasking model to exploit fine-grained parallelism within Ada tasks does not impact on programs safeness and correctness, which is vital in the environments where Ada is mostly used. Moreover, we compare the OpenMP tasking model with the proposal of Ada extensions to define parallel blocks, parallel loops and reductions. Overall, we conclude that the OpenMP tasking model can be safely used in such environments, being a promising …

Bibtex

@inproceedings{royuela2017openmp, title={OpenMP tasking model for Ada: safety and correctness}, author={Royuela, Sara and Martorell, Xavier and Qui{\~n}ones, Eduardo and Pinho, Luis Miguel}, booktitle={Ada-Europe International Conference on Reliable Software Technologies}, pages={184--200}, year={2017}, organization={Springer} }

A Functional Safety OpenMP for Critical Real-Time Embedded Systems

Link: https://upcommons.upc.edu/bitstream/handle/2117/107846/A%20functional%20safety%20OpenMP%20for%20critical.pdf

Authors: Sara Royuela, Alejandro Duran, María A Serrano, Eduardo Quiñones, Xavier Martorell

Date: 2017/9/20

In: International Workshop on OpenMP

Abstract

OpenMP* has recently gained attention in the embedded domain by virtue of the augmentations implemented in the last specification. Yet, the language has a minimal impact in the embedded real-time domain mostly due to the lack of reliability and resiliency mechanisms. As a result, functional safety properties cannot be guaranteed. This paper analyses in detail the latest specification to determine whether and how the compliant OpenMP implementations can guarantee functional safety. Given the conclusions drawn from the analysis, the paper describes a set of modifications to the specification, and a set of requirements for compiler and runtime systems to qualify for safety critical environments. Through the proposed solution, OpenMP can be used in critical real-time embedded systems without compromising functional safety.

Bibtex

@inproceedings{royuela2017functional, title={A Functional Safety OpenMP $$\^{}$\{$*$\}$ $$ for Critical Real-Time Embedded Systems}, author={Royuela, Sara and Duran, Alejandro and Serrano, Maria A and Qui{\~n}ones, Eduardo and Martorell, Xavier}, booktitle={International Workshop on OpenMP}, pages={231--245}, year={2017}, organization={Springer} }

Time-predictable parallel programming models

Link: https://upcommons.upc.edu/handle/2117/108151

Authors: María A Serrano, Eduardo Quiñones

Date: 2017/5/4

In: BSC Severo Ochoa International Doctoral Symposium (4th: 2017: Barcelona). Book of abstracts

Abstract

Embedded Computing (EC) systems are increas-ingly concerned with providing higher performance in real-time while HPC applications require huge amounts of information to be processed within a bounded amount of time. Addressing this convergence and mixed set of requirements needs suitable programming methodologies to exploit the massively parallel computation capabilities of the available platforms in a pre-dictable way. OpenMP has evolved to deal with the programma-bility of heterogeneous many-cores, with mature support for fine-grained task parallelism. Unfortunately, while these features are very relevant for EC heterogeneous systems, often modeled as periodic task graphs, both the OpenMP programming interface and the execution model are completely agnostic to any timing requirement that the target applications may have. The goal of our work is to enable the use of the OpenMP parallel programming model in real-time embedded systems, such that many-cores architectures can be adopted in critical real-time embedded systems. To do so, it is required to guarantee the timing behavior of OpenMP applications.

Bibtex

@inproceedings{serrano2017time, title={Time-predictable parallel programming models}, author={Serrano, Maria A and Qui{\~n}ones, Eduardo}, booktitle={Book of abstracts}, pages={108--109}, year={2017}, organization={Barcelona Supercomputing Center} }

An analysis of lazy and eager limited preemption approaches under DAG-based global fixed priority scheduling

Link: https://upcommons.upc.edu/bitstream/handle/2117/107002/An%20Analysis%20of%20Lazy%20and%20Eager%20Limited%20Preemption.pdf

Authors: María A Serrano, Alessandra Melani, Sebastian Kehr, Marko Bertogna, Eduardo Quiñones

Date: 2017/5/16

In: IEEE International Symposium on Real-Time Distributed Computing (ISORC)

Abstract

DAG-based scheduling models have been shown to effectively express the parallel execution of current many-core heterogeneous architectures. However, their applicability to real-time settings is limited by the difficulties to find tight estimations of the worst-case timing parameters of tasks that may arbitrarily be preempted/migrated at any instruction. An efficient approach to increase the system predictability is to limit task preemptions to a set of pre-defined points. This limited preemption model supports two different preemption approaches, eager and lazy, which have been analyzed only for sequential task-sets. This paper proposes a new response time analysis that computes an upper bound on the lower priority blocking that each task may incur with eager and lazy preemptions. We evaluate our analysis with both, synthetic DAG-based task-sets and a real case-study from the automotive domain. Results from the …

Bibtex

@inproceedings{serrano2017analysis, title={An analysis of lazy and eager limited preemption approaches under DAG-based global fixed priority scheduling}, author={Serrano, Maria A and Melani, Alessandra and Kehr, Sebastian and Bertogna, Marko and Quinones, Eduardo}, booktitle={2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC)}, pages={193--202}, year={2017}, organization={IEEE} }

A static scheduling approach to enable safety-critical OpenMP applications

Link: http://people.site.ac.upc.edu/~equinone/docs/2017/aspdac_2017.pdf

Authors: Alessandra Melani, María A Serrano, Marko Bertogna, Isabella Cerutti, Eduardo Quiñones, Giorgio Buttazzo

Date: 2017/1/16

In: Asia and South Pacific Design Automation Conference (ASP-DAC)

Abstract

Parallel computation is fundamental to satisfy the performance requirements of advanced safety-critical systems. OpenMP is a good candidate to exploit the performance opportunities of parallel platforms. However, safety-critical systems are often based on static allocation strategies, whereas current OpenMP implementations are based on dynamic schedulers. This paper proposes two OpenMP-compliant static allocation approaches: an optimal but costly approach based on an ILP formulation, and a sub-optimal but tractable approach that computes a worst-case makespan bound close to the optimal one.

Bibtex

@inproceedings{melani2017static, title={A static scheduling approach to enable safety-critical OpenMP applications}, author={Melani, Alessandra and Serrano, Maria A and Bertogna, Marko and Cerutti, Isabella and Quinones, Eduardo and Buttazzo, Giorgio}, booktitle={2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)}, pages={659--665}, year={2017}, organization={IEEE} }

Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions

Link: https://upcommons.upc.edu/bitstream/handle/2117/89373/Response-Time%20Analysis%20of%20DAG%20Tasks%20under.pdf?sequence=1

Authors: María A Serrano, Alessandra Melani, Marko Bertogna, Eduardo Quiñones

Date: 2016/3/14

In: Design, Automation & Test in Europe Conference & Exhibition (DATE)

Abstract

Limited preemptive (LP) scheduling has been demonstrated to effectively improve the schedulability of fully preemptive (FP) and fully non-preemptive (FNP) paradigms. On one side, LP reduces the preemption related overheads of FP; on the other side, it restricts the blocking effects of FNP. However, LP has been applied to multi-core scenarios only when completely sequential task systems are considered. This paper extends the current state-of-the-art response time analysis for global fixed priority scheduling with fixed preemption points by deriving a new response time analysis for DAG-based task-sets.

Bibtex

@inproceedings{serrano2016response, title={Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions}, author={Serrano, Maria A and Melani, Alessandra and Bertogna, Marko and Quinones, Eduardo}, booktitle={2016 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, pages={1066--1071}, year={2016}, organization={IEEE} }

A Lightweight OpenMP4 Run-time for Embedded Systems

Link: https://people.ac.upc.edu/equinone/docs/2016/aspdac_2016.pdf

Authors: Roberto E Vargas, Sara Royuela, María A Serrano, Xavier Martorell, Eduardo Quiñones

Date: 2016/1/25

In: 21st Asia and South Pacific Design Automation Conference (ASP-DAC)

Abstract

OpenMP is increasingly being adopted by current many-core embedded processors to exploit their parallel computation capabilities. Unfortunately, current run-time implementations of the latest specification (v4.0) are not suitable for processors relying on small and fast on-chip memories, due to its memory consumption. This paper proposes an OpenMP4 run-time that reduces the memory consumption while providing the same performance. Our run-time relies on a new compiler pass capable to generate the task dependency graph of OpenMP programs, which is then efficiently stored in memory.

Bibtex

@inproceedings{vargas2016lightweight, title={A lightweight OpenMP4 run-time for embedded systems}, author={Vargas, Roberto E and Royuela, Sara and Serrano, Maria A and Martorell, Xavi and Quinones, Eduardo}, booktitle={2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC)}, pages={43--49}, year={2016}, organization={IEEE} }

A system model and stack for the parallelization of time-critical applications on many-core architectures

Link: http://recipp.ipp.pt/handle/10400.22/6885

Authors: Vincent Nélis, Patrick Meumeu Yomsi, Luís Miguel Pinho, Eduardo Quiñones, Marko Bertogna, Andrea Marongiu, Paolo Gai, Claudio Scordino

Date: 2015

In:

Abstract

Many embedded systems are subject to stringent timing requirementsthat compel them to "react" within prede_ned time bounds.The said "reaction" may be understood as simply outputting the resultsof a basic computation, but may also mean engaging in complex interactionswith the surrounding environment. Although these strict temporalrequirements advocate the use of simple and predictable hardwarearchitectures that allow for the computation of tight upper-bounds onthe software response time, meanwhile most of these embedded systemssteadily demand for more and more computational performance, whichweighs in favor of specialized, complex, and optimized multi-core andmany-core processors on which the execution of the application can beparallelized. However, it is not straightforward how event-based embeddedapplications can be structured in order to take advantage and fullyexploit the parallelization opportunities and achieve higher performanceand energy-e_fficient computing. The P-SOCRATES project envisions thenecessity to bring together next-generation many-core accelerators fromthe embedded computing domain with the programming models andtechniques from the high-performance computing domain, supportingthis with real-time methodologies to provide timing predictability. This paper gives an overview of the system model and software stackproposed in the P-SOCRATES project to facilitate the deployment andexecution of parallel applications on many-core infrastructures, whilepreserving the time-predictability of the execution required by real-timepractices to upper-bound the response time of the embedded …

Bibtex

@article{nelis2015system, title={A system model and stack for the parallelization of time-critical applications on many-core architectures}, author={N{\'e}lis, Vincent and Yomsi, Patrick Meumeu and Pinho, Luis Miguel and Qui{\~n}ones, Eduardo and Bertogna, Marko and Marongiu, Andrea and Gai, Paolo and Scordino, Claudio}, year={2015} }

OpenMP and timing predictability: a possible union?

Link: http://people.site.ac.upc.edu/~equinone/docs/2015/OpenMP-date_2015.pdf

Authors: Roberto E Vargas, Eduardo Quiñones, Andrea Marongiu

Date: 2015/3/9

In: Design, Automation & Test in Europe Conference & Exhibition (DATE)

Abstract

Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies for such platforms will have to promote predictability as a first-class design constraint, along with features for massive parallelism exploitation. OpenMP, increasingly adopted in the embedded systems domain, has recently evolved to deal with the programmability of heterogeneous many-cores, with mature support for fine-grained task parallelism. While tasking is potentially very convenient for coding real-time applications modeled as periodic task graphs, OpenMP adopts an execution model completely agnostic to any timing requirement that the target application may have. In this position paper we reason about the suitability of the current OpenMP v4 specification and execution model to provide timing guarantees in many-cores.

Bibtex

@inproceedings{vargas2015openmp, title={OpenMP and timing predictability: a possible union?}, author={Vargas, Roberto and Quinones, Eduardo and Marongiu, Andrea}, booktitle={Proceedings of the 2015 Design, Automation \& Test in Europe Conference \& Exhibition}, pages={617--620}, year={2015}, organization={EDA Consortium} }

Timing characterization of OpenMP4 tasking model

Link: https://ieeexplore.ieee.org/abstract/document/7324556/

Authors: María A Serrano, Alessandra Melani, Roberto E Vargas, Andrea Marongiu, Marko Bertogna, Eduardo Quiñones

Date: 2015/10/4

In: 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

Abstract

OpenMP is increasingly being supported by the newest high-end embedded many-core processors. Despite the lack of any notion of real-time execution, the latest specification of OpenMP (v4.0) introduces a tasking model that resembles the way real-time embedded applications are modeled and designed, i.e., as a set of periodic task graphs. This makes OpenMP4 a convenient candidate to be adopted in future real-time systems. However, OpenMP4 incorporates as well features to guarantee backward compatibility with previous versions that limit its practical usability in real-time systems. The most notable example is the distinction between tied and untied tasks. Tied tasks force all parts of a task to be executed on the same thread that started the execution, whereas a suspended untied task is allowed to resume execution on a different thread. Moreover, tied tasks are forbidden to be scheduled in threads in which …

Bibtex

@inproceedings{serrano2015timing, title={Timing characterization of OpenMP4 tasking model}, author={Serrano, Maria A and Melani, Alessandra and Vargas, Roberto and Marongiu, Andrea and Bertogna, Marko and Quinones, Eduardo}, booktitle={2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)}, pages={157--166}, year={2015}, organization={IEEE} }

P-SOCRATES: A parallel software framework for time-critical many-core systems

Link: https://www.sciencedirect.com/science/article/pii/S0141933115000836

Authors: Luís Miguel Pinho, Vincent Nélis, Patrick Meumeu Yomsi, Eduardo Quiñones, Marko Bertogna, Paolo Burgio, Andrea Marongiu, Claudio Scordino, Paolo Gai, Michele Ramponi, Michal Mardiak

Date: 2015/11/1

In: Microprocessors and Microsystems

Abstract

Current generation of computing platforms is embracing multi-core and many-core processors to improve the overall performance of the system, meeting at the same time the stringent energy budgets requested by the market. Parallel programming languages are nowadays paramount to extracting the tremendous potential offered by these platforms: parallel computing is no longer a niche in the high performance computing (HPC) field, but an essential ingredient in all domains of computer science. The advent of next-generation many-core embedded platforms has the chance of intercepting a converging need for predictable high-performance coming from both the High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the …

Bibtex

@article{pinho2015p, title={P-SOCRATES: A parallel software framework for time-critical many-core systems}, author={Pinho, Luis Miguel and N{\'e}lis, Vincent and Yomsi, Patrick Meumeu and Qui{\~n}ones, Eduardo and Bertogna, Marko and Burgio, Paolo and Marongiu, Andrea and Scordino, Claudio and Gai, Paolo and Ramponi, Michele and others}, journal={Microprocessors and Microsystems}, volume={39}, number={8}, pages={1190--1203}, year={2015}, publisher={Elsevier} }

Efficient Execution of Mixed Application Workloads in a Hard Real-Time Multicore System

Link: http://personals.ac.upc.edu/equinone/docs/2009/repp_2009.pdf

Authors: Marco Paolieri, Eduardo Quiñones, Francisco J Cazorla, Mateo Valero

Date: 2014

In: Workshop on Reconciling Performance with Predictability

Abstract

In this paper we present a multicore architecture that introduces a novel hardware shared-resource management policy, called Worst-Case Resource Management (WC-RM), that allows executing efficiently mixed application workloads composed by hard real-time and non real-time applications in a multicore platform. Our multicore architecture forces hard real-time tasks to be executed close to their worst-case execution time, leaving more free shared resources that can be used by the non real-time tasks. Our WC-RM policy improves the performance of NHRTs up to 10% compared to a resource management policy in which hard real-time tasks access the shared resources as soon as they are available.

Bibtex

@article{paolieriefficient, title={Efficient Execution of Mixed Application Workloads in a Hard Real-Time Multicore System}, author={Paolieri, Marco and Quinones, Eduardo and Cazorla, Francisco J and Valero, Mateo} }

Time criticality challenge in the presence of parallelised execution

Link: https://www.cister.isep.ipp.pt/docs/time_criticality_challenge_in_the_presence_of_parallelised_execution/828/

Authors: Luís Miguel Pinho, Eduardo Quiñones, Marko Bertogna, Luca Benini, Jorge Pereira Carlos, Claudio Scordino, Michele Ramponi

Date: 2014/1/20

In: 2nd Workshop on High-performance and Real-time Embedded Systems

Abstract

The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on e cient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. This convergence, however, raises the problem about how to guarantee timing requirements in presence of parallel execution. This paper presents a novel approach to address this challenge through the design of an integrated framework for the execution of workload-intensive applications with real-time requirements.

Bibtex

@inproceedings{pinho2014time, title={Time criticality challenge in the presence of parallelised execution}, author={Pinho, Luis Miguel and Qui{\~n}ones, Eduardo and Bertogna, Marko and Benini, Luca and Carlos, Jorge Pereira and Scordino, Claudio and Ramponi, Michele}, booktitle={2nd Workshop on High-performance and Real-time Embedded Systems}, year={2014} }

The challenge of time-predictability in modern many-core architectures

Link: http://www.cister.isep.ipp.pt/docs/the_challenge_of_time_predictability_in_modern_many_core_architectures/929/

Authors: Vincent Nélis, Patrick Meumeu Yomsi, Luís Miguel Pinho, José Carlos Fonseca, Marko Bertogna, Eduardo Quiñones, Roberto E Vargas, Andrea Marongiu

Date: 2014/7/8

In: 14th International Workshop on Worst-Case Execution Time Analysis

Abstract

The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way.

Bibtex

@inproceedings{nelis2014challenge, title={The challenge of time-predictability in modern many-core architectures}, author={N{\'e}lis, Vincent and Yomsi, Patrick Meumeu and Pinho, Lu{\'\i}s Miguel and Fonseca, Jos{\'e} and Bertogna, Marko and Qui{\~n}ones, Eduardo and Vargas, Roberto and Marongiu, Andrea}, booktitle={14th International Workshop on Worst-Case Execution Time Analysis}, year={2014} }