Thursday, September 3rd
10:00 - 11:00 Welcome & Keynote
11:00 - 11:30 Coffee Break
11:30 - 12:30 Session 1: Applications
Session Chair: Christos D. Antonopoulos
12:00 - 12:30
Jose I. Aliaga, Sandra Catalan, Charalampos Chalios, Dimitrios S. Nikolopoulos and Enrique S. Quintana-Orti.
Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures.
12:00 - 12:30
Michael Firbach and Michael Gerndt.
Automatic Energy-Tuning using the Periscope Tuning Framework.
12:30 - 14:00 Lunch Break
14:00 - 15:00 Session 2: Programming models and system software support for energy efficiency and resilience
Session Chair: Dimitrios S. Nikolopoulos
14:00 - 14:30
Manolis Maroudas, Michalis Spyrou, Christos Kalogirou, Christos Konstantas, Panos Koutsovasilis, Christos D. Antonopoulos and Nikolaos Bellas.
Energy Minimization on Heterogeneous Systems through Approximate Computing.
14:30 - 15:00
Sam Kaplan, Sergio Pino, Aaron Landwehr and Guang Gao.
Landing Containment Domains on SWARM: Toward a Robust Resiliency Solution on a Dynamic Adaptive Runtime Machine.
Friday, September 4th
10:00 - 11:00 Session 3: Compilers and analysis for energy efficiency and resilience
Session Chair: Ioannis E. Venetis
10:00 - 10:30
Norman Rink, Dmitrii Kuvaiskii, Jeronimo Castrillon and Christof Fetzer.
Compiling for Resilience: the Performance Gap.
10:30 - 11:00
Jens Deussen, Jan Riehme and Uwe Naumann.
Automation of Significance Analysis with Interval Splitting.
11:00 - 11:30 Coffee Break
11:30 - 12:30 Panel discussion
Energy and Reliability: Separate concerns or two sides of the same coin.
Michele Weiland, EPCC, University of Edinburgh.
Christoph Kessler, Linköping University.
Hugh Leather, ICSA, University of Edinburgh.
Christos D. Antonopoulos, CERTH & University of Thessaly.
Dr. Shidhartha Das is a Principal Engineer working for ARM Ltd., UK in the research and development division. He received the B.Tech degree in electrical engineering from the Indian Institute of Technology, Bombay in 2002 and the M.S and Ph.D degrees in computer science and engineering from the University of Michigan, Ann Arbor in 2005 and 2009.His research interests include micro-architectural and circuit design for variation measurement and mitigation, on-chip power delivery and VLSI architectures for digital signal processing (DSP) accelerators. His research has been featured in IEEE Spectrum and has won several awards including the Microprocessor Review analysts' choice award in innovation and best paper awards at MICRO 2003 and SAME 2010. He has authored more than 25 papers in peer-reviewed journals and conferences, including invited publications in top-tier journals. Dr. Das works on several aspects of low-power, variation-tolerant circuits and micro-architectural design. Dr. Das serves on the Technical Program Committee of the European Solid-State Circuits Conference (ESSCIRC), International Symposium on Low-Power Engineering and Design (ISLPED) and the International On-Line Testing Symposium.
Computing systems are typically designed with sufficient margins to mitigate against rising uncertainties. These uncertainties are incorporated at each layer of abstraction in the design process - beginning with basic transistor modelling through circuit and system-level design even including the system software. The widening safety margins required to ensure correct operation inevitably leads to unacceptable power and performance overheads. Reconciling the conflicting objectives imposed due to uncertainty-mitigation and energy-efficient computing will require fundamental departures from conventional circuit and system design practices. In my talk, I posit error-resilient general-purpose computing as an effective approach for achieving this. The fundamental concept behind error-resilient computing is to treat computational errors not as a catastrophic system failure but as an effective approach for optimally tuning a system to its most efficient operating point. I will discuss how resiliency can be combined with cross-layer optimisations across circuits, micro-architecture and algorithmic boundaries to obtain an overall energy-efficient system.
We present some of the results of our involvement in the Score-E project, during which we built energy-tuning capabilities into an existing auto-tuning framework. We use the well-known techniques of parallelism-capping and dynamic voltage \& frequency scaling and combine these with automatic tuning to lower the entry barrier for application developers significantly. While the potential of auto-tuning is of course inherently limited, it can be unlocked with relative ease and is therefore often left under-estimated. Using synthetic applications that each document specific problems in application codes and a widely used application benchmark from the NPB suite, we can show that (1) automatic energy-tuning is feasible and can contribute to a reduced energy consumption for a wide range of applications and (2) can be applied to existing codes with reasonable effort.
As the complexity of computing systems grows, reliability and energy are two crucial challenges asking for holistic solutions. In this paper, we investigate the interplay among concurrency, power dissipation, energy consumption and voltage-frequency scaling for a key numerical kernel for the solution of sparse linear systems. Concretely, we leverage a task-parallel implementation of the Conjugate Gradient method, equipped with an state-of-the-art preconditioner embedded in the ILUPACK software, and target a low-power multicore processor from ARM. In addition, we perform a theoretical analysis on the impact of a technique like Near Threshold Voltage Computing (NTVC) from the points of view of increased hardware concurrency and error rate.
Recent advances in technology, most prominently smaller feature sizes, lead to reduced hardware reliability. Hence, faults will occur more frequently in future hardware generations and results of computations cannot always be relied on. Protecting against faults in hardware is expensive and inflexible. Therefore, to perform resilient computations on unreliable hardware, software-based protection mechanisms have been proposed.
Software-based approaches can detect faults by inserting additional checking instructions into an application's program code. Fault coverage, i.e. the proportion of detected faults, increases with the number of checking instructions, but so too does the runtime of the application. Typical slow-downs due to software-based code hardening are in the range of 10x-100x. This is a prohibitively wide performance gap between the original, unhardened application and the application protected by software-based code hardening.
Our goal is to close the performance gap by developing compiler optimizations specific to software-based fault protection. In the present paper we assess the potential scope of such optimizations. To this end, we introduce a compiler infrastructure for software-based code hardening based on encoding. We use our framework to analyze the trade-off between fault coverage and slow-downs of applications when applying simple, encoding-specific strategies for the generation of faster code.
Our results show that the performance of hardened programs can be improved by up to 2x while incurring no noticeable penalty on fault coverage. However, we have also identified programs where a drop in fault coverage is not accompanied by improved program performance. This calls for future work on code analyses that enable the compiler to reason about how a program's structure affects the severity of faults and the impact of checking instructions on performance.
In the SCoRPiO project we are interested in the significance of program code with regard to its outputs in order to compute less significant parts for instance on less reliable but power saving hardware. Multiple approaches can be taken including pure interval arithmetics, Monte Carlo methods, and a tool chain for interval derivative based significance analysis.
The tool chain dco/scorpio was introduced in the SCoRPiO project. In this paper we propose to automate the process of interval derivative based significance analysis to widen input domains. We present an interval splitting approach to handle difficulties introduced by the interval evaluation, e.g. unfeasible relational operators or the wrapping effect. Each split will result in multiple scenarios which can be computed in parallel.
The presented approach is a step forward towards a fully automatic significance analysis of computer code.
Energy efficiency is a prime concern for both HPC and conventional workloads. Heterogeneous systems typically improve energy efficiency at the expense of increased programmer effort.
A novel, complementary approach is approximating selected computations in order to minimize the energy footprint of applications. Not all applications or application components are amenable to this method, as approximations may be detrimental for the quality of the end result. Therefore the programmer should be able to express algorithmic wisdom on the importance of specific computations for the quality of the end-result and thus their tolerance to approximations.
We introduce a framework comprising of a parallel meta-programming model based on OpenCL, a compiler which supports this programming model, and a runtime system which serves as the compiler backend. The proposed framework: (a) allows the programmer to express the relative importance of different computations for the quality of the output, thus facilitating the dynamic exploration of energy / quality tradeoffs in a disciplined way, and (b) simplifies the development of parallel algorithms on heterogeneous systems, relieving the programmer from tasks such as work scheduling and data manipulation across address spaces.
We evaluate our approach using a number of real-world applications, beyond kernels, with diverse characteristics. Our results indicate that significant energy savings can be achieved by combining the execution on heterogeneous systems with approximations with graceful degradation of output quality.
Software and hardware errors are expected to be a much larger issue on exascale systems than current hardware. For this reason, resilience must be a major component of the design of an exascale system. By using containment domains, we propose a resilience scheme that works with the type of codelet-based runtimes expected to be utilized on exascale systems. We implemented a prototype of our containment domain framework in SWARM, and adapted a Cholesky decomposition program written in SWARM to use this framework. We will demonstrate the feasibility of this approach by showing the low overhead and high adaptability of our framework.