Sponsored by:
Schedule
Abstract
To meet ever-increasing computational needs and fixed power budget, computing systems are forced to adopt more efficient computational engines. With the end of Moore and Denard scaling, technology alone cannot satisfy these needs, hence systems incorporate heterogeneous accelerators, that is, units optimized for specific (set of) functions.
At the Microprocessor a Hardware Laboratory of the Technical University of Crete we have a long track of research in reconfigurable accelerators. I will give a brief overview of our recent work on accelerators for intensive big-data (Classification and Frequent sub-graph mining), streaming applications (Stream Join) and (ECM) Exponential Sketch generation) and bioinformatics. These works have been designed and prototyped for high-performance reconfigurable platforms such as Convey and Maxeler.
Bio
O Διονύσης Πνευματικάτος είναι Καθηγητής και Διευθυντής του Εργαστηρίου Μικροεπεξεργαστών και Υλικού της Σχολής ΗΜΜΥ του Πολυτεχνείου Κρήτης.Ελαβε Διδακτορικό Δίπλωμα σε Επιστήμη Υπολογιστών από το Παν/μιο του Wisconsin–Madison το 1995. Τα ερευνητικά του ενδιαφέροντα περιλαμβάνουν την Αρχιτεκτονική Υπολογιστών με επίκεντρο στην χρήση Αναδιατασσόμενης Λογικής για την δημιουργία αποδοτικών επιταχυντών σε ετερογενή παράλληλα συστήματα. Επίσης έχει εργασθεί στη σχεδίαση Αξιόπιστων συστημάτων, σε αρχιτεκτονικές για επιτάχυνση εφαρμογών σε υλικό ή και με αναδιατασσόμενη λογική, σε επεξεργαστές πακέτων δικτύου, κ.α. Έχει διατελέσει συντονιστής στο Ευρωπαϊκό ερευνητικό έργο FASTER (FP7) και ήταν και είναι Ιδρυματικός Υπεύθυνος (Principal Investigator) στα Ευρωπαϊκά ερευνητικά έργα DeSyRe (FP7), AXIOM, dRedBox and EXTRA (H2020), και σε αρκετά εθνικά έργα, και είναι τακτικά μέλος Επιτροπών Προγράμματος σε βασικά συνέδρια της περιοχής του όπως το FPL και το DATE στα αντικείμενα της Αρχιτεκτονικής Υπολογιστών και Αναδιατασσόμενων Συστημάτων.
Abstract
Single core processing power has stagnated, forcing us to use increasingly complex processing systems in order to extract performance: multicores, GPUs, asymmetric multiprocessors, distributed computing, computation offloading. Writing fast and correct programs for them is tough even for experts. For most programmers it is almost impossible. Existing development tools do not help enough. Most analysis and optimization is left to the programmer, while the decisions that our tools can make are often suboptimal or wrong. With hardware becoming more complicated, the gap between what we need our tools to do and what they can achieve will only grow.
In this talk, I will present a new method for bridging this gap. The central idea is substituting expert understanding of how code is structured and works with automatically trained deep neural networks. Such learned models can give us all the information we need to analyze the code and drive optimization decisions. This approach allows us to build new powerful tools with little human input, even less expertise, and in a mostly language agnostic way, dramatically reducing the difficulty and cost of creating such tools.
Bio
Pavlos Petoumenos is a Senior Researcher at the University of Edinburgh and a Research Fellow of the Royal Academy of Engineering. His work focuses on code optimization techniques for performance, energy, and size. Much of his recent output explores ways of automating optimization decisions through machine and deep learning. In a previous life, he was awarded a PhD from the University of Patras for his work on cache sharing and cache replacement techniques.
Abstract
The continuous growth of computer systems have introduced a new era for computing. The performance and power gains that came through advancements in transistor technology driven by Moore’s law have begun to diminish due to the Dennard’s Scaling hitting the physical boundaries. The increasing demand for performance along with resource constraints have brought energy and power efficiency to the forefront of research agenda. Power efficiency requirement is imposed by thermal problems in modern chips while energy efficiency is needed for long lasting batteries and low electricity costs. The inability of multi-core processors to meet the above requirements have shifted research towards heterogeneous architectures.
This work explores scheduling techniques on single-ISA heterogeneous architectures, and more specifically on ARM big.LITTLE system. The state-of-the-art schedulers for big.LITTLE systems are based on the default Time Preemptive Scheduling mechanism of Linux kernel which can miss rapid phase changes of the workload. This work proposes a novel scheduling mechanism, called Context Preemptive Scheduling, that exploits features of ARM architecture to closely track phase changes in running programs and invokes the migration process of the scheduler in time.
Bio
Ioanna is a software engineer at Canonical Ltd. Prior to this, she was a Ph.D student at the University of Manchester. She received her Diploma in Electrical and Computer Engineering from NTUA in 2014.
Abstract
Over the past few years, a large body of research has been devoted to optimizing sparse matrix-vector multiplication (SpMV) on General Purpose Graphics Processing Units (GPGPUs). Numerous sparse matrix formats and associated algorithms have been proposed, with different strengths and weaknesses. However, while previous works particularly focus on parallelization strategies that tackle load imbalance, in this paper we emphasize that other SpMV bottlenecks have not been thoroughly addressed on GPGPUs. Towards this direction, we present a bottleneck-aware SpMV auto-tuner (BASMAT), a holistic approach for the optimization of SpMV on GPGPUs that addresses all encountered bottlenecks, focusing both on fast execution and low preprocessing.
Bio
Athena Elafrou is a graduate of the Electrical and Computer Engineering (ECE) School of NTUA. She is currently a PhD candidate with the parallel systems research group of CSlab @ECE/NTUA. Her current research interests focus on high-performance sparse linear algebra and deep learning on parallel systems.
Abstract
The recently proposed dataplanes for microsecond scale applications, such as IX and ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world scenarios where request service times follow distributions with high dispersion or a heavy tail, they allow short requests to be blocked behind long requests, which leads to poor tail latency.
Shinjuku is a single-address space operating system that uses hardware support for virtualization to make preemption practical at the microsecond scale. This allows Shinjuku to implement centralized scheduling policies that preempt requests as often as every 5µsec and work well for both light and heavy tailed request service time distributions. We demonstrate that Shinjuku provides significant tail latency and throughput improvements over IX and ZygOS for a wide range of workload scenarios. For the case of a RocksDB server processing both point and range queries, Shinjuku achieves up to 6.6× higher throughput and 88% lower tail latency.
Bio
Kostis is a PhD student in Electrical Engineering at Stanford University, advised by Christos Kozyrakis. He has also worked with David Mazieres and Adam Belay. His research interests lie in the areas of of computer systems, cloud computing, and scheduling. Recently, he has been working on end-host preemptive scheduling for μs-scale tail latency. Previously, he completed his Diploma in Electrical and Computer Engineering in the National Technical University of Athens, Greece. There, he worked with Nectarios Koziris and Georgios Goumas on interference-aware VM scheduling. He has also done internships at Google and worked on data mobility at Arrikto.
Abstract
Modern large scale computer clusters benefit significantly from elasticity. Elasticity allows a cluster to dynamically allocate computer resources, based on the user’s fluctuating workload demands. Many cloud providers use threshold-based approaches, which have been proven to be difficult to configure and optimise, while others use reinforcement learning and decision-tree approaches, which struggle when having to handle large multidimensional cluster states. In this work we use Deep Reinforcement learning techniques to achieve automatic elasticity. We use three different approaches of a Deep Reinforcement learning agent, called DERP (Deep Elastic Resource Provisioning), that takes as input the current multi-dimensional state of a cluster and manages to train and converge to the optimal elasticity behaviour after a finite amount of training steps. The system automatically decides and proceeds on requesting/releasing VM resources from the provider and orchestrating them inside a NoSQL cluster according to user-defined policies/rewards. We compare our agent to state-of-the-art, Reinforcement learning and decision-tree based, approaches in demanding simulation environments and show that it gains rewards up to 1.6 times better on its lifetime. We then test our approach in a real life cluster environment and show that the system resizes clusters in real-time and adapts its performance through a variety of demanding optimisation strategies, input and training loads.
Bio
Constantinos Bitsakos is a graduate of the Electrical and Computer Engineering (ECE) School of NTUA. he has worked in the industry for 5 years as a full stack web developer. He is currently a PhD candidate with the distributed systems research group of CSlab@ECE/NTUA. His current research interests focus on deep reinforcement learning and game theoretic approaches applied for high elasticity on cloud computing.
Abstract
In recent years we observe the rapid growth of large-scale analytics applications in a wide range of domains – from healthcare infrastructures to traffic management. The high volume of data that need to be processed has stimulated the development of special purpose frameworks which handle the data deluge by parallelizing data processing and concurrently using multiple computing nodes. These frameworks differentiate significantly in terms of the policies they follow to decompose their workloads into multiple tasks and also on the way they exploit the available computing resources. As a result, based on the framework that applications have been implemented in, we observe significant variations in their resource utilization and execution times. Therefore, determining the appropriate framework for executing a big data application is not trivial. In this work we propose Orion, a novel resource negotiator for cloud infrastructures that support multiple big data frame-works such as Apache Spark, Apache Flink and TensorFlow. More specifically, given an application, Orion determines the most appropriate framework to assign it to. Additionally, Orion reserves the required resources so that the application is able to meet its performance requirements. Our negotiator exploits state-of-the-art prediction techniques for estimating the application’s execution time when it is assigned to a specific framework with varying configuration parameters and processing resources. Finally, our detailed experimental evaluation, using practical big data workloads on our local cluster, illustrates that our approach outperforms its competitors.
Bio
Nikolaos Chalvantzis is a graduate of the Electrical and Computer Engineering (ECE) School of NTUA. After bouncing around in the industry for a while he now works as a PhD candidate with the distributed systems research group of CSlab @ECE/NTUA. His current research interests and publications focus on distributed systems, cloud elasticity and resource provisioning. Nikolaos also holds a degree in Music (String Performance).