3rd Computing Systems Research Day

Sponsored by:

Schedule

12:45 – 13:45
Christina Delimitrou

Abstract

Cloud computing promises flexibility, high performance, and low cost. Despite its prevalence, most datacenters hosting cloud computing services still operate at very low utilization, posing serious scalability concerns. There are several reasons behind low cloud utilization, dominated by overly conservative users trying to avoid the unpredictable performance of multi-tenancy.

A crucial system that can improve the efficiency of cloud infrastructures, while guaranteeing high performance for each submitted application is the cluster manager; the system that orchestrates where applications are placed and how many resources they receive. In this talk, I will first describe Quasar, a cluster management system that leverages practical ML techniques to quickly determine the type and amount of resources a new cloud application needs to satisfy its quality of service constraints. Quasar also introduces a new declarative interface in cluster management, where users express their applications’ performance, not resource, requirements to the system. We have built and deployed Quasar in local clusters, as well as production systems, including Twitter and AT&T, and showed that it guarantees high application performance, while improving system utilization by 2-3x.

Second, I will talk about the security vulnerabilities cloud multi-tenancy creates, and show how similar ML techniques to those used in Quasar can enable an adversary to extract confidential information about an application, and negatively impact its performance.

Finally, I will briefly discuss the direction in which cloud applications and systems are evolving, and how big data can help us improve the way we design and manage these complex, large-scale systems.

Bio

Christina Delimitrou is an assistant professor of Electrical and Computer Engineering, and Computer Science at Cornell, working in computer architecture, systems, and applied data mining. She is a member of the Computer Systems Lab and directs the SAIL group at Cornell. Christina has received a PhD in Electrical Engineering from Stanford University. She previously earned an MS in Electrical Engineering, also from Stanford, and a diploma in Electrical and Computer Engineering from the National Technical University of Athens. She is the recipient of a John and Norma Balen Sesquicentennial Faculty Fellowship, a Facebook Research Fellowship, and a Stanford Graduate Fellowship.

Break
14:30 – 15:00
Nikos Pleros
Dptm of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece, http://phos-net.csd.auth.gr/
email: npleros@csd.auth.gr

Abstract

The vast amount of new data being generated is outpacing the development of infrastructures and continues to grow at much higher rates than Moore’s law, a problem that is commonly referred to as the “data deluge problem”. This brings current computational machines in the struggle to exceed Exascale processing powers by 2020 and this is where the energy boundary is setting the second, bottom-side alarm: A reasonable power envelope for future Supercomputers has been projected to be 20MW, while world’s current No. 2 Supercomputer Sunway TaihuLight provides 93 Pflops and requires already 15.37 MW. This simply means that we have reached so far below 10% of the Exascale target but we consume already more than 75% of the targeted energy limit!

The way to escape is currently following the paradigm of disaggregating and disintegrating resources, massively introducing at the same time optical technologies for interconnect purposes. Disaggregating computing from memory and storage modules can allow for flexible and modular settings where hardware requirements can be tailored to meet the certain energy and performance metrics targeted per application. At the same time, optical interconnect and photonic integration technologies are rapidly replacing electrical interconnects continuously penetrating at deeper hierarchy levels: Silicon photonics have enabled the penetration of optical technology to the computing environment, starting from rack-to-rack and gradually shifting towards board-level communications. In this work, we will discuss the main performance and energy challenges currently faced by the computing industry and we will present our recent research on photonic technologies towards realizing resource disaggregation at all hierarchy levels spanning from rack- through board- down to disintegrated chip-scale computing. We will demonstrate high-port count optical switch layouts with up to 1024×1024 input/output ports that allow for low-latency values well below the 1μsec target of disaggregated DCs, while respective technology advances on electro-optical PCBs and Silicon Photonic transceiver and routing elements towards realizing on-board resource disaggregation for multi-socket compute node applications will be overviewed. Finally, the transfer of resource disaggregation to chip-scale environment and the deployment of disintegrated computational setups will be discussed, highlighting how novel photonic Network-on-Chip technologies and emerging optical RAM and optical cache memory technologies can shape a radically new chip-level computing environment with increased granularity, modularity, performance and energy efficiency.

This work has been carried out within the frame of the European FP7 projects RAMPLAS and PhoxTrot and the H2020 projects ICT-STREAMS, L3MATRIX and dREDBox.

Bio

Dr. Nikos Pleros joined the faculty of the Department of Informatics, Aristotle University of Thessaloniki, Greece, in September 2007, where he is currently serving as an Assistant Professor. He obtained the Diploma and the PhD Degree in Electrical & Computer Engineering from the National Technical University of Athens (NTUA) in 2000 and 2004, respectively. His research interests include optical interconnect technologies and architectures, photonic integrated circuit technologies, optical technologies for disaggregated data center architectures and high-performance computing, optical RAM memories and optical caches, silicon photonics and plasmonics, optical signal processing, optical switching and fiber-wireless technologies and protocols for 5G mobile networks. He has more than 220 archival journal publications and conference presentations including several invited contributions, while his work has been cited more than 2200 times (G&S, h-index=27). He has held positions of responsibility at several major conference committees including ECOC, OFC and SPIE Photonics West. Dr. Pleros has coordinated several FP7 and H2020 European projects including ICT-STREAMS, PlasmoFab, RAMPLAS, PLATON and 5G-PHOS, while he has participated as partner in more than 10 additional projects. He has received the 2003 IEEE Photonics Society Graduate Student Fellowship granted to 12 PhD candidates world-wide in the field of photonics, while he was proud to supervise the PhD theses of three more Fellowship winners (Dr. D. Fitsios in 2014, Dr. C. Vagionas in 2016 and Dr. P. Maniotis in 2017). Dr. Pleros was also awarded the 15th prize in the Greek Mathematical Olympiad. He is a member of the IEEE Photonics Society and the IEEE Communications Society.

15:00 – 15:30
Christos Kozanitis

Abstract

FPGA and GPU based accelerators have recently become first class citizens in datacenters. Despite their high cost, however, accelerators remain underutilized for large periods of time, as vendors prefer to dedicate them to workloads for guaranteed QoS. At the same time, accelerator sharing is difficult, due to vendor locked communication paths with software applications. In this work in progress, we modified the agents of Apache Mesos with Vinetalk, an accelerator middleware that abstracts the entire communication path between OS processes and accelerator hardware without adding more than 10% performance overhead. We demonstrate the ease of integration of software applications with GPUs and, in collaboration with ICCS with FPGA logic. Finally, we show that the use of Vinetalk-enhanced Mesos allows analytics pipelines, such as Apache Spark, to use for the first time executors with heterogeneous characteristics.

Bio

Christos Kozanitis is a research collaborator at FORTH-ICS. He received his M.S. and Ph.D in Computer Science and Engineering from the University of California, San Diego in 2009 and 2013 respectively. Parts of his phd work influenced products from companies such as Cisco and Illumina. He also held a two-year postdoctoral appointment at the AMP Lab of the University of California, Berkeley, where he used and adapted state of the art big data technologies, such as Apache Spark SQL, Apache Parquet and Apache Avro to process large amounts of DNA sequencing data. His current research interests involve the improvement in software, storage and hardware level of modern datacenters in order to speed up the processing of big data workloads.

15:30 – 16:00
Dimitrios Siakavaras

Abstract

In this work we introduce RCU-HTM, a technique that combines Read-Copy-Update (RCU) with Hardware Transactional Memory (HTM) to implement highly efficient concurrent Binary Search Trees (BSTs). Similarly to RCU-based algorithms, we perform the modifications of the tree structure in private copies of the affected parts of the tree rather than in-place. This allows threads that traverse the tree to proceed without any synchronization and without being affected by concurrent modifications. The novelty of RCU-HTM lies at leveraging HTM to permit multiple updating threads to execute concurrently. After appropriately modifying the private copy, we execute an HTM transaction, which atomically validates that all the affected parts of the tree have remained unchanged since they’ve been read and, only if this validation is successful, installs the copy in the tree structure.

We apply RCU-HTM on AVL and Red-Black balanced BSTs and compare their performance to state-of-the-art lock-based, non-blocking, RCU- and HTM-based BSTs. Our experimental evaluation reveals that BSTs implemented with RCU-HTM achieve high performance, not only for read-only operations, but also for update operations. More specifically, our evaluation includes a diverse range of tree sizes and operation workloads and reveals that BSTs based on RCU-HTM outperform other alternatives by more than 18%, on average, on a multi-core server with 44 hardware threads.

Bio

Dimitrios Siakavaras is a Ph.D. candidate at the Computing Systems Laboratory of the National Technical University of Athens (NTUA). His research interests include concurrent programming, concurrent data structures and transactional memory. He received his Diploma in Electrical and Computer Engineering from NTUA in 2012.

Break
16:30 – 17:00
Georgios Alexandridis

Abstract

Review-based recommender systems have become dominant in recent years. In these systems, the traditional user-item ratings’ matrix is augmented with textual evaluations of the items by the users. In this talk, we are going to explore how this extra information source can be incorporated in matrix factorization algorithms, which constitute the state-of-the-art in recommender systems. More specifically, we will examine a special category of machine learning techniques for text analysis known as neural language models. The talk will conclude with the presentation of some preliminary results of the discussed techniques on reference datasets.

Bio

Georgios Alexandridis is an electrical and computer engineer and a post-doc affiliate of the Intelligent Systems, Content and Interaction Laboratory of the National Technical University of Athens (NTUA). He graduated from the Department of Electrical and Computer Engineering of the University of Patras (major in Telecommunications, minor in Computer science) and he also holds a doctoral degree from the School of Electrical and Computer Engineering of NTUA. His research interests are in the areas of Machine Learning, Artificial Intelligence and Big Data analysis. His work, related to Recommender Systems, Social Network analysis and web-server log analysis, has appeared in a number of international conferences and peer-reviewed journals.

17:00 – 17:30
Vassilis Papaefstathiou

Abstract

Task-based dataflow programming models and runtimes are promising candidates for programming multicore and manycore architectures. These programming models analyze dynamically task dependencies at runtime and schedule independent tasks concurrently to the processing elements. In such models, cache locality and efficient utilization of the on-chip cache resources is critical for performance and energy efficiency. In this talk we will describe a number of combined hardware-software approaches to improve data movement and locality in the cache hierarchy and better utilize the on-chip cache resources. We will also present our recent research activities on interconnects and communication primitives for exascale systems.

Bio

Vassilis Papaefstathiou received his Ph.D. in Computer Science (2013) from the University of Crete. From 2001 to 2003 he worked on IC design and verification in ISD S.A. and collaborated closely with STMicroelectronics on industrial SoC designs. From 2005 to 2013 he was a Research Engineer in the Computer Architecture and VLSI Systems Laboratory at the Institute of Computer Science, FORTH, Greece. From 2014 to 2016 he was a Postdoctoral Researcher at the Computer Science and Engineering Department at Chalmers University of Technology, Sweden. Since September 2016 he is with FORTH. He has been heavily involved in several EU-funded research projects (EuroEXA, ExaNest, ExaNoDe, ECOSCALE, EuroServer, ERC MECCA, SHARCS, ENCORE, SARC, UNISIX, SIVSS) and has designed several FPGA-based hardware prototypes for multicore architectures and high-performance interconnects. His research interests are on Parallel Computer Architecture, High-Performance Computing, High-Speed Interconnects, Low-Power Datacenter Servers, and Storage Systems, with particular emphasis on cross-layer design and optimization.

17:30 – 18:00
Giannis Giannakopoulos

Abstract

The advent of the Big Data era has given birth to a variety of new architectures aiming at applications with increased scalability, robustness and fault tolerance. At the same time, though, these architectures have complicated the application structure, leading to an exponential growth of the applications’ configuration space and increased difficulty in predicting their performance.

In this work, we describe a novel, automated profiling methodology that makes no assumptions on application structure. Our approach utilizes oblique Decision Trees in order to recursively partition an application’s configuration space in disjoint regions, choose a set of representative samples from each subregion according to
a defined policy and return a model for the entire space as a composition of linear models over each subregion.

An extensive evaluation over real-life applications and synthetic performance functions showcases that our scheme outperforms other state-of-the-art profiling methodologies. It particularly excels at reflecting abnormalities and discontinuities of the performance function, allowing the user to influence the sampling policy based on the modeling accuracy and the space coverage.

Bio

Giannis Giannakopoulos is a Ph.D. candidate at the Computing Systems Laboratory of the National Technical University of Athens (NTUA). His research interests include Large Scale Data Management, Distributed Systems and Cloud Computing. He received his Diploma in Electrical and Computer Engineering from NTUA in 2012.