fbpx
Features Hub Opinion

Research focus: ETH Zurich/University of Bologna and pAElla-powered data centres

Thu 21 May 2020 | Antonio Libri | Andrea Bartolini | Luca Benini

A collaborative research effort in Italy and Switzerland is pushing the boundaries of the energy-aware and automated data centre

ETH Zurich in Switzerland is one of the most highly regarded science and technology universities, one known for its cutting-edge research and innovation.

When it come to data centres, the pinnacle of innovation right now centres on how data analytics, sensors and AI can be used to improve power and performance.

Over the last few years, a group of researchers from both ETH Zurich and the University of Bologna has been at the forefront of advanced data centre monitoring research.

By combining AI, edge computing and low power systems, the group has developed data centre monitoring infrastructures that have been shown to improve energy efficiency, detect power anomalies, and even shore up data centre cyber security via a breakthrough method codenamed pAElla (recent attacks on European HPC systems underline the value of such an application).

The group, headed by Professor Luca Benini, chair of digital Circuits and Systems at ETH Zurich and pioneer in AI-powered holistic monitoring systems for data centres, recently parterned with computing and storage technology innovator E4 Computer Engineering to test its monitoring infrastructure on the beastly D.A.V.I.D.E, a supercomputer in Bologna, Italy, developed by E4.

In this Q&A Benini and fellow researchers Antonio Libri (ETH Zurich) and Andrea Bartolini (University of Bologna) guide us through the technology that powers their research, the results so far what differentiates their system from existing industry efforts.

How long have you been working on your research and what are its long-term aims?

In the last five years, we worked on advanced solutions for data centre monitoring infrastructures, with advanced features to monitor and analyse at the edge both performance and power consumption of the system at a fine granularity (microseconds) and with ultra-high precision.

Thanks to the collaboration with our industrial partner E4 Computer Engineering, and the Italian Supercomputing Center CINECA, we could integrate our monitoring infrastructure DiG [1][2][3][4] in D.A.V.I.D.E. [5], which is a supercomputer that was ranked #18 in Green500 of Nov. 2017.

This monitoring infrastructure allowed us to carry out several research works on anomaly detection and cybersecurity of data centres. With respect to our latest work on cybersecurity of data centres, namely pAElla (i.e., Power-AutoEncoder-WeLch for anomaLy and Attacks), our long-term aims are the validation on a real data centre in production and the integration as an advanced feature on processors of the latest generation used in this market segment.

How extensively have you tested your system?

We tested pAElla against 95 malwares, and seven widely used applications for supercomputers on a data centre node with DiG. The idea is to carry out this analysis also in a large scale system that integrates a similar technology for advanced monitoring and edge AI analytics.

How does your research differ from other industry efforts that use AI to automate functions and improve energy efficiency?

Standard monitoring systems for supercomputers and data centres allow measuring the power and energy consumption at a coarse grain. In the last few years, industry and academic researchers are pushing towards the use of fine-grain monitoring of the power consumption (see BULL-HDEEM, Intel RAPL], and DiG) for a deeper knowledge of the energy-consumption of the data centre infrastructure and applications, aiming to create energy-awareness.

With DiG we show for the first time that it is possible to realise a cost-effective monitoring system for high-resolution power and performance measurements, edge-AI-capable to be integrated into large scale data centres. Moreover, we show for the first time that these high-resolution measurements allow us to deploy advanced real-time solutions for anomaly detection and cybersecurity, which would not be viable otherwise.

This is a clear distinction with state-of-the-art and industrial best-practices which have-proposed fine-grain monitoring of the power only for energy-awareness. We are the first to prove that helpful for anomaly detection and cybersecurity – which is a hot-topic in ICT today.

In particular, focusing on cybersecurity, past research shows how to use performance measurements for malware detection. In pAElla we show that by measuring power consumption at a very fine granularity and using it together with advanced AI methods, allow us to cover a wider range of malware with high accuracy.

Moreover, in another paper we show how we can carry out online anomaly detection in data centres with power and performance measurements. Finally, in the last few years, we also developed advanced solutions to improve the energy efficiency of data centres.

How can the sensors and the data produced be directed towards cybersecurity?

In the past, power measurements were thought to be useful only to evaluate solutions for increased energy efficiency. Thanks to the effort of our research group in the last years to develop high-resolution monitoring systems, hand in hand with systems for holistic monitoring, integrated into large scale data centres and supercomputers, in pAElla we show that these sensors can detect perturbations in the power consumption when specific patterns of software run in the system (e.g., malware). Using AI we can then discriminate them from the healthy activity of the system.

What kind of AI are we talking about here?

Our group, which is headed by Prof. Luca Benini and involves researchers in both ETH Zurich and the University of Bologna, is a pioneer in the combination of AI and holistic monitoring systems for data centres

In particular, on AI analytics for data centres we are at the cutting-edge in the use of semi-supervised deep learning techniques (e.g., AutoEncoders), which proved to be robust and capable of handling a wide range of anomalies, both at the hardware and software level of the computing nodes (e.g., in the problem of cybersecurity).

What hardware did you use and why?

We developed DiG that exploits open and cost-effective solutions and can be easily integrated to different architectures for data centres (we tested it in Intel, ARM and IBM-based nodes) and systems in production (e.g., D.A.V.I.D.E.).

DiG is the first monitoring system for data centres that allows edge AI analytics on high-resolution measurements (50x improvement in time resolution w.r.t. State-of-the-Art solutions, ultra-precise measurement synchronisation, with error below 1%).

It also allows processing and inference at the edge, which is essential for real-time analysis at high frequency and with scalable systems. Thanks to these fundamental advantages, we could perform this kind of study.

What sort of edge hardware did you leverage?

Our group is at the cutting-edge for the research and development of advanced processors that combine AI, edge computing, and low power, ranging from IoT applications to data centres. In the specific case of DiG, we exploit a widely used IoT device, namely the BeagleBone Black, because it is open and at the start of our project (five years ago), it was the best match in terms of needed features it could deliver out-of-the-box.

How close are we to a truly energy-aware and automated data centre?

Technologies for energy-aware systems are already available on the market, and several research groups and companies are now working on the standardisation of the approaches and more sophisticated algorithms (e.g., PowerStack). Automated data centres is a really hot topic which today still requires further effort from the research community, as well as data centre leading vendors and owners.

What are your main focuses for 2020?

As ETH Zurich and the University of Bologna, we are carrying out in our group several works on energy efficiency, anomaly detection, and cybersecurity in data centres within different projects.

  • Find out more about Luca Benini, Antonio Libri and Andrea Bartolini below

Experts featured:

Antonio Libri

Doctorate
D-​ITET, ETH Zurich

Andrea Bartolini

Assistant professor
Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna

Luca Benini

Professor of Digital Circuits and Systems
ETH Zurich

Tags:

AI hpc Italy supercomputer
Send us a correction Send us a news tip


Do NOT follow this link or you will be banned from the site!