Skip to Content

You are currently visiting an old archive website with limited functionality. If you are looking für the current Berlin Buzzwords Website, please visit https://berlinbuzzwords.de

Urania Berlin, June 6-7, 2011

ANALYZING THE INTERNET IN REAL-TIME USING HADOOP AND HBASE

Friso is Xebia's principal in the Netherlands on all things NoSQL, focussing on Hadoop and HBase for handling of substantial amounts of data. Friso has a history of dealing with architecture to achieve sufficiently scalable, performant and above all working software. He has more than ten years behind the keyboard to tell and educate about. Recent buzz: Hadoop, HBase, nodeJS, HTML5.

What happens to the internet when Egypt decides to switch off their part of it? How long does it take for the internet to route traffic around broken cables? Only data can tell... The global internet has grown to a complex network. Understanding the actual topology and operation of it requires a substantial amount of data gathered by measurements around the world.

In this talk I will present about a project carried out at the European internet numbers registrar and network coordination center (RIPE NCC) in which we used Hadoop and HBase to provide near real-time insight into network operation and topology of the global internet using millions of data points per day collected from several hundreds of routers around the world. Also, the system can hold ten years worth of historical measurement data available for live querying from a web application.

I will give an overview of the background and the system and talk about important lessons that we learned when deploying Hadoop and HBase in a system that needs to cope with a continuous write load while maintaining query performance.

Some additional information used for a internal presentation is here: https://www.slideshare.net/fvanvollenhoven/hadoop-hbase-project-ripe-ncc
En example use case of the system is here: https://stat.ripe.net/egypt