Analyzing the internet in real-time using Hadoop and HBase
What happens to the internet when Egypt decides to switch off their part of it? How long does it take for the internet to route traffic around broken cables? Only data can tell... The global internet has grown to a complex network. Understanding the actual topology and operation of it requires a substantial amount of data gathered by measurements around the world.
In this talk I will present about a project carried out at the European internet numbers registrar and network coordination center (RIPE NCC) in which we used Hadoop and HBase to provide near real-time insight into network operation and topology of the global internet using millions of data points per day collected from several hundreds of routers around the world. Also, the system can hold ten years worth of historical measurement data available for live querying from a web application.
I will give an overview of the background and the system and talk about important lessons that we learned when deploying Hadoop and HBase in a system that needs to cope with a continuous write load while maintaining query performance.
Some additional information used for a internal presentation is here: https://www.slideshare.net/fvanvollenhoven/hadoop-hbase-project-ripe-ncc
En example use case of the system is here: https://stat.ripe.net/egypt