Skip to Content

You are currently visiting an old archive website with limited functionality. If you are looking für the current Berlin Buzzwords Website, please visit https://berlinbuzzwords.de

Urania Berlin, June 6-7, 2011

Urania Loft

NoSQL: Past, Present, and Future

Location: 
Loft
Date and time: 
Mon, 2011-06-06 11:00 - 11:40
Speaker: 
Mathias Meyer

Two years of NoSQL, two years of a roller coaster ride through new ways of scaling, high availability, and storing data. The whole idea of NoSQL started out as a differentiator, as way to put a clear line between new and traditional databases.

But how far apart are they really, two years after the term NoSQL was coined? It's due time to look at the past and into the future on what's in stock for NoSQL, where we're headed with databases, why high scale and big data aren't everything we should strive for, and why NoSQL must die so that databases can live.

Wrap Your SQL Head Around Riak & MapReduce

Location: 
Loft
Date and time: 
Mon, 2011-06-06 13:30 - 14:10
Speaker: 
Sean Cribbs

"NoSQL is awesome! I need to use it on my next project!" ... [hours later] ... "How the heck do I get my data out of this thing?!"

Sound familiar? Non-relational data storage solutions (NoSQL) promise all kinds of benefits -- scalability, flexibility, fault-tolerance -- but (by the nature of the moniker) don't have SQL to query with. Riak is one such solution, a distributed key-value store that implements MapReduce for querying, and has some awesome client libraries for Javascript and Node.js.

From content storage to scaling smart data

Location: 
Loft
Date and time: 
Mon, 2011-06-06 14:20 - 15:00
Speaker: 
Steven Noels

A presentation on the Lily roadmap, and how content storage with HBase and indexing/search with SOLR evolves to a smart data management system with audience data analytics and data augmentation through recommendation engines.

Scaling Big Data Search with Solr and HBase

Location: 
Loft
Date and time: 
Mon, 2011-06-06 15:20 - 16:00
Speaker: 
Rod Cope

HBase can easily store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? We used a combination of Solr sharding, careful index creation, and result pruning to meet these strict requirements in our production environment. Come see how we handle millions of rapid fire queries from dozens of parallel search clients against many terabytes of data while addressing high availability through load balancing and replication.

Common MapReduce Patterns

Location: 
Loft
Date and time: 
Mon, 2011-06-06 16:10 - 16:50
Speaker: 
Chris Wensel

In this talk I will introduce the MapReduce model and discuss in some depth the most common patterns seen in Hadoop MapReduce applications including Joins, Secondary Sorting, and Partial Aggregations.

Highly Available Data Storage & Processing for Socorro

Location: 
Kleistsaal
Date and time: 
Mon, 2011-06-06 16:10 - 16:50
Speaker: 
Daniel Einspanjer

The Mozilla Metrics team in their role as data services for Mozilla is developing a new data storage and processing infrastructure for the Socorro crash report system.

ZooKeeper - the unsung hero

Location: 
Einsteinsaal
Date and time: 
Mon, 2011-06-06 17:00 - 17:40
Speaker: 
Thomas Koch

ZooKeeper started as an internal Yahoo project. After being released as free software to the Apache foundation it became a critical building block of well known systems like HBase, Cassandra, Neo4j and SOLR and many more lesser known but critical systems at linkedin, yahoo, facebook and rackspace. Despite being so useful and important, most people are not aware of the hard working ZooKeeper.

ZooKeeper helps you to supervise the availability of processes (heartbeat), to implement distributed locking, leader election, central configuration and reliable messaging.

Analyzing the internet in real-time using Hadoop and HBase

Location: 
Loft
Date and time: 
Tue, 2011-06-07 11:00 - 11:40

What happens to the internet when Egypt decides to switch off their part of it? How long does it take for the internet to route traffic around broken cables? Only data can tell... The global internet has grown to a complex network.

Making Hadoop Secure

Location: 
Loft
Date and time: 
Tue, 2011-06-07 11:50 - 12:30
Speaker: 
Devaraj Das

Hadoop, until recently, would trust any user based on who he says he is. This is clearly not enough in large companies where they have hadoop instances storing sensitive data (like financial, revenue, etc.), and where these instances are being used by many users and from potentially different groups. In this talk, I will cover the security threats in Hadoop in the various communication paths (in Hadoop Distributed File System, MapReduce, and the client components). I will present the solutions we designed for each of them.

Web Scale Crawling with Apache Nutch

Location: 
Loft
Date and time: 
Tue, 2011-06-07 13:30 - 13:45
Speaker: 
Julien Nioche

This talk will give an overview of Apache Nutch. I will describe its main components and how it fits with other Apache projects such as Hadoop, Lucene, SOLR, Tika or HBase. The presentation will contain examples of real-case uses.

The second part of the presentation will be focused on the latest developments in Nutch and the changed introduces by the forthcoming version 2.0.

Syndicate content