Loft (Blue)

NoSQL: Past, Present, and Future

Location:

Loft

Date and time:

Mon, 2011-06-06 11:00 - 11:40

Speaker:

Mathias Meyer

Two years of NoSQL, two years of a roller coaster ride through new ways of scaling, high availability, and storing data. The whole idea of NoSQL started out as a differentiator, as way to put a clear line between new and traditional databases.

But how far apart are they really, two years after the term NoSQL was coined? It's due time to look at the past and into the future on what's in stock for NoSQL, where we're headed with databases, why high scale and big data aren't everything we should strive for, and why NoSQL must die so that databases can live.

Scaling with MongoDB

Location:

Loft

Date and time:

Mon, 2011-06-06 11:50 - 12:30

Speaker:

Mathias Stearn

For applications that outgrow the resources of a single database server, MongoDB can convert to a sharded cluster, automatically managing failover and balancing of nodes, with few or no changes to the original application code. This talk starts by discussing when to shard and continues on to describe MongoDB's sharding architecture. We'll describe how to configure a shard cluster and provide several example topologies. We'll also give some advice on schema design for sharding and how to pick the best shard key.

Wrap Your SQL Head Around Riak & MapReduce

Location:

Loft

Date and time:

Mon, 2011-06-06 13:30 - 14:10

Speaker:

Sean Cribbs

"NoSQL is awesome! I need to use it on my next project!" ... [hours later] ... "How the heck do I get my data out of this thing?!"

Sound familiar? Non-relational data storage solutions (NoSQL) promise all kinds of benefits -- scalability, flexibility, fault-tolerance -- but (by the nature of the moniker) don't have SQL to query with. Riak is one such solution, a distributed key-value store that implements MapReduce for querying, and has some awesome client libraries for Javascript and Node.js.

Read more

From content storage to scaling smart data

Location:

Loft

Date and time:

Mon, 2011-06-06 14:20 - 15:00

Speaker:

Steven Noels

A presentation on the Lily roadmap, and how content storage with HBase and indexing/search with SOLR evolves to a smart data management system with audience data analytics and data augmentation through recommendation engines.

Scaling Big Data Search with Solr and HBase

Location:

Loft

Date and time:

Mon, 2011-06-06 15:20 - 16:00

Speaker:

Rod Cope

HBase can easily store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? We used a combination of Solr sharding, careful index creation, and result pruning to meet these strict requirements in our production environment. Come see how we handle millions of rapid fire queries from dozens of parallel search clients against many terabytes of data while addressing high availability through load balancing and replication.

Common MapReduce Patterns

Location:

Loft

Date and time:

Mon, 2011-06-06 16:10 - 16:50

Speaker:

Chris Wensel

In this talk I will introduce the MapReduce model and discuss in some depth the most common patterns seen in Hadoop MapReduce applications including Joins, Secondary Sorting, and Partial Aggregations.

Hadoop: A Reality Check

Location:

Loft

Date and time:

Mon, 2011-06-06 17:00 - 17:40

Speaker:

Stefan Groschupf

Obviously, there’s a huge amount of interest and use around Hadoop for processing large amounts of data given its scalability and cost/performance. Its great for a lot of Big Data needs but it was never designed to replace RDBMS systems. Hadoop is batch oriented, it doesn’t supports queries and its environment can best be described as “ideal” for programmers but abysmal for business users.

Read more

Analyzing the internet in real-time using Hadoop and HBase

Location:

Loft

Date and time:

Tue, 2011-06-07 11:00 - 11:40

Speaker:

Friso van Vollenhoven

What happens to the internet when Egypt decides to switch off their part of it? How long does it take for the internet to route traffic around broken cables? Only data can tell... The global internet has grown to a complex network.

Read more

Making Hadoop Secure

Location:

Loft

Date and time:

Tue, 2011-06-07 11:50 - 12:30

Speaker:

Devaraj Das

Hadoop, until recently, would trust any user based on who he says he is. This is clearly not enough in large companies where they have hadoop instances storing sensitive data (like financial, revenue, etc.), and where these instances are being used by many users and from potentially different groups. In this talk, I will cover the security threats in Hadoop in the various communication paths (in Hadoop Distributed File System, MapReduce, and the client components). I will present the solutions we designed for each of them.

Read more

Web Scale Crawling with Apache Nutch

Location:

Loft

Date and time:

Tue, 2011-06-07 13:30 - 13:45

Speaker:

Julien Nioche

This talk will give an overview of Apache Nutch. I will describe its main components and how it fits with other Apache projects such as Hadoop, Lucene, SOLR, Tika or HBase. The presentation will contain examples of real-case uses.

The second part of the presentation will be focused on the latest developments in Nutch and the changed introduces by the forthcoming version 2.0.

Berlin Buzzwords 2011 is a conference for developers and users of open source software projects, focussing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations of international speakers specific to the three tags "search", "store" and "scale".