Composing Mahout clustering jobs
Clustering is a popular technique to analyze and understand large corpora and is a key feature of for instance Google News. This talk introduces you to clustering, how it's implemented in Mahout and it will show you step-by-step, how to compose a sequence of Mahout jobs in Java to cluster text. Additionally, it will show you how to tweak your chain of Mahout jobs and how it affects clustering results. This will be talk suitable for people having some experience with Hadoop and perhaps Mahout. Knowledge of clustering is not required.