Making Hadoop Secure
Hadoop, until recently, would trust any user based on who he says he is. This is clearly not enough in large companies where they have hadoop instances storing sensitive data (like financial, revenue, etc.), and where these instances are being used by many users and from potentially different groups. In this talk, I will cover the security threats in Hadoop in the various communication paths (in Hadoop Distributed File System, MapReduce, and the client components). I will present the solutions we designed for each of them. I’ll also cover briefly the security solutions to do with external services like Oozie & HDFSProxy talking with Hadoop.