Hadoop, is that a kids cartoon?

February 7
blog author


Solutions Architect

Hadoop, Pig, Hive, no no I am not talking about kids cartoons here. These are the names of some of the projects in Hadoop ecosystem. Hmm there I said Hadoop again. So, what exactly is Hadoop and why should you be interested in it?

Let’s first look at, what is Hadoop? Hadoop is an open source project under Apache Software foundation. It was built on the idea of reliable, scalable, distributed computing. It can store and process large volumes of data (in petabytes) on commodity hardware.

The two most important projects under Hadoop are:

  • MapReduce - A framework which enables you to process a large volume of data.
  • Hadoop Distributed File System (HDFS) - A distributed file system that ensures reliability by replicating data.

Apart from these two projects, Hadoop has its own ecosystem of Apache projects like Pig, Hive, HBase, Zookeeper and more.

Hadoop is mainly designed to solve big complex data problems. The underlying technology idea of Hadoop was formulated at Google where they were doing the complex task of indexing the web. Such data would not fit into a relational database system like MySql or Oracle. Hence they designed a technology that would scale linearly and be able to store and process petabytes of data.

To summarize, the main characteristics of Hadoop are:

  • It is reliable because data is held on multiple nodes.
  • It is scalable and it scales linearly.
  • It has simple API’s to work with.
  • It is cost effective and can run on commodity hardware.

Now let's look at, why should you be interested in Hadoop? Applications of Hadoop are varied and many. They range from using it for fraud detection and prevention by credit card companies to predictive modeling for new drugs. Following are some examples:

  • Chevron uses Hadoop to process large amount of data which will help them identify reservoirs of oil.
  • Facebook uses Hadoop to process information that is generated by all its users to make decisions about improvements to the product.
  • UC Irvine Medical Center uses to Hadoop to improve the quality of care.

So, as you see above there are many useful applications for Hadoop. There are companies out there who are sitting on a vast amount of data not knowing how to use it to their advantage. Also, there are companies out there who know what kind of information they would like to extract by processing that data but the processing power required is huge so they are held back. Well, they don’t need to hold back any more. Hadoop has matured quite a bit and they can use it now. Contact us today if you need help setting up Hadoop in your organization.