Sunday, April 27, 2014

What's Hadoop ?

HADOOP SIMPLIFIED


Since the buzz of Hadoop is on a high note in the Developer Community and in the Industry as well, I couldn't stop my self from writing on Hadoop. Though this topic can include multiple high speed bouncers that can go over your head, I will try to keep it very simple and make sure you get a basic idea about Hadoop. 

What is Hadoop?

To be very frank, Hadoop is all about parallel computing incorporating many clusters of commodity hardware.The basic idea is that instead of having a few costly super minicomputers/mainframes which do all the processing work and storage of data, its better to have clusters of less powerful computers and use parallel computing.Truly speaking, these super minicomputer/mainframes are not cost effective.You have to pay so much for them and all you may get is 10-50 times more processing power.Also try to understand that you have to rely on these big machines for just everything.What if they fail/crash?

                    This is where Hadoop comes to the rescue. Hadoop is an open source software framework that deals with storage and large scale processing of data on clusters of commodity hardware i.e. many less powerful computers connected together.Here you get better performance at a lower cost.Also if one of the node fails, it's job can be given to another node with in no time.So, the job of Hadoop framework is to handle these problems of resource management,storing and replicating data,data processing,etc...

These modules make it possible:

Hadoop common - has libraries/utilities that may be needed by other Hadoop modules.

Hadoop Distributed File System(HDFS) - It's a distributed file system which allows storage of data on                                                                         commodity machines.

Hadoop MapReduce: It's a programming model for large scale data processing.

Hadoop YARN- It does resource management in clusters.

Why Hadoop?

  • Less expensive
  • Highly scalable.
  • Good community support
  • Open source software framework
By scalability, I mean that if at a point of time I feel that I should add 10 more computers to the cluster, It can be done easily without any technical glitch.

Famous Companies That Use Hadoop:

  • Facebook 
  • Yahoo!
  • Amazon
  • IBM
There is no doubt that many other small companies and start-ups are also getting attracted to Hadoop.

Future of Hadoop:

By seeing the current scenario it seems to me that Hadoop has a promising future.I think making an expertise in Hadoop framework will not be a bad deal.



If you feel anything needs to be added or modified, please share your views. 

 

               

No comments:

Post a Comment

Thanks for your valuable comment