Monday, July 15, 2013

Introduction to Big Data and Hadoop

Hi , so today we would like to cover some of the basis about Big Data analytics and how you can learn about it. we would also look at the Open source stable version to play around with Hadoop and get some hands on experience.

Big Data : What is Big data ?

Now you may encounter many definations about Big Data but the short and to the point defination is where your data volume is large and your typical database processing on that data set do not meet business response time lines.  

so its not only about the volume of the data its more about the processing time it takes to produce any value.

Get Started : Where do I start?

Now where should one get started if you are interested in knowing it and want to explore more ? you may have often heard about Hadoop. Horton works is providing an open source stable version of Hadoop which you can easily install on your machine and start playing around.

Download Link : http://hortonworks.com/get-started/

Hadoop Architecture 

Hadoop comes with a lot of components you can view list of available components in Horton Hadoop below.




Below is a very brief and to the point description of each of these components.


SCOOP : Its a utility in apache hadoop to move data from SQL database into hadoop.

PIG :  A script based utility to write transformations e.g Agg , Join etc similar to SQL it is for people who are more comfortable in SQL then Java. But you can also make UDF for complex transformation which are written in Java and called directly in PIG.

Hive : Its an SQL interface for Hadoop and is used for data analysis can connect to any other source with ODBC drivers e,g Excel , MStr , other data storagtres etc. The language is caled HQL (Hive Query Language) same as like SQL.

H-Base :  NO SQL Database for Hadoop

H-Catalog : Metadata about the hadoop database.

AMBARI : Ambari enables you to manage, monitor and install your cluste
OOZIE : Its Hadoop schedular and can schedule jobs which you will develop in PIG /Hive  or Scoop.

Where does Hadoop fit in the Enterprise Model ? 

Hadoop is actually to deliver the best and quick Analytics it is NOT replacement of the existing DWH or OLTP systems. It can consume information and can help you achive your target SLA with gicing you a capability to perform analytics on Non-Structures ,Semi structured data sets. 



 


Thanks





1 comment:

  1. Hi I read your post very carefully and I think you are right that a well written post should be at least a 100 words and should capture the essence of your blog, book or article.

    Web Designing Training in Chennai

    Java Training in Chennai

    Salesforce Training in Chennai

    ReplyDelete