In this tutorial we will get started to load data into Horton
Hadoop .We would explain step by step instructions to load data
using Hcatalog and will later use Hive and do simple data manipulation on the loaded data using HIVE SQL.
Loading Data using HIVE
You can load sample data into Hadoop using File Explorer or
Hive Directly. Let’s use Hive UI interface to create a table using File as
input. For this example we are going to use the sample file available at Hadoop
website (Download this file and save at your disk)
Follow below steps to create table from file and load data
into target table.
-
1.
Click on “HCatalog Icon” and Click on “Create a
new table from File. (See Fig below).
2.
Fill the required details enter Table name,
Description and chose to upload file which you have saved on your disk.
3.
Click Next and on next step “Choose Delimiter” Chose the
delimiter for this sample file it’s TAB
but other options are also available if your file delimiter is something else e.g.
comma , Pipe etc.
4.
Click Next and on next step “Define Columns”
Check the data types by default it treats everything as string you can change
the data type to suitable data type as per the column contents. E.g. in our
current example file stock_price_high, stock_price_low etc are good candidates
to be handled as float so change it.
5.
Once you are done the process will create table
and will also load the table with the file which you have just selected in
above steps.
So what is next now? Let’s check the data content using the
HIVE SQL and verify if the contents are loaded properly and all data types are
OK.
Click on Hive UI and you will see an interface where you can
enter your query and execute them.
To check if the table has been created and what are the data
types you can enter your query as
describe nyse_stocks_t;
It will display the column names and data type of the
columns.
Now let’s try to do a simple manipulation let’s say we want to
calculate the average of Stock volume for IBM. HIVE SQL is similar to standard
SQL and supports almost all SQL aggregations. You can write below HIVE SQL to
calculate the average.
select AVG(stock_volume) from nyse_stocks_t
where stock_symbol == 'IBM';
This will return the Average for IBM.
Hope you learned the basics.
Thanks
Tahir Aziz