Thursday, August 29, 2013

Alter Table Vs Ins Select for Modifying Table Structure in Teradata

Conclusion

It is strongly recommend implementing Alter Table, at least start considering it. If you're concerned about availability you should bear in mind that this process will probably be scheduled out of business hours anyway.
And when you need to change the [P]PI or you just want the safeness of a copy of the old table you should definitely prefer Merge Into over good ol' Insert Select.

Please review below figure to see the Pros and Cons of using Alter/Insert Selct and Merge into options for modifying any table in Teradata.

The above conclusion is based on Dieter Blog reference link for details is as follows

http://developer.teradata.com/blog/dnoeth/2013/08/why-dont-you-use-alter-table-to-alter-a-table

Wednesday, August 28, 2013

Dynamic DDL Generation using BTEQ

Dynamic script can be generated using Teradata DBC tables. we would use dbc.tables to generate the DDL of all objects available in development database and later we can change that script to deploy to Test or any other environment.

You can use below BTEQ script which will actually generate the standard SHOW TABLE ; statement for all objects in development database and results will be exported to Prepare_DDL.BTEQ .

.SET ERROROUT STDOUT
.logon 10.16.X.X/tdusre,tduserpwd
.Set Echoreq off
.Set Titledashes off
.Set Separator '|'
.Set Format off
.set width 5000
.Set Null ''
.export file=.\Prepare_DDL.BTEQ;
select distinct 'SHOW TABLE ' || TRIM(databasename) || '.' || TRIM(tablename) || ';'
from dbc.tables
where tablekind =t' and databasename in
(
'DD_DEV_ENV'

);

.export reset;

.Logoff;
.Quit;

You can invoke this BTEQ on command prompt as "Bteq < Bteq_filename > FileName.logs

In next step you can open up Prepare_DDL.BTEQ and modify it as above to execute that will give you all table DDL;s in one file. The modified file will look like as follows

.SET ERROROUT STDOUT
.logon 10.16.X.X/tdusre,tduserpwd
.Set Echoreq off
.Set Titledashes off
.Set Separator '|'
.Set Format off
.Set Null ''
.set width 5000
.export file=.\Generated_DDL.sql;

SHOW TABLE DD_DEV_ENV.ACTION_TYPE;
SHOW TABLE DD_DEV_ENV.BARRING;
...........
......
......
.export reset;
.Logoff;
.Quit;

The final exported file "Generated_DDL.sql" will have all the Create table statements which you can modify /parametrized and use to migrate to any other environment.

Thanks

Thursday, August 22, 2013

Hadoop : Getting Started with PIG

In this tutorial lets get started to learn some basics about PIG. If you recall from previous tutorial PIG is a script based utility to write transformations e.g Agg , Join etc similar to SQL it is for people who are more comfortable in SQL then Java. But you can also make UDF for complex transformation which are written in Java and called directly in PIG.

If you are new to Hadoop please go through http://tahir-aziz.blogspot.com/2013/07/introduction-to-big-data-and-hadoop.html

Our current case study which we want to implement is a very simple example we want to compuet the average Stock Volume for IBM company .From last tutrial we loaded the Stock exchange data and we would use the same to complete this.

Refer to the blog to http://tahir-aziz.blogspot.com/2013/08/getting-started-with-hadooploading-data.html

To write our first PIG script follow below steps.

Login to your Horton Sandbox and on Main screen Click on "Pig" icon. Refer to diagram below
You can create your first script by entering the Script Name and in the middle of the screen you can see Editor where you should actually code the logic.
Enter below script into Editing Area
A = LOAD 'nyse_stocks_t' USING org.apache.hcatalog.pig.HCatLoader(); // Load the file into a variable A but make a note that the file should be first loaded using File Uploader at /User/SandBox.
B = FILTER A BY stock_symbol == 'IBM'; // Filter data for IBM only
C = GROUP B all; // Simple Group by function on ALL columns
D = FOREACH C GENERATE AVG(B.stock_volume); // Iterate records to calculate Average using AVG function.
dump D;
Click on "Save" to save this script to sandbox.
Later you can Open the script and "Execute" to make sure it works fine. The expected result is
```
(7915934.0)
```

Hope that helps you get started.

Thanks

Tahir Aziz

Tuesday, August 20, 2013

Getting Started with Hadoop:Loading Data using Hcatalog/Hive

In this tutorial we will get started to load data into Horton Hadoop .We would explain step by step instructions to load data using Hcatalog and will later use Hive and do simple data manipulation on the loaded data using HIVE SQL.

Refer to http://tahir-aziz.blogspot.com/2013/07/introduction-to-big-data-and-hadoop.html if you are new to Hadoop and need more basic information.

Loading Data using HIVE

You can load sample data into Hadoop using File Explorer or Hive Directly. Let’s use Hive UI interface to create a table using File as input. For this example we are going to use the sample file available at Hadoop website (Download this file and save at your disk)

https://s3.amazonaws.com/hw-sandbox/tutorial1/NYSE-2000-2001.tsv.gz

Follow below steps to create table from file and load data into target table.

1. Click on “HCatalog Icon” and Click on “Create a new table from File. (See Fig below).

2. Fill the required details enter Table name, Description and chose to upload file which you have saved on your disk.

3. Click Next and on next step “Choose Delimiter” Chose the delimiter for this sample file it’s TAB but other options are also available if your file delimiter is something else e.g. comma , Pipe etc.

4. Click Next and on next step “Define Columns” Check the data types by default it treats everything as string you can change the data type to suitable data type as per the column contents. E.g. in our current example file stock_price_high, stock_price_low etc are good candidates to be handled as float so change it.

5. Once you are done the process will create table and will also load the table with the file which you have just selected in above steps.

So what is next now? Let’s check the data content using the HIVE SQL and verify if the contents are loaded properly and all data types are OK.

Click on Hive UI and you will see an interface where you can enter your query and execute them.

To check if the table has been created and what are the data types you can enter your query as

describe nyse_stocks_t;

It will display the column names and data type of the columns.

Now let’s try to do a simple manipulation let’s say we want to calculate the average of Stock volume for IBM. HIVE SQL is similar to standard SQL and supports almost all SQL aggregations. You can write below HIVE SQL to calculate the average.

select AVG(stock_volume) from nyse_stocks_t

where stock_symbol == 'IBM';

This will return the Average for IBM.

Hope you learned the basics.

Thanks

Tahir Aziz

Friday, August 2, 2013

Push down Optimization - Informatica

Pushdown Optimization Option enables data transformation processing, to be pushed down into any relational database to make the best use of database processing power. It converts the transformation logic into SQL statements, which can directly execute on database. This minimizes the need of moving data between servers and utilizes the power of database engine.

There are three main types of Push down

Source-side pushdown optimization
Target-side pushdown optimization
Full pushdown optimization

To read more about PDO and configurations please refer to below very use ful article.

http://www.disoln.org/2013/07/Informatica-PowerCenter-Pushdown-Optimization-an-ELT-Approach.html?goback=%2Egde_1690897_member_262700329

Thanks