Tahir Aziz: July 2013

Friday, July 26, 2013

Partitioning in Informatica

A very to the point use case to understand the Partitioning in Informatica and configuration settings to make it work.

In Informatica you can use different types of partition algorithms listed below

-Pass Through
-Round robin
-Hash AutoKey
-Key Range
-Hash user Key
-Database Partition

http://www.disoln.org/2013/07/Informatica-PowerCenter-Partitioning-When-Where-and-How.html?goback=.gde_1690897_member_260653409

Thanks

Tuesday, July 16, 2013

Load Balancing In Informatica

A Very useful article to explain about the Load balancing in Informatica.

http://ptlshivakumar.blogspot.in/2013/07/load-balancing-in-informatica.html?goback=.gde_37180_member_257933343

Thanks

Monday, July 15, 2013

Introduction to Big Data and Hadoop

Hi , so today we would like to cover some of the basis about Big Data analytics and how you can learn about it. we would also look at the Open source stable version to play around with Hadoop and get some hands on experience.

Big Data : What is Big data ?

Now you may encounter many definations about Big Data but the short and to the point defination is where your data volume is large and your typical database processing on that data set do not meet business response time lines.

so its not only about the volume of the data its more about the processing time it takes to produce any value.

Get Started : Where do I start?

Now where should one get started if you are interested in knowing it and want to explore more ? you may have often heard about Hadoop. Horton works is providing an open source stable version of Hadoop which you can easily install on your machine and start playing around.

Download Link : http://hortonworks.com/get-started/

Hadoop Architecture

Hadoop comes with a lot of components you can view list of available components in Horton Hadoop below.

Below is a very brief and to the point description of each of these components.

SCOOP : Its a utility in apache hadoop to move data from SQL database into hadoop.

PIG : A script based utility to write transformations e.g Agg , Join etc similar to SQL it is for people who are more comfortable in SQL then Java. But you can also make UDF for complex transformation which are written in Java and called directly in PIG.

Hive : Its an SQL interface for Hadoop and is used for data analysis can connect to any other source with ODBC drivers e,g Excel , MStr , other data storagtres etc. The language is caled HQL (Hive Query Language) same as like SQL.

H-Base : NO SQL Database for Hadoop

H-Catalog : Metadata about the hadoop database.

AMBARI : Ambari enables you to manage, monitor and install your cluste

OOZIE : Its Hadoop schedular and can schedule jobs which you will develop in PIG /Hive or Scoop.

Where does Hadoop fit in the Enterprise Model ?

Hadoop is actually to deliver the best and quick Analytics it is NOT replacement of the existing DWH or OLTP systems. It can consume information and can help you achive your target SLA with gicing you a capability to perform analytics on Non-Structures ,Semi structured data sets.

Thanks

Tuesday, July 9, 2013

Exporting Data From Oracle to Flat File

Hi here's an example of how an sqlplus export script should look if you would like to export data from Oracle to a flat file .

------

set colsep | -- separate columns with a "|" character

set headsep off -- remove heading separators

set linesize 1000 -- line size set to 1000 or whatever your maximum length is

set numw 20 -- 20 is the length of numbers, you set this at whatever length you want your numbers to be. This should also avoid getting numbers in scientific notations such as 1.77382E... etc.

set pagesize 0 -- no headers

set trimspool on -- get rid of trailing blanks set echo off -- so you don't get the data on the screen that slows down the process significantly... if you want to monitor the file you can use the unix command tail -f /usr/loadfiles/loadfile1.dat set feedback off -- no messages

spool /usr/loadfiles/loadfile1.dat -- location where the flat file will be exported... You might have to convert dates, numbers etc. to varchars

select col1,col2,col3,... from tableX where ... ; -- query for downloading the data... You can put here your "latest data" vs. "cold data" filtering conditions etc.

spool off

exit

------

If you want everything into a single line you can use the concatenate query (without trailing blanks after every column): select col1|| '|' ||col2|| '|' ||col3,... from tableX where ... In that case you don't need the "set colsep... statement). There are many ways of specifying the query. Also (this is trivial, but worth mentioning) be sure you have enough space and write permissions in the file system you are exporting the file.

You put the text between the two "------" into a file called (for example) script.sql or something similar, then run from the unix prompt as follows:

unix prompt> sqlplus username/password@oracle_connection @script.sql

You can put multiple sqlplus statements into a unix shell script for exporting from multimple sources/tables. Be aware that I did not test this, so you will have to experiment a bit in your specific environment, but it should give you a good conceptual idea on how to do it...