Posts

Showing posts from February, 2014

Different types of Facts

Fact - A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level. Types of Facts - Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact. Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. Eg: Facts which have percentages, ratios calculated. Factless Fact Table:  In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”. Eg: A fact tabl...

Different types of Dimensions

Dimension - A dimension table typically has two types of columns, primary keys to fact tables and textual\descreptive data. Types of Dimensions - Slowly Changing Dimensions:   Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension. Rapidly Changing Dimensions: A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimen...

Slowly Changing Dimensions

Slowly Changing Dimensions are  dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee. There are many approaches how to deal with SCD. The most popular are:  Type 0  - The passive method Type 1  - Overwriting the old value Type 2  - Creating a new additional record Type 3  - Adding a new column Type 0  - The passive method. In this method no special action is performed upon dimensional changes. Some  dimension data  can remain the same as it was first time inserted, others may be overwritten.  Type 1  - Overwriting the old value. In this method no history of dimension changes ...

Deploying your job - Talend

After the completion of development you have the scenario in which you want to deploy your ETL, may it be test environment or production environment. In Talend we have two widely used methods to deploy your ETL. 1. Autonomous Job The job is exported as a Java file with a .bat/.sh executable. So you can put your ETL job on any machine with java to run on it. 2. Web service  -  (WAR file) The job is exported as a .war for running it as a webservice & integrating it in your web application.