Posts

Ways to schedule Talend Job

Ways to schedule your Talend job (open studio version) : 1       Export your job as Executable .bat or .sh file : Right click your job in Repository >  Build Job >  Build Type > Standalone Job        You can Export your job as  .bat or .sh executable  which can then run on any machine without the      help of Talend. It is fully portable. You just need java. It is similar to running a java code. You can then schedule via Windows/Linux scheduler to run this executable. 2.     Export  your job as webservice : Right click your job in Repository >  Build Job > Build Type > War file                                       You can export your job as WAR file (.war) too. You must then add it in your tomcat and run as a webapp.

Perfomance tuning : Tmap Component in Talend

The settings mostly depend on the volume of data you require for processing while applying a join, i.e.,  If Lookup table have few rows: select   "Lookup model"  as Load Once .  This will load lookup data into memory only once and uses that to compare all other records from parent table.   This is a faster option as memory loading is less IF lookup table have large amount of data : Talend cant store the lookup table data in memory. you will get java heap space Execption.  To resolve this issue follow the below steps open tmap go to lookup table click on tmap settings select the value for  store temp data  property to  True  click on ok In Tmap properties basic settings  set  Temp Data Directory path  by browsing folder Go to Advance settings set   max buffer size(nb of row)  to some value based on  lookup condition data type

Different types of Facts

Fact - A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level. Types of Facts - Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact. Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. Eg: Facts which have percentages, ratios calculated. Factless Fact Table:  In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”. Eg: A fact table wh

Different types of Dimensions

Dimension - A dimension table typically has two types of columns, primary keys to fact tables and textual\descreptive data. Types of Dimensions - Slowly Changing Dimensions:   Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension. Rapidly Changing Dimensions: A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Ra

Slowly Changing Dimensions

Slowly Changing Dimensions are  dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee. There are many approaches how to deal with SCD. The most popular are:  Type 0  - The passive method Type 1  - Overwriting the old value Type 2  - Creating a new additional record Type 3  - Adding a new column Type 0  - The passive method. In this method no special action is performed upon dimensional changes. Some  dimension data  can remain the same as it was first time inserted, others may be overwritten.  Type 1  - Overwriting the old value. In this method no history of dimension changes is kept in the database. The old dimension value i

Deploying your job - Talend

After the completion of development you have the scenario in which you want to deploy your ETL, may it be test environment or production environment. In Talend we have two widely used methods to deploy your ETL. 1. Autonomous Job The job is exported as a Java file with a .bat/.sh executable. So you can put your ETL job on any machine with java to run on it. 2. Web service  -  (WAR file) The job is exported as a .war for running it as a webservice & integrating it in your web application.