34 Subsystems of ETL

In this, and in the next series of posts, I will be exploring the 34 subsystems of ETL Data Integration as defined by the Kimball Group. I introduce the subsystems in this post, and then I will discuss how each fits (or does not fit) into Talend & PDI .

The subsystem concept is a best-practice initiative formulated by The Kimball Group to help organizations design effective and efficient Data Integration environments for Data Warehousing using the Dimensional Model.

The Kimball Group categorizes the subsystems into 4 distinct groups: Data Extraction, Cleansing and Conforming Tasks, Data Delivery, and Management.

Data Extraction

1. Data Profiling

Talend: Talend has a separate tool for data profiling & data quality called 'Talend Open Studio for Data Quality'
Pentaho: 'DataCleaner' plugin is available for download for this purpose

2. Change Data Capture (CDC)

Talend: Talend has a inbuilt trigger based CDC feaature which can be applied easily. (Enterprise version only feature). It also has option of log based CDC for Oracle database.
Pentaho: Has no specific components

3. Extraction

Both Talend & PDI has plenty of connectors available to connect to a variety of sources

Cleansing and Conforming Tasks

4. Data Cleansing Subsystem
5. Error Event Management
6. Auditing
7. Removing Duplicates
8. Data Conformance

Data Delivery

9. Slowly Changing Dimensions (SCD)
10. Surrogate Key Generator
11. Hierarchy Manager
12. Special Dimensions Manager
13. Fact Table Builders
14. Surrogate Key Management
15. Bridge Table Builder
16. Late Arriving Data Handler
17. Dimension Manager
18. Fact Table Provider
19. Aggregate Generation
20. OLAP Cube Builder
21. Data Propagation Manager

Management

22. Scheduler
23. Backup System
24. Recovery and Restart
25. Version Control
26. Version Migration
27. Work flow Monitor
28. Sorting
29. Data Lineage and Dependency
30. Problem Escalation
31. Paralleling and Pipelining
32. Security
33. Compliance Manager
34. Metadata Repository

Comments

Popular posts from this blog

Increase Java Memory For Pentaho Data Integration

Different Match Models in tMap with example

Simple Transformation from a csv to Excel Output