34 Subsystems of ETL
In this, and in the next series of posts, I will be exploring the 34 subsystems of ETL Data Integration as defined by the Kimball Group. I introduce the subsystems in this post, and then I will discuss how each fits (or does not fit) into Talend & PDI .
The subsystem concept is a best-practice initiative formulated by The Kimball Group to help organizations design effective and efficient Data Integration environments for Data Warehousing using the Dimensional Model.
The Kimball Group categorizes the subsystems into 4 distinct groups: Data Extraction, Cleansing and Conforming Tasks, Data Delivery, and Management.
Talend: Talend has a separate tool for data profiling & data quality called 'Talend Open Studio for Data Quality'
Pentaho: 'DataCleaner' plugin is available for download for this purpose
2. Change Data Capture (CDC)
Talend: Talend has a inbuilt trigger based CDC feaature which can be applied easily. (Enterprise version only feature). It also has option of log based CDC for Oracle database.
Pentaho: Has no specific components
3. Extraction
Both Talend & PDI has plenty of connectors available to connect to a variety of sources
5. Error Event Management
6. Auditing
7. Removing Duplicates
8. Data Conformance
10. Surrogate Key Generator
11. Hierarchy Manager
12. Special Dimensions Manager
13. Fact Table Builders
14. Surrogate Key Management
15. Bridge Table Builder
16. Late Arriving Data Handler
17. Dimension Manager
18. Fact Table Provider
19. Aggregate Generation
20. OLAP Cube Builder
21. Data Propagation Manager
23. Backup System
24. Recovery and Restart
25. Version Control
26. Version Migration
27. Work flow Monitor
28. Sorting
29. Data Lineage and Dependency
30. Problem Escalation
31. Paralleling and Pipelining
32. Security
33. Compliance Manager
34. Metadata Repository
The subsystem concept is a best-practice initiative formulated by The Kimball Group to help organizations design effective and efficient Data Integration environments for Data Warehousing using the Dimensional Model.
The Kimball Group categorizes the subsystems into 4 distinct groups: Data Extraction, Cleansing and Conforming Tasks, Data Delivery, and Management.
Data Extraction
1. Data ProfilingTalend: Talend has a separate tool for data profiling & data quality called 'Talend Open Studio for Data Quality'
Pentaho: 'DataCleaner' plugin is available for download for this purpose
2. Change Data Capture (CDC)
Talend: Talend has a inbuilt trigger based CDC feaature which can be applied easily. (Enterprise version only feature). It also has option of log based CDC for Oracle database.
Pentaho: Has no specific components
3. Extraction
Both Talend & PDI has plenty of connectors available to connect to a variety of sources
Cleansing and Conforming Tasks
4. Data Cleansing Subsystem5. Error Event Management
6. Auditing
7. Removing Duplicates
8. Data Conformance
Data Delivery
9. Slowly Changing Dimensions (SCD)10. Surrogate Key Generator
11. Hierarchy Manager
12. Special Dimensions Manager
13. Fact Table Builders
14. Surrogate Key Management
15. Bridge Table Builder
16. Late Arriving Data Handler
17. Dimension Manager
18. Fact Table Provider
19. Aggregate Generation
20. OLAP Cube Builder
21. Data Propagation Manager
Management
22. Scheduler23. Backup System
24. Recovery and Restart
25. Version Control
26. Version Migration
27. Work flow Monitor
28. Sorting
29. Data Lineage and Dependency
30. Problem Escalation
31. Paralleling and Pipelining
32. Security
33. Compliance Manager
34. Metadata Repository
Comments
Post a Comment