It involved 4 Phases: Planning, Designing, Implementation and Maintenance
Planning involves
- Understanding of business requirements and Identifying the scope of whole DW ( is it for reporting or for creating cube)
- Understanding the heterogeneous sources from where the data is going to pulled and the destinations to which the data is going to.
- Planning for conceptual, logical and physical schema of pre-staging DB, staging DB and destination data-warehouse.
- Understanding the backup and recovery OLTP database
Designing
- Designing an algorithm for Data Profiling
- Documenting the “Finding” for Data Profiling
- Designing Pre-staging area
- Designing the Incremental load algorithm for packages whether it will be Cyclic Redundancy Checksum algorithm or triggers implementation
- Design the debugging methods (Checkpoints, Breakpoints and data viewers)
- Designing the event handlers on pre-execute, post-execute, on-error, on failure, on variable change etc
- Designing log mechanisms for error handling using Audit transformation in SSIS 2005
- Designing the Package Configurations to move packages between environments
- Designing the Destination Data-warehouse schema.
Implementation
- Implementing everything what we designed
- Documenting each database object created
- Perform the unit testing and functionality testing for each of the package
Maintenance
- Improving latency of packages, queries and stored procedures involved in ETL process
- Taking care of Backup Schedules and Scheduling Packages
What is Data Profiling?
Data Profiling is analyzing the data to find if the data is fit for the project or should be discarded.
Data Profiling involve creating “Dynamic SQL Procedure” for
- Finding length of the columns
- Finding the max value
- Finding the min value
- Count of all the rows
- Count of all the null values
- Finding the null percentage
- Checking patterns if any (i.e data consistencies)
- Incorrect spelling
- Mis-spelt spelling
- White space (blank space)
Appropriate action should be taken to handle the following situations, along with many others: name and address parsing, inconsistent descriptive values, missing decodes, invalid data, and missing data.Then Document this finding to pinpoint the discrepancies in the data and validate if the data can be used for project or can it be the source of ETL Process
What does ETL even Mean? Well, Its and abbreviation for
Extraction: Extract the relevant data
Transform : Transfer data to DW format
— Build Keys
— Cleansing the data
–Confirm the data
Load: Load data into DW
An ETL developer is an important person to almost every organization because he/she has lot of impact on the data use for analysis and other business decisions.