It involved 4 Phases: Planning, Designing, Implementation and Maintenance

Planning involves

  • Understanding of  business requirements and  Identifying the scope of whole DW ( is it for reporting or for creating cube)
  • Understanding the heterogeneous sources from where the data is going to pulled and the destinations to which the data is going  to.
  • Planning for conceptual, logical and physical schema of pre-staging DB, staging DB and destination data-warehouse.
  • Understanding  the backup and recovery OLTP database

Designing

  • Designing an algorithm for Data Profiling
  • Documenting the “Finding” for Data Profiling
  • Designing Pre-staging  area
  • Designing the Incremental load algorithm for packages whether it will be Cyclic Redundancy Checksum algorithm or triggers implementation
  • Design the debugging methods (Checkpoints, Breakpoints and data viewers)
  • Designing the event handlers on pre-execute, post-execute, on-error, on failure, on variable change etc
  • Designing log mechanisms for error handling using Audit transformation in SSIS 2005
  • Designing the  Package Configurations to move packages between environments
  • Designing the Destination Data-warehouse schema.

Implementation

  • Implementing everything what we designed
  • Documenting each database object created
  • Perform the unit testing and functionality testing for each of the package

Maintenance

  • Improving latency of packages, queries and stored procedures involved in ETL process
  • Taking care of Backup Schedules and Scheduling Packages

 

What is Data Profiling?

Data Profiling is analyzing  the data to find if the data is fit for the project or should be discarded.

Data Profiling involve creating “Dynamic SQL Procedure” for

  • Finding length of the columns
  • Finding the max value
  • Finding the min value
  • Count of all the rows
  • Count of all the null values
  • Finding the null percentage
  • Checking patterns if any (i.e data consistencies)
  • Incorrect spelling
  • Mis-spelt spelling
  • White space (blank space)

Appropriate action should be taken to handle the following situations, along with many others: name and address parsing, inconsistent descriptive values, missing decodes, invalid data, and missing data.Then Document this finding to pinpoint the discrepancies in the data and validate if the data can be used for project or can it be the source of ETL Process

What does ETL even Mean? Well, Its and abbreviation for

Extraction:  Extract the relevant data

Transform : Transfer data to DW format

— Build Keys

— Cleansing the data

–Confirm the data

Load:  Load data into DW

An ETL developer is an important person to almost every organization because he/she has lot of impact on the data use for analysis and other business decisions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *