Tuesday, May 30, 2017

What is ETL?



What is ETL?


ETL (Extract, Transform and Load) is a procedure in data warehousing to transform information out of the source systems and putting it into an information distribution center (Data Warehouse).



  • Extract is the process of reading data from a one or multiple source systems.

  • Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another system. 
Transforming of data means performing any of the below tasks.
    •  Applying business rules (e.g. calculating new measures and dimensions),
    •  Cleaning (e.g. mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),
    •  Filtering (e.g. selecting only certain rows/columns to load),
    •  Splitting a column into multiple columns and vice versa,
    •  Joining together data from multiple sources (e.g. lookup, merge),
    • Changing date format (e.g. From YYYY-MM-DD to MM-DD-YYYY)
    •  Transposing rows and columns,
    •  Applying any kind of simple or complex data validation (e.g. if the first 2 columns in a row are empty then reject the row from processing)

  • Load is the process of writing the data into the target systems (Generally Data Warehouse).