preview

Construction Of A Data Warehouse

Decent Essays

The process of where a data warehouse is fed with extracted source data is largely known as ETL (Extraction, Transforming and Loading). ETL is a critical process in the construction of a data warehouse project.
The three stages of the ETL process comply of:
 Extraction: Data is identified and extracted from one or more external different sources, including applications and database systems.
 Transform: Data is transformed in the aim of ensuring consistency and satisfying business requirements.
 Loading: Data is loaded into the resultant data warehouse.

There are many challenges involved with fulfilling a dependable ETL process. Some of these relate to:
 Data volumes – as there are huge amounts of data available presently, …show more content…

Throughout extraction, the desired data is identified and extracted from the source system and is made available for additional processing. The data can be extracted from numerous different sources. In most cases, the data sources are internal however sometimes they are external. The ultimate aim is to retrieve all the essential data from the source system with as little resources as possible. The size of the data extracted can range from kilobytes to gigabytes.
For extraction within the ETL to be positive and effective, it is essential that there is a high understanding of the data’s layout. The data should also be backed up in another location. Sometimes, it isn’t possible to identify the exact subsection of interest, as a result a larger amount of data than needed has to be extracted. Consequently, the identification part of extract is carried out further down the line within the ETL process. Depending on the competence of the system, transformations can take place during the extraction stage of ETL.
The difficulty of the extraction step is dependent on how alike or unalike the source systems are and the type of source data. Good documentation along with excellent maintenance and the use of similar technology within a system should result in fairly simple extraction process. On the other hand, poor documentation and incompetent maintenance using different formats of data and technologies will result in

Get Access