DATA WAREHOUSE COMPONENTS & ARCHITECTURE
Lecture Note # 02
The data in a data warehouse comes from operational systems of the organization as well as from other external sources. These are collectively referred to as source systems. The data extracted from source systems is stored in a area called data staging area, where the data is cleaned, transformed, combined, deduplicated to prepare the data for us in the data warehouse. The data staging area is generally a collection of machines where simple activities like sorting and sequential processing takes place. The data staging area does not provide any query or presentation services. As soon as a system provides query or presentation services, it is categorized as a presentation server. A presentation server is the target machine on which the data is loaded from the data staging area organized and stored for direct querying by end users, report writers and other applications. The three different kinds of systems that are required for a data warehouse are:
1. Source Systems
2. Data Staging Area
3. Presentation servers
The data travels from source systems to presentation servers via the data staging area. The entire process is popularly known as ETL (extract, transform, and load) or ETT (extract, transform, and transfer). Oracle’s ETL tool is called Oracle Warehouse Builder (OWB) and MS SQL Server’s ETL tool is called Data Transformation Services (DTS).
A typical architecture of a data warehouse is shown below:
Up until this point, Third Star Financial Services has operated via a succession of mergers and acquisitions where systems were inherited but never integrated into the network. Its data management has been virtually non-existent and entirely ineffective. Evidence of this can be found in the absence of an enterprise-wide data management solution and the presence of several disparate systems operating independently with no measurable benefit to the company. Due to a lack of actionable data, management makes decisions based on instinct rather than through analysis. A direct consequence of this is a steadily declining market share and loss of high-level employees to competing companies. Fortunately, this discrepancy has been identified and Third Star executives have established the new goal of modernizing and streamlining operations. Using concepts outlined by the Data Management Association (DAMA), this proposed enterprise architecture will allow Third Star to transform their data from a liability to an asset.
5. What special issues about data warehouse management (e.g., data capture and loading for the data warehouse (ETL processes) and query workload balancing) does this case suggest occur for real-time data warehousing? How has Continental addressed these issues?
Extraction, Transformation, and Loading processes are responsible for the operations taking place in the back stage of a data warehouse architecture. In a broader aspect, initially the data is extracted from the source data stores which could be On-Line Transaction Processing or Legacy system, files of any formats, web pages or any other documents like spreadsheets or text documents. In this step, only the data which is different from the previous execution of ETL process (newly inserted, updated) gets extracted from the sources. Next, the extracted data is sent to Data Staging Area where the data is transformed and cleaned. Finally, the data is loaded to the central data warehouse and all its counterparts e.g., data marts and views. (Kabiri & Chiadmi 2013, p.1)
- this is to support their information based system while having shared communication between different branches
24) Before it can be loaded into the data warehouse, operational data must be extracted and
What information is accessible? The data warehouse offers possibilities to define what’s offered through metadata, published information, and parameterized analytic applications. Is the data of high value? Data warehouse patrons assume reliability and value. The presentation area’s data must be correctly organized and harmless to consume. In terms of design, the presentation area would be planned for the luxury of its consumers. It must be planned based on the preferences articulated by the data warehouse diners, not the staging supervisors. Service is also serious in the data warehouse. Data must be transported, as ordered, promptly in a technique that is pleasing to the business handler or reporting/delivery application designer. Lastly, cost is a feature for the data
A data warehouse is a large databased organized for reporting. It preserves history, integrates data from multiple sources, and is typically not updated in real time. The key components of data warehousing is the ability to access data of the operational systems, data staging area, data presentation area, and data access tools (HIMSS, 2009). The goal of the data warehouse platform is to improve the decision-making for clinical, financial, and operational purposes.
A data warehouse is unique kind of a database where current and historical data about a certain group of people such as customers, is stored. Information from operational systems, such as transaction processing systems, is extracted and summarised then stored in in a data warehouse. This type of information includes records about customer interaction patens, customer purchasing history or trends and current customer records. The information in a data warehouse is used for management analysis and decision making.
In today 's organizations, basic descision making procedures and day by day operations frequently rely upon information that is put away in an assortment of information stockpiling frameworks, arrangements, and areas. To transform this information into helpful business data, the information commonly should be consolidated, purified, institutionalized, and compressed. For example, data may be changed over to an alternate information sort or different database servers may store the vital information utilizing diverse patterns. Dissimilarities like these must be settled before the information can be effectively stacked to an objective target. After the plan and improvement of data warehouse as per the business prerequisites, the way toward combining the information into the information stockroom from different sources is to be thought of. Extract Transform Load (ETL) procedures are basic in the achievement of the Data Warehousing ventures. The way toward extricating information from one source (extract), changing it as per the outline of the data warehouse(transform) and stacking it into data warehouse (load) constitute ETL. As it were, ETL is the way toward extracting information from different information sources, changes it according to the prerequisites of the target data warehouse and effectively stacking it into the data warehouse (database). In the transformation procedure data is institutionalized to make it perfect with the target database along with data purifying
The data warehouse comes ready for use, but an organization has to get prepared to use it. The main factor is data warehouse usage. A data warehouse can be used for decision making for management staff.
Chapter 11 Enterprise Resource Planning Systems 1. Closed database architecture is a. a control technique intended to prevent unauthorized access from trading partners. b. a limitation inherent in traditional information systems that prevents data sharing. c. a data warehouse control that prevents unclean data from entering the warehouse. d. a technique used to restrict access to data marts. e. a database structure that many of the leading ERPs use to support OLTP applications. 2. Each of the following is a necessary element for the successful warehousing of data EXCEPT a. cleansing extracted data. b. transforming data. c. modeling data. d. loading data. e. all of the above are necessary. 3. Which of the following is typically NOT part of
· Extracting data from source systems, transforming it, and then loading it into a data warehouse
This represents the different data sources that feed data into the data warehouse. The data source can be of any format - plain text file, relational database, other types of database, Excel file, etc., can all act as a data source.
Data warehouse are multiple databases that work together. In other words, data warehouse integrates data from other databases. This will provide a better understanding to the data. Its primary goal is not to just store data, but to enhance the business, in this case, higher education institute, a means to make decisions that can influence their success. This is accomplished, by the data warehouse providing architecture and tools which organizes and understands the
Efficient data integration can address some of the issues associated with increasing demands for accessing data from numerous sources and of varied structure and format. Yet some complications remain in populating data warehouses in a timely and consistent manner that meets the performance requirements of consuming systems. When the impediments are linked to the complexity of extraction and transformation in a synchronous manner, you run the risk of timing and synchronization issues that lead to inconsistencies between the consumers of data and the original source systems.