What is the meaning of a data warehouse?

A data warehouse is a data repository which is used to store large quantities of historical data that are mostly used for creating reports that help businesses to identify their weaknesses and strengths. For example, demographics data about people of a region can help businesses identify the needs and preferences of the region’s population. A data warehouse acts as a central storage for storing data that comes from one or more heterogenous sources. It also consists of data which can be old and of historical value. One of the most important applications of this type of system is that it is used for supporting business intelligence and data analytics applications.

Introduction to data warehouse

The data warehouse is defined as the storage for huge volumes of data. This data is mostly transactional data which is used for taking critical business decisions. The data in a data warehouse is derived from various sources and data analytics techniques are applied on those data to gather useful insights from the raw data.

The image explains the position of Data warehouse in the system.

The data stored in the data warehouse is collected from the sales, marketing, finance, product management and various other departments of an organization. The transactional data which consists of day-to-day online transactions is also stored in the data warehouse. All this data can be used for analysis.

Therefore, data warehousing acts as a back-end engine for business intelligence tools, which show the reports and dashboards for business users. Thus, data warehouse is used in domains such as banking and insurance, marketing, health care, e-commerce and so forth.

The common data warehouse characteristics which are set by William Inmon are defined as follows.

Integrated: The data warehouse stores the data from various disparate sources into a logical format and at a single storage location.
Subject-oriented: The data warehouse data pertains to a single subject such as sales, customer, order and so forth.
Non-volatile: The data stored corresponds to archived data which is not updated on real-time like transactional data.
Time-Variant: The data stored in the data warehouse has a time stamp attached to it. It corresponds to the data collected over several years.

Key characteristics of data warehouse

The data stored in a data warehouse is structured for easy access and high-speed performance.
The data in a data warehouse is present in large volumes and it consists of large amounts of historical data.
Ad Hoc and predefined queries are most commonly answered using data available in data warehouses.
It helps in the decision making process.
It gives better insights to the current business scenario and offers guidelines for future scope.

OLTP vs data warehousing environment

The major differences between On-Line Transactional Processing (OLTP) performed on traditional database systems and Online Analytical Processing (OLAP) performed on data warehouse are listed below.

Workload: Data analysis and ad hoc queries are accommodated by the data warehouse. The data warehouse is optimized to perform various queries and analytical operations whereas, OLTP system only supports predefined operations.
Data modifications: In OLTP systems content of the database is updated often by issuing SQL queries whereas in a data warehouse the contents are not updated often since it acts as a repository of historical data.
Schema design: A de-normalized schema is used by the data warehouse to optimize queries and to perform analytics whereas, in the OLTP system, fully normalized schemas are used to optimize, insert, delete, and update performance.
Historical data: Data warehouse supports reporting and analyzing on historical data whereas, in the OLTP system only the recent transactional data are stored.

Data warehousing application types

Business intelligence (BI) applications are catered by the data warehouse. The following are the types of data warehousing applications:

Information processing.
Analytical processing.
Data mining.

Information processing

This data present in the data warehouse is processed by using the well known data analytics and statistical techniques and the final results are communicated to business users in the form of charts, tables, graphs, or reports.

Analytical processing

Data in a data warehouse is represented in the form of multi-dimensional data cube. The following are the operations which can be performed on the cube.

(i) Slice-and-dice: It is used to select a single value for any of the dimensions. For example, it can be used to determine the sales of various products in various regions in the year 2010.

(ii) Drill-down: The data in a data warehouse are stored in multiple levels of abstraction. It is used view that data at a more detailed level.

For example, the sales data can be stored at country-level $\to$ regional level $\to$ state-level $\to$ district level $\to$ store level along the location dimension. Drill-down operation can be used to view sales from state level to store level by moving down the hierarchy.

The image shows example for Drill Down explaining the result of sales level drop can occur from country level to store level.

(iii) Roll-up: The opposite of drill-down is a roll-up. Roll-up is used view data at a higher level of abstraction. The data is aggregated by moving up the concept hierarchy. For example, sales data can be viewed from state level to country level.

(iv) Pivot: In pivoting, the dimensional data is analyzed by rotating the cube. For example, the row dimension is often modified to the column dimension and vice-versa.

Data mining

Data mining is used to derive useful insights by applying various descriptive and predictive modelling techniques on the data stored in the data warehouse. It is also known as Knowledge Discovery in Database (KDD).

Data mining drives data with its results and past association to forecast the future. Therefore, data mining is data-driven and not user-driven. Data is discovered with the help of association, hidden patterns, predictions and classification.

Multi-dimensional data model and schemas

Data in a data warehouse is modelled in the form of data cube. It stores precomputed data which helps in Online Analytical Processing (OLAP). The data can be stored at different levels of abstraction along each dimension. The various abstraction levels available along each dimension is represented by concept hierarchy. For example, concept hierarchy for time dimension consists of levels such as year -> Quarter ->month -> week - >day and so forth.

Data cube in Data warehouse corresponding to sales — CC-BY | Image Credits: https://binaryterms.com/data-cube.html

The three schemas used to organize data in a data warehouse are listed below:

Star schema - It consists of a single fact table at the center and many dimension tables arranged in a radial pattern around the fact table. It is used when data is gathered with respect to a single subject.
Snowflake schema - It consists of single fact table but more than one dimension table for a particular dimension. The dimension tables are normalized and represented as more than one table.
Fact - constellation schema - It consists of multiple fact tables and several dimension tables related to the fact tables. It is used when data is gathered around more than one subject.

Data warehousing benefits

Helps organization to make informed decisions.
Increases ROI.
Provides visualization which helps in easy interpretation.
Maintains historical data.

Data warehousing disadvantages

Creating a data warehouse is a tedious task.
It incurs huge maintenance cost.
Data warehouse admin must be a skilled professional.
Data integration is difficult.

Context and Applications

This topic is important for postgraduate and undergraduate courses, particularly for, Bachelors in Computer Science Engineering, and Associate of Science in Computer Science.

Practice Problems

Question 1: Which of the following correspond to the operation carried out in a data warehouse?

Data mining
Analytical processing
Transaction processing
All the above

Answer: Option D is correct.

Explanation: Analytical processing, information processing, and data mining are the 3 applications of the data warehouse.

Question 2: A data warehouse does not require recovery, concurrency controls, and transaction processing.

Cannot say
True
False
Can be True or False

Answer: Option B is correct.

Explanation: A data warehouse does not require recovery, concurrency control and transaction processing mechanism because they are stored physically but isolated from the operational database.

Question 3: _____________ supports knowledge discovery by finding constructing analytical models, hidden patterns, performing classification, prediction, and associations.

Information processing
Analytical processing
Data mining
None of these

Answer: Option C is correct.

Explanation: Data mining supports knowledge discovery by finding constructing analytical models, hidden patterns, performing classification, prediction, and associations. Visualization tools are used to present the mining results.

Question 4: Are data transformation and data cleaning are the major steps in improving the quality of data and data mining results?

False
True
Can be True or False
Cannot say

Answer: Option B is correct.

Explanation: The given statement is true that both (data transformation & data cleaning) are vital steps to improve the quality of data and data mining results.

Question 5: Which of the following are schemas used in a data warehouse?

Star
Snowflake
Fact-constellation
All the above

Answer: Option D is correct.

Explanation: Data in a data warehouse is organized using star, snowflake or fact-constellation schema.

Want more help with your computer science homework?

We've got you covered with step-by-step solutions to millions of textbook problems, subject matter experts on standby 24/7 when you're stumped, and more.

Check out a sample computer science Q&A solution here!

*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.

Search. Solve. Succeed!

Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.

Tagged in

Engineering Computer Science

Database

Datawarehouse

Fundamentals of Datawarehouse

Fundamentals of Datawarehouse Homework Questions from Fellow Students

Browse our recently answered Fundamentals of Datawarehouse homework questions.

Q: Please use MATLAB

Q: (a) Think about a program that you have written in the past. Would another programmer beable to make…

Q: In two paragraphs, summarize the pros and cons of in-house development.

Q: ca you Include a screenshot of the simulation’s result (A screenshot of the MARIE Simulator window…

Q: Q2. RSA Alice starts the RSA, with key generation. She selects p-19 but she is confused with the…

Q: What does the INSERT command do in mySQL? Why is it important?

Q: Please help me with this using JavaScript

Q: Write code to print this shape: * ** *** **** *** ** *

Q: Please solve and show all work.

Q: Use the master Theorem to solve the problem

Q: Output the execution time for the quick sort, selection sort, insertion sort, bubble sort, and merge…

Q: Implementation of largest-cover difference #23@title Implementation of largest-cover difference def…

Q: Can you write me a linked list function in COQ and prove that function in the COQ proof language…

Q: answer in matlab do not use ai

Q: Describe the key functions of IPv4 protocol at the network layer?

Q: new years eve bash jave script code

Q: Please provide a test(medium in difficulty) for the following topics computer science(Data Structure…

Q: Having problems where my inserted rows duplicate every time I run my SQL query. How do I delete…

Q: Consider the following assumptions: Size of virtual address: 64 bits Size of physical address: 40…

Q: (1) Let's visit the billboard hot 100 page "https://www.billboard.com/charts/hot-100". Retrieve the…

Q: Performance Equations

Q: Hello, would you be able to assist me with this issue? I'm finding it challenging to solve and would…

Q: Calculate the squared length of these wavelets: ||b1||^2= ||b2||^2 = ||b3||^2=

Q: If we have a mapping to a data source that has not yet been tested which semantics are we likely to…

Q: Which of the following are true? The Serializable interface has no methods.…

Q: Make a Python Program that solves the attached Probability question.

Q: data science question c) suggest strategies to implement multi-class classification using support…

Q: (b) Which type of human-computer interface did you implement in your program in (a)above. Why the…

Q: A standard deck of playing cards consists of 52 cards. Each card has a rank and a suit. There are 13…

Q: What physical address does <4,152> resolve to ? Error 4852 4851 4853

Q: Statements in RDF include two kinds of constants: and A schemas , operators B URLs, tuples C…

Q: For an array of sorted items, which algorithm (quick sort, merge sort, selection sort, insertion…

Q: Q2 The Powerball Lottery 15 Points The Powerball lottery is based on a random drawing of six balls…

Q: An experiment is performed and four events (A, B, C, and D) are defined over the set of all possible…

Q: Suppose that you are an analyst developing a new information system to automate the…

Q: The Java class that we use to allow the user to navigate through folders and select a file is called…

Q: easy python calculator with explanation of how everything works no chatGPT

Q: The DNA sequence of an organism is not just a string of genes. In order to express a gene, it first…

Q: Assume that a customer purchase a new car every 5 years, for a total of 10 cars through her…

Q: Python /jupyter notebooks Build the reaction mechanism and display the reaction on python.. (kindly…

Q: Find solution to this question on memory hierarchies in computer structures (Refer to the…

Q: Draw syntactic trees for the following two sentences and apply the Hobbs tree-search algorithm to it…

Q: Can you Fix this error in the following image Require Import Coq.Lists.List.Import ListNotations.…

Q: Exceptions for which the Java compiler requires us to add exception handling - such as…

Q: A line has a signal-to-noise ratio of 1000 and a bandwidth of 4000 kHz. What is the maximum data…

Q: Please use MATLAB

Q: Implement following three methods that passes sturctural tests: getXStep and getYStep: helper…

Q: For each RAG below. determine whether: We have a Single instance Resource Allocation Graph OR…

Q: A domestic electricity usage will charge with basic rate (kWh) RM0.22. Calculate the bill by giving…

Q: What is the service model of the Internet’s network layer? What guarantees are made by the…

Search. Solve. Succeed!

Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.