SampleMidTerm

docx

School

California State University, East Bay *

*We aren’t endorsed by this school

Course

620

Subject

Information Systems

Date

Oct 30, 2023

Type

docx

Pages

6

Report

Uploaded by AmbassadorBeaver2154

1 Data Warehousing Name_______________________________ Provide brief answers to the following questions. 1. A dimensional model for an auction process consists of a FactAuction table, DimDate, DimSeller, DimBuyer, DimProduct. What would be the primary key, foreign key(s) and attributes in the FactAuction table? The only measure of interest is the Auction Amount. DimDate DimProduct DateKey (PK) ProductKey (PK) DimSeller SellerKey (PK) FactAuction Auction Key (Primary Key): A unique identifier for each auction. Auction Date: The date and time when the auction took place. Buyer Key (Foreign Key): A reference to the DimBuyer dimension, linking the auction to the buyer. Seller Key (Foreign Key): A reference to the DimSeller dimension, associating the auction with the seller. Product Key (Foreign Key): A reference to the DimProduct dimension, connecting the auction to the product being auctioned. Auction Amount (Measure): The amount at which the auctioned item was sold 2. Mixing multiple granularities in a single fact table is not recommended. Give two reason why. Explain with an example. Mixing different levels of detail in a single data table is not a good idea for two main reasons: 1. **Data Quality Issues:** It can lead to problems with data accuracy and consistency. When you put very detailed data and summary data together, it's easy to make mistakes and have data that doesn't match up correctly. For instance, merging total sales revenue with individual sales transactions within the same table can lead to data inconsistencies. 2. **Slow Queries:** It can slow down how quickly you can get information from the data. If you want specific details, having a mix of detailed and summary data in the same table can make it take longer to get the answers you need. 3.Find two most critical errors and propose a solution to correct the errors for the following Product Dimension Table. Attribute Description ProductNo Business Key and Primary Key ProductName Name of Product ProductCategory Category of Product ProductInventory Current inventory (Quantity on hand) of product StartDate Start date of row effective period EndDate End date of row effective period Two critical errors in the Product Dimension Table and proposed solutions: Error 1: StartDate and EndDate should represent the effective period for each row. However, there's no provision for the current row, which may have a null EndDate, causing confusion. The EndDate value for ProductNo 1002 is earlier than the StartDate value. This is not allowed, as the StartDate value must be less than or equal to the EndDate value. Solution: Introduce a "Current Row" flag (e.g., IsActive or CurrentRow) to indicate the active row without relying on a null EndDate. Ensure only one row is marked as the current row at any given time. Error 2: The ProductNo is both a business key and the primary key. ProductNo column is the primary key of the table and must be unique . A primary key should be a surrogate key, not a business key. Solution: Introduce a surrogate key (e.g., ProductID) as the primary key, and use ProductNo as a unique business key. This helps maintain data integrity and facilitates historical tracking when products change their identifiers. Use the Pharmacy Data Warehouse Dimensional Model and related information provided below to answer Q6 and Q7 DimBuyer BuyerKey (PK)
2 Data Warehousing Name_______________________________ Description of select fields PharmaSalesFact.POSTransactionNo: A unique number generated for each transaction by the point of sales register at the pharmacy. PharmaSalesFact.RegularUnitPrice: The price per unit of product PharmaSalesFact.ExtendedSalesDollar: The netprice of the product multiplied by quantity sold. PharmaSalesFact.ProfitMargin: The profit margin for the specific transaction line item calculated based on calculated based on the ExtendedSalesDollarAmount and ExtendedGrossProfit DoctorDim.DoctorID: A unique sequential integer created for each record inserted into the DoctorDim table DoctorDim.LicenseNo : The medical practice license number of the Physician issued by the Medical Board. The number is unique to each Physician licensed to practice. 3. For each of the attributes below, specify if it is one or more of Additive, Non-Additive, Semi-Additive, Degenerate Dimension, Surrogate Key, Business or Natural Key. Attribute Attribute Type(s) PharmaSalesFact.POSTransactionNo PharmaSalesFact.RegularUnitPrice PharmaSalesFact.ExtendedSalesDollar PharmaSalesFact.ProfitMargin DoctorDim.DoctorID DoctorDim.LicenseNo Solution: PharmaSalesFact.POSTransactionNo: Attribute Type: Surrogate Key (A unique number generated for each transaction) PharmaSalesFact.RegularUnitPrice:
3 Data Warehousing Name_______________________________ Attribute Type: Additive (The price per unit is additive when aggregating across transactions or other dimensions, as summing unit prices makes sense) PharmaSalesFact.ExtendedSalesDollar: Attribute Type: Additive (The extended sales dollars are additive as they represent the total sales amount, so aggregating them across transactions or dimensions is meaningful) PharmaSalesFact.ProfitMargin: Attribute Type: Non-Additive (Profit margin is not directly additive because the calculation involves a ratio of ExtendedSalesDollar to ExtendedGrossProfit. Summing profit margins across transactions or dimensions wouldn't yield a meaningful result) DoctorDim.DoctorID: Attribute Type: Surrogate Key (A unique sequential integer for each doctor) DoctorDim.LicenseNo: Attribute Type: Business Key (The medical practice license number is a business key as it uniquely identifies each licensed physician) Additive: Definition: Additive attributes are those that can be aggregated or summed up across different dimensions or levels. When you add up these attributes, the result makes sense. Example: Sales revenue is additive because you can add the revenue from different products, stores, or time periods to get the total sales revenue. Non-Additive: Definition: Non-additive attributes are not suitable for direct aggregation. Adding them across dimensions may not provide meaningful results. Example: Profit margin is non-additive because aggregating profit margins from different products or time periods wouldn't yield a meaningful overall profit margin. Semi-Additive: Definition: Semi-additive attributes can be aggregated across some dimensions but not others. They may be additive along one dimension but non-additive along another. Example: Inventory Quantity on Hand is semi-additive. It's additive across products and stores, but not additive across time periods. You can sum up quantities for different products and stores, but adding up quantities over time may not be meaningful. Degenerate Dimension: Definition: A degenerate dimension is a dimension that has been "flattened" into the fact table, usually because it has a unique identifier but no additional attributes. It serves as a grouping or identifier for facts without any descriptive attributes. Example: Transaction or Order Number can be a degenerate dimension in a sales fact table, where it uniquely identifies each sale but doesn't have any associated attributes. Surrogate Key: Definition: A surrogate key is a system-generated primary key used to uniquely identify records in a table. It's often an arbitrary number generated by the database management system. Example: An auto-incremented serial number assigned to each row in a table, such as an Employee ID in an employee database. Business Key: Definition: A business key is a natural, meaningful key that uniquely identifies a record based on real- world attributes. Business keys have a real-world significance, such as a customer's email address or a product's SKU. Example: Social Security Numbers (SSN) for individuals, ISBNs for books, or UPC codes for products are business keys because they uniquely identify entities based on their real-world characteristics. These concepts are essential in database design, particularly in data warehousing and dimensional modeling, to ensure data accuracy, query performance , and meaningful analysis. 4. Consider the Pharmacy Data Warehouse Dimensional Model. The company now wants to also track, total number of unique products(drugs) purchased in a transaction and customer wait time for each visit to the pharmacy. The wait time is measured as the difference between the time at which the customer checks in at the pharmacy with their prescription and the check-out time at which all products in the prescription are dispensed. There can be multiple products prescribed within one prescription. Option 1 is adding the number of products and wait time measure to the existing Fact table Option 2 is to create a new fact table Which option would you recommend and why? The decision between Option 1 (adding the number of products and wait time measure to the existing Fact table) and Option 2 (creating a new fact table) depends on several factors, including the scope and requirements of the data warehouse and the specific use cases. Option 1 (Adding to Existing Fact Table): Pros: Simplicity: Adding the new measures to the existing Fact table is a straightforward and simple solution, especially if the existing table structure can accommodate these measures. Efficiency: It can be more efficient in terms of query performance when you need to analyze these new measures in conjunction with the existing pharmacy data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 Data Warehousing Name_______________________________ Data Integration: It keeps related measures within the same fact table, making it easier to maintain data consistency and integrity. Cons: Data Model Complexity: If the new measures significantly increase the complexity of the existing fact table, it might become harder to manage and query, especially if they don't naturally fit with the existing measures. Data Granularity: Adding new measures may impact data granularity, and it could lead to a less detailed or less granular fact table. Option 2 (Creating a New Fact Table): Pros: Isolation: Creating a new fact table specifically for the number of products and customer wait time isolates these measures and keeps the original fact table structure intact. This can help maintain a cleaner and more focused data model. Data Granularity: You can design the new fact table to handle the new measures with the desired granularity without affecting the existing data. Scalability: If these new measures will significantly increase in volume or complexity, a dedicated fact table can provide scalability without affecting the existing model. Cons: Query Complexity: Querying and analyzing data involving the new measures may require more complex joins and queries when data from both fact tables is needed. Data Integration: It might require additional effort to ensure data consistency and integrity between the existing fact table and the new fact table. Recommendation: The choice between Option 1 and Option 2 should be based on the specific requirements of the data warehouse and the expected impact of these new measures on the existing data model. If the new measures align well with the existing data and don't significantly complicate the model, Option 1 is a simple and efficient choice. However, if the new measures have a different granularity or significantly change the data structure, Option 2 provides isolation and scalability but may require more complex querying. Ultimately, the decision should be made based on the trade-offs between simplicity and data model complexity. A real estate brokerage agency helps individuals buy and sell homes. A customer approaches the agency to either buy or sell a home and an agent works with the customer to help with the process. A home sale transaction includes information on home buyer, home seller, buyer’s agent, sellers agent, home property, list date, sold date, list price, sold price, buyer agent commission, seller agent commission. A real estate agent can acts as a buyer’s agent in some transactions and as a sellers agent in others. The agency is also interested in selling quicker (reducing days between list date and sold date). 5. Assuming a grain of one row per home sale transaction, identify all the relevant dimensions. a. Home Buyer Dimension: b. Attributes: BuyerID (Primary Key), Name, Contact Information, Preferences, Creditworthiness, Buying History, and more. c. Purpose: To capture detailed information about individuals or entities purchasing homes through the agency. 6. Home Seller Dimension: a. Attributes: SellerID (Primary Key), Name, Contact Information, Property Features, Selling History, Motivations, and more. b. Purpose: To store comprehensive details about individuals or entities selling their homes through the agency. 7. Real Estate Agent Dimension: a. Attributes: AgentID (Primary Key), Name, Contact Information, License Details, Experience, Affiliations, and more. b. Purpose: To maintain information about real estate agents who work with the agency, whether they act as buyer's agents or seller's agents. 8. Home Property Dimension: a. Attributes: PropertyID (Primary Key), Address, Property Type, Size, Features, Previous Sales History, and more. b. Purpose: To provide a comprehensive overview of the properties being bought or sold, allowing for property valuation and market analysis. 9. Time Dimension: a. Attributes: DateID (Primary Key), Date, Month, Year, Season, and more. b. Purpose: To capture time-related information, including listing dates and sale dates, which is crucial for tracking the time between listing and selling properties. These dimensions collectively create a robust data model that helps the real estate brokerage agency track, analyze, and optimize its operations, including reducing the time between listing and selling properties. The dimensions allow for a deep understanding of buyer and seller behavior, agent performance, property characteristics, and temporal trends in the real estate market.
5 Data Warehousing Name_______________________________ Assuming a grain of one row per home sale transaction, identify all the relevant measures that can be included in the fact table. List Price: The price at which the property was initially listed for sale. Sold Price: The actual price at which the property was sold. Buyer Agent Commission: The commission earned by the buyer's agent for facilitating the purchase. Seller Agent Commission: The commission earned by the seller's agent for facilitating the sale. Days on Market: The number of days the property was listed on the market before it was sold. Total Unique Products Purchased: The total count of unique products or drugs purchased within the transaction. Customer Wait Time: The time elapsed between the customer's check-in with their prescription and the check-out time when all products in the prescription are dispensed. These measures provide valuable insights into the real estate transactions, financial aspects, and efficiency of the agency's operations. The addition of "Total Unique Products Purchased" and "Customer Wait Time" measures can help the agency understand and improve customer service and operational efficiency, as mentioned in the previous scenario. 10. A 4 bedroom home that was sold in 2014 is back on the market for sale again. However, a new bedroom has been added to the home since the last sale and it now has 5 bedrooms. The data warehouse includes information about the home and corresponding sales transaction in 2014. Specify whether bedroom information is best included in a fact table or a dimension table. Also describe how the data warehouse should be updated to reflect that the specific home now has 5 bedrooms. In this scenario, the information about the change in the number of bedrooms for the home should be best handled through a dimension table. Here's how it can be managed: 1. Dimension Table: Create a "Property Dimension" or "Home Dimension" that contains attributes related to the property, including the number of bedrooms. Include a unique property identifier (e.g., PropertyID) as the primary key for this dimension. 2. Fact Table: Continue to maintain a "Sales Transaction Fact Table" where each row represents a home sale transaction. In this fact table, include a foreign key reference to the "Property Dimension" based on the unique property identifier (PropertyID). 3. Update Process: When the specific home is back on the market for sale with an additional bedroom, update the "Property Dimension" to reflect the change. Change the number of bedrooms for this property from 4 to 5 within the "Property Dimension." By handling the bedroom information through a dimension table, you maintain data integrity and history. This approach allows you to track changes over time without affecting the fact table, which continues to reference the property through the consistent property identifier. The fact table remains focused on sales transactions, while the dimension table captures property attributes that can change over time. The reason for including the information about the number of bedrooms in a dimension table, rather than a fact table, is to maintain data integrity, history, and consistency while efficiently handling changes to property attributes. Here's why a dimension table is the preferred choice: Data Integrity: The dimension table serves as a repository for static and slowly changing attributes related to properties, such as the number of bedrooms. By keeping this information in a dimension, you ensure data integrity because it remains consistent across multiple transactions and time periods. Historical Tracking: Storing property attributes in a dimension table allows for historical tracking. In the scenario where a 4-bedroom home becomes a 5-bedroom home, the dimension can be updated to reflect this change while preserving the historical data. This is essential for auditing and tracking property changes over time. Consistency in Reporting: By using a dimension table, you ensure that the attribute (number of bedrooms) is consistent across all sales transactions associated with that property. This consistency is crucial for accurate reporting and analysis. Efficiency: It is more efficient to update a dimension table with property attributes that change slowly, such as the number of bedrooms, rather than modifying each related fact table row in multiple sales transactions. This approach reduces data maintenance efforts and potential errors.
6 Data Warehousing Name_______________________________ Isolation of Property Attributes: A dimension table isolates property-specific attributes, making it easier to manage changes without impacting the fact table. It separates the structural and descriptive attributes of properties from the transactional data in the fact table. Overall, using a dimension table for property attributes ensures that the data remains accurate, complete, and easy to manage, especially when properties undergo changes like the addition of a bedroom. It aligns with best practices in dimensional modeling for data warehousing, promoting data consistency and integrity while accommodating changes over time.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: MPG Data 32.8 35.9 37.8 38.5 40.1 42.1 34.2 36.4 38.0 38.7 40.7 42.4 34.6 37.4 38.2 39.6 41.4…
Q: 3, 4, 5, 6, 7 How many subsets of A are there? That is, find |P(A)|.[ Preview How many subsets of A…
Q: 4.4 Prepare a direct labor budget for each of the upcoming five years. Direct Labor Budget To…
Q: Using the first and second derivative conditions, find the relative maximum(s) and minimum(s) of the…
Q: f(x)={x^2−9x≤c, 2x−10 x>c } i
Q: Consider functions f: {1,2,3,4)→ {a,b,c). a. How many functions are there? b. How many functions are…
Q: Practicing Step 3. Given the following population param each of the following:
Q: What is the chi square test and give an example of it?
Q: #INSTRUCTIONS #Type the codes for each question #Include answers to ALL questions in the script as a…
Q: Find the Car D={a, b, c, d, ..., z), n(D)=? U=universal set n(U')=? B={p people in the USA | p is a…
Q: A standard deck of 52 cards consists of 4 suites (hearts, diamonds, spades and clubs) each…
Q: cleared
Q: An installation technician for a specialized communication system is dispatched to a city only when…
Q: Briefly describe the three forms clay can be found in with supporting examples.
Q: Construct a truth tab 10) (pg
Q: For a 1040 steel solid square bar, determine the endurance limit or Se based on the following…
Q: Wilson Company earned $4,000 of cash sales. Sales tax is 6%. Which of the following shows how this…
Q: 4 If x+y=3z=0 x - 3y +z = -7, then y - 3z = 0 x - y O x = 5-3z, y = 3 + 4z, and z = t. O x = 5, y =…
Q: Past experience has indicated that the breaking strength of yarn used in manufacturing drapery…
Q: An equation of a circle is x squared - 8x + 16 + y squared + 10y + 25 =81 a. What is the radius of…
Q: For the given position vectors r(t)r(t) compute the unit tangent vector T(t)T(t) for the given value…
Q: With the rise of big data, increased attention is being paid to legal and ethical issues. INFORMS…