DAT 325 Project One

.docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

325

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

2

Uploaded by JusticeMetalRabbit26 on coursehero.com

DAT 325 Project One Template Data Quality Plan Purpose Statement: Obtaining and maintaining high quality data is vital to making data driven decisions. Without high quality data, the decisions we make and the processes we adopt based on data analysis have the potential to be flawed. Acting on incomplete or flawed data can lead to inefficiencies, loss of revenue, and time which could have been spent working on other projects and initiatives which may have yielded a better result. This is a risk point that should be taken seriously and is within our control. Organizational Goals: Prior to joining the data received from Wayne Enterprises, we must ensure that the data adheres to our data quality requirements. If we do not take the time to ensure high quality data is entering our system from the start, we may be faced with incomplete or inaccurate data down the line which will likely have already affected business decisions made along the way. We must also establish a standard method in which we extract, transform, and load (ETL) the data. This method must be reproducible to ensure we receive the same high-quality data into our system each time. Lastly, we must ensure that those handling the ETL process are aligned on what high data quality is. To lay these goals out more simply we must: 1. Complete an initial data assessment – this will provide insight into the data's state before joining our system. This will allow us as an organization to locate obstacles and address areas of opportunity in source data quality. 2. Create a process for ETL of the source data – having a standardized ETL procedure will ensure that issues with data quality and integrity are addressed each time we load new data. A repeatable process will help us maintain high quality data standards throughout the ETL process and minimize data quality loss. 3. Align the organization with Data Quality Expectations – having each stakeholder involved in the ETL process aligned with data quality expectations will ensure that we continue to follow industry and organizational standards. The job of ensuring data quality does not begin and stop at the analyst level, therefore all parties involved in the data should also be involved in the data quality process. This alignment will be achieved through initial and follow-up training and regular audits on the data at each step in the ETL process. Data Quality Characteristics and Procedures: There are many characteristics by which data quality can be measured, the typical measures to gauge data quality are completeness, timeliness, validity, consistency, and integrity. Completeness refers to the amount of data populated measured against the total possible data entries for a specific category (Gawande, 2022) , essentially checking for missing records. Timeliness measures the time between an actual event occurring versus the time it took to capture that data in the system and make it available for use (Gawande, 2022) . A sufficient lag in capturing data timely can cause downstream processes to suffer due to missing data. Validity measures the closeness of the data value to the predetermined values or calculations (Gawande, 2022) . Having invalid data can cause issues with downstream calculations if the data type is not valid.
Consistency measures how closely your data aligns with another dataset or a reference dataset (Gawande, 2022) . If adding data to an existing data set, the data being loaded should be consistent with previous entries for number of values and data types. Integrity measures the degree to which a defined relational constraint is implemented between two data sets (Gawande, 2022) . Cardinality and Referential integrity should be considered when adding new data to existing data sets. Security and Personnel Responsibility Plan: Although there are many stakeholders involved in data quality, and everyone involved in the data analysis has an expectation to be involved in data quality – there are limitations to the involvement in this process simply based on security standards and requirements. Limitations must be placed on who can access the data based on a specific business need, in order to protect personal or sensitive data and comply with industry regulations. This does pose some significant challenges, for example with respect to data consumers understanding what high-quality data means with respect to the business functions utilizing this data, access to the source data may be limited based on the sensitivity level of the data being analyzed. With the shift in industry to a cloud computing model, the security of our data is more important than ever. References: Gawande, S. (2022, February 22). A Guide for Data Quality (DQ) and 6 Data Quality Dimensions . ICEDQ. https://icedq.com/6-data-quality- dimensions#what_is_integrity_data_quality_dimension
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help