Assignment 1

Better Essays

User’s perception of a dataset helps to determine the quality of the dataset and its reflects user needs. It is a good preparation to see how well a dataset is recommended by other users with regards to the quality of data. To characterize this, user metric has been categories into six criteria. 1. Downloads: Data scientists would prefer to download datasets that have higher downloads numbers assuming those datasets are higher quality in accuracy. For example, GeneCards web source have 147820 number of downloads, which confirms a higher level of quality of trust for its datasets. 2. Feed-back: Feed-back from other users will give a general judgement and satisfaction about the datasets. For example, GeneCards data sources have some …show more content…

Completeness is based on the Wand and Wang (1996), because it is unique in the quality literature for its theoretical approach to the definition of quality criteria. Their scope of the study is limited to the objective view of quality based on the stored data’s reliability to the external world. However, this serves as a basis for deriving Completeness for machine learning criteria in this thesis. 1. Completeness: Good representation of real-world by a data source requires that the data is complete. For example, size of tumor cell attribute has no empty fields. Completeness can be derived into two sub-criteria of data quality. • Missing values: It is a common technique in machine learning process to replace the missing values with the mean value of that attribute or remove the missing values depends on the proportion of the missing value to the total number of records. This is not appropriate when there is a significant percentage of missing values which could lead to biased results. • NULL Values: NULL value described by Redman (1997), not applicable and none or applicable but unknown or applicability unknown. Nulls in the datasets could potentially ambiguous unless their meaning is clearly defined. 2. Correctness: Describes how meaningful and unambiguous the given data. Correctness further classified into two data quality sub-criteria. • Cardinality:

Get Access

Assignment 1

Acct 550 Midterm

Acct 550 Midterm

Right To Work In Australia Essay

Right To Work In Australia Essay

Immigration Status Essay

Immigration Status Essay

M1 Business Essay

M1 Business Essay

Puffer Fish Data Model

Puffer Fish Data Model

Senior Management Process

Senior Management Process

Annotated Bibliography On Creating Data Governance

Annotated Bibliography On Creating Data Governance

Creating A Sound Integrated System

Creating A Sound Integrated System

Information Technology : Current State Analytics Technology

Information Technology : Current State Analytics Technology

How A Regulator Observe Data Integrity Of Pharmaceutical Industry Essay

How A Regulator Observe Data Integrity Of Pharmaceutical Industry Essay

Energy Star Score

Energy Star Score

Ethical Issues : Information Issues

Ethical Issues : Information Issues

Assess the Use of Accounting Information in Decision Making in the Aviation Industry

Assess the Use of Accounting Information in Decision Making in the Aviation Industry

Modifications Needed For The Collected Information

Modifications Needed For The Collected Information

Design of Information Quality

Design of Information Quality

Related Topics