Analyzing The Data Of Data Storage

Better Essays

INTRODUCTION Duplicate data is defined as the existence of data in several records which is also known as redundancy. The definition has different interpretations. Data warehouse contains voluminous data for mining and analyzing it for better decision making process. In any data warehouse, data comes from n number of sources and hence the result is increase in data and the duplication of data. In order to clean the data, data preprocessing is done which includes data cleaning, data integration, reduction etc. which attempt to clean the data and make the process of decision making much easier. DUPLICATE DATA DETECTION There are several ways to detect duplicate data. Two of them that were mentioned in the papers are: 1. Pre-duplicate record detection phase: Here, data is standardized. Data that is repeated in the fields is converted to a specific format. This removes all the duplicate data that is present in the warehouse. All the duplicate entries are erroneously designated to a non-duplicate value. This is an inexpensive stage for identifying duplicate entries which are later used for comparison. 2. Detection using factors: The pre- duplicate record elimination stage is useful for removing data but it helps in retaining only one copy of the duplicate data and removing the rest. For this purpose, a threshold value is calculated for all the records and a similarity. Threshold value is calculated for elimination purpose. All the possible pairs are selected from the clusters

Get Access

Analyzing The Data Of Data Storage

Nt1310 Unit 3 Record Analysis

Nt1310 Unit 3 Record Analysis

Nt1330 Unit 1 Timestamp Paper

Nt1330 Unit 1 Timestamp Paper

Unit 7 P1

Unit 7 P1

Btec Business Level 3 Unit 2 M2

Btec Business Level 3 Unit 2 M2

Health Level 7 Paper

Health Level 7 Paper

INFS1602

INFS1602

Qnt351 Week 3 Team Assignment Essay

Qnt351 Week 3 Team Assignment Essay

Collecting HR Data

Collecting HR Data

Unit 5 Database Discussion Assignment

Unit 5 Database Discussion Assignment

Swot Analysis : Coles Supermarkets

Swot Analysis : Coles Supermarkets

The Healthcare Information And Management Systems Society Essay

The Healthcare Information And Management Systems Society Essay

Data Mining Essay

Data Mining Essay

Payment Of The Vendor Master Data Into Acl

Payment Of The Vendor Master Data Into Acl

Data Warehousing And Data Mining Essay

Data Warehousing And Data Mining Essay

Developing Statistical Algorithm For Linking Dynamic Data With Temporal Information

Developing Statistical Algorithm For Linking Dynamic Data With Temporal Information

Related Topics