preview

The Success Of The Database And Data Warehouse

Decent Essays

The success of the database and data warehouse (DW) project really depends on the quality of data. If data quality is not good enough, the information will logically be unreliable when the business users retrieve it from the database/DW environment. Good quality of data will be useful for the decision maker to make the right decision, gain more trust and make the organization more efficient. In contrast, the bad quality of data will drive the decision maker to make a wrong decision. Debbarma, Nath and Das (2013) stated that “good quality of data will enable DW environment to provide right information in the right place at the right time with the right cost in order to support the right decision”. Thus, data quality needs to be maintained …show more content…

The most prominent issue that always occurs is duplicated records “the records that represent the same real-world object in numerous ways” (Christie, Timothy, 2005). Such duplicates could cause many significant problems. Therefore, data cleaning strategy is essential to ensure these redundancies. However, duplicate elimination is one of challenge tasks because it is caused by many different types of errors such as typographical errors, null values, abbreviations, word transformation and different representations of the same word. To detect and eliminate data duplication, there are many algorithms that have been proposed by researchers and scholars. Those algorithms include standard duplication elimination algorithm (SDE), adaptive duplication detection algorithm (ADD), sorted neighborhood algorithm (SNA), duplicate elimination sorted neighborhood algorithm (DE-SNA), etc. Most of the algorithms use the following techniques in different ways to achieve a duplicate detection and elimination goal: Character-based similarity measure techniques. Phonetic similarity measure techniques Numeric similarity measure techniques Semantic similarity measure techniques To measure the similarity between characters, numbers and semantic words, the above techniques use standard string similarity functions such as edit distance, generalized edit distance, hamming distant, cosine metric, and Jaccard coefficient function, etc. There are many approaches

Get Access