Introduction
The past thirty years have seen increasingly rapid advances in the field of Database. Moreover the amount of data being stored in electronic format has been increased dramatically. This increased gives rise to increase accumulation of data at a very quick rate. In addition, the volume of information in the world has been projected to doubles every two years. For example, the health care database system or financial database system is worth instances for the types of data that are being collected and increased dramatically. In fact we are living in a world where vast amounts of data are collected daily and we cannot stop our live to interact with data because we are actually living in an age of the data. There are Terabytes or
…show more content…
These necessities have prompted the conception of Data Mining that has been changing the live from the data age toward the coming information age. A considerable amount of literature has been published on Data Mining and the aim of this survey is concerned with the ideas behind the processes; purpose and techniques of Data Mining. [1][2]
1. What is Data mining In every day live, the word ‘Mining’ refer to the process that discovered a small set of valuable pieces from a great deal of raw material as in mining process of gold from rocks or sand. According to [3] Data Mining, or Knowledge Discovery in Databases (KDD) as it is also known, is the process of extraction of implicit information that previously unknown and potentially useful from database. By using a number of different technical, such as clustering, data summarization, learning classification, finding dependency networks, analyzing changes, and detecting anomalies. Data Mining refers to a variety of techniques that can be used to analyses and observes database in order to find relationships or summarize the data in ways that can be put to use in different areas such as decision making, prediction and estimation and to do that there are a sequence of the process [2] . As show in figure (1.1)
(1) A petabyte is a unit of measurement of amount of data storage in computer and it equal to a thousand terabytes, or 1 million gigabytes
1. Data cleaning: that is the process where noise
Data mining uses computer-based technology to evaluate data in a database and identify different trends. Effective data mining helps researchers predict economic trends and pinpoint sales prospects. Data mining is stored in data warehouses, which are sophisticated customer databases that allow managers to combine data from several different organization functions.
Data Mining. It is the process of discovering interesting knowledge that are gathered and significant structures from large amounts of data stored in data warehouse or other information storage.
Data Mining is an analytical process that primarily involves searching through vast amounts of data to spot useful, but initially undiscovered, patterns. The data mining process typically involves three major stepsexploration, model building and validation and finally, deployment.
Data mining is another concept closely associated with large databases such as clinical data repositories and data warehouses. However data mining like several other IT concepts means different things to different people. Health care application vendors may use the term data mining when referring to the user interface of the data warehouse or data repository. They may refer to the ability to drill down into data as data mining for example. However more precisely used data mining refers to a sophisticated analysis tool that automatically dis covers patterns among data in a data store. Data mining is an advanced form of decision support. Unlike passive query tools the data mining analysis tool does not require the user to pose individual specific questions to the database. Instead this tool is programmed to look for and extract patterns, trends and rules. True data mining is currently used in the business community for market ing and predictive analysis (Stair & Reynolds, 2012). This analytical data mining is however not currently widespread in the health care community.
What is data mining? Data mining is the deriving new information from massive amounts of data in databases (Sauter, 2014, p. 148). Chowdhurry argues that data mining is part of KDD. KDD is knowledge discovery in databases, it is a process that includes data mining. In addition to data mining, KDD includes data preparation, modeling and evaluation of KDD. KDD is at the heart of this research field. This research field is multidisciplinary and includes data visualization, machine learning, database technology, expert systems and statistics. Overall, the use of a case based reasoning and data mining tools within an information system would create a CBR system to solve new problems with adapted solutions and could be used in many industries such as education and healthcare (Chowdhurry,
Data mining is really just the next step in the process of analyzing data. Instead of getting queries on standard or user-specified relationships, data mining goes a step farther by finding meaningful relationships in data. Relationships that were thought to have not existed, or ones that give a more insightful view of the
by determining their similarity, helping patterns to emerge. The supervised learning is used to classify
DATA MINING: means searching and analyzing large masses of data to discover patterns and develop new information.
With the increased and widespread use of technologies, interest in data mining has increased rapidly. Companies are now utilized data mining techniques to exam their database looking for trends, relationships, and outcomes to enhance their overall operations and discover new patterns that may allow them to better serve their customers. Data mining provides numerous benefits to businesses, government, society as well as individual persons. However, like many technologies, there are negative things that caused by data mining such as invasion of privacy right. This paper tries to explore the advantages as well as the disadvantages of data mining. In addition, the ethical and global issues regarding the use of data mining
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
Due to the increase in new technology, business, communication, device, big scale of data was produced. About 90% data in today’s world was just created in last two years alone, without counting those data that has been created previously. The information retained in those data was a big risk to many organizations as the current technology was managing the data with traditional approach, which consisted of user, a centralized system and relational data base. This style had various drawbacks together along with two key problems: less storage capacity and slow data processing.
Data mining, an interdisciplinary subfield of computer science, is the computational methodology of finding examples in substantial data sets including routines at the crossing point of manmade brainpower, machine learning,
Today with the ever growing use of computers in the world, information is constantly moving from one place to another. What is this information, who is it about, and who is using it will be discussed in the following paper. The collecting, interpreting, and determination of use of this information has come to be known as data mining. This term known as data mining has been around only for a short time but the actual collection of data has been happening for centuries. The following paragraph will give a brief description of this history of data collection.
Background - One of the most promising developments in the field of computing and computer memory over the past few decades has been the ability to bring tremendous complex and large data sets into database management that are both affordable and workable for many organizations. Improvement in computer power has also allowed for the field of artificial intelligence to evolve which also improves the sifting of massive amounts of information for appropriate use in business, military, governmental, and academic venues. Essentially, data mining is taking as much information as possible for a variety of databases, sifting it intelligently and coming up with usable information that will help with data prediction, customer service, what if scenarios, and extrapolating trends for population groups (Ye, 2003; Therling, 2009).
However, after extracting the information from a large database, the data are analyzed and summarized into useful information. This process of analyzing and summarizing the extracted data is known as Data Mining (Maimom & Rokach, 2007). In fact, data mining is one of the important steps of KDD process that infer algorithms, explore data, develop model, and discover previous patterns (Maimom & Rokach). Hence, due to the accessibility and abundance of data, knowledge discovery and data mining have become considerably important in the healthcare industry (Maimom & Rokach).