Abstract
Data mining is the process of extracting hidden information from the large data set. Data mining techniques makes easier to predict hidden patterns from the data. The most popular data mining techniques are classification, clustering, regression, association rules, time series analysis and summarization. Classification is a data mining task, examines the features of a newly presented object and assigning it to one of a predefined set of classes. In this research work data mining classification techniques are applied to disaster data set which helps to categorize the disaster data based on the type of disaster occurred in worldwide for past 10 decade. The experimental comparison has been conducted among Bayes classification algorithms (BayesNet and NaiveBayes) and Rules Classification algorithms (DecisionTable and JRip). The efficiency of these algorithms is measured by using the performance factors; classification accuracy, error rate and execution time. This work is carried out in the WEKA data mining tool. From the experimental result, it is observed that Rules classification algorithm, JRip has produced good classification accuracy compared to Bayes classification algorithms. By comparing the execution time the NaiveBayes classification algorithm required minimum time.
Keywords: Disasters, Classification, BayesNet, NaiveBayes, DecisionTable, JRip.
I Introduction
Data mining is the process of extracting hidden information from the large dataset. Data mining is
Data mining uses computer-based technology to evaluate data in a database and identify different trends. Effective data mining helps researchers predict economic trends and pinpoint sales prospects. Data mining is stored in data warehouses, which are sophisticated customer databases that allow managers to combine data from several different organization functions.
4) Technically speaking, data mining is a process that uses statistical, mathematical, and artificial intelligence techniques to extract and identify useful information and
Data Mining. It is the process of discovering interesting knowledge that are gathered and significant structures from large amounts of data stored in data warehouse or other information storage.
Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few
Data mining is another concept closely associated with large databases such as clinical data repositories and data warehouses. However data mining like several other IT concepts means different things to different people. Health care application vendors may use the term data mining when referring to the user interface of the data warehouse or data repository. They may refer to the ability to drill down into data as data mining for example. However more precisely used data mining refers to a sophisticated analysis tool that automatically dis covers patterns among data in a data store. Data mining is an advanced form of decision support. Unlike passive query tools the data mining analysis tool does not require the user to pose individual specific questions to the database. Instead this tool is programmed to look for and extract patterns, trends and rules. True data mining is currently used in the business community for market ing and predictive analysis (Stair & Reynolds, 2012). This analytical data mining is however not currently widespread in the health care community.
On the night of April 14, 1912 the world was struck with disaster as the RMS Titanic collided with an iceberg and sunk into the sea. Only 31.6% survived the disaster. In order to understand what factors lead to the highest chances of survival, this project uses a Titanic passenger database and decision trees in order to classify survivors into two groups, survived and perished. Then using this information, rank the most important factors regarding the likelihood of survival. From our experiments we found that the most important factor in surviving the Titanic disaster was title and that a decision tree that created around that variable as its highest level decision had a correct classification rate of 79.425%
With the increased and widespread use of technologies, interest in data mining has increased rapidly. Companies are now utilized data mining techniques to exam their database looking for trends, relationships, and outcomes to enhance their overall operations and discover new patterns that may allow them to better serve their customers. Data mining provides numerous benefits to businesses, government, society as well as individual persons. However, like many technologies, there are negative things that caused by data mining such as invasion of privacy right. This paper tries to explore the advantages as well as the disadvantages of data mining. In addition, the ethical and global issues regarding the use of data mining
Data mining is “[t]he process of finding significant, previously unknown, and potentially valuable knowledge hidden in data” (Gordon, 2007). Organizations use data mining to sift through massive quantities of raw data in order to find patterns and relationships that will ultimately be used for business purposes (Definition of: Data mining, 2016). Organizations mainly use data mining to get a better idea of their customer’s purchasing habits, product preferences, etc. in order to create sales tactics targeted at a certain customer demographic (Definition of: Data management, 2016).
Data Mining, a sub-branch of computer science, involving statistics, methods and calculations to find patterns in large amount of data sets, and database systems. Generally, data mining is the process to examine data from different aspects and summarizing it into meaningful information. Data mining techniques depict actions and future trends, allowing any individual to make better and knowledge-driven decisions.[1][2]
Data mining enables users to discover hidden patterns without a predetermined idea or hypothesis about what the pattern may be. The data mining process can be divided into two categories: discovering patterns and associations, and predicting future trends and behaviors using the patterns. The power of data mining is evident as it can bring forward patterns that are not even considered by the user to search for. Hence, you have the answer to a question that was never asked. This is especially helpful when you are dealing with a large database where there may be an infinite number of patterns to identify. Interesting to note is the fact that “the more data in the warehouse, the more patterns there are, and the more data we analyze the fewer patterns we find.” What this means is that when there is richness of data and data patterns, it may be best to data mine different data segments separately, so that the influence of one pattern does not dilute the effect of another pattern in a large database.
Data mining is the procedure of getting new patterns from large amount of data. Data mining is a procedure of finding of beneficial information and patterns from huge data. It is also called as knowledge discovery method, knowledge mining from data, knowledge extraction or data/ pattern analysis. The main goal from data mining is to get patterns that were already unknown. The useful of these patterns are found they can be used to make certain decisions for development of their businesses. Data mining aims to discover implicit, already unknown, and potentially useful information that is embedded in data.
Data Mining is the non-trivial extraction of potentially useful information about data. In other words, Data Mining extracts the knowledge or interesting information from large set of structured data that are from different sources. There are various research domains in data mining specifically text mining, web mining, image mining, sequence mining, process mining, graph mining, etc. Data mining applications are used in a range of areas such as it is used for financial data analysis, retail and telecommunication industries, banking, health care and medicine. In health care, the data mining is mainly used for disease prediction. In data mining, there are several techniques have been developed and used for predicting the diseases that includes data preprocessing, classification, clustering, association rules and sequential patterns. This paper analyses the performance of two classification techniques such as Bayesian and Lazy classifiers for hepatitis dataset. In Bayesian classifier there are two algorithms namely BayesNet and NaiveBayes. In Lazy classifier we have two algorithms namely IBK and KStar. Comparative analysis is done by using the WEKA tool.It is open source software which consists of the collection of machine learning algorithms for data mining tasks.
Data mining is the extraction of knowledge from the various databases that was previously unknown (Musan & Hunyadi, 2010). Data mining consists of using software that conglomerates artificial intelligence, statistical analysis, and systems management in the act of extracting facts and understanding from data stored in data warehouses, data marts, and through metadata (Giudici, 2005). Through algorithms and learning capabilities data mining software can analyze large amounts of data and give the management team intellectual and effective information to help them form their decisions. The intention for data mining is to analyze prevailing data and form new truths and new associations that were unknown prior to the analysis (Musan & Hunyadi,
In its infancy, data mining was as limited as the hardware being used. Large amounts of data were difficult to analyze because the hardware simply could not handle it [1]. The term "data mining" first began appearing in the 1980 's largely within the research and computer science communities. In the 1990 's it was considered a subset of a process called Knowledge Discovery in Databases of KKD [1]. KKD analyzes data in the search for patterns that may not normally be recognized with the naked eye. Today however, data mining does not limit itself to databases,
Data, Data everywhere. It is a precious thing that will last longer than the systems. In this challenging world, there is a high demand to work efficiently without risk of losing any tiny information which might be very important in future. Hence there is need to create large volumes of data which needs to be stored and explored for future analysis. I am always fascinated to know how this large amount of data is handled, stored in databases and manipulated to extract useful information. A raw data is like an unpolished diamond, its value is known only after it is polished. Similarly, the value of data is understood only after a proper meaning is brought out of it, this is known as Data Mining.