Comparative Analysis of Data Mining Tools
Research Paper
11/16/2015
Dr. Kweku-Muata Osei-Bryson
1. Executive Summary
This research paper is about the Comparative analysis of three data mining software’s selected based on four important criteria Performance, Functionality, Usability and Ancillary Tasks support. “Data Mining is a field of study that is gaining importance and is used to explore data in search of patterns or relationships between variables and is applied to new data used for predictions”. (Statistics – Textbook. (n.d.). Retrieved November 17, 2015). Selection of the appropriate data mining tools is critical to any research or business and this could impact the business in terms of money, resources and time. Data experts
…show more content…
Hence, different tools have to be used in different scenarios. Each tool varies according to the environment and the problem type and its nature. A comprehensive framework has been used to select the best tool and it employs research and findings through numerous questions about each tool. Each tool is then evaluated based on the criteria and assigned a rank and an overall score is calculated thus providing results on the credibility of the tools. Hence, we get to identify the strength and weakness of each tool in this research and finalize the weighted average of these tools. A sample case study too has been shown using the same framework to find the best tool. The author of the research paper believes that this framework could help in identifying and selecting the best tools based on the given criteria. A Likert scale of 1 to 5 has been used in this framework to rank the tools based on their functioning. Hence, selecting the right software helps in better decision making and also helps businesses sustain in terms of its resources.
2. Table of Contents
1. Executive Summary 2
3. Introduction 4
3.1 Objectives of the paper 4
3.2 List of Three Decision tree induction software 5
3.3 Limitations of the paper 6
4. Overview on DT induction 7
5. Evaluation criteria 8
5.1 Set of criteria 8
5.2 Definition for each criterion 9
6. Description of the DT induction software 10
6.1 DT
Well it seems you have a complicated decision on your mind “what tool for the job?” I have decided its best for you to decide on your own which tools you prefer to work with, although I can ease the pain of the decision by providing insight on each of the tools you have at your fingertips.
Based on the comparison Table 1, Table 2 and Table 3, we identified following set of categories on which we would like to evaluate the above tools and computing paradigm in subsequent sub-sections:
It is important to have information gathering techniques so that no information can be overlooked. The information system that we are looking for must meet the requirements of the organization and the employees that will be using the system. The first part of information gathering should consist of identifying information sources. The main sources of information in the company should be employees who use the system and will be using the new one because they can tell you what works and what does not work or basically what’s good about this system so that we can implement it in the new
The concept of reliability was included in the study. In fact, the reliability of each tool was tested and a value was included in the details about the study. The researchers assigned values to the score per individual to make the comparison process less
When a test of a tool is conducted, this method begins by reviewing the tool documentation of the tool to be acquired. If there is no documentation, an analysis of the tool is conducted. This method is reviewed by both the vendor and the testing organization, but this process can be time-consuming.
The 7 tools of quality, “provide a simple, yet strong method for collecting, analyzing and visualizing information from various views” (Schule, 2014). The tools consist of the following items:
Data mining is another concept closely associated with large databases such as clinical data repositories and data warehouses. However data mining like several other IT concepts means different things to different people. Health care application vendors may use the term data mining when referring to the user interface of the data warehouse or data repository. They may refer to the ability to drill down into data as data mining for example. However more precisely used data mining refers to a sophisticated analysis tool that automatically dis covers patterns among data in a data store. Data mining is an advanced form of decision support. Unlike passive query tools the data mining analysis tool does not require the user to pose individual specific questions to the database. Instead this tool is programmed to look for and extract patterns, trends and rules. True data mining is currently used in the business community for market ing and predictive analysis (Stair & Reynolds, 2012). This analytical data mining is however not currently widespread in the health care community.
Decision making refers to the process of finding and selecting options according to the priorities and values of the person making the decision. Since there are many choices involved, it is important to identify as many options as possible so as to pick the option that best fits a company’s target, goals, values and vision. Due to the integral role of decision making in company growth and financial progress, many firms such as Amazon.com and EBay are pumping in huge investments in business intelligence systems, which are made up of certain technological tools and technological applications that are created for the purpose of facilitating improved decision making process in
Question 1: Assume a base cuboid of 10 dimensions contains only three base cells: (1) (a1, b2, c3, d4; ..., d9, d10), (2) (a1, c2, b3, d4, ..., d9, d10), and (3) (b1, c2, b3, d4, ..., d9, d10), where a_i != b_i, b_i != c_i, etc. The measure of the cube is count. 1, How many nonempty cuboids will a full data cube contain? Answer: 210 = 1024 2, How many nonempty aggregate (i.e., non-base) cells will a full cube contain? Answer: There will be 3 ∗ 210 − 6 ∗ 27 − 3 = 2301 nonempty aggregate cells in the full cube. The number of cells overlapping twice is 27 while the number of cells overlapping once is 4 ∗ 27 . So the final calculation is 3 ∗ 210 − 2 ∗ 27 − 1 ∗ 4 ∗ 27 − 3, which yields the result. 3, How many
With the increased and widespread use of technologies, interest in data mining has increased rapidly. Companies are now utilized data mining techniques to exam their database looking for trends, relationships, and outcomes to enhance their overall operations and discover new patterns that may allow them to better serve their customers. Data mining provides numerous benefits to businesses, government, society as well as individual persons. However, like many technologies, there are negative things that caused by data mining such as invasion of privacy right. This paper tries to explore the advantages as well as the disadvantages of data mining. In addition, the ethical and global issues regarding the use of data mining
In our report we will discuss the capabilities of data mining techniques in the context of education and how it is used to evaluate student
Since higher education has blurred the lines with traditional businesses, it is important to have the tools to assist them with valuable data and information, in making decisions. Using of data and having the right data mining tools can insure the institute’s success, in many forms, such as, identifying market trends, precision marketing, new products, performance management, grants and funding management, student life cycle management and procurement to mention a few. To get a better grasp on these benefits it’s important to understand data warehouse, data mining and the associated benefits.
The proliferation, ubiquity and increasing power of computer technology has increased the volume of data oday`s mobile technologies and social media have collection and it`s storage manifold. This led to unleashed an exponential increase in information. continual growth in the size of data sets with Predictive analytics, a business intelligence technology consequent increase in complexity as well. Hands-on is one of the latest to take the future by storm with its data analysis is being increasingly augmented with immense potential for data- mining and efficacy. indirect, automated data processing Predictive analytics can be defined as any solution that techniquesclustered together and known as DATAIJERTsupports the identification of meaningful patterns and MINING.
This Project is a group project of two and it’s a study of business process for a small company. In this study, our goal/objective is to understand how company can have better insights of the business by incorporating data warehouse and data mining into their current software environment. We will also try to explore which tool will be best for the company and how different data mining tools and techniques helps users to improve business process in terms of operational and management.
A database is set up giving the fuzzy score on a scale of 1-10, to each attribute. Key performance indicators are indentified based on cost evaluation factors, technical analytic factors and environmental factors and are stored in the database.