Chapter 1
INTRODUCTION
1.1 Background
It is a reputable fact that we are in an information technology motivated society, where knowledge is a priceless asset to any individual, organization or government. Companies are provided with massive amount of information in daily basis, and there is the desire for them to concentrate on improving these data so as to get the most essential and useful information in their data warehouses. The urge for a technology to help solve this task for information has been on the study and development front for quite a few years now. Data in the real world is dirty such as incomplete lacking of attributes value, lacking certain attributes of interest, or containing only aggregate data also dirty data is also called noisy data that containing errors and outliers and the data also is inconsistent that containing discrepancy in codes or names.
Data mining is a new technology which could be used in extracting valuable information from data warehouses and databases of companies and governments. It involves the extraction of hidden information from some raw data. It helps in detecting inconsistency in data and predicting future patterns and attitude in a highly proficient way. Data mining is implemented using various algorithm and framework, and the automated analysis provided by this algorithm and framework go ahead of evaluation in dataset to providing solid evidences that human experts would not have been able to detect due to the fact that they
Before a data set can be mined, it first has to be ?cleaned?. This cleaning process removes errors, ensures consistency and takes missing values into account. Next, computer algorithms are used to ?mine? the clean data looking for unusual patterns. Finally, the patterns are interpreted to produce new knowledge.3
Today with the ever growing use of computers in the world, information is constantly moving from one place to another. What is this information, who is it about, and who is using it will be discussed in the following paper. The collecting, interpreting, and determination of use of this information has come to be known as data mining. This term known as data mining has been around only for a short time but the actual collection of data has been happening for centuries. The following paragraph will give a brief description of this history of data collection.
Abstract - In the Data mining process, we can identify the patterns in the data that is hard to find using normal analysis. Several Mathematical and statistical algorithms are used in this approach to determine the probability of the event or scenario. The main aim of this process in terms of technical representation is to find the correlation amongst the attributes. There is a huge amount of discovery being carried out in this field creating a huge scope and jobs in this area. Several data mining algorithms are present that could determine different features present in the data that could lead in prediction and future analysis. Main Study report would consist of these algorithms that could help us predict and some sample data that we
The hunger for analyzing data to improve delivered needs and to better meet quality measures is spurring a revolution in all the industries like Healthcare, Manufacturing Industry, Insurance Domain, etc. Considering any Industry, the providers are demanding better respective IT systems that allow Information Management and data analytics professionals to filter through large amounts of data and turn it into "information" that can change the business and function of the industry.
Lot of researches and studies are being conducted in analyzing this data and “Data Analytics” has become a subject by itself. Data mining processes great potential, as the data has predictive power; the
Since higher education has blurred the lines with traditional businesses, it is important to have the tools to assist them with valuable data and information, in making decisions. Using of data and having the right data mining tools can insure the institute’s success, in many forms, such as, identifying market trends, precision marketing, new products, performance management, grants and funding management, student life cycle management and procurement to mention a few. To get a better grasp on these benefits it’s important to understand data warehouse, data mining and the associated benefits.
Data mining has become astronomically paramount for most of the business domains among them few of them are listed like marketing, financing and telecommunication. This has become possible because of the development of data base technology and systems in recent past few years. Data mining strategies is utilized for data processing. Operations performed on the data such as accumulation, utilize or administration is called data processing. A few real life demonstrations that can further demonstrate with the help of an example, a shop keeper requesting that client to fill in a counter slip for data process and to maintain the record for future [5]. Affiliation rule mining is a data mining strategy that can undoubtedly finds the patterns or association in astronomically quantity of facts units. So as for statistics to be valuable, it should have the following characteristics: Precise, Consummate, Malleable, Dependable, Pertinent, Simple, Timely Retrievable, and Verifiable.
Data Mining technique is the result of a long process of studies and research in the area of databases and product development. This evolution began when business data and companies was stored for the first time on computer device, with continuous
Data Mining is the non-trivial extraction of potentially useful information about data. In other words, Data Mining extracts the knowledge or interesting information from large set of structured data that are from different sources. There are various research domains in data mining specifically text mining, web mining, image mining, sequence mining, process mining, graph mining, etc. Data mining applications are used in a range of areas such as it is used for financial data analysis, retail and telecommunication industries, banking, health care and medicine. In health care, the data mining is mainly used for disease prediction. In data mining, there are several techniques have been developed and used for predicting the diseases
Data mining prediction model works on the process of identifying the patterns based on the historical information to predict the new incoming data sets. This prediction modelling is much useful in the case of decision making process in the business models. On the other way, Descriptive model describes the data in an efficient way by means of grouping the data by using clustering; association rules principles of data mining.
Data Mining technique is the result of a long process of studies and research in the area of databases and product development. This evolution began when business data and companies was stored for the first time on computer device, with continuous improvements in access to data and more newly, produced technologies that allow users to navigate during their data in real time. Data mining is a approach that help to mine important data from a large database. It is the technique of classification during huge amounts of data and chosen out relevant information during the use of certain advanced algorithms. Like more data is collected, with the amount of data doubling every one years, data mining is becoming an more and more important tool to convert this data into information. Data mining takes this evolutionary process behind retrospective data access and navigation to prospective and proactive information delivery. Data mining is very useful and ready in applications in the business
Data mining is the process of extracting knowledge from large data sets. It uses artificial intelligence methods to discover the hidden relationships among the huge amount of data that is collected. It has a great potential to improve applications in many fields like Healthcare systems, Customer relationship management, Financial banking, Research analysis, Bio informatics, Marketing analysis, Education, Manufacturing engineering, Criminology and many more. Criminology is the study of crimes and typically a criminologist’s job include analyzing data to determine why the crime was committed and more importantly to predict and prevent criminal behavior in the future. It became an interesting field to apply data mining techniques because of its large datasets and the complexity of relationships between the data. This paper will discuss some of the tools and techniques used in this field to find out important information that will help and support the police forces and reduce social nuisance.
In business, data warehouse plays an important role that combine business activities and it consider the basement that support in taking the decision. Any kind of error in data can cause drawbacks and difficulties for business and that leads to getting negative results. Errors usually have reason stands behind, some errors occur during data collecting from different sources while others occur during transferring. So, one of the big challenges that face data warehouse is to ensure that data quality remains high. The process which use to introduce or process data with high quality called data cleaning. Data cleaning consider new in research area, and it highly coast specially for massive data, modern computers allowing us to perform data
- With the increment in IT (Information Technology) the size of the databases generated by the organizations due to the availability of low-cost store and the development in the data pick technologies is also increasing , Data mining (DM) also called KDD (Knowledge discovery in databases) helps to identifying priceless information in such large databases. This valuable information can help the decision producer to make exact futurity decisions, In Figure 1 describe data mining techniques
There are growing researches in data mining as a part of education. This new developing field, called Educational Data Mining, concerns with creating techniques that find information from data originate from educational situations. The data can be collected structure verifiable and operational data dwell in the databases of educational establishments. The understudy data can be close to home or scholastic. Additionally it can be gathered from e-learning frameworks which have a vast measure of data utilized by mostly organizations. Educational data mining utilized numerous strategies, for example, decision trees, neural systems, k-nearest Neighbor, Naive Bayes, help vector machines and numerous