New Frontiers in Applied Data Mining
PAKDD 2009 International Workshops, Bangkok, Thailand, April 27-30, 2009. Revised Selected Papers
Chapter
Rattle relies on an extensive collection of free and open source software. Some preliminary steps need to be followed in installing it. The latest installation instructions are maintained at http://rattle.toga...
Chapter
A support vector machine (SVM) searches for so-called support vectors which are observations that are found to lie at the edge of an area in space which presents a boundary between one of these classes of observa...
Chapter
Once a model is developed and evaluated, and we have determined it to be suitable, we then need to deploy it. This is an often-overlooked issue in many data mining projects. It also seems to receive little att...
Chapter
There is more to exploring data than simply generating textual and statistical summaries and graphical plots. As we have begun to see, R has some very signi_cant capabilities for generating graphics that assis...
Chapter
The following sections introduce the datasets that we use throughout the book to demonstrate data mining. R provides quite a collection of datasets. Each of the datasets we introduce here is available through ...
Chapter
Modelling is what we most often think of when we think of data mining. Modelling is the process of taking some data (usually) and building a simplified description of the processes that might have generated it...
Chapter
Data mining is the art and science of intelligent data analysis. The aim is to discover meaningful insights and knowledge from data. Discoveries are often expressed as models, and we often describe data mining...
Chapter
The Boosting meta-algorithm is an eficient, simple, and easy-touse approach to building models. The popular variant called AdaBoost (an abbreviation for adaptive boosting) has been described as the \best off-t...
Chapter
Data is the starting point for all data mining—without it there is nothing to mine. In today's world, there is certainly no shortage of data, but turning that data into information, knowledge, and, eventually,...
Chapter
The preceding chapters presented a number of algorithms for building descriptive and predictive models. Before we can identify the best from amongst the different models, we must evaluate the performance of th...
Chapter
An interesting issue with the delivery of a data mining project is that in reality we spend more of our time working on and with the data than we do building actual models, as we suggested in Chapter 1. In bui...
Chapter
The clustering technique is one of the core tools that is used by the data miner. Clustering gives us the opportunity to group observations in a generally unguided fashion according to how similar they are. Th...
Chapter
Many years ago, a number of new Internet businesses were created to sell books on-line. Over time, they collected information about the books that each of their customers were buying. Using association analysi...
Chapter
New ideas are often most effectively understood and appreciated by actually doing something with them. So it is with data mining. Fundamentally, data mining is about practical application—application of the al...
Chapter
Data can come in many different formats from many different sources.By using R's extensive capabilities, Rattle provides direct access to such data. Indeed, we are fortunate with the R system in that it is an ...
Chapter
Decision trees (also referred to as classification and regression trees) are the traditional building blocks of data mining and the classic machine learning algorithm. Since their development in the 1980s, dec...
Chapter
As a data miner, we need to live and breathe our data. Even before we start building our data mining models, we can gain signi_cant insights through exploring the data. Insights gained can deliver new discover...
Chapter
Building a single decision tree provides a simple model of the world, but it is often too simple or too specific. Over many years of experience in data mining, it has become clear that many models working toge...
Book and Conference Proceedings
PAKDD 2009 International Workshops, Bangkok, Thailand, April 27-30, 2009. Revised Selected Papers
Chapter and Conference Paper
Open source data mining software represents a new trend in data mining research, education and industrial applications, especially in small and medium enterprises (SMEs). With open source software an enterpris...