In the previous article (Tree Based Methods Part I), a decision tree is created on the Carseats data which is in the chapter 8 lab of ISLR. In that article, potentially asymetric costs due to misclassification are not taken into account. When unbalance between false positive and false negative can have a significant impact, it can be explicitly adjusted either by altering prior (or empirical) probabilities or by adding a loss matrix.
A comprehensive summary of this topic, as illustrated in Berk (2008), is shown below.
... when the CART solution is determined solely by the data, the prior distribution is empirically determined, and the costs in the loss matrix of all classification errors are the same. Costs are being assigned even if the data analyst makes no conscious decision about them. Should the balance of false negatives to false positives that results be unsatisfactory, that balance can be changed. Either the costs in the loss matrix can be directly altered, leaving the prior distribution to be empirically determined, or the prior distribution can be altered leaving the default costs untouched. Much of the software currently available makes it easier to change the prior in the binary response case. When there are more than two response categories, it will usually be easier in practice to change the costs in the loss matrix directly.
In this article, cost-sensitive classification is implemented, assuming that misclassifying the High class is twice as expensive, both by altering the priors and by adjusting the loss matrix.
The following loss matrix is implemented.
The corresponding altered priors can be obtained by
The bold-cased sections of the tutorial of the caret package are covered in this article.
Miscellaneous Model Functions
Model Training and Tuning
Using Custom Models
Feature Selection: RFE, Filters, GA, SA
Let’s get started.
The following packages are used.
Carseats data is created as following while the response (Sales) is converted into a binary variable.
The train and test data sets are split using createDataPartition().
5 repeats of 10-fold cross validation is set up.
Rather than tuning the complexity parameter (cp) using the built-in tuneLength, a grid is created. At first, it was intended to use this grid together with altered priors in the expand.grid() function of the caret package as rpart() has an argument named parms to enter altered priors (prior) or a loss matrix (loss) as a list. Later, however, it was found that the function does not accept an argument if it is not set as a tuning parameter. Therefore cp is not tuned when each of parms values is modified. (Although it is not considered in this article, the mlr package seems to support cost sensitive classification by adding a loss matrix as can be checked here)
The default model is fit below.
The model is refit with the tuned cp value.
Confusion matrices are obtained from both the training and test data sets. Here the matrices are transposed to the previous article and this is to keep the same structure as used in Berk (2008) - the source of getUpdatedCM() can be found in this gist.
The model error means how successful fitting or prediction is on each class given data and it is shown that the High class is more misclassified. The use error is to see how useful the model is given fitted or predicted values. It is also found that misclassification of the High class becomes worse when the model is applied to the test data.
As mentioned earlier, either althered priors or a loss matrix can be entered into rpart(). They are created below.
Both will deliver the same outcome.
Confusion matrices are obtained again. It is shown that more values are classified as the High class. Note that, although the overall misclassification error is increased, it does not reflect costs. In a situation, the cost adjusted CART may be more beneficial.