# 2015-02-08-Tree-Based-Methods-Part-II-Cost-Sensitive-Classification

In the previous article (Tree Based Methods Part I), a decision tree is created on the *Carseats* data which is in the chapter 8 lab of ISLR. In that article, potentially asymetric costs due to misclassification are not taken into account. When unbalance between false positive and false negative can have a significant impact, it can be explicitly adjusted either by altering prior (or empirical) probabilities or by adding a loss matrix.

A comprehensive summary of this topic, as illustrated in Berk (2008), is shown below.

```
... when the CART solution is determined solely by the data, the prior distribution is empirically determined, and the costs in the loss matrix of all classification errors are the same. Costs are being assigned even if the data analyst makes no conscious decision about them. Should the balance of false negatives to false positives that results be unsatisfactory, that balance can be changed. Either the costs in the loss matrix can be directly altered, leaving the prior distribution to be empirically determined, or the prior distribution can be altered leaving the default costs untouched. Much of the software currently available makes it easier to change the prior in the binary response case. When there are more than two response categories, it will usually be easier in practice to change the costs in the loss matrix directly.
```

In this article, cost-sensitive classification is implemented, assuming that misclassifying the *High* class is twice as expensive, both by altering the priors and by adjusting the loss matrix.

The following loss matrix is implemented.

The corresponding altered priors can be obtained by

The bold-cased sections of the tutorial of the caret package are covered in this article.

- Visualizations
- Pre-Processing
**Data Splitting**- Miscellaneous Model Functions
**Model Training and Tuning**- Using Custom Models
- Variable Importance
- Feature Selection: RFE, Filters, GA, SA
- Other Functions
- Parallel Processing
- Adaptive Resampling

Letâ€™s get started.

The following packages are used.

*Carseats* data is created as following while the response (*Sales*) is converted into a binary variable.

The train and test data sets are split using `createDataPartition()`

.

5 repeats of 10-fold cross validation is set up.

Rather than tuning the complexity parameter (*cp*) using the built-in `tuneLength`

, a grid is created. At first, it was intended to use this grid together with altered priors in the `expand.grid()`

function of the **caret** package as `rpart()`

has an argument named *parms* to enter altered priors (*prior*) or a loss matrix (*loss*) as a list. Later, however, it was found that the function does not accept an argument if it is not set as a tuning parameter. Therefore *cp* is not tuned when each of *parms* values is modified. (Although it is not considered in this article, the **mlr** package seems to support cost sensitive classification by adding a loss matrix as can be checked here)

The default model is fit below.

The model is refit with the tuned *cp* value.

Confusion matrices are obtained from both the training and test data sets. Here the matrices are transposed to the previous article and this is to keep the same structure as used in Berk (2008) - the source of `getUpdatedCM()`

can be found in this gist.

The **model error** means how successful fitting or prediction is on each class given data and it is shown that the *High* class is more misclassified. The *use error* is to see how useful the model is given fitted or predicted values. It is also found that misclassification of the *High* class becomes worse when the model is applied to the test data.

As mentioned earlier, either althered priors or a loss matrix can be entered into `rpart()`

. They are created below.

Both will deliver the same outcome.

Confusion matrices are obtained again. It is shown that more values are classified as the *High* class. Note that, although the overall misclassification error is increased, it does not reflect costs. In a situation, the cost adjusted CART may be more beneficial.