# 2015-02-14-Tree-Based-Methods-Part-III-Regression

While classificaton tasks are implemented in the last two articles (Link 1 and Link 2), a regression task is the topic of this article. While the **caret** package selects the tuning parameter (*cp*) that minimizes the error (*RMSE*), the **rpart** packages recommends the *1-SE rule*, which selects the smallest tree within 1 standard error of the minimum cross validation error (*xerror*). The models with 2 complexity parameters that are suggested by the packages are compared.

The bold-cased sections of the tutorial of the caret package are covered in this article.

- Visualizations
- Pre-Processing
**Data Splitting**- Miscellaneous Model Functions
**Model Training and Tuning**- Using Custom Models
- Variable Importance
- Feature Selection: RFE, Filters, GA, SA
- Other Functions
- Parallel Processing
- Adaptive Resampling

Let’s get started.

The following packages are used.

The *Carseats* data from the **ISLR** package is used - a dummy variable (*High*) is created from *Sales* for classification. Note that the order of the labels is changed from the previous articles for comparison with the regression model.

The data is split as following.

Having 5 times of 10-fold cross-validation set above, both the CART is fit as both classification and regression tasks.

Note that the package developer informs that, in spite of the warning messages, the function fits the training data without a problem so that the outcome can be relied upon (Link). Note that the criterion is selecting the *cp* that has the lowest *RMSE*.

For comparison, the data is fit using `rpart()`

where the *cp* value is set to 0. This cp value results in an unpruned tree and *cp* can be selected by the *1-SE rule*, which selects the smallest tree within 1 standard error of the minimum cross validation error (*xerror*), which is recommended by the package. Note that a custom function (`bestParam()`

) is used to search the best parameter by the *1-SE rule* and the source can be found here.

If it is necessary to apply the *1-SE rule* on the result by the **caret** package, the `bestParam()`

can be used by setting *isDesc* to be *FALSE*. The result is shown below as reference.

A graphical display of the best *cp* is shown below.

The best *cp* values for each of the models are shown below

The training data is refit with the best *cp* values.

Initially it was planned to compare the regression model to the classification model. Specifically, as the response is converted as a binary variable and the break is at the value of *8.0*, it is possible to create a regression version of confusion matrix by splitting the data at the equivalent percentile, which is about *0.59* in this data. Then the outcomes can be compared. However it turns out that they cannot be compared directly as the regression outcome is too good as shown below. Note `updateCM()`

and `regCM()`

are custom functions and their sources can be found here.

As it is not easy to compare the classification and regression models directly, only the 2 regression models are compared from now on. At first, the regression version of confusion matrices are compared by every 20th percentile followed by the residual mean sqaured error (*RMSE*) values.

Fitted: 20%- | Fitted: 40%- | Fitted: 60%- | Fitted: 80%- | Fitted: 80%+ | Model Error | |
---|---|---|---|---|---|---|

actual: 20%- | 65.00 | 0 | 0.0 | 0.00 | 0 | 0.00 |

actual: 40%- | 1.00 | 49 | 14.0 | 0.00 | 0 | 0.23 |

actual: 60%- | 0.00 | 0 | 56.0 | 8.00 | 0 | 0.12 |

actual: 80%- | 0.00 | 0 | 0.0 | 64.00 | 0 | 0.00 |

actual: 80%+ | 0.00 | 0 | 0.0 | 16.00 | 48 | 0.25 |

Use Error | 0.02 | 0 | 0.2 | 0.27 | 0 | 0.12 |

Fitted: 20%- | Fitted: 40%- | Fitted: 60%- | Fitted: 80%- | Fitted: 80%+ | Model Error | |
---|---|---|---|---|---|---|

actual: 20%- | 59 | 6.00 | 0.00 | 0.00 | 0 | 0.09 |

actual: 40%- | 0 | 50.00 | 14.00 | 0.00 | 0 | 0.22 |

actual: 60%- | 0 | 0.00 | 64.00 | 0.00 | 0 | 0.00 |

actual: 80%- | 0 | 0.00 | 24.00 | 40.00 | 0 | 0.38 |

actual: 80%+ | 0 | 0.00 | 0.00 | 33.00 | 31 | 0.52 |

Use Error | 0 | 0.11 | 0.37 | 0.45 | 0 | 0.24 |

It turns out that the model by the *caret* package produces a better fit and it can also be checked by *RMSE* values. This is understandable as the model by the *caret* package takes the *cp* that minimizes *RMSE* while the one by the *rpart* package accepts some more error in favor of a smaller tree by the `1-SE rule`

. Besides their different resampling strateges can be a source that makes it difficult to compare the outcomes directly.

As their performance on the test data is more important, they are compared on it.

Pred: 20%- | Pred: 40%- | Pred: 60%- | Pred: 80%- | Pred: 80%+ | Model Error | |
---|---|---|---|---|---|---|

actual: 20%- | 11 | 5.00 | 0.00 | 0 | 0.00 | 0.31 |

actual: 40%- | 0 | 15.00 | 1.00 | 0 | 0.00 | 0.06 |

actual: 60%- | 0 | 0.00 | 15.00 | 0 | 0.00 | 0.00 |

actual: 80%- | 0 | 0.00 | 0.00 | 15 | 1.00 | 0.06 |

actual: 80%+ | 0 | 0.00 | 0.00 | 0 | 16.00 | 0.00 |

Use Error | 0 | 0.25 | 0.06 | 0 | 0.06 | 0.09 |

Pred: 20%- | Pred: 40%- | Pred: 60%- | Pred: 80%- | Pred: 80%+ | Model Error | |
---|---|---|---|---|---|---|

actual: 20%- | 10 | 6.0 | 0.0 | 0 | 0.00 | 0.38 |

actual: 40%- | 0 | 9.0 | 7.0 | 0 | 0.00 | 0.44 |

actual: 60%- | 0 | 0.0 | 15.0 | 0 | 0.00 | 0.00 |

actual: 80%- | 0 | 0.0 | 8.0 | 6 | 2.00 | 0.62 |

actual: 80%+ | 0 | 0.0 | 0.0 | 0 | 16.00 | 0.00 |

Use Error | 0 | 0.4 | 0.5 | 0 | 0.11 | 0.29 |

Even on the test data, the model by the *caret* package performs well and it seems that the cost of selecting a smaller tree by the *1-SE rule* may be too much on this data set.

Below shows some plots on the model by the *caret* package on the test data.

Finally the following shows the CART model tree on the training data.