While the last three articles illustrated the CART model for both classification (with equal/unequal costs) and regression tasks, this article is rather technical as it compares three packages: rpart, caret and mlr. For those who are not farmiliar with the last two packages, they are wrappers (or frameworks) that implement a range of models (or algorithms) in a unified way. For example, the CART implementation of the rpart package can also be performed in these packages as an integrated learner. As mentioned in an earlier article (Link), inconsistent API could be a drawback of R (like other open source tools) and it would be quite beneficial if there is a way to implement different models in a standardized way. In line with the earlier articles, the Carseats data is used for a classification task.
Before getting started, I should admit the names are not defined effectively. I hope the below list may be helpful to follow the script.
package: rpt - rpart, crt - caret, mlr - mlr
model fitting: ftd - fit on training data, ptd - fit on test data
bst - best cp by 1-SE rule (recommended by rpart)
lst - best cp by highest accuracy (caret) or lowest mmce (mlr)
cost: eq - equal cost, uq - unequal cost (uq case not added in this article)
etc: cp - complexity parameter, mmce - mean misclassification error, acc - Accuracy (caret), cm - confusion matrix
Also the data is randomly split into trainData and testData. In practice, the latter is not observed and it is used here for evaludation.
Let’s get started.
The following packages are used.
The Sales column is converted into a binary variables.
Balanced splitting of data can be performed in either of the packages as shown below.
Same to the previous articles, the split by the caret package is taken.
Note that two custom functions are used: bestParam() and updateCM(). The former searches the cp values by the 1-SE rule (bst) and at the lowest xerror (lst) from the cp table of a rpart object. The latter produces a confusion matrix with model and use error, added to the last column and row respectively. Their sources can be seen here.
At first, the model is fit using the rpart package and bst and lstcp values are obtained.
The selected cp values can be check graphically below.
The original tree is pruned with the 2 cp values, resulting in 2 separate trees, and they are fit on the training data.
Details of the fitting is kept in a list (mmce).
pkg: package name
isTest: fit on test data?
isBest: cp by 1-SE rule?
isEq: equal cost?
cp: cp value used
mmce: mean misclassification error
The pruned trees are fit into the test data and the same details are added to the list (mmce).
Secondly the caret package is employed to implement the CART model.
Note that the caret package select the best cp value that corresponds to the lowest Accuracy. Therefore the best cp by this package is labeled as lst to be consistent with the rpart package. And the bstcp is selected by the 1-SE rule. Note that, as the standard error of Accuracy is relatively wide, an adjustment is maded to select the best cp value and it can be checked in the graph below.
Similar to above, 2 trees with the respective cp values are fit into the train and test data and the details are kept in mmce. Below is the update by fitting from the train data.
Below is the update by fitting from the test data. The updated fitting details can be checked.
Finally the mlr package is employed.
At first, a taks and learner are set up.
Then a grid of cp values is generated followed by tuning the parameter. Note that, as the tuning optimization path does not include a standard-error-like variable, only the best cp values are taken into consideration.
Using the best cp value, the learner is updated followed by training the model.
Then the model is fit into the train and test data and the fitting details are updated in mmce. The overall fitting results can be checked below.
It is shown that the mmce values are identical and it seems to be because the model is quite stable with respect to cp. It can be checked in the following graph.
It may not be convicing to use a wrapper by this article about a single model. For example, however, if there are multiple models with a variety of tuning parameters to compare, the benefit of having one can be considerable. In the following articles, a similar approach would be taken, which is comparing individual packages to the wrappers.