# 2015-01-18-Second-Look-on-MLR

In the previous article titled First Look on MLR, a quick comparison is made between *stats* and *mlr* packages by fitting logistic regression. However the benefits of using *mlr* (**consistent API** and **extensibility**) cannot be demonstrated well with that kind of simple analysis which has a single learner and no or a simple resampling strategy.

In this article, the comparison is extended to cover **Resampling** and **Benchmark** so as to illustrate how comprehensive analysis can be implemented - data is ‘*resampled*’ to employ holdout and 10-fold cross validation and the following 4 learners are ‘*benchmarked*’: logistic regression (glm), linear discriminant analysis (lda), quadratic discriminant analysis (qda) and k-nearest neighbors algorithm (knn). Same to the first article, *Chapter 4 Lab* of ISLR is reiterated again.

- Imputation, Processing …
- Task
- Learner
- Train
- Predict
- Performance
**Resampling****Benchmark**

The following packages are used.

At first, logistic regression (glm), linear discriminant analysis (lda), quadratic discriminant analysis (qda) and k-nearest neighbors algorithm (knn) are fit using individual libraries and their outcomes are consolidated at the end - model name, hyper parameter and mean misclassification error (mmce) are kept. As with the ISLR lab, only *Lag1* and *Lag2* are fit to the response of *Direction* - the structure of data is shown below. For holdout validation, 2005 data is used as the *test* set.

## Fitting with individual libraries

### Logistic regression (glm)

### Linear discriminant analysis (lda)

### Quadratic discriminant analysis (qda)

### k-nearest neighbors algorithm (knn)

The outcomes are kept in **holdout.res**.

## Fitting with MLR

Note that the development version (v2.3) is necessary to fit logistic regression - see the previous article for installation information.

### Task

The task can be set up as following.

### Learner

#### Logistic regression (glm)

#### Linear discriminant analysis (lda)

#### Quadratic discriminant analysis (qda)

#### k-nearest neighbors algorithm (knn)

One hyper- or tuning parameter is set to be 3.

The three subsequent steps (**Train**, **Predict** and **Performance**) are not directly dealt in this article as the focus is **Resampling** and **Benchmark**.

### Resampling

**mlr** supports the following resampling strategies and the first and last are chosen.

**Cross-validation (“CV”)**,- Leave-one-out cross-validation (“LOO””),
- Repeated cross-validation (“RepCV”),
- Out-of-bag bootstrap (“Bootstrap”),
- Subsampling (“Subsample”),
**Holdout (training/test) (“Holdout”)**

There are two ways to set up resampling stragies. The first one is to create resampling description while the second is to create resampling instance. A resampling instance is created for holdout validation while the more general resampling description is used for 10-fold cross validation - I haven’t found a way to set up 2005 records for the test set while a resampling instance can be created by adjusting (row) indices.

#### Resampling instance

#### Resampling description

### Benchmark

For benchmark, lists of tasks, learners, holdout resampling and CV reampling are created.

Then benchmark measures (mean misclassification error) are obtained for each learner and resampling strategy.

### Outcomes from individual libraries - holdout validation

### Outcome from mlr - holdout validation

### Outcome from mlr - cross-validation

There is a hyper- or tuning parameter in k-nearest neighbors algorithm and it is set to be 3 without justification. This is limitation of this analysis. As **mlr** supports **nested resampling** so that the parameter(s) can be determined together, this package allows even more extension.