# 2015-01-24-Benchmark-Example-in-MLR-Part-I

This is an update of the second article - Second Look on MLR. While a hyper- or turning parameter is either non-existent or given in the previous article, it is estimated here - specifically cost of constraints violation (C) of support vector machine is estimated.

The *Credit Scoring* example in Chapter 4 of Applied Predictive Modeling is reimplemented using the **mlr** package. Details of the German Credit Data that is used here can be found here.

The bold-cased topics below are mainly covered.

**Imputation, Processing …**- Task
- Learner
- Train
- Predict
- Performance
**Resampling****Benchmark**

The following packages are used.

## Preprocessing

**mlr** has different methods of preprocessing and splitting data to **caret**. For comparison that may be necessary in the future, these steps are performed in the same way.

80% of data is taken as the training set.

## Task

*Task* is set up using the training data and normalized as the original example.

## Learner

The following two learners are set up for benchmark: *Support vector machine* and *logistic regression*. Note that the development version (v2.3) is necessary to fit logistic regression - see this article for installation information.

## Resampling

*Repeated cross-validation* is chosen as the original example.

## Tuning

As the original example, *sigma* (inverse kernel width) is estimated first using `sigest()`

in the *kernlab* package. Then a control grid is made by varying values of *C* only.

In `makeParamSet()`

, *sigma* and *kernel* are fixed as discrete parameters while *C* is varied from *lower* to *upper* in the scale that is determined by the argument of `trafo`

. For numeric and integer parameters, it is possible to adjust increment by *resolution*. Note that the above set up can be relaxed, for example, by varying both *C* and *sigma* and, in this case, it would be more flexible to set *sigma* as a numeric parameter.

The resulting grid can be checked using `generateGridDesign()`

The parameter can be tuned using `tuneParams()`

as shown below.

Fitting details can check as following.

## Benchmark

Once the hyper- or tuning parameter is determined, the learner can be updated using `setHyperPars()`

.

The tuned SVM learner can be bechmarked with the logistic regression learner. This shows only a marginal difference.

The *tuning* section of mlr tutorial indicates that the above practice in which optimization is undertaken over the same data during tuning the SVM parameter might be optimistically biased to estimate the performance value. In order to handle this issue, **nested resampling** is necessary - a more detailed explanation about nested resampling can be found here. Moreover this resampling strategy can be applied to **feature selection** - see the benchmark tutorial and this article. In this regards, it would be alright that the topic of the next article is about **nested resampling** for model selection.