In the previous post, a quick comparison is made between two useful functionals: lapply() and do.call(). Functionals are an example of higher-order functions, which can have a function as an argument and/or return a function as an output. As R supports functional programming, it supports other types of higher order functions and clever use of them can be beneficial, resulting in succinct code. In this post, closure and user-defined functional are illustrated by creating a generic bootstrapper.

Closure

Simply put, a closure is a function factory (Hadley, 2014). A function has three parts: formals(), body() and environment(). A closure returns a function and the function inherits the environment that is generated by the closure. Below is a quick example from Hadley, 2014. The anonymous function (function(x) { x ^ exponent }) inherits the value of exponent (2 or 3) when it is created and this value is used to exponent its argument (2).

## closure example
power <- function(exponent) {
  function(x) {
    x ^ exponent
  }
}

square <- power(2)
square(2)
## [1] 4
cube <- power(3)
cube(2)
## [1] 8

Functional

In R, a function can have another function as an argument, which is called as a functional, and a user-defined functional can be created as other ordinary functions.

In order to illustrate it with a practical example, a sample data set is generated. It has 5 random vectors of predictors (x1 to x5) and a sigle response vector (y) and all are uniform-distributed.

## data
set.seed(1237)
x <- matrix(runif(500), 100)
y <- runif(100)

data <- as.data.frame(cbind(y, x))
names(data) <- c("y","x1","x2","x3","x4","x5")

Let say multiple learners are to be compared and individual learners should be wrapped by another function. This case can be handled by a functional as shown below. The user-defined functional (functional()) has a function argument (f) and unspecified arguments (). As can be seen in the body, this funtional simply executes the input function with unspecified arguments (f(...)). Here the following three learners are considered: lm(), glm() and rpart().

## functional
functional <- function(f, ...) {
  f(...)
}

functional(f = lm, formula = y ~ ., data = data)
## 
## Call:
## f(formula = ..1, data = ..2)
## 
## Coefficients:
## (Intercept)           x1           x2           x3           x4  
##    0.490465    -0.082547    -0.090125     0.024990     0.193828  
##          x5  
##    0.007473
functional(f = glm, formula = y ~ ., data = data, family = gaussian)
## 
## Call:  f(formula = ..1, family = ..3, data = ..2)
## 
## Coefficients:
## (Intercept)           x1           x2           x3           x4  
##    0.490465    -0.082547    -0.090125     0.024990     0.193828  
##          x5  
##    0.007473  
## 
## Degrees of Freedom: 99 Total (i.e. Null);  94 Residual
## Null Deviance:	    8.573 
## Residual Deviance: 8.201 	AIC: 47.7
require(rpart)
functional(f = rpart, formula = y ~ ., data = data, control = rpart.control(cp = 0))
## n= 100 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##   1) root 100 8.57286900 0.5164056  
##     2) x4< 0.2324552 19 1.32257300 0.3394754 *
##     3) x4>=0.2324552 81 6.51599700 0.5579078  
##       6) x5>=0.9340799 9 0.37043010 0.3140506 *
##       7) x5< 0.9340799 72 5.54347000 0.5883899  
##        14) x1>=0.1189346 64 5.01265200 0.5601459  
##          28) x5< 0.7090657 46 3.63206800 0.5147681  
##            56) x4>=0.8136216 11 0.36021190 0.3525337 *
##            57) x4< 0.8136216 35 2.89134400 0.5657561  
##             114) x1< 0.7937034 26 1.93447800 0.5081107  
##               228) x3>=0.4130735 15 0.95942230 0.4356593 *
##               229) x3< 0.4130735 11 0.78894740 0.6069080 *
##             115) x1>=0.7937034 9 0.62087430 0.7322873 *
##          29) x5>=0.7090657 18 1.04379900 0.6761114 *
##        15) x1< 0.1189346 8 0.07132895 0.8143421 *

Closure + Functional = Generic Bootstrapper

The above functional itself has no benefit but, if it is wrapped by another function, it can be quite useful. In the following example, an anonymous functional is created by a closure (bootstrapper()). The purpose of this closure is to configure settings of bootstrap sampling. For simplicity, only formula, data and ntrial are considered in this example.

The arguments of the closure as well as internally generated variables are accessible by the anonymous functional and it returns model objects and errors in a list recursively - note lapply(). Unlike the above simple example, formula may be necessary (eg later when stratification is implemented) and bootstrap samples are generated recursively so that those that are created outside the functional are used to execute the function. A difference of this functional to anther similar function is that it accepts multiple learners as the function argument (f), which makes it generic.

## closure + functional
ntrial <- 100
bootstrapper <- function(formula, data, ntrial) {
  # check if response is found
  res.name = gsub(" ","",unlist(strsplit(formula,split="~"))[[1]])
  res.ind = match(res.name, colnames(data))

  function(f, ...) {
    lapply(1:ntrial, function(i) {
      # do bootstrap
      bag <- sample(nrow(data), size = nrow(data), replace = TRUE)
      model <- f(formula = formula, data = data[bag,], ...)
      fitted <- predict(model)
      actual <- data[bag, res.ind]
      error <- if(class(actual) == "numeric") {
        sqrt(sum((fitted - actual)^2) / length(actual))
      } else {
        1 - diag(table(actual, fitted)) / sum(table(actual, fitted))
      }
      
      list(model = model, error = error)      
    })
  }
}

As mentioned above, the closure is used to configure bootstrap sampling and the following conditions are specified in boot_configure: formula, data and ntrial. Then, given configuration, individual learners are fit (boot_lm and boot_cart) and their bootstrap errors are obtained.

boot_configure <- bootstrapper(formula = "y ~ .", data = data, ntrial = ntrial)

set.seed(1237)
boot_lm <- boot_configure(lm)
mean(do.call(c, lapply(boot_lm, function(x) x$error)))
## [1] 0.2797097
set.seed(1237)
boot_cart <- boot_configure(rpart)
mean(do.call(c, lapply(boot_cart, function(x) x$error)))
## [1] 0.2020982