In the previous article, parallel processing on a single machine using the snow and parallel packages are introduced. The four functions are an extension of
lapply() with an additional argument that specifies a cluster object. In spite of their effectiveness and ease of use, there may be cases where creating a function that can be sent into clusters is not easy or looping may be more natural. In this article, another way of implementing parallel processing on a single machine is introduced using the foreach and doParallel packages where clusters are created by the parallel package. Finally the iterators package is briefly covered as it can facilitate writing a loop. The examples here are largely based on the individual packages’ vignettes and further details can be found there.
Let’s get started.
The following packages are loaded.
A key difference between the for construct in base R and the foreach construct in the foreach package is as following.
- for causes a side-effect
- side-effect means state of something is changed. For example, printing a variable, changing the value of a variable and writing data to disk are side-effects.
- foreach returns a variable
- A list is created by default.
An equivalent outcome by the two can be created as following.
The foreach construct has two binary operators for executing a loop:
%dopar%. The first executes a loop sequentially while the latter does it in parallel. By default, the doParallel package uses functionality of the multicore package on Unix-like systems and that of the snow package on Windows. However the default type value (PSOCK) of
makeCluster() in the parallel package is brought from the snow package and thus the socket transport by the package will be used in this example regardless of operation systems. The number of cores (or workers in the socket transport) is identified by
detectCores() and this function is provided by the parallel package. Note that, if a cluster object is not setup, the loop will be executed sequentially.
Multiple iterators can be used and (1) more than one expressions can be run in parentheses. (2) If more than one iterators are used, the number of iterations is the minimum length of the iterators. Some examples are shown below.
There are some options to control a loop or to change the return data type. Some of them are
.combine: A function can be specified to reduce the outcome variable.
+ - * / ...,
min/maxare some of useful built-in functions. A user-defined function can also be created.
.maxcombinecan be set to determine how a function is applied - actually I don’t fully understand these options and see the vignette for further details.
.inorder: The order of sequence is reserved if TRUE and it is relevant if a loop is executed in parallel. The default value is TRUE.
.packages: By specifying one or more pckage names, the package(s) can be loaded in each cluster. This is similar to initialize a worker using
clusterEvalQ()in the parallel package.
The binary operator of
%:% can be used for list comprehension (filtering which to loop with
when) and nested looping.
An example of list comprehension is shown below. It returns a vector of even numbers.
Below shows an example of nested looping by for and foreach. According to the vignette, it is not necessary to determine which loop (inner or outer) to parallize as
%:% turns multiple foreach loops into a single stream of tasks that can be parallelized.
As a loop is constructed by foreach, the iterators package can be useful as the package allows to create an iterator object from a conventional R objects: vectors, data frames, matrices, lists and even functions. Some examples are shown below.
The following code generates Error: StopIteration error at the end as there is no nextElem available.
A quick example of using an iterator object with foreach is shown below.
Finally this package provides some wappers around some of built-in functions:
So far two groups of ways are introduced to perform parallel processing on a single machine. The first group uses an extended
lapply() while the latter is an extension of for construct. In the next article, they will be compared with more realistic examples.