Lack of multi-threading and memory limitation are two outstanding weaknesses of base R. In fact, however, if the size of data is not so large that it can be read in RAM, the former would be relatively easily handled by parallel processing, provided that multiple processors are equipped. This article introduces to a way of implementing parallel processing on a single machine using the snow and parallel packages - the examples are largely based on McCallum and Weston (2012).
The snow and multicore are two of the packages for parallel processing and the parallel package, which has been included in the base R distribution by CRAN (since R 2.14.0), provides functions of both the packages (and more). Only the functions based on the snow package are covered in the article.
The functions are similar to
lapply() and they just have an additional argument for a cluster object (cl). The following 4 functions will be compared.
clusterApply(cl, x, fun, ...)
- pushes tasks to workers by distributing x elements
clusterApplyLB(cl, x, fun, ...)
- clusterApply + load balancing (workers pull tasks as needed)
- efficient if some tasks take longer or some workers are slower
parLapply(cl, x, fun, ...)
- clusterApply + scheduling tasks by splitting x given clusters
- docall(c, clusterApply(cl, splitList(x, length(cl)), lapply, fun, …))
parLapplyLB(cl = NULL, X, fun, ...)
- clusterApply + tasks scheduling + load balancing
- available only in the parallel package
Let’s get started.
As the packages share the same function names, the following utility function is used to reset environment at the end of examples.
Make and stop a cluster
In order to make a cluster, the socket transport is selected (type=”SOCK” or type=”PSOCK”) and the number of workers are set manually using the snow package (spec=4). In the parallel package, it can be detected by
stopCluster() stops a cluster.
CASE I - load balancing matters
As mentioned earlier, load balancing can be important when some tasks take longer or some workers are slower. In this example, system sleep time is assigned randomly so as to compare how tasks are performed using
snow.time() in the snow package and how long it takes by the functions using
parLapply() takes shorter than
clusterApply(). The efficiency of the former is due to load balancing (pulling a task when necessary) while that of the latter is because of a lower number of I/O operations thanks to task scheduling, which allows a single I/O operation in a chunk (or split) - its benefit is more outstanding when one or more arguments are sent to workers as shown in the next example. The scheduling can be checked by
The above functions can also be executed using the parallel package and it provides an additional function (
snow.time() is not available, their system time is compared below.
CASE II - I/O operation matters
In this example, a case where an argument is sent to workers is considered. While the argument is passed to wokers once for each task by
clusterApplyLB(), it is sent to each chunk once by
parLapplyLB(). Therefore the benefit of the latter group of functions can be outstanding in this example - it can be checked workers are idle inbetween in the first two plots while tasks are performed continuously in the last plot.
clusterApplyLB() has some improvement over
clusterApply(), it is
parLapply() which takes the least amount of time. Actually, for the snow package, McCallum and Weston (2012) recommends
parLapply() and it’d be better to use
parLapplyLB() if the parallel package is used. The elapsed time of each function is shown below - the last two functions’ elapsed time is identical as individual tasks are assumed to take exactly the same amount of time.
Initialization of workers
Sometimes workers have to be initialized (eg loading a library) and two functions can be used:
clusterCall(). While the former just executes an expression, it is possible to send a variable using the latter. Note that it is recommended to let an expression or a function return NULL in order not to receive unnecessary data from workers (McCallum and Weston (2012)). Only an example by the snow package is shown below.
Random number generation
In order to ensure that each worker has different random numbers, independent randome number streams have to be set up. In the snow package, either the L’Ecuyer’s random number generator by the rlecuyer package or the SPRNG generator by the rsprng are used in
clusterSetupRNG(). Only the former is implemented in the parallel package in
clusterSetRNGStream() and, as it uses its own algorithm, the rlecuyer package is not necessary. For reproducibility, a seed can be specified and, for the function in the snow package, a vector of six integers is necessary (eg seed=rep(1237,6)) while an integer value is required for the function in the parallel package (eg iseed=1237). Each of the examples are shown below.
A quick introduction to the snow and parallel packages is made in this article. Sometimes it may not be easy to create a function that can be sent into clusters or looping may be more natural for computation. In this case, the foreach package would be used and an introduction to this package will be made in the next article.