Purely programming point of view, I consider for-loops would be better to be avoided in R as
- the script can be more readable
- it is easier to handle errors
Some articles on the web indicate that looping functions (or apply family of functions) don’t guarantee faster execution and sometimes even slower. Although, assuming that the experiments are correct, in my opinion, code readability itself is beneficial enough to avoid for-loops. Even worse, R’s dynamic typing system coupled with poor readability can result in a frustrating consequence as the code grows.
Also one of the R’s best IDE (RStudio) doesn’t seem to provide an incremental debugging - it may be wrong as I’m new to it. Therefore it is not easy to debug pieces in a for-loop. In this regard, it would be a good idea to refactor for-loops into pieces.
If one has decided to avoid for-loops, the way how to code would need to be changed. With for-loops, the focus is ‘how to get the job done’. One the other hand, if it is replaced with looping functions, the focus should be ‘what does the outcome look like’. In other words, the way of thinking should be declarative, rather than imperative.
Below shows two examples from The R Project for Statistical Computing in LinkedIn. Instead of using for-loop, apply family of functions or plyr package are used for recursive computation.
The following packages are used.
In this post, the goal is to create a function that creates a simulated vector with the following code.
The above for-loop can be replaced easily with
mapply as following.
The function named
sim is evaluated below.
- I’m trying to extract subsets of values from my dataset which are grouped by a value (which could be any number). This column is set by another piece of software and so the code needs to be flexible enough to identify groups of identical numbers without the number being specified. I.e. if value in row10 = row11 then group. For that I have used:
- This seems to work. I then need to identify all of the groups which have >4 rows and separate those from each other.
Here the goal is to select records that have preset Date and BPM pairs. Also it is necessary that the numbers of each group are greater than 4.
Initial selection conditions are shown below. Note that, for simplicity, the groups will be selected only if the numbers are greater than or equal to 2.
Then a data frame is created.
At first, the data frame is converted into a list by Date and BPM using
Then dimensions of each list elements are checked using
sapply. Then a Boolean vector is created so that the value will be TRUE only if the row dimension is greater than 2.
Finally the list is converted into a data frame using
ldply by subsetting the Boolean vector.