As mentioned in an earlier post, things that are not easy in R can be relatively simple in other languages. Another example would be connecting to Amazon Web Services. In relation to s3, although there are a number of existing packages, many of them seem to be deprecated, premature or platform-dependent. (I consider the cloudyr project looks promising though.)
If there isn’t a comprehensive R-way of doing something yet, it may be necessary to create it from scratch. Actually there are some options to do so by using AWS Command Line Interface, AWS REST API or wrapping functionality of another language.
In this post, a quick summary of the last way using Python is illustrated by introducting the rs3helper package.
The reasons why I’ve come up with a package are as following.
- Firstly, Python is relatively easy to learn and it has quite a comprehensive interface to Amazon Web Services - boto.
- Secondly, in order to call Python in R, the rPython package may be used if it only targets UNIX-like platforms. For cross-platform functionality, however,
systemcommand has to be executed.
- Finally, due to the previous reason, it wouldn’t be stable to keep the source files locally and it’d be necessary to keep them in a package.
I use Python 2.7 and the boto library can be installed easily using pip by executing
pip install boto.
Using RStudio, it is not that complicated to develop a package. (see R packages by Hadley Wickham) Even the folder structure and necessary files are generated if the project type is selected as R Package. R script files should locate in the R folder while Python scripts should be in inst/python.
In the package, the s3-related R functions exists in R/s3utils.R while the corresponding python scripts are in inst/python - all Python functions are in inst/python/s3helper.py. As the Python function outputs should be passed to R, a response variable is returned for each function and it is converted into JSON string. The response variable is a Python list, dictionary or list of dictionaries and thus it is parsed as R vector, list or data frame.
An example of the wrapper functions, which looks up a bucket, is shown below.
lookup_bucket() are imported to inst/python/lookup_bucket.py from inst/python/s3helper.py. The script requires 4 mandatory/optional argumens and prints the response after converting it into JSON string.
lookup_bucket() generates the path where inst/python/lookup_bucket.py exists and constructs the command to be executed in
system() - the intern argument should be TRUE to grap the printed JSON string. Then it parses the returned JSON string into a R object using the jsoinlite package.
A quick example of running this function is shown below.