While comparing the three packages (rpart, caret, mlr) in the previous article, I was quite concerned that names of new variables are not easy to be kept effectively. First of all, names that are separated by full stop (.) are by no means effective and/or it gets harder to keep a comprehensive naming convention as the number of variables increases. Secondly, although old variables may be replaced, there are cases that some of them should be kept during analysis and/or, at worst, replacement can be unintentional. Also, as R is not type-safe due to its dynamic type system, things can go beyond control easily. In order to keep analysis under control, the S3 object system is looked into and it is found that, at least, key analysis outcomes can be easily encapsulated as members of a list so that it can be easier to perform analysis without making horible names and/or avoidng an error that is caused by confliting names.
This article begins with a simple example that illustrates what seems to be the most relevant for encapculating variables and thus it is not complete - for those who are intested in more details, please see these slides. Then a class that holds CART analysis outcomes is discussed - this class aims to keep the same outcomes by the rpart package in the previous article
The basic example creates two classes: employee and manager. The base class of employee has a single property (name) and method (print.employee()). Note that this method extends the S3 generic function of print() so that the full name doesn’t need to be entered (ie print(obj) is enough). The class of manageris-aemployee and thus it can inherit and extend the functionality of the base class. This class has an additional property (members) and, as the name suggests, this property aims to keep team members - each member is an instance of the class employee.
Although there are a number of ways to create a class, creating one by a function (or constructor) would be good for S3 objects. Below employee and manager classes are created by emp() and man() respectively.
As employee has a single property, a character vector is assigned. If there are multiple properties, a list can be used as the case of manager.
Class names are assigned at the end and, as manager extends employee, both the classes names are assigned to it.
A drawback of S3 objects is that their class attribute(s) can be changed without restriction, simply by replacing the exisiting one(s) or by unclass(). Therefore it is important to put pieces to check the arguments of a constructor. If (condition) stop(message) can be used for this purpose. For example, man() checks whether the class of each element of members is character and throws an error if not. To illustrate this, a numeric vector of 1 is passed to this function and it causes the following error: Error in man(“John”, foo) : a vector of employees necessary. This can prevent a potentially undesirable result.
This throws an error with a message of Error in man(“John”, foo) : a vector of employees necessary.
Below the S3 generic function of print() is extended. Note that it is recommended not to include full stop (.) in a class name. Otherwise R may not be able to identify the correct method to execute.
Available methods can be searched by entering a method or class name.
If class attributes are changed, the new print methods no longer work.
This class (rpartExt) repeats the previous analysis and produces a list of the following variables
CART model object by the rpart package
cp values at minimum xerror and by the 1-SE rule (it is now abbreviated as se rather than bst)
4 sets of performance results on the train and test data (2 cp values on 2 data sets)
In line with the above example, this class plays the same role to employee as a base class and the outcomes by the caret and mlr packages can also be created to extend the base class like manager.
As this class encapsulates key outcomes in a list, it is possible to keep variables more effectively. Moreover, as it can handle both classification and regression, it is more easier to update/add new variables as well as the number of variables needed can be quite smaller. The source of this class can be checked on here. Note that a number of utility functions are necessary and they can be seen on here
Finally the S3 default function of plot() is extended and the plot function (plot.rpartExt()) shows the combination of xerror and cp. Note that params is exported into the global environment (.GlobalEnv$params = params). As far as I understand, after the second geom_abline(), the ggplot object (plot.obj) seems to be registered in the global environment. Therefore the two lines of adding points (geom_point()) can’t locate params, resulting in an error. Therefore the necessary variable is exported so that the plot object can get the values (eg params[1,2]) - the function would have to be updated to add lines rather than points in the future.
A quick illustration is shown below.
A classificaton task can be created as following.
The confusion matrix on the test data is shown below.
The object can be plotted as following.
In the next article, further discussion will be made with the other two classes.