I’ve used R6 classes in a lot of my R projects in recent years. They are easier to understand than S3 and S4 objects if you’ve ever done programming with objects in other languages. I also am still not sure how I can have functions act on data in an S3 object. I think it’s not possible, but I’m really not sure. And R6 classes are definitely better than RC objects.
Active bindings allow R6 fields to act like functions (or to have functions look like fields). My main reason for writing this is to see how slow accessing active binding fields of R6 classes is.
What is an R6 class?
Shortly, R6 classes allow you to create objects to store data and functions that act on the data. This allows you to create multiple instances of the same class that will have the same functions available.
For example, below I’ve defined a Circle
class, which I’ll explain below.
Circle <- R6::R6Class(
classname = "Circle",
public = list(
radius = NULL,
area = function() {
pi * self$radius ^ 2
},
initialize = function(radius) {
self$radius <- radius
}
)
)
You give the variable a name as usual, but you also give a class name as an argument. These don’t have to be the same. The first is what you’ll type to create a new one, the latter is what will show up when you call class
. Then the second argument is public
, a list of items or functions associated with the object. I have defined a variable radius
and a function area
. I set the radius to NULL
by default. The function area
calculates the area using the radius
associated with the object. Using self$
is how you can access these. Finally I also defined an initialize
function, which is what will be called when I create a new Circle
.
To create a new Circle I call Circle$new(<arguments passed to initialize>)
. Below I create a Circle
with radius 2 and print the radius and calculate the area.
c1 <- Circle$new(radius = 2)
print(c(c1$radius, c1$area()))
## [1] 2.00000 12.56637
I can change the radius by simply assigning it a new value.
c1$radius <- 4
print(c(c1$radius, c1$area()))
## [1] 4.00000 50.26548
You can also create private functions/variables. For a more in-depth look at how to use R6
, check the link above or other sites. This is not meant to be a comprehensive explanation.
How to use active bindings
Suppose we wanted to add diameter
to the Circle
class. I could add it as a function, just as I already have area
. I could even write the function to take a new value of diameter so that it would update the radius at the same time. But instead suppose I want to have it as a variable that can be set. Then I’d be able to change the radius or diameter. But then whenever I change one, I’d need to change the other so they are in agreement. Here’s where I can use active bindings so it still looks like they are variables despite them actually acting like functions.
Below I define the Circle2
class to do this. An active binding it actually a function. If it is assigned a value, then the corresponding function is evaluated, using the value given as the argument value
. If no value is given, i.e. the field is just called and not assigned, then the value
argument is missing. The function should be defined to act one way when assigned a value, and another when not.
Circle2 <- R6::R6Class(
classname = "Circle2",
public = list(
radius = NULL,
area = function() {
pi * self$radius ^ 2
},
initialize = function(radius) {
self$radius <- radius
}
),
active = list(
diameter = function(value) {
if (missing(value)) {2*self$radius}
else self$radius <- value / 2
}
)
)
Now if I create one of these objects, I can set the radius
, access the diameter
as a field, and then when I set the diameter
it will automatically change the radius
to match it.
c2 <- Circle2$new(radius=2)
print(c(c2$radius, c2$diameter, c2$area()))
## [1] 2.00000 4.00000 12.56637
c2$diameter <- 10
print(c(c2$radius, c2$diameter, c2$area()))
## [1] 5.00000 10.00000 78.53982
On my recent projects I’ve considered using active bindings when I have a variable that I want to store in two formats, e.g. as itself and on a log scale. My main concern with this is that using active bindings will be slow.
Are active bindings slow?
My goal of writing this is to find out how slow active bindings are to use. It seems like there should be an overhead cost to use something that is flexible and requires the machine determining which action to take. I will use microbenchmark
to time these and see how fast they are.
First I’ll just try to access the radius
, a normal field, from c2
, the diameter
, an active binding, of c2
, and compare these two just multiplying the radius
by 2.
microbenchmark::microbenchmark(
c2$radius,
c2$diameter,
{c2$radius * 2}
)
## Unit: nanoseconds
## expr min lq mean median uq max neval cld
## c2$radius 760 760 1144.27 1140 1141 18626 100 a
## c2$diameter 2280 2661 2984.01 2661 3041 11784 100 b
## { c2$radius * 2 } 760 1140 1239.29 1141 1520 2280 100 a
First we see by comparing the first and third line that the multiplication by 2 takes almost no time. Second by comparing these to the second line we see that there is a small cost to using the active binding. On my computer this is less than 2 microseconds, meaning that it is negligible unless calling these tons of times.
Next I want to compare how much longer it takes to assign an active binding. Here we set the radius
and diameter
to a random value. Recall that setting the active binding really is just calling a function to set the radius to half of the given value.
microbenchmark::microbenchmark(
{c2$radius <- runif(1)},
{c2$diameter <- runif(1)}
)
## Unit: microseconds
## expr min lq mean median uq max
## { c2$radius <- runif(1) } 2.660 3.041 3.63399 3.421 3.801 11.403
## { c2$diameter <- runif(1) } 5.321 5.702 7.50365 6.462 6.842 100.730
## neval cld
## 100 a
## 100 b
We see that it takes twice as long to set the active binding, but the difference on my computer is still only 3 microseconds. This tells me that I shouldn’t worry about the slowdown of active bindings unless I am calling them thousands or millions of times. And since the code I was considering using them on did lots of matrix equation solving, it wouldn’t hurt the performance in any noticeable way.
Conclusion
Active bindings are a useful part of R6
classes when the fields are entangled in some way. They can also be used when randomness is involved in each access, which I did not cover here. Accessing them is quite fast and should not cause any speed problems unless they are called millions of times.