Whether you are a veteran programmer with experience dating back to Fortran, or a new college grad with all the latest technologies, if you use R eventually you will have to worry about scoping!
Sure, we all start out ignoring scoping when we first begin using a new language. So what if all your variables and functions are global – you are the only one using them, right?!?! Unless you give up on R, you will eventually grow beyond your own system – either having to share your code with others, or deliver it to someone else – and that’s when you’ll start to need to pay attention to your code’s quality – starting with scoping!
Let’s get started at the beginning of the R coding experience. When you execute R on the command line generally everything is added to the global scope – and this makes logical sense. Little changes when you program in a .R file – it’s just a series of commands that are executed one by one, but as your sophistication of code increases exponentially you will want and need to use functions for reusable code pieces. This more granular scoping is ideal as your codebase grows!
Basic scoping rules in R
- By default they are added to the Global scope.
- Variables passed into the function as inputs are visible by default within the function. Variables defined in the parent scope are not visible, but globally-defined variables are visible. If the parent scope is the same as the global scope – those variables will be visible!
- Variables created inside the function are local to that function and it’s sub-components, and NOT visible outside of the function.
- Each invocation of a function is independent, which means variables declared and manipulated inside a function do not retain their values
- Arguments are immutable – if you change the value of an argument, what you are actually doing is creating a new variable and changing it. R does not support “call by reference” where the arguments can be changed in the called function and then the caller can use the changed variables. This is a very important difference from other languages – in some ways it makes your code safer and easier to debug/trace and in other ways it can be inconvenient when you have to return several of values of different types.
- Brackets {} do not create reduced or isolated scopes in R
Seems straightforward! However there are two big gotchas – automatic searching and double-arrow assignment misuse.
1. Automatic Searching
R uses environments that look like nested trees. When a variable or function is not found in a particular scope, R will automatically (like it or not) start searching parents to find the variable or function. R then continues searching until reaching the top-level environment (usually global) and then continues back down the search list until reaching the empty environment (at which point if a match hasn’t been found your code will error)
This can be dangerously unexpected, especially if there is a critical typo or you like to reuse variables (like x). You can download and run the example code to see this in action.
One of the best ways to double-check your functions for external searching is to get in the habit of using the codetools::findGlobals function. When you have created your function, and you’re pretty convinced it is working, call this function to get a list of all external dependencies and double-check that there isn’t anything unexpected!
Another “gotcha” is the double-arrow assignment. Some users assume incorrectly that using – will assign a value to a global environment variable. This is not completely correct. What happens with – is that it starts walking up the environment tree from child to parent until it either finds a match, or ends up in the global (top) environment. This is a way to initiate a tree-walk (like automatic searching) but with dire consequences because you are making an assignment outside of the current scope! Only the first match it finds will get changed, whether or not it is at the global environment. If you truly need to assign a variable in the global environment (or any other non-local environment) you should use the assign function (assign (“x”, “y”, envir = .GlobalEnv). Ideally you should return the value from the function and deal with it from that other environment.
If you understand and follow the above you will be well on your way to ensuring correctly scoped variables and functions in your R code. Yes, there are mechanisms for hiding variables and getting around the standard scoping and restrictions in R. However, once you are comfortable with the basics you’ll be able to properly deal with these mechanisms – we’ll leave that set of topics for another day and another post.
I’ve written a commented R script if you would like to see examples of the above scoping rules as well as the gotchas in action. Feel free to download and use it as you see fit!
To get the code and read the original article, click here.