R comes with its own suite of built-in functions
Non-trivial applications require you build your own functions
Create functions using function :
my_func <- function( formal_arguments ) body
The above statement creates a function called my_func
formal_arguments: comma separated names
my_func
expectsfunction_body: a statement or a block
my_func
does with inputsmy_add <- function(x,y) x+y
gauss_pdf <- function(ip, mn, vr, lg_pr) {
# Calculate the (log)-probability of Gaussian with mean m and variance vr
rslt <- -((ip-mn)^2)/(2*vr)
rslt <- rslt - 0.5*log(2*pi*vr)
# Do we want the prob or the log-prob?
if(lg_pr == F) rslt <- exp(rslt)
return(rslt)
}
print(gauss_pdf(1,0,1,F)); dnorm(1, log=F)
gauss_pdf
is an object:
typeof(gauss_pdf)
class(gauss_pdf)
str(gauss_pdf)
Expects three numerics and a boolean input, and returns a numeric
A function can accept/return any object:
Can add some defaults and checks
gauss_pdf <- function(ip, mn=0, vr=1, lg_pr=T) {
# Calculate the (log)-probability of Gaussian with mean m and variance vr
if(vr <= 0) {
warning("Expect a positive variance");
return(NULL)
}
rslt <- -((ip-mn)^2)/(2*vr)
rslt <- rslt - 0.5*log(2*pi*vr)
# Do we want the prob or the log-prob?
if(lg_pr == F) rslt <- exp(rslt)
rslt
}
pr <- gauss_pdf(1,0,1); print(exp(pr))
my_add <- function(x,y) {return(x+y)}
my_mul <- function(x,y) x*y
my_gen <- function(ip_fun, x) function(z) ip_fun(x,z)
inc3 <- my_gen(my_add,3)
inc3(5)
Proceeds by a three-pass process
Any remaining unmatched arguments triggers an error
mean(,TRUE,x=c(1:10,NA)) # From Advanced R, Hadley Wickham
‘...’ allows any number of arguments
Useful when passing arguments to other functions:
pick_func <- function (two_arg, ...) {
# Function w/ 2 arguments
if(two_arg) two_arg_fun(...) else
# Function w/ 3 arguments
three_arg_fun(...)
}
Example: Recursive addition via functional programming
recurse_sum <- function(x = TRUE, ...) # Cute but inefficient
if(isTRUE(x)) 0 else x + recurse_sum(...)
recurse_sum(1,2,3,5,6,7) # Don’t include TRUE in the input!
Note the use of isTRUE() above
We saw a function recurse_sum()
that called itself
This raises a few questions:
R decides this by following a set of scoping rules
R follows what is called lexical scoping
Function objects have attributes
body(recurse_sum)
formals(recurse_sum)
environment(recurse_sum)
environment
: data-structure that binds names to values
Determines scoping rules in R
An environment is a kind of named list of symbol-value pairs
x <- 5; env <- environment(); env
env$x
func1 <- function() {my_local <- 1; environment()}
(local_env <- func1())
local_env$my_local
parent.env(local_env) # Each environment has a parent environment
Lexical scoping:
<<-
, the super-assignment operator)Here, environments are those at time of definition
Where the function is defined (rather than how it is called) determines which variables to use
Values of these variables at the time of calling are used
x <- 5
func1 <- function(x) {x + 1}
func1(1)
x <- 5; func2 <- function() {x + 1}
func2(); x
x <- 10; func2() # use new x or x at the time of definition?
x <- 1; y <- 10
func3 <- function() {x <- x + 1; y <<- y + 1; environment()}
env <- func3()
c(x, y, env$x)
func1 <- function(x) {x + 1}
func4 <- function(x) {func1(x)}
func4(2)
x <- 5; func2 <- function() {x + 1}
func5 <- function(x) {func2()}
func5(2) # func2 uses x from calling or global environment?
For more on scoping, see (Advanced R, Hadley Wickham)
The bottomline
’+’ <- function(x,y) x*y #Open new RStudio session!
2 + 10
Lazy evaluation: R evaluates arguments only when needed
Can also cause confusion
func <- function(x,y) if(x) 2*x else x + 2*y
func(1, {print("Hello"); 5})
func(0, {print("Hello"); 5})
functions: modular blocks of code that map input arguments to output (and sometimes have side-effects e.g. plotting)
Should not use global variables!
Functions should produce same output for same input, irrespective of values of non-input variables
Some exceptions are functions
set.seed()
)my_func <- function(mf_arg1, mf_arg2) {
# Stuff not involving information other than
# mf_arg's
}
If you’re going to change R datasets, make local copies
Bad:
USArrests['Indiana'] <- USArrests['Indiana'] + 1
# What happens next time you run your script?
# What if you want the original value?
Good:
my_USArrests <- USArrests
my_USArrests['Indiana'] <- USArrests['Indiana'] + 1
Bad: Hacking away at the console and later trying to reconstruct how you got your output
Good: Work with a text file in the editor
Use the console to explore outcome of one line, check help/syntax, but once successful, add line to editor
While working at the editor, <Ctrl><Enter>
<Ctrl><Shift><Enter>
: executes all lines
<Ctrl><1> and <Ctrl><2>
: Move cursor to editor or console
Also <Tab>
autocompletes, <Up>
moves through command
history, and <Ctrl><Up>
autocompletes from command history
For more:
<ALT><SHIFT><k>
(Tools>Keyboard Shortcuts Help)Ideally, instead of submitting a script, wrap it up in a function Assigning variables won’t mangle someone else’s namespace
# Homework 1A
Lots of variable assignments
result <- ...
Better:
homework_1a <- function(ip_data) {
# Helpful comment
Lots of variable assignments
result <- ...
}
homework_1a(USArrests)