In [1]:
library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
Custom Functions in R¶
What is a function?¶
A function, in the programming sense, is a bit of code that is organized and named so that it may be executed by simply executing the name (referred to as ‘calling’ a function). Often, functions take some values as input and return output values.
You have already met functions in R. The command ‘sum’ is a function. It takes as arguments a list of objects and adds them:
In [2]:
sum(1,2,4)
The sum
function takes a list of numbers as arguments (inputs), adds
them together and returns the result.
While there are many, many built-in functions in R, it is often desirable to write your own. It’s also helpful to be familiar with the structure of functions, to better understand how function calls work.
Anatomy of a function¶
The R function has the following structure
name <- function(arg1, arg2, ...) { # name of function and arguments (inputs) it takes
body_of_function # code that (presumably) manipulates arguments
return(value) # What the function returns (result of manipulations
}
A function is created using the function
keyword, followed by a
series of arguments in parentheses. The main work done by the function
is enclosed witin curly braces, and a return
function is used to
indicate the output of the function. Finally we can assign the funciton,
just like any other R object, to a named variable for later use.
Our first custom function¶
Let’s write a function to calculate the mean of a vector of numbers.
In [1]:
MyMean <- function(xs) {
n <- length(xs)
return(sum(xs/n))
}
In [2]:
MyMean(1:10)
There are a lot details to writing custom functions, such as specifying default values, position of arguments, etc.
In [3]:
## Another function example - with default arguments
WeightAvg <- function(xs, weights = rep(1,length(xs))) {
n <- length(xs)
weighted_vector <- xs*weights
return(sum(weighted_vector)/n)
}
In [4]:
xs <- c(1,2,3,4,5,6,7)
WeightAvg(xs)
In [8]:
weights <- c(2,3,2,3,2,3,1)
## Note that the order of arguments doesn't matter, because we have labeled the default.
WeightAvg(weights = weights, xs)
In [10]:
## Order matters when arguments aren't labeled.
ShowArgs <- function(xs, weights = rep(1,length(xs))) {
print(weights)
print(xs)
}
# Note that the above function does not return anything
In [11]:
ShowArgs(xs,weights)
[1] 2 3 2 3 2 3 1
[1] 1 2 3 4 5 6 7
In [12]:
ShowArgs(weights,xs)
[1] 1 2 3 4 5 6 7
[1] 2 3 2 3 2 3 1
In [13]:
ShowArgs(weights = weights,xs)
[1] 2 3 2 3 2 3 1
[1] 1 2 3 4 5 6 7
Examples¶
Write a function
MySd
that takes a vector of numbers and returns their standard deviation as calculated from the following formula:\[ \begin{align}\begin{aligned} \sqrt{\frac{\sum{(x - \bar{x})^2}}{n-1}}\\where :math:`x` is some vector of numbers, :math:`\bar{x}` is the\end{aligned}\end{align} \]mean of \(x\) and \(n\) is th number of elements in \(x\).
What is the standard deviation of
1:10
?Write a function that takes a matrix and multiplies it by a vector (for those who don’t know about matrix multiplication, an in-class example will be provided). Provide the identity matrix as a default: diag(rep(1,length(v)).
Mapping with apply¶
R provides tools for performing vectorized operations with nearly any function. We will first talk about the *apply group, but the more modern version of these are part of the ‘tidyverse’ in the package ‘purr’. More on those later...
Suppose I have a matrix, and I would like to obtain a sum of each
column, and store that in a vector. One way to do this is to simply loop
over the columns using a for
loop:
In [21]:
my_matrix <- matrix(rnorm(30), ncol = 6, nrow = 5)
sums <- rep(0,6)
for (i in 1:6){
sums[i] <- sum(my_matrix[,i])
}
print(sums)
[1] -2.0673368 3.3092049 -1.3901258 -5.1382279 0.8350368 -0.7088264
But there is another way, that is better for various reasons:
In [23]:
sums <- apply(FUN = sum, X = my_matrix, MARGIN = 2)
print(sums)
[1] -2.0673368 3.3092049 -1.3901258 -5.1382279 0.8350368 -0.7088264
Note that we can easily change this to a row sum:
In [27]:
sums <- apply(FUN = sum, X = my_matrix, MARGIN = 1)
print(sums)
[1] 0.6588709 -1.5304585 -3.0984142 0.6073535 -1.7976270
There are a number of things that make this version better than the
for
loop. There is less code, with fewer variables that need to be
specified.
- In the
for
loop, we need to somehow include the dimension, so that the loop knows how many interations to perform. - In the
apply
version, We can easily convert the column sum into a row sum and we do not need to specify dimension in either case. - The
apply
code can be easily parallelized (this is beyond our scope, but something to keep in mind). - The less code we write, the less chance for bugs to crop up.
Note that apply
works for user generated functions as well:
In [28]:
apply(MyMean, X = my_matrix, MARGIN = 1)
- 0.109811818585238
- -0.255076413886755
- -0.516402373949533
- 0.101225588404194
- -0.299604492013341
In [29]:
apply(MyMean, X = my_matrix, MARGIN = 2)
- -0.41346736204718
- 0.661840972673175
- -0.278025169001729
- -1.02764557513093
- 0.167007368691562
- -0.141765282617135
There are a number of variants of the apply
function that differ on
the data types they return or take as input. The lapply
function
works on lists (as opposed to 2 dimensional arrays such as matrices) and
returns a list object. The sapply
function is the same as
lapply
, but it attempts to simplify output to be the simplest
possible class (e.g. a list of numerics is returned as a vector, rather
than a list).
Examples¶
- Create a matrix using
rnorm
with 10 rows and 15 columns. Useapply
to obtain the median of rows and then over the columns. - Creat the following list: x <- list(a = rep(1,7), b = 1:3, c = 10:100)
Use lapply
and sapply
to get the length of each element of the
list.
In [37]:
# Your code here
Next, we will explore dataframes and something new called a ‘tibble’. We’ll see how to manipulate these structures, and then we will talk about the new generation of mapping functions.