{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading tidyverse: ggplot2\n",
"Loading tidyverse: tibble\n",
"Loading tidyverse: tidyr\n",
"Loading tidyverse: readr\n",
"Loading tidyverse: purrr\n",
"Loading tidyverse: dplyr\n",
"Conflicts with tidy packages ---------------------------------------------------\n",
"filter(): dplyr, stats\n",
"lag(): dplyr, stats\n"
]
}
],
"source": [
"library(tidyverse)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Custom Functions in R"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is a function?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A function, in the programming sense, is a bit of code that is organized and named so that it may be executed by simply executing the name (referred to as 'calling' a function). Often, functions take some values as input and return output values.\n",
"\n",
"You have already met functions in R. The command 'sum' is a function. It takes as arguments a list of objects and adds them:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"7"
],
"text/latex": [
"7"
],
"text/markdown": [
"7"
],
"text/plain": [
"[1] 7"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sum(1,2,4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``sum`` function takes a list of numbers as arguments (inputs), adds them together and returns the result."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While there are many, many built-in functions in R, it is often desirable to write your own. It's also helpful to be familiar with the structure of functions, to better understand how function calls work. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Anatomy of a function"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The R function has the following structure\n",
"\n",
"```R\n",
"name <- function(arg1, arg2, ...) { # name of function and arguments (inputs) it takes\n",
" body_of_function # code that (presumably) manipulates arguments\n",
" return(value) # What the function returns (result of manipulations\n",
" }\n",
"```\n",
"\n",
"A function is created using the `function` keyword, followed by a series of arguments in parentheses. The main work done by the function is enclosed witin curly braces, and a `return` function is used to indicate the output of the function. Finally we can assign the funciton, just like any other R object, to a named variable for later use."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Our first custom function\n",
"\n",
"Let's write a function to calculate the mean of a vector of numbers."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"MyMean <- function(xs) {\n",
" n <- length(xs)\n",
" return(sum(xs/n))\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"5.5"
],
"text/latex": [
"5.5"
],
"text/markdown": [
"5.5"
],
"text/plain": [
"[1] 5.5"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"MyMean(1:10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are a lot details to writing custom functions, such as specifying default values, position of arguments, etc. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"## Another function example - with default arguments\n",
"\n",
"WeightAvg <- function(xs, weights = rep(1,length(xs))) {\n",
" \n",
" n <- length(xs)\n",
" weighted_vector <- xs*weights\n",
" return(sum(weighted_vector)/n)\n",
" \n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"4"
],
"text/latex": [
"4"
],
"text/markdown": [
"4"
],
"text/plain": [
"[1] 4"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"xs <- c(1,2,3,4,5,6,7)\n",
"\n",
"WeightAvg(xs)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"8.71428571428571"
],
"text/latex": [
"8.71428571428571"
],
"text/markdown": [
"8.71428571428571"
],
"text/plain": [
"[1] 8.714286"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"weights <- c(2,3,2,3,2,3,1)\n",
"\n",
"## Note that the order of arguments doesn't matter, because we have labeled the default.\n",
"\n",
"WeightAvg(weights = weights, xs)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## Order matters when arguments aren't labeled.\n",
"\n",
"ShowArgs <- function(xs, weights = rep(1,length(xs))) {\n",
" \n",
" print(weights)\n",
" print(xs)\n",
" \n",
"}\n",
"\n",
"# Note that the above function does not return anything"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 2 3 2 3 2 3 1\n",
"[1] 1 2 3 4 5 6 7\n"
]
}
],
"source": [
"ShowArgs(xs,weights)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 1 2 3 4 5 6 7\n",
"[1] 2 3 2 3 2 3 1\n"
]
}
],
"source": [
"ShowArgs(weights,xs)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 2 3 2 3 2 3 1\n",
"[1] 1 2 3 4 5 6 7\n"
]
}
],
"source": [
"ShowArgs(weights = weights,xs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Write a function `MySd` that takes a vector of numbers and returns their standard deviation as calculated from the following formula:\n",
"$$\n",
"\\sqrt{\\frac{\\sum{(x - \\bar{x})^2}}{n-1}}\n",
"$$\n",
"where $x$ is some vector of numbers, $\\bar{x}$ is the mean of $x$ and $n$ is th number of elements in $x$. \n",
"\n",
" What is the standard deviation of `1:10`?
\n",
" \n",
"2. Write a function that takes a matrix and multiplies it by a vector (for those who don't know about matrix multiplication, an in-class example will be provided). Provide the identity matrix as a default: diag(rep(1,length(v))."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Mapping with apply\n",
"\n",
"R provides tools for performing vectorized operations with nearly any function. We will first talk about the *apply group, but the more modern version of these are part of the 'tidyverse' in the package 'purr'. More on those later...\n",
"\n",
"Suppose I have a matrix, and I would like to obtain a sum of each column, and store that in a vector. One way to do this is to simply loop over the columns using a ``for`` loop:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] -2.0673368 3.3092049 -1.3901258 -5.1382279 0.8350368 -0.7088264\n"
]
}
],
"source": [
"my_matrix <- matrix(rnorm(30), ncol = 6, nrow = 5)\n",
"\n",
"sums <- rep(0,6)\n",
"\n",
"for (i in 1:6){\n",
" sums[i] <- sum(my_matrix[,i])\n",
"}\n",
"\n",
"print(sums)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But there is another way, that is better for various reasons:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] -2.0673368 3.3092049 -1.3901258 -5.1382279 0.8350368 -0.7088264\n"
]
}
],
"source": [
"sums <- apply(FUN = sum, X = my_matrix, MARGIN = 2)\n",
"\n",
"print(sums)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we can easily change this to a row sum:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.6588709 -1.5304585 -3.0984142 0.6073535 -1.7976270\n"
]
}
],
"source": [
"sums <- apply(FUN = sum, X = my_matrix, MARGIN = 1)\n",
"\n",
"print(sums)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are a number of things that make this version better than the ``for`` loop. There is less code, with fewer variables that need to be specified. \n",
"\n",
"- In the ``for`` loop, we need to somehow include the dimension, so that the loop knows how many interations to perform.\n",
"\n",
"- In the ``apply`` version, We can easily convert the column sum into a row sum and we do not need to specify dimension in either case.\n",
"\n",
"- The ``apply`` code can be easily parallelized (this is beyond our scope, but something to keep in mind).\n",
"\n",
"- The less code we write, the less chance for bugs to crop up."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that ``apply`` works for user generated functions as well:\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"