{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading tidyverse: ggplot2\n", "Loading tidyverse: tibble\n", "Loading tidyverse: tidyr\n", "Loading tidyverse: readr\n", "Loading tidyverse: purrr\n", "Loading tidyverse: dplyr\n", "Conflicts with tidy packages ---------------------------------------------------\n", "filter(): dplyr, stats\n", "lag(): dplyr, stats\n" ] } ], "source": [ "library(tidyverse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Custom Functions in R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is a function?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function, in the programming sense, is a bit of code that is organized and named so that it may be executed by simply executing the name (referred to as 'calling' a function). Often, functions take some values as input and return output values.\n", "\n", "You have already met functions in R. The command 'sum' is a function. It takes as arguments a list of objects and adds them:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "7" ], "text/latex": [ "7" ], "text/markdown": [ "7" ], "text/plain": [ "[1] 7" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sum(1,2,4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``sum`` function takes a list of numbers as arguments (inputs), adds them together and returns the result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While there are many, many built-in functions in R, it is often desirable to write your own. It's also helpful to be familiar with the structure of functions, to better understand how function calls work. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Anatomy of a function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The R function has the following structure\n", "\n", "```R\n", "name <- function(arg1, arg2, ...) { # name of function and arguments (inputs) it takes\n", " body_of_function # code that (presumably) manipulates arguments\n", " return(value) # What the function returns (result of manipulations\n", " }\n", "```\n", "\n", "A function is created using the `function` keyword, followed by a series of arguments in parentheses. The main work done by the function is enclosed witin curly braces, and a `return` function is used to indicate the output of the function. Finally we can assign the funciton, just like any other R object, to a named variable for later use." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Our first custom function\n", "\n", "Let's write a function to calculate the mean of a vector of numbers." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "MyMean <- function(xs) {\n", " n <- length(xs)\n", " return(sum(xs/n))\n", "}" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "5.5" ], "text/latex": [ "5.5" ], "text/markdown": [ "5.5" ], "text/plain": [ "[1] 5.5" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "MyMean(1:10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a lot details to writing custom functions, such as specifying default values, position of arguments, etc. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Another function example - with default arguments\n", "\n", "WeightAvg <- function(xs, weights = rep(1,length(xs))) {\n", " \n", " n <- length(xs)\n", " weighted_vector <- xs*weights\n", " return(sum(weighted_vector)/n)\n", " \n", "}" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "4" ], "text/latex": [ "4" ], "text/markdown": [ "4" ], "text/plain": [ "[1] 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "xs <- c(1,2,3,4,5,6,7)\n", "\n", "WeightAvg(xs)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "8.71428571428571" ], "text/latex": [ "8.71428571428571" ], "text/markdown": [ "8.71428571428571" ], "text/plain": [ "[1] 8.714286" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "weights <- c(2,3,2,3,2,3,1)\n", "\n", "## Note that the order of arguments doesn't matter, because we have labeled the default.\n", "\n", "WeightAvg(weights = weights, xs)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Order matters when arguments aren't labeled.\n", "\n", "ShowArgs <- function(xs, weights = rep(1,length(xs))) {\n", " \n", " print(weights)\n", " print(xs)\n", " \n", "}\n", "\n", "# Note that the above function does not return anything" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 2 3 2 3 2 3 1\n", "[1] 1 2 3 4 5 6 7\n" ] } ], "source": [ "ShowArgs(xs,weights)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 1 2 3 4 5 6 7\n", "[1] 2 3 2 3 2 3 1\n" ] } ], "source": [ "ShowArgs(weights,xs)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 2 3 2 3 2 3 1\n", "[1] 1 2 3 4 5 6 7\n" ] } ], "source": [ "ShowArgs(weights = weights,xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Write a function `MySd` that takes a vector of numbers and returns their standard deviation as calculated from the following formula:\n", "$$\n", "\\sqrt{\\frac{\\sum{(x - \\bar{x})^2}}{n-1}}\n", "$$\n", "where $x$ is some vector of numbers, $\\bar{x}$ is the mean of $x$ and $n$ is th number of elements in $x$. \n", "\n", " What is the standard deviation of `1:10`?

\n", " \n", "2. Write a function that takes a matrix and multiplies it by a vector (for those who don't know about matrix multiplication, an in-class example will be provided). Provide the identity matrix as a default: diag(rep(1,length(v))." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Mapping with apply\n", "\n", "R provides tools for performing vectorized operations with nearly any function. We will first talk about the *apply group, but the more modern version of these are part of the 'tidyverse' in the package 'purr'. More on those later...\n", "\n", "Suppose I have a matrix, and I would like to obtain a sum of each column, and store that in a vector. One way to do this is to simply loop over the columns using a ``for`` loop:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] -2.0673368 3.3092049 -1.3901258 -5.1382279 0.8350368 -0.7088264\n" ] } ], "source": [ "my_matrix <- matrix(rnorm(30), ncol = 6, nrow = 5)\n", "\n", "sums <- rep(0,6)\n", "\n", "for (i in 1:6){\n", " sums[i] <- sum(my_matrix[,i])\n", "}\n", "\n", "print(sums)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But there is another way, that is better for various reasons:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] -2.0673368 3.3092049 -1.3901258 -5.1382279 0.8350368 -0.7088264\n" ] } ], "source": [ "sums <- apply(FUN = sum, X = my_matrix, MARGIN = 2)\n", "\n", "print(sums)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we can easily change this to a row sum:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.6588709 -1.5304585 -3.0984142 0.6073535 -1.7976270\n" ] } ], "source": [ "sums <- apply(FUN = sum, X = my_matrix, MARGIN = 1)\n", "\n", "print(sums)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a number of things that make this version better than the ``for`` loop. There is less code, with fewer variables that need to be specified. \n", "\n", "- In the ``for`` loop, we need to somehow include the dimension, so that the loop knows how many interations to perform.\n", "\n", "- In the ``apply`` version, We can easily convert the column sum into a row sum and we do not need to specify dimension in either case.\n", "\n", "- The ``apply`` code can be easily parallelized (this is beyond our scope, but something to keep in mind).\n", "\n", "- The less code we write, the less chance for bugs to crop up." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that ``apply`` works for user generated functions as well:\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 0.109811818585238
  2. \n", "\t
  3. -0.255076413886755
  4. \n", "\t
  5. -0.516402373949533
  6. \n", "\t
  7. 0.101225588404194
  8. \n", "\t
  9. -0.299604492013341
  10. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 0.109811818585238\n", "\\item -0.255076413886755\n", "\\item -0.516402373949533\n", "\\item 0.101225588404194\n", "\\item -0.299604492013341\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 0.109811818585238\n", "2. -0.255076413886755\n", "3. -0.516402373949533\n", "4. 0.101225588404194\n", "5. -0.299604492013341\n", "\n", "\n" ], "text/plain": [ "[1] 0.1098118 -0.2550764 -0.5164024 0.1012256 -0.2996045" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "apply(MyMean, X = my_matrix, MARGIN = 1)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. -0.41346736204718
  2. \n", "\t
  3. 0.661840972673175
  4. \n", "\t
  5. -0.278025169001729
  6. \n", "\t
  7. -1.02764557513093
  8. \n", "\t
  9. 0.167007368691562
  10. \n", "\t
  11. -0.141765282617135
  12. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item -0.41346736204718\n", "\\item 0.661840972673175\n", "\\item -0.278025169001729\n", "\\item -1.02764557513093\n", "\\item 0.167007368691562\n", "\\item -0.141765282617135\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. -0.41346736204718\n", "2. 0.661840972673175\n", "3. -0.278025169001729\n", "4. -1.02764557513093\n", "5. 0.167007368691562\n", "6. -0.141765282617135\n", "\n", "\n" ], "text/plain": [ "[1] -0.4134674 0.6618410 -0.2780252 -1.0276456 0.1670074 -0.1417653" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "apply(MyMean, X = my_matrix, MARGIN = 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### lapply, sapply, et al.\n", "\n", "There are a number of variants of the ``apply`` function that differ on the data types they return or take as input. The ``lapply`` function works on lists (as opposed to 2 dimensional arrays such as matrices) and returns a list object. The ``sapply`` function is the same as ``lapply``, but it attempts to simplify output to be the simplest possible class (e.g. a list of numerics is returned as a vector, rather than a list)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples\n", "\n", "1. Create a matrix using ``rnorm`` with 10 rows and 15 columns. Use ``apply`` to obtain the median of rows and then over the columns.\n", "\n", "2. Creat the following list:\n", " x <- list(a = rep(1,7), b = 1:3, c = 10:100) \n", " \n", " Use ``lapply`` and ``sapply`` to get the length of each element of the list." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will explore dataframes and something new called a 'tibble'. We'll see how to manipulate these structures, and then we will talk about the new generation of mapping functions." ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.3.1" } }, "nbformat": 4, "nbformat_minor": 0 }