{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading tidyverse: ggplot2\n",
      "Loading tidyverse: tibble\n",
      "Loading tidyverse: tidyr\n",
      "Loading tidyverse: readr\n",
      "Loading tidyverse: purrr\n",
      "Loading tidyverse: dplyr\n",
      "Conflicts with tidy packages ---------------------------------------------------\n",
      "filter(): dplyr, stats\n",
      "lag():    dplyr, stats\n"
     ]
    }
   ],
   "source": [
    "library(tidyverse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom Functions in R"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is a function?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A function, in the programming sense, is a bit of code that is organized and named so that it may be executed by simply executing the name (referred to as 'calling' a function). Often, functions take some values as input and return output values.\n",
    "\n",
    "You have already met functions in R. The command 'sum' is a function. It takes as arguments a list of objects and adds them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "7"
      ],
      "text/latex": [
       "7"
      ],
      "text/markdown": [
       "7"
      ],
      "text/plain": [
       "[1] 7"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sum(1,2,4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ``sum`` function takes a list of numbers as arguments (inputs), adds them together and returns the result."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While there are many, many built-in functions in R, it is often desirable to write your own. It's also helpful to be familiar with the structure of functions, to better understand how function calls work. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Anatomy of a function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The R function has the following structure\n",
    "\n",
    "```R\n",
    "name <- function(arg1, arg2, ...) { # name of function and arguments (inputs) it takes\n",
    "    body_of_function                # code that (presumably) manipulates arguments\n",
    "    return(value)                   # What the function returns (result of manipulations\n",
    "    }\n",
    "```\n",
    "\n",
    "A function is created using the `function` keyword, followed by a series of arguments in parentheses. The main work done by the function is enclosed witin curly braces, and a `return` function is used to indicate the output of the function. Finally we can assign the funciton, just like any other R object, to a named variable for later use."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Our first custom function\n",
    "\n",
    "Let's write a function to calculate the mean of a vector of numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "MyMean <- function(xs) {\n",
    "    n <- length(xs)\n",
    "    return(sum(xs/n))\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "5.5"
      ],
      "text/latex": [
       "5.5"
      ],
      "text/markdown": [
       "5.5"
      ],
      "text/plain": [
       "[1] 5.5"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "MyMean(1:10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a lot details to writing custom functions, such as specifying default values, position of arguments, etc. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## Another function example - with default arguments\n",
    "\n",
    "WeightAvg <- function(xs, weights = rep(1,length(xs))) {\n",
    "    \n",
    "    n <- length(xs)\n",
    "    weighted_vector <- xs*weights\n",
    "    return(sum(weighted_vector)/n)\n",
    "    \n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "4"
      ],
      "text/latex": [
       "4"
      ],
      "text/markdown": [
       "4"
      ],
      "text/plain": [
       "[1] 4"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "xs <- c(1,2,3,4,5,6,7)\n",
    "\n",
    "WeightAvg(xs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "8.71428571428571"
      ],
      "text/latex": [
       "8.71428571428571"
      ],
      "text/markdown": [
       "8.71428571428571"
      ],
      "text/plain": [
       "[1] 8.714286"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "weights <- c(2,3,2,3,2,3,1)\n",
    "\n",
    "## Note that the order of arguments doesn't matter, because we have labeled the default.\n",
    "\n",
    "WeightAvg(weights = weights, xs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "## Order matters when arguments aren't labeled.\n",
    "\n",
    "ShowArgs <- function(xs, weights = rep(1,length(xs))) {\n",
    "    \n",
    "    print(weights)\n",
    "    print(xs)\n",
    "    \n",
    "}\n",
    "\n",
    "# Note that the above function does not return anything"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 2 3 2 3 2 3 1\n",
      "[1] 1 2 3 4 5 6 7\n"
     ]
    }
   ],
   "source": [
    "ShowArgs(xs,weights)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 1 2 3 4 5 6 7\n",
      "[1] 2 3 2 3 2 3 1\n"
     ]
    }
   ],
   "source": [
    "ShowArgs(weights,xs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 2 3 2 3 2 3 1\n",
      "[1] 1 2 3 4 5 6 7\n"
     ]
    }
   ],
   "source": [
    "ShowArgs(weights = weights,xs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Write a function `MySd` that takes a vector of numbers and returns their standard deviation as calculated from the following formula:\n",
    "$$\n",
    "\\sqrt{\\frac{\\sum{(x - \\bar{x})^2}}{n-1}}\n",
    "$$\n",
    "where $x$ is some vector of numbers, $\\bar{x}$ is the mean of $x$ and $n$ is th number of elements in $x$. \n",
    "\n",
    "    What is the standard deviation of `1:10`?<br><br>\n",
    "    \n",
    "2. Write a function that takes a matrix and multiplies it by a vector (for those who don't know about matrix multiplication, an in-class example will be provided). Provide the identity matrix as a default: diag(rep(1,length(v))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Mapping with apply\n",
    "\n",
    "R provides tools for performing vectorized operations with nearly any function. We will first talk about the *apply group, but the more modern version of these are part of the 'tidyverse' in the package 'purr'. More on those later...\n",
    "\n",
    "Suppose I have a matrix, and I would like to obtain a sum of each column, and store that in a vector. One way to do this is to simply loop over the columns using a ``for`` loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] -2.0673368  3.3092049 -1.3901258 -5.1382279  0.8350368 -0.7088264\n"
     ]
    }
   ],
   "source": [
    "my_matrix <- matrix(rnorm(30), ncol = 6, nrow = 5)\n",
    "\n",
    "sums <- rep(0,6)\n",
    "\n",
    "for (i in 1:6){\n",
    "    sums[i] <- sum(my_matrix[,i])\n",
    "}\n",
    "\n",
    "print(sums)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But there is another way, that is better for various reasons:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] -2.0673368  3.3092049 -1.3901258 -5.1382279  0.8350368 -0.7088264\n"
     ]
    }
   ],
   "source": [
    "sums <- apply(FUN = sum, X = my_matrix, MARGIN = 2)\n",
    "\n",
    "print(sums)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we can easily change this to a row sum:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]  0.6588709 -1.5304585 -3.0984142  0.6073535 -1.7976270\n"
     ]
    }
   ],
   "source": [
    "sums <- apply(FUN = sum, X = my_matrix, MARGIN = 1)\n",
    "\n",
    "print(sums)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a number of things that make this version better than the ``for`` loop. There is less code, with fewer variables that need to be specified. \n",
    "\n",
    "-  In the ``for`` loop, we need to somehow include the dimension, so that the loop knows how many interations to perform.\n",
    "\n",
    "-  In the ``apply`` version, We can easily convert the column sum into a row sum and we do not need to specify dimension in either case.\n",
    "\n",
    "- The ``apply`` code can be easily parallelized (this is beyond our scope, but something to keep in mind).\n",
    "\n",
    "- The less code we write, the less chance for bugs to crop up."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that ``apply`` works for user generated functions as well:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ol class=list-inline>\n",
       "\t<li>0.109811818585238</li>\n",
       "\t<li>-0.255076413886755</li>\n",
       "\t<li>-0.516402373949533</li>\n",
       "\t<li>0.101225588404194</li>\n",
       "\t<li>-0.299604492013341</li>\n",
       "</ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 0.109811818585238\n",
       "\\item -0.255076413886755\n",
       "\\item -0.516402373949533\n",
       "\\item 0.101225588404194\n",
       "\\item -0.299604492013341\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 0.109811818585238\n",
       "2. -0.255076413886755\n",
       "3. -0.516402373949533\n",
       "4. 0.101225588404194\n",
       "5. -0.299604492013341\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1]  0.1098118 -0.2550764 -0.5164024  0.1012256 -0.2996045"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "apply(MyMean, X = my_matrix, MARGIN = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ol class=list-inline>\n",
       "\t<li>-0.41346736204718</li>\n",
       "\t<li>0.661840972673175</li>\n",
       "\t<li>-0.278025169001729</li>\n",
       "\t<li>-1.02764557513093</li>\n",
       "\t<li>0.167007368691562</li>\n",
       "\t<li>-0.141765282617135</li>\n",
       "</ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item -0.41346736204718\n",
       "\\item 0.661840972673175\n",
       "\\item -0.278025169001729\n",
       "\\item -1.02764557513093\n",
       "\\item 0.167007368691562\n",
       "\\item -0.141765282617135\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. -0.41346736204718\n",
       "2. 0.661840972673175\n",
       "3. -0.278025169001729\n",
       "4. -1.02764557513093\n",
       "5. 0.167007368691562\n",
       "6. -0.141765282617135\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1] -0.4134674  0.6618410 -0.2780252 -1.0276456  0.1670074 -0.1417653"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "apply(MyMean, X = my_matrix, MARGIN = 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### lapply, sapply, et al.\n",
    "\n",
    "There are a number of variants of the ``apply`` function that differ on the data types they return or take as input. The ``lapply`` function works on lists (as opposed to 2 dimensional arrays such as matrices) and returns a list object. The ``sapply`` function is the same as ``lapply``, but it attempts to simplify output to be the simplest possible class (e.g. a list of numerics is returned as a vector, rather than a list)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples\n",
    "\n",
    "1. Create a matrix using ``rnorm`` with 10 rows and 15 columns. Use ``apply`` to obtain the median of rows and then over the columns.\n",
    "\n",
    "2. Creat the following list:\n",
    "   x <- list(a = rep(1,7), b = 1:3, c = 10:100) \n",
    "   \n",
    "   Use ``lapply`` and ``sapply`` to get the length of each element of the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will explore dataframes and something new called a 'tibble'. We'll see how to manipulate these structures, and then we will talk about the new generation of mapping functions."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.3.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}