{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python is not R\n",
    "\n",
    "Despite superficial similarities, Python is a very different language than R, and the `pandas` DataFrame is not the R `data.frame`. Use Python idioms when coding Python, and R ones when coding R - if you first think of how you would do it in one language and then mentally \"port\" it to another language, you are likely to end up with inefficient code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%load_ext rpy2.ipython"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df = %R Puromycin\n",
    "iris = %R iris"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 01**: In exercise A of Python09\n",
    " \n",
    "The question is to show the types of each column\n",
    "```python\n",
    "Puromycin.dtypes\n",
    "```\n",
    "conc     float64\n",
    "rate     float64\n",
    "state     object\n",
    "dtype: object\n",
    " \n",
    "At first, I tried the code as follow. The result is not what I expected, but I do not know the reason.\n",
    "```python\n",
    "Puromycin.apply(np.dtype)\n",
    "Puromycin.apply(lambda x:x.dtypes)\n",
    "```\n",
    "conc     object\n",
    "rate     object\n",
    "state    object\n",
    "dtype: object\n",
    "\n",
    "\n",
    "**Response**\n",
    "\n",
    "I am not sure why this is happening - something to do with how pandas guesses whether to return a Series or a DataFrame. \n",
    "\n",
    "```\n",
    "reduce : boolean or None, default None\n",
    "    Try to apply reduction procedures. If the DataFrame is empty,\n",
    "    apply will use reduce to determine whether the result should be a\n",
    "    Series or a DataFrame. If reduce is None (the default), apply's\n",
    "    return value will be guessed by calling func an empty Series (note:\n",
    "    while guessing, exceptions raised by func will be ignored). If\n",
    "    reduce is True a Series will always be returned, and if False a\n",
    "    DataFrame will always be returned.\n",
    "```\n",
    "\n",
    "The issue does not arise if the DataFrame does not contain string columns or if you set reduce=False."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "conc    float64\n",
       "rate    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['conc', 'rate']].apply(np.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "conc     float64\n",
       "rate     float64\n",
       "state     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(np.dtype, reduce=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 02**: I am confuse the function applymap with the function pipe?\n",
    " \n",
    "For example, I have execute the following code and those function return the same results. It seems that those functions are just element-wise operations.\n",
    "```python\n",
    "df+1\n",
    "df.applymap(lambda x:x+1)\n",
    "df.pipe(lambda x:x+1)\n",
    "```\n",
    " \n",
    "**Response** \n",
    "\n",
    "`applymap` does perform an element-wise operation. It is useful when the element-wise operation is not one that can be accomplished using broadcasting.\n",
    "\n",
    "`pipe` just allows chaining of an arbitrary custom function. It need not be an element-wise function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>conc</th>\n",
       "      <th>rate</th>\n",
       "      <th>state</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.02</td>\n",
       "      <td>76.0</td>\n",
       "      <td>TREATED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.02</td>\n",
       "      <td>47.0</td>\n",
       "      <td>TREATED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.06</td>\n",
       "      <td>97.0</td>\n",
       "      <td>TREATED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.06</td>\n",
       "      <td>107.0</td>\n",
       "      <td>TREATED</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.11</td>\n",
       "      <td>123.0</td>\n",
       "      <td>TREATED</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   conc   rate    state\n",
       "1  0.02   76.0  TREATED\n",
       "2  0.02   47.0  TREATED\n",
       "3  0.06   97.0  TREATED\n",
       "4  0.06  107.0  TREATED\n",
       "5  0.11  123.0  TREATED"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.applymap(lambda x: str(x).upper()).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pipe(lambda x: 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 03**: Is it able to perform melt and pivot using chain or pipeline?\n",
    " \n",
    "For example, in R, I could\n",
    "```\n",
    "# in R Kernel\n",
    "library(dplyr)\n",
    "library(tidyr)\n",
    " \n",
    "iris %>%\n",
    "    gather(Sample, Value, -Species) %>%   # wide -> long\n",
    "    group_by(Species) %>%                  # group by Species\n",
    "    summarise(Mean = mean(Value)) %>%     # summarise mean value\n",
    "select(-Species)                        # remove a column\n",
    "```\n",
    "Mean\n",
    "2.5355\n",
    "3.5730\n",
    "4.2850\n",
    " \n",
    "However, using python pandas to rewrite above R code, it is not as straightforward.\n",
    "```\n",
    "# in Python kernel\n",
    "df = %R iris\n",
    "df = pd.melt(df,\n",
    "var_name=\"Sample\",\n",
    "value_name=\"Value\",\n",
    "id_vars=[\"Species\"])\n",
    " \n",
    "df = (df\n",
    "     .groupby(\"Species\")\n",
    "     .apply(np.mean))\n",
    " \n",
    "df = df.reset_index()\n",
    "df.drop(“Species”)\n",
    "```\n",
    "\n",
    "**Response**\n",
    "\n",
    "In `pandas` `melt` is a function, not a method, so you cannot easily chain it. The operation above can be done much simpler with the following code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Species</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>setosa</th>\n",
       "      <td>2.5355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>versicolor</th>\n",
       "      <td>3.5730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>virginica</th>\n",
       "      <td>4.2850</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             value\n",
       "Species           \n",
       "setosa      2.5355\n",
       "versicolor  3.5730\n",
       "virginica   4.2850"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.melt(iris, 'Species').groupby('Species').mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is possible to code R in Python if you really want to, but the code is insane."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.5355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.5730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.2850</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Value\n",
       "0  2.5355\n",
       "1  3.5730\n",
       "2  4.2850"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(\n",
    "    iris.\n",
    "    pipe(lambda x: pd.melt(x, id_vars ='Species', \n",
    "                           var_name='Sample', \n",
    "                           value_name='Value')).\n",
    "    groupby('Species').apply(np.mean).\n",
    "    reset_index().\n",
    "    drop('Species', axis=1)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 04**: apply to each column when the applied function returns different number of items\n",
    " \n",
    "For example, in R,\n",
    "```\n",
    "apply(Puromycin, 2, unique)\n",
    "```\n",
    "$conc\n",
    "[1] \"0.02\" \"0.06\" \"0.11\" \"0.22\" \"0.56\" \"1.10\"\n",
    " \n",
    "$rate\n",
    "[1] \" 76\" \" 47\" \" 97\" \"107\" \"123\" \"139\" \"159\" \"152\" \"191\" \"201\" \"207\" \"200\" \" 67\" \" 51\" \" 84\" \" 86\" \" 98\" \"115\"\n",
    "[19] \"131\" \"124\" \"144\" \"158\" \"160\"\n",
    " \n",
    "$state\n",
    "[1] \"treated\"   \"untreated\"\n",
    " \n",
    " \n",
    "However in python, it returns an error. Is there any way to perform the same R task using Pandas.\n",
    "```\n",
    "Puromycin.apply(pd.unique)\n",
    "```\n",
    "\n",
    "**Response**\n",
    "\n",
    "The return value should be a `pandas` DataFrame, which means each column must have the same length. You can coerce it to do this with the code shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>conc</th>\n",
       "      <th>rate</th>\n",
       "      <th>state</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.02</td>\n",
       "      <td>76.0</td>\n",
       "      <td>treated</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.06</td>\n",
       "      <td>47.0</td>\n",
       "      <td>untreated</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.11</td>\n",
       "      <td>97.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.22</td>\n",
       "      <td>107.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.56</td>\n",
       "      <td>123.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   conc   rate      state\n",
       "0  0.02   76.0    treated\n",
       "1  0.06   47.0  untreated\n",
       "2  0.11   97.0        NaN\n",
       "3  0.22  107.0        NaN\n",
       "4  0.56  123.0        NaN"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(lambda x: pd.Series(pd.unique(x))).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 05**: import R packages in using rmagic\n",
    " \n",
    "If I change to R kernel, I am able to load r packages in the cell. However, if I use R magic, error occurs when trying the load r packages. I tried to search the web but did not find the solution to load r packages under R magic.\n",
    " \n",
    "Example:\n",
    "```\n",
    "%%R\n",
    "library(dplyr)\n",
    "```\n",
    "\n",
    "**Response**\n",
    "\n",
    "`rmagic` and ohter magics really only work wiht the Python kernels. Kernels such as R are not writtten by the Jyupyter team and typically do not offer the same set of magic functions. To run the above code, you need to make sure that the **cell** is running the Python kernel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "library(dplyr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}