{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python is not R\n", "\n", "Despite superficial similarities, Python is a very different language than R, and the `pandas` DataFrame is not the R `data.frame`. Use Python idioms when coding Python, and R ones when coding R - if you first think of how you would do it in one language and then mentally \"port\" it to another language, you are likely to end up with inefficient code." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%load_ext rpy2.ipython" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = %R Puromycin\n", "iris = %R iris" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 01**: In exercise A of Python09\n", " \n", "The question is to show the types of each column\n", "```python\n", "Puromycin.dtypes\n", "```\n", "conc float64\n", "rate float64\n", "state object\n", "dtype: object\n", " \n", "At first, I tried the code as follow. The result is not what I expected, but I do not know the reason.\n", "```python\n", "Puromycin.apply(np.dtype)\n", "Puromycin.apply(lambda x:x.dtypes)\n", "```\n", "conc object\n", "rate object\n", "state object\n", "dtype: object\n", "\n", "\n", "**Response**\n", "\n", "I am not sure why this is happening - something to do with how pandas guesses whether to return a Series or a DataFrame. \n", "\n", "```\n", "reduce : boolean or None, default None\n", " Try to apply reduction procedures. If the DataFrame is empty,\n", " apply will use reduce to determine whether the result should be a\n", " Series or a DataFrame. If reduce is None (the default), apply's\n", " return value will be guessed by calling func an empty Series (note:\n", " while guessing, exceptions raised by func will be ignored). If\n", " reduce is True a Series will always be returned, and if False a\n", " DataFrame will always be returned.\n", "```\n", "\n", "The issue does not arise if the DataFrame does not contain string columns or if you set reduce=False." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "conc float64\n", "rate float64\n", "dtype: object" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['conc', 'rate']].apply(np.dtype)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "conc float64\n", "rate float64\n", "state object\n", "dtype: object" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.apply(np.dtype, reduce=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 02**: I am confuse the function applymap with the function pipe?\n", " \n", "For example, I have execute the following code and those function return the same results. It seems that those functions are just element-wise operations.\n", "```python\n", "df+1\n", "df.applymap(lambda x:x+1)\n", "df.pipe(lambda x:x+1)\n", "```\n", " \n", "**Response** \n", "\n", "`applymap` does perform an element-wise operation. It is useful when the element-wise operation is not one that can be accomplished using broadcasting.\n", "\n", "`pipe` just allows chaining of an arbitrary custom function. It need not be an element-wise function." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
concratestate
10.0276.0TREATED
20.0247.0TREATED
30.0697.0TREATED
40.06107.0TREATED
50.11123.0TREATED
\n", "
" ], "text/plain": [ " conc rate state\n", "1 0.02 76.0 TREATED\n", "2 0.02 47.0 TREATED\n", "3 0.06 97.0 TREATED\n", "4 0.06 107.0 TREATED\n", "5 0.11 123.0 TREATED" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.applymap(lambda x: str(x).upper()).head()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pipe(lambda x: 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 03**: Is it able to perform melt and pivot using chain or pipeline?\n", " \n", "For example, in R, I could\n", "```\n", "# in R Kernel\n", "library(dplyr)\n", "library(tidyr)\n", " \n", "iris %>%\n", " gather(Sample, Value, -Species) %>% # wide -> long\n", " group_by(Species) %>% # group by Species\n", " summarise(Mean = mean(Value)) %>% # summarise mean value\n", "select(-Species) # remove a column\n", "```\n", "Mean\n", "2.5355\n", "3.5730\n", "4.2850\n", " \n", "However, using python pandas to rewrite above R code, it is not as straightforward.\n", "```\n", "# in Python kernel\n", "df = %R iris\n", "df = pd.melt(df,\n", "var_name=\"Sample\",\n", "value_name=\"Value\",\n", "id_vars=[\"Species\"])\n", " \n", "df = (df\n", " .groupby(\"Species\")\n", " .apply(np.mean))\n", " \n", "df = df.reset_index()\n", "df.drop(“Species”)\n", "```\n", "\n", "**Response**\n", "\n", "In `pandas` `melt` is a function, not a method, so you cannot easily chain it. The operation above can be done much simpler with the following code" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
Species
setosa2.5355
versicolor3.5730
virginica4.2850
\n", "
" ], "text/plain": [ " value\n", "Species \n", "setosa 2.5355\n", "versicolor 3.5730\n", "virginica 4.2850" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.melt(iris, 'Species').groupby('Species').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to code R in Python if you really want to, but the code is insane." ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Value
02.5355
13.5730
24.2850
\n", "
" ], "text/plain": [ " Value\n", "0 2.5355\n", "1 3.5730\n", "2 4.2850" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " iris.\n", " pipe(lambda x: pd.melt(x, id_vars ='Species', \n", " var_name='Sample', \n", " value_name='Value')).\n", " groupby('Species').apply(np.mean).\n", " reset_index().\n", " drop('Species', axis=1)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 04**: apply to each column when the applied function returns different number of items\n", " \n", "For example, in R,\n", "```\n", "apply(Puromycin, 2, unique)\n", "```\n", "$conc\n", "[1] \"0.02\" \"0.06\" \"0.11\" \"0.22\" \"0.56\" \"1.10\"\n", " \n", "$rate\n", "[1] \" 76\" \" 47\" \" 97\" \"107\" \"123\" \"139\" \"159\" \"152\" \"191\" \"201\" \"207\" \"200\" \" 67\" \" 51\" \" 84\" \" 86\" \" 98\" \"115\"\n", "[19] \"131\" \"124\" \"144\" \"158\" \"160\"\n", " \n", "$state\n", "[1] \"treated\" \"untreated\"\n", " \n", " \n", "However in python, it returns an error. Is there any way to perform the same R task using Pandas.\n", "```\n", "Puromycin.apply(pd.unique)\n", "```\n", "\n", "**Response**\n", "\n", "The return value should be a `pandas` DataFrame, which means each column must have the same length. You can coerce it to do this with the code shown below." ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
concratestate
00.0276.0treated
10.0647.0untreated
20.1197.0NaN
30.22107.0NaN
40.56123.0NaN
\n", "
" ], "text/plain": [ " conc rate state\n", "0 0.02 76.0 treated\n", "1 0.06 47.0 untreated\n", "2 0.11 97.0 NaN\n", "3 0.22 107.0 NaN\n", "4 0.56 123.0 NaN" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.apply(lambda x: pd.Series(pd.unique(x))).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 05**: import R packages in using rmagic\n", " \n", "If I change to R kernel, I am able to load r packages in the cell. However, if I use R magic, error occurs when trying the load r packages. I tried to search the web but did not find the solution to load r packages under R magic.\n", " \n", "Example:\n", "```\n", "%%R\n", "library(dplyr)\n", "```\n", "\n", "**Response**\n", "\n", "`rmagic` and ohter magics really only work wiht the Python kernels. Kernels such as R are not writtten by the Jyupyter team and typically do not offer the same set of magic functions. To run the above code, you need to make sure that the **cell** is running the Python kernel." ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%R\n", "library(dplyr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }