{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python is not R\n",
"\n",
"Despite superficial similarities, Python is a very different language than R, and the `pandas` DataFrame is not the R `data.frame`. Use Python idioms when coding Python, and R ones when coding R - if you first think of how you would do it in one language and then mentally \"port\" it to another language, you are likely to end up with inefficient code."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%load_ext rpy2.ipython"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = %R Puromycin\n",
"iris = %R iris"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 01**: In exercise A of Python09\n",
" \n",
"The question is to show the types of each column\n",
"```python\n",
"Puromycin.dtypes\n",
"```\n",
"conc float64\n",
"rate float64\n",
"state object\n",
"dtype: object\n",
" \n",
"At first, I tried the code as follow. The result is not what I expected, but I do not know the reason.\n",
"```python\n",
"Puromycin.apply(np.dtype)\n",
"Puromycin.apply(lambda x:x.dtypes)\n",
"```\n",
"conc object\n",
"rate object\n",
"state object\n",
"dtype: object\n",
"\n",
"\n",
"**Response**\n",
"\n",
"I am not sure why this is happening - something to do with how pandas guesses whether to return a Series or a DataFrame. \n",
"\n",
"```\n",
"reduce : boolean or None, default None\n",
" Try to apply reduction procedures. If the DataFrame is empty,\n",
" apply will use reduce to determine whether the result should be a\n",
" Series or a DataFrame. If reduce is None (the default), apply's\n",
" return value will be guessed by calling func an empty Series (note:\n",
" while guessing, exceptions raised by func will be ignored). If\n",
" reduce is True a Series will always be returned, and if False a\n",
" DataFrame will always be returned.\n",
"```\n",
"\n",
"The issue does not arise if the DataFrame does not contain string columns or if you set reduce=False."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"conc float64\n",
"rate float64\n",
"dtype: object"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['conc', 'rate']].apply(np.dtype)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"conc float64\n",
"rate float64\n",
"state object\n",
"dtype: object"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.apply(np.dtype, reduce=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 02**: I am confuse the function applymap with the function pipe?\n",
" \n",
"For example, I have execute the following code and those function return the same results. It seems that those functions are just element-wise operations.\n",
"```python\n",
"df+1\n",
"df.applymap(lambda x:x+1)\n",
"df.pipe(lambda x:x+1)\n",
"```\n",
" \n",
"**Response** \n",
"\n",
"`applymap` does perform an element-wise operation. It is useful when the element-wise operation is not one that can be accomplished using broadcasting.\n",
"\n",
"`pipe` just allows chaining of an arbitrary custom function. It need not be an element-wise function."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" conc | \n",
" rate | \n",
" state | \n",
"
\n",
" \n",
" \n",
" \n",
" | 1 | \n",
" 0.02 | \n",
" 76.0 | \n",
" TREATED | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.02 | \n",
" 47.0 | \n",
" TREATED | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.06 | \n",
" 97.0 | \n",
" TREATED | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.06 | \n",
" 107.0 | \n",
" TREATED | \n",
"
\n",
" \n",
" | 5 | \n",
" 0.11 | \n",
" 123.0 | \n",
" TREATED | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" conc rate state\n",
"1 0.02 76.0 TREATED\n",
"2 0.02 47.0 TREATED\n",
"3 0.06 97.0 TREATED\n",
"4 0.06 107.0 TREATED\n",
"5 0.11 123.0 TREATED"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.applymap(lambda x: str(x).upper()).head()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pipe(lambda x: 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 03**: Is it able to perform melt and pivot using chain or pipeline?\n",
" \n",
"For example, in R, I could\n",
"```\n",
"# in R Kernel\n",
"library(dplyr)\n",
"library(tidyr)\n",
" \n",
"iris %>%\n",
" gather(Sample, Value, -Species) %>% # wide -> long\n",
" group_by(Species) %>% # group by Species\n",
" summarise(Mean = mean(Value)) %>% # summarise mean value\n",
"select(-Species) # remove a column\n",
"```\n",
"Mean\n",
"2.5355\n",
"3.5730\n",
"4.2850\n",
" \n",
"However, using python pandas to rewrite above R code, it is not as straightforward.\n",
"```\n",
"# in Python kernel\n",
"df = %R iris\n",
"df = pd.melt(df,\n",
"var_name=\"Sample\",\n",
"value_name=\"Value\",\n",
"id_vars=[\"Species\"])\n",
" \n",
"df = (df\n",
" .groupby(\"Species\")\n",
" .apply(np.mean))\n",
" \n",
"df = df.reset_index()\n",
"df.drop(“Species”)\n",
"```\n",
"\n",
"**Response**\n",
"\n",
"In `pandas` `melt` is a function, not a method, so you cannot easily chain it. The operation above can be done much simpler with the following code"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
"
\n",
" \n",
" | Species | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | setosa | \n",
" 2.5355 | \n",
"
\n",
" \n",
" | versicolor | \n",
" 3.5730 | \n",
"
\n",
" \n",
" | virginica | \n",
" 4.2850 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value\n",
"Species \n",
"setosa 2.5355\n",
"versicolor 3.5730\n",
"virginica 4.2850"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.melt(iris, 'Species').groupby('Species').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible to code R in Python if you really want to, but the code is insane."
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Value | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 2.5355 | \n",
"
\n",
" \n",
" | 1 | \n",
" 3.5730 | \n",
"
\n",
" \n",
" | 2 | \n",
" 4.2850 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Value\n",
"0 2.5355\n",
"1 3.5730\n",
"2 4.2850"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" iris.\n",
" pipe(lambda x: pd.melt(x, id_vars ='Species', \n",
" var_name='Sample', \n",
" value_name='Value')).\n",
" groupby('Species').apply(np.mean).\n",
" reset_index().\n",
" drop('Species', axis=1)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 04**: apply to each column when the applied function returns different number of items\n",
" \n",
"For example, in R,\n",
"```\n",
"apply(Puromycin, 2, unique)\n",
"```\n",
"$conc\n",
"[1] \"0.02\" \"0.06\" \"0.11\" \"0.22\" \"0.56\" \"1.10\"\n",
" \n",
"$rate\n",
"[1] \" 76\" \" 47\" \" 97\" \"107\" \"123\" \"139\" \"159\" \"152\" \"191\" \"201\" \"207\" \"200\" \" 67\" \" 51\" \" 84\" \" 86\" \" 98\" \"115\"\n",
"[19] \"131\" \"124\" \"144\" \"158\" \"160\"\n",
" \n",
"$state\n",
"[1] \"treated\" \"untreated\"\n",
" \n",
" \n",
"However in python, it returns an error. Is there any way to perform the same R task using Pandas.\n",
"```\n",
"Puromycin.apply(pd.unique)\n",
"```\n",
"\n",
"**Response**\n",
"\n",
"The return value should be a `pandas` DataFrame, which means each column must have the same length. You can coerce it to do this with the code shown below."
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" conc | \n",
" rate | \n",
" state | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.02 | \n",
" 76.0 | \n",
" treated | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.06 | \n",
" 47.0 | \n",
" untreated | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.11 | \n",
" 97.0 | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" 0.22 | \n",
" 107.0 | \n",
" NaN | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.56 | \n",
" 123.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" conc rate state\n",
"0 0.02 76.0 treated\n",
"1 0.06 47.0 untreated\n",
"2 0.11 97.0 NaN\n",
"3 0.22 107.0 NaN\n",
"4 0.56 123.0 NaN"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.apply(lambda x: pd.Series(pd.unique(x))).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 05**: import R packages in using rmagic\n",
" \n",
"If I change to R kernel, I am able to load r packages in the cell. However, if I use R magic, error occurs when trying the load r packages. I tried to search the web but did not find the solution to load r packages under R magic.\n",
" \n",
"Example:\n",
"```\n",
"%%R\n",
"library(dplyr)\n",
"```\n",
"\n",
"**Response**\n",
"\n",
"`rmagic` and ohter magics really only work wiht the Python kernels. Kernels such as R are not writtten by the Jyupyter team and typically do not offer the same set of magic functions. To run the above code, you need to make sure that the **cell** is running the Python kernel."
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%R\n",
"library(dplyr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}