Python is not R¶
Despite superficial similarities, Python is a very different language
than R, and the pandas
DataFrame is not the R data.frame
. Use
Python idioms when coding Python, and R ones when coding R - if you
first think of how you would do it in one language and then mentally
“port” it to another language, you are likely to end up with inefficient
code.
In [1]:
%load_ext rpy2.ipython
In [2]:
import pandas as pd
import numpy as np
In [83]:
df = %R Puromycin
iris = %R iris
Question 01: In exercise A of Python09
The question is to show the types of each column
Puromycin.dtypes
conc float64 rate float64 state object dtype: object
At first, I tried the code as follow. The result is not what I expected, but I do not know the reason.
Puromycin.apply(np.dtype)
Puromycin.apply(lambda x:x.dtypes)
conc object rate object state object dtype: object
Response
I am not sure why this is happening - something to do with how pandas guesses whether to return a Series or a DataFrame.
reduce : boolean or None, default None
Try to apply reduction procedures. If the DataFrame is empty,
apply will use reduce to determine whether the result should be a
Series or a DataFrame. If reduce is None (the default), apply's
return value will be guessed by calling func an empty Series (note:
while guessing, exceptions raised by func will be ignored). If
reduce is True a Series will always be returned, and if False a
DataFrame will always be returned.
The issue does not arise if the DataFrame does not contain string columns or if you set reduce=False.
In [80]:
df[['conc', 'rate']].apply(np.dtype)
Out[80]:
conc float64
rate float64
dtype: object
In [43]:
df.apply(np.dtype, reduce=False)
Out[43]:
conc float64
rate float64
state object
dtype: object
Question 02: I am confuse the function applymap with the function pipe?
For example, I have execute the following code and those function return the same results. It seems that those functions are just element-wise operations.
df+1
df.applymap(lambda x:x+1)
df.pipe(lambda x:x+1)
Response
applymap
does perform an element-wise operation. It is useful when
the element-wise operation is not one that can be accomplished using
broadcasting.
pipe
just allows chaining of an arbitrary custom function. It need
not be an element-wise function.
In [48]:
df.applymap(lambda x: str(x).upper()).head()
Out[48]:
conc | rate | state | |
---|---|---|---|
1 | 0.02 | 76.0 | TREATED |
2 | 0.02 | 47.0 | TREATED |
3 | 0.06 | 97.0 | TREATED |
4 | 0.06 | 107.0 | TREATED |
5 | 0.11 | 123.0 | TREATED |
In [49]:
df.pipe(lambda x: 1)
Out[49]:
1
Question 03: Is it able to perform melt and pivot using chain or pipeline?
For example, in R, I could
# in R Kernel
library(dplyr)
library(tidyr)
iris %>%
gather(Sample, Value, -Species) %>% # wide -> long
group_by(Species) %>% # group by Species
summarise(Mean = mean(Value)) %>% # summarise mean value
select(-Species) # remove a column
Mean 2.5355 3.5730 4.2850
However, using python pandas to rewrite above R code, it is not as straightforward.
# in Python kernel
df = %R iris
df = pd.melt(df,
var_name="Sample",
value_name="Value",
id_vars=["Species"])
df = (df
.groupby("Species")
.apply(np.mean))
df = df.reset_index()
df.drop(“Species”)
Response
In pandas
melt
is a function, not a method, so you cannot easily
chain it. The operation above can be done much simpler with the
following code
In [101]:
pd.melt(iris, 'Species').groupby('Species').mean()
Out[101]:
value | |
---|---|
Species | |
setosa | 2.5355 |
versicolor | 3.5730 |
virginica | 4.2850 |
It is possible to code R in Python if you really want to, but the code is insane.
In [102]:
(
iris.
pipe(lambda x: pd.melt(x, id_vars ='Species',
var_name='Sample',
value_name='Value')).
groupby('Species').apply(np.mean).
reset_index().
drop('Species', axis=1)
)
Out[102]:
Value | |
---|---|
0 | 2.5355 |
1 | 3.5730 |
2 | 4.2850 |
Question 04: apply to each column when the applied function returns different number of items
For example, in R,
apply(Puromycin, 2, unique)
$conc [1] “0.02” “0.06” “0.11” “0.22” “0.56” “1.10”
$rate [1] ” 76” ” 47” ” 97” “107” “123” “139” “159” “152” “191” “201” “207” “200” ” 67” ” 51” ” 84” ” 86” ” 98” “115” [19] “131” “124” “144” “158” “160”
$state [1] “treated” “untreated”
However in python, it returns an error. Is there any way to perform the same R task using Pandas.
Puromycin.apply(pd.unique)
Response
The return value should be a pandas
DataFrame, which means each
column must have the same length. You can coerce it to do this with the
code shown below.
In [100]:
df.apply(lambda x: pd.Series(pd.unique(x))).head()
Out[100]:
conc | rate | state | |
---|---|---|---|
0 | 0.02 | 76.0 | treated |
1 | 0.06 | 47.0 | untreated |
2 | 0.11 | 97.0 | NaN |
3 | 0.22 | 107.0 | NaN |
4 | 0.56 | 123.0 | NaN |
Question 05: import R packages in using rmagic
If I change to R kernel, I am able to load r packages in the cell. However, if I use R magic, error occurs when trying the load r packages. I tried to search the web but did not find the solution to load r packages under R magic.
Example:
%%R
library(dplyr)
Response
rmagic
and ohter magics really only work wiht the Python kernels.
Kernels such as R are not writtten by the Jyupyter team and typically do
not offer the same set of magic functions. To run the above code, you
need to make sure that the cell is running the Python kernel.
In [103]:
%%R
library(dplyr)
In [ ]: