Python is not R

Despite superficial similarities, Python is a very different language than R, and the pandas DataFrame is not the R data.frame. Use Python idioms when coding Python, and R ones when coding R - if you first think of how you would do it in one language and then mentally “port” it to another language, you are likely to end up with inefficient code.

In [1]:
%load_ext rpy2.ipython
In [2]:
import pandas as pd
import numpy as np
In [83]:
df = %R Puromycin
iris = %R iris

Question 01: In exercise A of Python09

The question is to show the types of each column

Puromycin.dtypes

conc float64 rate float64 state object dtype: object

At first, I tried the code as follow. The result is not what I expected, but I do not know the reason.

Puromycin.apply(np.dtype)
Puromycin.apply(lambda x:x.dtypes)

conc object rate object state object dtype: object

Response

I am not sure why this is happening - something to do with how pandas guesses whether to return a Series or a DataFrame.

reduce : boolean or None, default None
    Try to apply reduction procedures. If the DataFrame is empty,
    apply will use reduce to determine whether the result should be a
    Series or a DataFrame. If reduce is None (the default), apply's
    return value will be guessed by calling func an empty Series (note:
    while guessing, exceptions raised by func will be ignored). If
    reduce is True a Series will always be returned, and if False a
    DataFrame will always be returned.

The issue does not arise if the DataFrame does not contain string columns or if you set reduce=False.

In [80]:
df[['conc', 'rate']].apply(np.dtype)
Out[80]:
conc    float64
rate    float64
dtype: object
In [43]:
df.apply(np.dtype, reduce=False)
Out[43]:
conc     float64
rate     float64
state     object
dtype: object

Question 02: I am confuse the function applymap with the function pipe?

For example, I have execute the following code and those function return the same results. It seems that those functions are just element-wise operations.

df+1
df.applymap(lambda x:x+1)
df.pipe(lambda x:x+1)

Response

applymap does perform an element-wise operation. It is useful when the element-wise operation is not one that can be accomplished using broadcasting.

pipe just allows chaining of an arbitrary custom function. It need not be an element-wise function.

In [48]:
df.applymap(lambda x: str(x).upper()).head()
Out[48]:
conc rate state
1 0.02 76.0 TREATED
2 0.02 47.0 TREATED
3 0.06 97.0 TREATED
4 0.06 107.0 TREATED
5 0.11 123.0 TREATED
In [49]:
df.pipe(lambda x: 1)
Out[49]:
1

Question 03: Is it able to perform melt and pivot using chain or pipeline?

For example, in R, I could

# in R Kernel
library(dplyr)
library(tidyr)

iris %>%
    gather(Sample, Value, -Species) %>%   # wide -> long
    group_by(Species) %>%                  # group by Species
    summarise(Mean = mean(Value)) %>%     # summarise mean value
select(-Species)                        # remove a column

Mean 2.5355 3.5730 4.2850

However, using python pandas to rewrite above R code, it is not as straightforward.

# in Python kernel
df = %R iris
df = pd.melt(df,
var_name="Sample",
value_name="Value",
id_vars=["Species"])

df = (df
     .groupby("Species")
     .apply(np.mean))

df = df.reset_index()
df.drop(“Species”)

Response

In pandas melt is a function, not a method, so you cannot easily chain it. The operation above can be done much simpler with the following code

In [101]:
pd.melt(iris, 'Species').groupby('Species').mean()
Out[101]:
value
Species
setosa 2.5355
versicolor 3.5730
virginica 4.2850

It is possible to code R in Python if you really want to, but the code is insane.

In [102]:
(
    iris.
    pipe(lambda x: pd.melt(x, id_vars ='Species',
                           var_name='Sample',
                           value_name='Value')).
    groupby('Species').apply(np.mean).
    reset_index().
    drop('Species', axis=1)
)
Out[102]:
Value
0 2.5355
1 3.5730
2 4.2850

Question 04: apply to each column when the applied function returns different number of items

For example, in R,

apply(Puromycin, 2, unique)

$conc [1] “0.02” “0.06” “0.11” “0.22” “0.56” “1.10”

$rate [1] ” 76” ” 47” ” 97” “107” “123” “139” “159” “152” “191” “201” “207” “200” ” 67” ” 51” ” 84” ” 86” ” 98” “115” [19] “131” “124” “144” “158” “160”

$state [1] “treated” “untreated”

However in python, it returns an error. Is there any way to perform the same R task using Pandas.

Puromycin.apply(pd.unique)

Response

The return value should be a pandas DataFrame, which means each column must have the same length. You can coerce it to do this with the code shown below.

In [100]:
df.apply(lambda x: pd.Series(pd.unique(x))).head()
Out[100]:
conc rate state
0 0.02 76.0 treated
1 0.06 47.0 untreated
2 0.11 97.0 NaN
3 0.22 107.0 NaN
4 0.56 123.0 NaN

Question 05: import R packages in using rmagic

If I change to R kernel, I am able to load r packages in the cell. However, if I use R magic, error occurs when trying the load r packages. I tried to search the web but did not find the solution to load r packages under R magic.

Example:

%%R
library(dplyr)

Response

rmagic and ohter magics really only work wiht the Python kernels. Kernels such as R are not writtten by the Jyupyter team and typically do not offer the same set of magic functions. To run the above code, you need to make sure that the cell is running the Python kernel.

In [103]:
%%R
library(dplyr)
In [ ]: