Exercises with pandas
1¶
We will work with the Puromycin data set (available in R) in this exercise.
Reaction Velocity of an Enzymatic Reaction
Description:
The ‘Puromycin’ data frame has 23 rows and 3 columns of the
reaction velocity versus substrate concentration in an enzymatic
reaction involving untreated cells or cells treated with
Puromycin.
Usage:
Puromycin
Format:
This data frame contains the following columns:
‘conc’ a numeric vector of substrate concentrations (ppm)
‘rate’ a numeric vector of instantaneous reaction rates
(counts/min/min)
‘state’ a factor with levels ‘treated’ ‘untreated’
Details:
Data on the velocity of an enzymatic reaction were obtained by
Treloar (1974). The number of counts per minute of radioactive
product from the reaction was measured as a function of substrate
concentration in parts per million (ppm) and from these counts the
initial rate (or velocity) of the reaction was calculated
(counts/min/min). The experiment was conducted once with the
enzyme treated with Puromycin, and once with the enzyme untreated.
Source:
Bates, D.M. and Watts, D.G. (1988), _Nonlinear Regression Analysis
and Its Applications_, Wiley, Appendix A1.3.
Treloar, M. A. (1974), _Effects of Puromycin on
Galactosyltransferase in Golgi Membranes_, M.Sc. Thesis, U. of
Toronto.
Load the Puromycin data set into a Python DataFrame
In [ ]:
How many rows and columns are there?
In [ ]:
What is the type of each column?
In [ ]:
Show all unique values for the state column
In [ ]:
Show the first 5 rows
In [ ]:
Show the last 5 rows
In [ ]:
Show 5 randomly sampled rows
In [ ]:
Show rows 5 to 10 (inclusive)
In [ ]:
Show only rows where the state is untreated
In [ ]:
Show only rows where the conc is 0.11
In [ ]:
Show only rows where the conc is less than 0.1
In [ ]:
Show only rows where the state is treated and the rate is more than 100
In [ ]:
Show only rows where the conc is less than 0.1 or the rate is more than 200
In [ ]:
Show only the conc and rate columns
In [ ]:
Show only the columns whose type is numeric
In [ ]:
Show only the columns whose names end with the letter e
In [ ]:
Convert all column names to UPPERCASE
In [ ]:
Rearrange the columns in the order state, conc, rate
In [ ]:
Drop the state column
In [ ]:
Create a new column rate2 that is the square of rate
In [ ]:
Create a new data frame that only has the 3 columns with conc, conc^2 and conc^3 values. Name them conc, conc2 and conc3
In [ ]:
Replace each value of all numeric columns with the square root of the value
In [ ]:
Sort in ascending rate order
In [ ]:
Sort in descending rate order
In [ ]:
Sort first on conc i ascending order, then rate in ascending order
In [ ]:
Sort in ascending order of the number of characters in the state column
In [ ]:
Find the mean value of numeric columns
In [ ]:
Find the mean length of the state column
In [ ]:
Find the min, median and max of the rate column
In [ ]:
Find the average rate for each state
In [ ]:
Find the number of treated and untreated states in a new column count
In [ ]:
Find the number of rows with the same conc and state in a new column count and only show rows where the count is an even number.
In [ ]:
Find the mean and standard deviation of rate for each state and conc. Remove any rows with an NA value for the rate standard deviation.
In [ ]: