Solutions to Warm-up exercises
These problem sets will introduce you to data frames and the basics of
data frame manipulation with the dplyr package.
Warning message:
“package ‘dplyr’ was built under R version 3.4.1”
Use of piping (%>%)
2. Using the Puromycin data set,
- Show the first 20 rows using piping
- Show the last 10 rows using piping
- Show rows 11 to 20 using piping
| conc | rate | state |
| 0.02 | 76 | treated |
| 0.02 | 47 | treated |
| 0.06 | 97 | treated |
| 0.06 | 107 | treated |
| 0.11 | 123 | treated |
| 0.11 | 139 | treated |
| 0.22 | 159 | treated |
| 0.22 | 152 | treated |
| 0.56 | 191 | treated |
| 0.56 | 201 | treated |
| 1.10 | 207 | treated |
| 1.10 | 200 | treated |
| 0.02 | 67 | untreated |
| 0.02 | 51 | untreated |
| 0.06 | 84 | untreated |
| 0.06 | 86 | untreated |
| 0.11 | 98 | untreated |
| 0.11 | 115 | untreated |
| 0.22 | 131 | untreated |
| 0.22 | 124 | untreated |
| conc | rate | state |
| 14 | 0.02 | 51 | untreated |
| 15 | 0.06 | 84 | untreated |
| 16 | 0.06 | 86 | untreated |
| 17 | 0.11 | 98 | untreated |
| 18 | 0.11 | 115 | untreated |
| 19 | 0.22 | 131 | untreated |
| 20 | 0.22 | 124 | untreated |
| 21 | 0.56 | 144 | untreated |
| 22 | 0.56 | 158 | untreated |
| 23 | 1.10 | 160 | untreated |
| conc | rate | state |
| 11 | 1.10 | 207 | treated |
| 12 | 1.10 | 200 | treated |
| 13 | 0.02 | 67 | untreated |
| 14 | 0.02 | 51 | untreated |
| 15 | 0.06 | 84 | untreated |
| 16 | 0.06 | 86 | untreated |
| 17 | 0.11 | 98 | untreated |
| 18 | 0.11 | 115 | untreated |
| 19 | 0.22 | 131 | untreated |
| 20 | 0.22 | 124 | untreated |
Use of filter
3. Using the Puromycin data set,togehter with piping and
filter
- Show only rows where the
state is untreated
- Show only rows where the
conc is 0.11
- Show only rows where the
conc is less than 0.1
- Show only rows where the
state is treated and the rate is
more than 100
- Show only rows where the
conc is less than 0.1 or the rate is
more than 200
| conc | rate | state |
| 0.02 | 67 | untreated |
| 0.02 | 51 | untreated |
| 0.06 | 84 | untreated |
| 0.06 | 86 | untreated |
| 0.11 | 98 | untreated |
| 0.11 | 115 | untreated |
| 0.22 | 131 | untreated |
| 0.22 | 124 | untreated |
| 0.56 | 144 | untreated |
| 0.56 | 158 | untreated |
| 1.10 | 160 | untreated |
| conc | rate | state |
| 0.11 | 123 | treated |
| 0.11 | 139 | treated |
| 0.11 | 98 | untreated |
| 0.11 | 115 | untreated |
| conc | rate | state |
| 0.02 | 76 | treated |
| 0.02 | 47 | treated |
| 0.06 | 97 | treated |
| 0.06 | 107 | treated |
| 0.02 | 67 | untreated |
| 0.02 | 51 | untreated |
| 0.06 | 84 | untreated |
| 0.06 | 86 | untreated |
| conc | rate | state |
| 0.06 | 107 | treated |
| 0.11 | 123 | treated |
| 0.11 | 139 | treated |
| 0.22 | 159 | treated |
| 0.22 | 152 | treated |
| 0.56 | 191 | treated |
| 0.56 | 201 | treated |
| 1.10 | 207 | treated |
| 1.10 | 200 | treated |
| conc | rate | state |
| 0.02 | 76 | treated |
| 0.02 | 47 | treated |
| 0.06 | 97 | treated |
| 0.06 | 107 | treated |
| 0.56 | 201 | treated |
| 1.10 | 207 | treated |
| 0.02 | 67 | untreated |
| 0.02 | 51 | untreated |
| 0.06 | 84 | untreated |
| 0.06 | 86 | untreated |
Use of select
4. Using the Puromycin data set, together with piping,
head and select, select_if and select_all
- Show only the
conc and rate columns
- Show only the columns whose type is numeric
- Show only the columns whose names end with the letter
e
- Convert all column names to UPPERCASE
- Rearrange the columns in the order
state, conc, rate
- Drop the
state column
Limit to only the first 3 rows in each case.
| conc | rate |
| 0.02 | 76 |
| 0.02 | 47 |
| 0.06 | 97 |
| conc | rate |
| 0.02 | 76 |
| 0.02 | 47 |
| 0.06 | 97 |
| rate | state |
| 76 | treated |
| 47 | treated |
| 97 | treated |
| CONC | RATE | STATE |
| 0.02 | 76 | treated |
| 0.02 | 47 | treated |
| 0.06 | 97 | treated |
| state | conc | rate |
| treated | 0.02 | 76 |
| treated | 0.02 | 47 |
| treated | 0.06 | 97 |
| conc | rate |
| 0.02 | 76 |
| 0.02 | 47 |
| 0.06 | 97 |
Use of mutate and transmute
5. Using the Puromycin data set, together with mutate or
transmutate and any other operation necessary
- Create a new column
rate2 that is the square of rate
- Create a new data frame that only has the 3 columns with
conc,
conc^2 and conc^3 values. Name them conc, conc2 and
conc3
- Replace each value of all numeric columns with the square root of the
value
Show only the first 5 rows in each case
| conc | rate | state | rate2 |
| 0.02 | 76 | treated | 5776 |
| 0.02 | 47 | treated | 2209 |
| 0.06 | 97 | treated | 9409 |
| 0.06 | 107 | treated | 11449 |
| 0.11 | 123 | treated | 15129 |
| conc | conc2 | conc3 |
| 0.02 | 0.0004 | 0.000008 |
| 0.02 | 0.0004 | 0.000008 |
| 0.06 | 0.0036 | 0.000216 |
| 0.06 | 0.0036 | 0.000216 |
| 0.11 | 0.0121 | 0.001331 |
| conc | rate | state |
| 0.1414214 | 8.717798 | treated |
| 0.1414214 | 6.855655 | treated |
| 0.2449490 | 9.848858 | treated |
| 0.2449490 | 10.344080 | treated |
| 0.3316625 | 11.090537 | treated |
Use of arrange
6. Using the Puromycin data set, together with arrange and
any other operation necessary
- Sort in ascending
rate order
- Sort in descending
rate order
- Sort first on
conc i ascending order, then rate in ascending
order
- Sort in ascending order of the number of characters in the
state
column
In each case show only the first 5 rows.
| conc | rate | state |
| 0.02 | 47 | treated |
| 0.02 | 51 | untreated |
| 0.02 | 67 | untreated |
| 0.02 | 76 | treated |
| 0.06 | 84 | untreated |
| conc | rate | state |
| 1.10 | 207 | treated |
| 0.56 | 201 | treated |
| 1.10 | 200 | treated |
| 0.56 | 191 | treated |
| 1.10 | 160 | untreated |
| conc | rate | state |
| 1.10 | 207 | treated |
| 0.56 | 201 | treated |
| 1.10 | 200 | treated |
| 0.56 | 191 | treated |
| 1.10 | 160 | untreated |
| conc | rate | state |
| 0.02 | 47 | treated |
| 0.02 | 51 | untreated |
| 0.02 | 67 | untreated |
| 0.02 | 76 | treated |
| 0.06 | 84 | untreated |
| conc | rate | state |
| 0.02 | 76 | treated |
| 0.02 | 47 | treated |
| 0.06 | 97 | treated |
| 0.06 | 107 | treated |
| 0.11 | 123 | treated |
Use of summarize
7. Using the Puromycin data set, together with summarize and
any other operation necessary
- Find the mean value of numeric columns
- Find the mean length of the
state column
- Find the min, median and max of the
rate column
| conc | rate |
| 0.3121739 | 126.8261 |
| rate.min | rate.median | rate.max |
| 47 | 124 | 207 |
Use of group_by
8. Using the Puromycin data set, together with group_by and
any other operation necessary
- Find the average rate for each
state
- Find the number of treated and untreated states in a new column
count
- Find the number of rows with the same
conc and state in a new
column count and only show rows where the count is an even
number.
- Find the mean and standard deviation of rate for each
state and
conc. Remove any rows with an NA value for the rate standard
deviation.
Hint: group_by is often combined with summarize, and n()
returns the count.
| state | conc | rate |
| treated | 0.3450000 | 141.5833 |
| untreated | 0.2763636 | 110.7273 |
| state | count |
| treated | 12 |
| untreated | 11 |
| conc | state | count |
| 0.02 | treated | 2 |
| 0.02 | untreated | 2 |
| 0.06 | treated | 2 |
| 0.06 | untreated | 2 |
| 0.11 | treated | 2 |
| 0.11 | untreated | 2 |
| 0.22 | treated | 2 |
| 0.22 | untreated | 2 |
| 0.56 | treated | 2 |
| 0.56 | untreated | 2 |
| 1.10 | treated | 2 |
| state | conc | rate.mean | rate.sd |
| treated | 0.02 | 61.5 | 20.506097 |
| treated | 0.06 | 102.0 | 7.071068 |
| treated | 0.11 | 131.0 | 11.313708 |
| treated | 0.22 | 155.5 | 4.949747 |
| treated | 0.56 | 196.0 | 7.071068 |
| treated | 1.10 | 203.5 | 4.949747 |
| untreated | 0.02 | 59.0 | 11.313708 |
| untreated | 0.06 | 85.0 | 1.414214 |
| untreated | 0.11 | 106.5 | 12.020815 |
| untreated | 0.22 | 127.5 | 4.949747 |
| untreated | 0.56 | 151.0 | 9.899495 |
Use of gather
9. Using the iris data set, together with spread and any
other operation necessary
- Create a new data frame
df that has only 3 columns (Species,
Measure, Value) where Measure takes on the values
Sepal.Length, Sepal.Width, Petal.Length or
Petal.Width. Show the first 5 rows.
- Show the mean value and counts for each Species and Measure of
df
| Species | Measure | Value |
| setosa | Sepal.Length | 5.1 |
| setosa | Sepal.Length | 4.9 |
| setosa | Sepal.Length | 4.7 |
| setosa | Sepal.Length | 4.6 |
| setosa | Sepal.Length | 5.0 |
| Species | Measure | mean | count |
| setosa | Petal.Length | 1.462 | 50 |
| setosa | Petal.Width | 0.246 | 50 |
| setosa | Sepal.Length | 5.006 | 50 |
| setosa | Sepal.Width | 3.428 | 50 |
| versicolor | Petal.Length | 4.260 | 50 |
| versicolor | Petal.Width | 1.326 | 50 |
| versicolor | Sepal.Length | 5.936 | 50 |
| versicolor | Sepal.Width | 2.770 | 50 |
| virginica | Petal.Length | 5.552 | 50 |
| virginica | Petal.Width | 2.026 | 50 |
| virginica | Sepal.Length | 6.588 | 50 |
| virginica | Sepal.Width | 2.974 | 50 |
Use of spread
This is the opposite of gather - it takes a key and value column, and
makes new columns out of the keys.
10. Using the df data set, apply spread to
give each different treatement its own column.
| subject | treatment | value |
| 1 | A | 0.1765537 |
| 2 | A | 2.0216923 |
| 3 | A | 0.1613333 |
| 4 | A | 1.0256221 |
| 1 | B | 0.8115127 |
| 2 | B | -1.0966100 |
| 3 | B | -2.0626682 |
| 4 | B | 0.2882717 |
| 1 | C | 1.7624858 |
| 2 | C | 0.5935407 |
| 3 | C | -0.5302786 |
| 4 | C | 0.5366085 |
| subject | A | B | C |
| 1 | 0.1765537 | 0.8115127 | 1.7624858 |
| 2 | 2.0216923 | -1.0966100 | 0.5935407 |
| 3 | 0.1613333 | -2.0626682 | -0.5302786 |
| 4 | 1.0256221 | 0.2882717 | 0.5366085 |
Use of separate
| name | bp |
| 1-A | 102.33299 |
| 2-A | 92.67872 |
| 3-A | 148.08022 |
| 4-A | 156.18250 |
| 1-B | 138.82081 |
| 2-B | 113.52586 |
| 3-B | 152.55624 |
| 4-B | 123.05604 |
| 1-C | 129.22529 |
| 2-C | 131.28523 |
| 3-C | 74.97073 |
| 4-C | 159.81344 |
11. Using the expt data set, together with separate and any
other operation necessary
- Find the average blood pressure for each treatment group (A, B or C).
Note: You are assumed not to have access to the pid and treat
values separate.y.
| treat | mean(bp) |
| A | 123.3292 |
| B | 111.1280 |
| C | 138.1923 |