Practice TWO¶
Working with dataframes¶
Practice TWO is meant to help you get comfortable working with data frames, and the basic ways you can slice and dice datafrmaes.
- How many rows and columns are there in the
mtcars
dataframe?
In [1]:
(nrow(mtcars))
(ncol(mtcars))
Out[1]:
32
Out[1]:
11
In [2]:
dim(mtcars)
Out[2]:
- 32
- 11
- Show the last 6 rows of
mtcars
.
In [3]:
tail(mtcars, 6)
Out[3]:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Porsche 914-2 | 26 | 4 | 120.3 | 91 | 4.43 | 2.14 | 16.7 | 0 | 1 | 5 | 2 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.9 | 1 | 1 | 5 | 2 |
Ford Pantera L | 15.8 | 8 | 351 | 264 | 4.22 | 3.17 | 14.5 | 0 | 1 | 5 | 4 |
Ferrari Dino | 19.7 | 6 | 145 | 175 | 3.62 | 2.77 | 15.5 | 0 | 1 | 5 | 6 |
Maserati Bora | 15 | 8 | 301 | 335 | 3.54 | 3.57 | 14.6 | 0 | 1 | 5 | 8 |
Volvo 142E | 21.4 | 4 | 121 | 109 | 4.11 | 2.78 | 18.6 | 1 | 1 | 4 | 2 |
- Show 6 rows at random (no duplicates) from
mtcars
In [4]:
ridx <- sample(1:nrow(mtcars), 6, replace=FALSE)
mtcars[ridx,]
Out[4]:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Cadillac Fleetwood | 10.4 | 8 | 472 | 205 | 2.93 | 5.25 | 17.98 | 0 | 0 | 3 | 4 |
Dodge Challenger | 15.5 | 8 | 318 | 150 | 2.76 | 3.52 | 16.87 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.46 | 20.22 | 1 | 0 | 3 | 1 |
Volvo 142E | 21.4 | 4 | 121 | 109 | 4.11 | 2.78 | 18.6 | 1 | 1 | 4 | 2 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.9 | 1 | 1 | 5 | 2 |
Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.73 | 17.6 | 0 | 0 | 3 | 3 |
- Display information only for the subset of cars with automatic transmission.
In [5]:
mtcars[mtcars$am == 1,]
Out[5]:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21 | 6 | 160 | 110 | 3.9 | 2.62 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21 | 6 | 160 | 110 | 3.9 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.32 | 18.61 | 1 | 1 | 4 | 1 |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.2 | 19.47 | 1 | 1 | 4 | 1 |
Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.9 | 1 | 1 | 4 | 1 |
Fiat X1-9 | 27.3 | 4 | 79 | 66 | 4.08 | 1.935 | 18.9 | 1 | 1 | 4 | 1 |
Porsche 914-2 | 26 | 4 | 120.3 | 91 | 4.43 | 2.14 | 16.7 | 0 | 1 | 5 | 2 |
Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.9 | 1 | 1 | 5 | 2 |
Ford Pantera L | 15.8 | 8 | 351 | 264 | 4.22 | 3.17 | 14.5 | 0 | 1 | 5 | 4 |
Ferrari Dino | 19.7 | 6 | 145 | 175 | 3.62 | 2.77 | 15.5 | 0 | 1 | 5 | 6 |
Maserati Bora | 15 | 8 | 301 | 335 | 3.54 | 3.57 | 14.6 | 0 | 1 | 5 | 8 |
Volvo 142E | 21.4 | 4 | 121 | 109 | 4.11 | 2.78 | 18.6 | 1 | 1 | 4 | 2 |
- Display information only for the subset of cars with weight between 2 and 3.
In [6]:
mtcars[(2 < mtcars$wt) & (mtcars$wt < 3),]
Out[6]:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21 | 6 | 160 | 110 | 3.9 | 2.62 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21 | 6 | 160 | 110 | 3.9 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.32 | 18.61 | 1 | 1 | 4 | 1 |
Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.2 | 19.47 | 1 | 1 | 4 | 1 |
Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.7 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
Porsche 914-2 | 26 | 4 | 120.3 | 91 | 4.43 | 2.14 | 16.7 | 0 | 1 | 5 | 2 |
Ferrari Dino | 19.7 | 6 | 145 | 175 | 3.62 | 2.77 | 15.5 | 0 | 1 | 5 | 6 |
Volvo 142E | 21.4 | 4 | 121 | 109 | 4.11 | 2.78 | 18.6 | 1 | 1 | 4 | 2 |
- What is the mean weight of all cars?
In [7]:
mean(mtcars$wt)
Out[7]:
3.21725
In [8]:
(7) What is the mean weight of cars wtih `mpg` greater than 20?
Error in parse(text = x, srcfile = src): <text>:1:5: unexpected symbol 1: (7) What ^
In [9]:
mean(mtcars[mtcars$mpg > 20, "wt"])
Out[9]:
2.41807142857143
- Add a column
kpl
showing the number of kilometers per liter (1 mile = 1.609344 kilometers, and 1 gallon = 3.78541178 liters)
In [10]:
mpg.to.kpl <- function(mpg) {
return(mpg * 1.609344 / 3.78541178)
}
In [11]:
mtcars$kpl <- mpg.to.kpl(mtcars$mpg)
head(mtcars)
Out[11]:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | kpl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21 | 6 | 160 | 110 | 3.9 | 2.62 | 16.46 | 0 | 1 | 4 | 4 | 8.928018 |
Mazda RX4 Wag | 21 | 6 | 160 | 110 | 3.9 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 8.928018 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.32 | 18.61 | 1 | 1 | 4 | 1 | 9.693277 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | 9.098075 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.44 | 17.02 | 0 | 0 | 3 | 2 | 7.950187 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.46 | 20.22 | 1 | 0 | 3 | 1 | 7.695101 |
- Make a new dataframe
mtcars.1
with only thempg
andkpl
columns.
In [12]:
mtcars.1 <- mtcars[, c("mpg", "kpl")]
head(mtcars.1)
Out[12]:
mpg | kpl | |
---|---|---|
Mazda RX4 | 21 | 8.928018 |
Mazda RX4 Wag | 21 | 8.928018 |
Datsun 710 | 22.8 | 9.693277 |
Hornet 4 Drive | 21.4 | 9.098075 |
Hornet Sportabout | 18.7 | 7.950187 |
Valiant | 18.1 | 7.695101 |
- Perform a linear regression model of
mpg
againstwt
. Plot the model fit.
In [13]:
fit <- lm(mpg ~ wt, data=mtcars)
summary(fit)
Out[13]:
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
In [14]:
plot(mtcars$wt, mtcars$mpg, col=rgb(0,0,1,0.5), pch=16, cex=2.0,
xlab="Weigth", ylab="Miles per gallon",
main="Linear regression of MPG against wt")
abline(fit, col="red", lwd=2)
- Print 10 rows at ranodm from the
iris
dataframe
In [15]:
ridx <- sample(1:nrow(iris), 10, replace = FALSE)
iris[ridx,]
Out[15]:
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
103 | 7.1 | 3 | 5.9 | 2.1 | virginica |
117 | 6.5 | 3 | 5.5 | 1.8 | virginica |
60 | 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
131 | 7.4 | 2.8 | 6.1 | 1.9 | virginica |
78 | 6.7 | 3 | 5 | 1.7 | versicolor |
137 | 6.3 | 3.4 | 5.6 | 2.4 | virginica |
66 | 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
115 | 5.8 | 2.8 | 5.1 | 2.4 | virginica |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
116 | 6.4 | 3.2 | 5.3 | 2.3 | virginica |
- Find the mean Sepal.Length Sepal.Width Petal.Length Petal.Width for
each iris species using the
aggregate
command.
In [16]:
aggregate(iris[,1:4], by=list(iris$Species), FUN=mean)
Out[16]:
Group.1 | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | |
---|---|---|---|---|---|
1 | setosa | 5.006 | 3.428 | 1.462 | 0.246 |
2 | versicolor | 5.936 | 2.77 | 4.26 | 1.326 |
3 | virginica | 6.588 | 2.974 | 5.552 | 2.026 |