Practice TWO¶

Working with dataframes¶

Practice TWO is meant to help you get comfortable working with data frames, and the basic ways you can slice and dice datafrmaes.

How many rows and columns are there in the mtcars dataframe?

In [1]:

(nrow(mtcars))
(ncol(mtcars))

Out[1]:

32

Out[1]:

11

In [2]:

dim(mtcars)

Out[2]:

32
11

Show the last 6 rows of mtcars.

In [3]:

tail(mtcars, 6)

Out[3]:

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Porsche 914-2	26	4	120.3	91	4.43	2.14	16.7	0	1	5	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.9	1	1	5	2
Ford Pantera L	15.8	8	351	264	4.22	3.17	14.5	0	1	5	4
Ferrari Dino	19.7	6	145	175	3.62	2.77	15.5	0	1	5	6
Maserati Bora	15	8	301	335	3.54	3.57	14.6	0	1	5	8
Volvo 142E	21.4	4	121	109	4.11	2.78	18.6	1	1	4	2

Show 6 rows at random (no duplicates) from mtcars

In [4]:

ridx <- sample(1:nrow(mtcars), 6, replace=FALSE)
mtcars[ridx,]

Out[4]:

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Cadillac Fleetwood	10.4	8	472	205	2.93	5.25	17.98	0	0	3	4
Dodge Challenger	15.5	8	318	150	2.76	3.52	16.87	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.46	20.22	1	0	3	1
Volvo 142E	21.4	4	121	109	4.11	2.78	18.6	1	1	4	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.9	1	1	5	2
Merc 450SL	17.3	8	275.8	180	3.07	3.73	17.6	0	0	3	3

Display information only for the subset of cars with automatic transmission.

In [5]:

mtcars[mtcars$am == 1,]

Out[5]:

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21	6	160	110	3.9	2.62	16.46	0	1	4	4
Mazda RX4 Wag	21	6	160	110	3.9	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.32	18.61	1	1	4	1
Fiat 128	32.4	4	78.7	66	4.08	2.2	19.47	1	1	4	1
Honda Civic	30.4	4	75.7	52	4.93	1.615	18.52	1	1	4	2
Toyota Corolla	33.9	4	71.1	65	4.22	1.835	19.9	1	1	4	1
Fiat X1-9	27.3	4	79	66	4.08	1.935	18.9	1	1	4	1
Porsche 914-2	26	4	120.3	91	4.43	2.14	16.7	0	1	5	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.9	1	1	5	2
Ford Pantera L	15.8	8	351	264	4.22	3.17	14.5	0	1	5	4
Ferrari Dino	19.7	6	145	175	3.62	2.77	15.5	0	1	5	6
Maserati Bora	15	8	301	335	3.54	3.57	14.6	0	1	5	8
Volvo 142E	21.4	4	121	109	4.11	2.78	18.6	1	1	4	2

Display information only for the subset of cars with weight between 2 and 3.

In [6]:

mtcars[(2 < mtcars$wt) & (mtcars$wt < 3),]

Out[6]:

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21	6	160	110	3.9	2.62	16.46	0	1	4	4
Mazda RX4 Wag	21	6	160	110	3.9	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.32	18.61	1	1	4	1
Fiat 128	32.4	4	78.7	66	4.08	2.2	19.47	1	1	4	1
Toyota Corona	21.5	4	120.1	97	3.7	2.465	20.01	1	0	3	1
Porsche 914-2	26	4	120.3	91	4.43	2.14	16.7	0	1	5	2
Ferrari Dino	19.7	6	145	175	3.62	2.77	15.5	0	1	5	6
Volvo 142E	21.4	4	121	109	4.11	2.78	18.6	1	1	4	2

What is the mean weight of all cars?

In [7]:

mean(mtcars$wt)

Out[7]:

3.21725

In [8]:

(7) What is the mean weight of cars wtih `mpg` greater than 20?

Error in parse(text = x, srcfile = src): <text>:1:5: unexpected symbol
1: (7) What
        ^

In [9]:

mean(mtcars[mtcars$mpg > 20, "wt"])

Out[9]:

2.41807142857143

Add a column kpl showing the number of kilometers per liter (1 mile = 1.609344 kilometers, and 1 gallon = 3.78541178 liters)

In [10]:

mpg.to.kpl <- function(mpg) {
    return(mpg * 1.609344 / 3.78541178)
}

In [11]:

mtcars$kpl <- mpg.to.kpl(mtcars$mpg)
head(mtcars)

Out[11]:

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb	kpl
Mazda RX4	21	6	160	110	3.9	2.62	16.46	0	1	4	4	8.928018
Mazda RX4 Wag	21	6	160	110	3.9	2.875	17.02	0	1	4	4	8.928018
Datsun 710	22.8	4	108	93	3.85	2.32	18.61	1	1	4	1	9.693277
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1	9.098075
Hornet Sportabout	18.7	8	360	175	3.15	3.44	17.02	0	0	3	2	7.950187
Valiant	18.1	6	225	105	2.76	3.46	20.22	1	0	3	1	7.695101

Make a new dataframe mtcars.1 with only the mpg and kpl columns.

In [12]:

mtcars.1 <- mtcars[, c("mpg", "kpl")]
head(mtcars.1)

Out[12]:

	mpg	kpl
Mazda RX4	21	8.928018
Mazda RX4 Wag	21	8.928018
Datsun 710	22.8	9.693277
Hornet 4 Drive	21.4	9.098075
Hornet Sportabout	18.7	7.950187
Valiant	18.1	7.695101

Perform a linear regression model of mpg against wt. Plot the model fit.

In [13]:

fit <- lm(mpg ~ wt, data=mtcars)
summary(fit)

Out[13]:

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max
-4.5432 -2.3647 -0.1252  1.4096  6.8727

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,        Adjusted R-squared:  0.7446
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

In [14]:

plot(mtcars$wt, mtcars$mpg, col=rgb(0,0,1,0.5), pch=16, cex=2.0,
    xlab="Weigth", ylab="Miles per gallon",
     main="Linear regression of MPG against wt")
abline(fit, col="red", lwd=2)

Print 10 rows at ranodm from the iris dataframe

In [15]:

ridx <- sample(1:nrow(iris), 10, replace = FALSE)
iris[ridx,]

Out[15]:

	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
103	7.1	3	5.9	2.1	virginica
117	6.5	3	5.5	1.8	virginica
60	5.2	2.7	3.9	1.4	versicolor
131	7.4	2.8	6.1	1.9	virginica
78	6.7	3	5	1.7	versicolor
137	6.3	3.4	5.6	2.4	virginica
66	6.7	3.1	4.4	1.4	versicolor
115	5.8	2.8	5.1	2.4	virginica
7	4.6	3.4	1.4	0.3	setosa
116	6.4	3.2	5.3	2.3	virginica

Find the mean Sepal.Length Sepal.Width Petal.Length Petal.Width for each iris species using the aggregate command.

In [16]:

aggregate(iris[,1:4], by=list(iris$Species), FUN=mean)

Out[16]:

	Group.1	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width
1	setosa	5.006	3.428	1.462	0.246
2	versicolor	5.936	2.77	4.26	1.326
3	virginica	6.588	2.974	5.552	2.026

Practice TWO¶

Working with dataframes¶

Page contents

Previous page

Next page

This Page