Practice TWO

Working with dataframes

Practice TWO is meant to help you get comfortable working with data frames, and the basic ways you can slice and dice datafrmaes.

  1. How many rows and columns are there in the mtcars dataframe?
In [1]:
(nrow(mtcars))
(ncol(mtcars))
Out[1]:
32
Out[1]:
11
In [2]:
dim(mtcars)
Out[2]:
  1. 32
  2. 11
  1. Show the last 6 rows of mtcars.
In [3]:
tail(mtcars, 6)
Out[3]:
mpgcyldisphpdratwtqsecvsamgearcarb
Porsche 914-2264120.3914.432.1416.70152
Lotus Europa30.4495.11133.771.51316.91152
Ford Pantera L15.883512644.223.1714.50154
Ferrari Dino19.761451753.622.7715.50156
Maserati Bora1583013353.543.5714.60158
Volvo 142E21.441211094.112.7818.61142
  1. Show 6 rows at random (no duplicates) from mtcars
In [4]:
ridx <- sample(1:nrow(mtcars), 6, replace=FALSE)
mtcars[ridx,]
Out[4]:
mpgcyldisphpdratwtqsecvsamgearcarb
Cadillac Fleetwood10.484722052.935.2517.980034
Dodge Challenger15.583181502.763.5216.870032
Valiant18.162251052.763.4620.221031
Volvo 142E21.441211094.112.7818.61142
Lotus Europa30.4495.11133.771.51316.91152
Merc 450SL17.38275.81803.073.7317.60033
  1. Display information only for the subset of cars with automatic transmission.
In [5]:
mtcars[mtcars$am == 1,]
Out[5]:
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Fiat 12832.4478.7664.082.219.471141
Honda Civic30.4475.7524.931.61518.521142
Toyota Corolla33.9471.1654.221.83519.91141
Fiat X1-927.3479664.081.93518.91141
Porsche 914-2264120.3914.432.1416.70152
Lotus Europa30.4495.11133.771.51316.91152
Ford Pantera L15.883512644.223.1714.50154
Ferrari Dino19.761451753.622.7715.50156
Maserati Bora1583013353.543.5714.60158
Volvo 142E21.441211094.112.7818.61142
  1. Display information only for the subset of cars with weight between 2 and 3.
In [6]:
mtcars[(2 < mtcars$wt) & (mtcars$wt < 3),]
Out[6]:
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Fiat 12832.4478.7664.082.219.471141
Toyota Corona21.54120.1973.72.46520.011031
Porsche 914-2264120.3914.432.1416.70152
Ferrari Dino19.761451753.622.7715.50156
Volvo 142E21.441211094.112.7818.61142
  1. What is the mean weight of all cars?
In [7]:
mean(mtcars$wt)
Out[7]:
3.21725
In [8]:
(7) What is the mean weight of cars wtih `mpg` greater than 20?
Error in parse(text = x, srcfile = src): <text>:1:5: unexpected symbol
1: (7) What
        ^

In [9]:
mean(mtcars[mtcars$mpg > 20, "wt"])
Out[9]:
2.41807142857143
  1. Add a column kpl showing the number of kilometers per liter (1 mile = 1.609344 kilometers, and 1 gallon = 3.78541178 liters)
In [10]:
mpg.to.kpl <- function(mpg) {
    return(mpg * 1.609344 / 3.78541178)
}
In [11]:
mtcars$kpl <- mpg.to.kpl(mtcars$mpg)
head(mtcars)
Out[11]:
mpgcyldisphpdratwtqsecvsamgearcarbkpl
Mazda RX42161601103.92.6216.4601448.928018
Mazda RX4 Wag2161601103.92.87517.0201448.928018
Datsun 71022.84108933.852.3218.6111419.693277
Hornet 4 Drive21.462581103.083.21519.4410319.098075
Hornet Sportabout18.783601753.153.4417.0200327.950187
Valiant18.162251052.763.4620.2210317.695101
  1. Make a new dataframe mtcars.1 with only the mpg and kpl columns.
In [12]:
mtcars.1 <- mtcars[, c("mpg", "kpl")]
head(mtcars.1)
Out[12]:
mpgkpl
Mazda RX4218.928018
Mazda RX4 Wag218.928018
Datsun 71022.89.693277
Hornet 4 Drive21.49.098075
Hornet Sportabout18.77.950187
Valiant18.17.695101
  1. Perform a linear regression model of mpg against wt. Plot the model fit.
In [13]:
fit <- lm(mpg ~ wt, data=mtcars)
summary(fit)
Out[13]:

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max
-4.5432 -2.3647 -0.1252  1.4096  6.8727

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,        Adjusted R-squared:  0.7446
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

In [14]:
plot(mtcars$wt, mtcars$mpg, col=rgb(0,0,1,0.5), pch=16, cex=2.0,
    xlab="Weigth", ylab="Miles per gallon",
     main="Linear regression of MPG against wt")
abline(fit, col="red", lwd=2)
  1. Print 10 rows at ranodm from the iris dataframe
In [15]:
ridx <- sample(1:nrow(iris), 10, replace = FALSE)
iris[ridx,]
Out[15]:
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1037.135.92.1virginica
1176.535.51.8virginica
605.22.73.91.4versicolor
1317.42.86.11.9virginica
786.7351.7versicolor
1376.33.45.62.4virginica
666.73.14.41.4versicolor
1155.82.85.12.4virginica
74.63.41.40.3setosa
1166.43.25.32.3virginica
  1. Find the mean Sepal.Length Sepal.Width Petal.Length Petal.Width for each iris species using the aggregate command.
In [16]:
aggregate(iris[,1:4], by=list(iris$Species), FUN=mean)
Out[16]:
Group.1Sepal.LengthSepal.WidthPetal.LengthPetal.Width
1setosa5.0063.4281.4620.246
2versicolor5.9362.774.261.326
3virginica6.5882.9745.5522.026