Working with Data¶
Scalars¶
In [1]:
n <- 3.14
s <- 'c'
b <- TRUE
In [2]:
typeof(n)
Out[2]:
In [3]:
typeof(s)
Out[3]:
In [4]:
typeof(b)
Out[4]:
Vectors¶
Vectors are 1D collections of the same scalar type.
In [5]:
xs <- c(1, 0.5, 0.25)
ss <- c('G', 'A', 'T', 'T', 'A', 'C', 'A')
bs <- c(T, T, F, F, T, T, F, F)
In [6]:
xs
Out[6]:
- 1
- 0.5
- 0.25
In [7]:
ss
Out[7]:
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
In [8]:
bs
Out[8]:
- TRUE
- TRUE
- FALSE
- FALSE
- TRUE
- TRUE
- FALSE
- FALSE
Extracting elements with a logical condition¶
In [12]:
ss[ss %in% c('A', 'T')]
Out[12]:
- 'A'
- 'T'
- 'T'
- 'A'
- 'A'
Matrices and Arrays¶
Like vecorrs, only in 2D (matrices) or more (arrays).
In [13]:
m <- matrix(1:12, ncol=4)
In [14]:
m
Out[14]:
1 | 4 | 7 | 10 |
2 | 5 | 8 | 11 |
3 | 6 | 9 | 12 |
In [15]:
m[6:10]
Out[15]:
- 6
- 7
- 8
- 9
- 10
In [16]:
m[m < 10]
Out[16]:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
In [17]:
m[2,]
Out[17]:
- 2
- 5
- 8
- 11
In [18]:
m[,2]
Out[18]:
- 4
- 5
- 6
Work!¶
Try to solve the following problems without searching the web. You can
use the built-in help()
function.
Create the following \(3 \times 3\) matrix and save in a variable
called A
.
- Row 1 = 4, 5, 6
- Row 2 = 1, 2, 3
- Row 3 = 7, 8, 9
In [ ]:
What is the sum of all the numbers in A?
In [ ]:
Create a vector of the column sums in A
using the colSums
function.
In [ ]:
Create a vector of the row sums in A
using the apply
function.
In [ ]:
What is the sum of the numbers in bottom right \(2 \times2\) block (i.e the numbers 2, 3, 8, 9)
In [ ]:
Lists¶
In [19]:
ls <- list(dna=ss, ispurine=ss %in% c('A', 'G'))
In [20]:
ls
Out[20]:
- $dna
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
- $ispurine
- TRUE
- TRUE
- FALSE
- FALSE
- TRUE
- FALSE
- TRUE
Extracting a sublist from a list¶
In [21]:
ls[1]
Out[21]:
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
In [22]:
class(ls[1])
Out[22]:
Extracting an element from a list¶
In [23]:
ls$dna
Out[23]:
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
In [24]:
class(ls$dna)
Out[24]:
In [25]:
ls[[1]]
Out[25]:
- 'G'
- 'A'
- 'T'
- 'T'
- 'A'
- 'C'
- 'A'
In [26]:
class(ls[[1]])
Out[26]:
Data frames¶
A data frame is a special list of vectors where all the vectors have the same length. Because all the vectors have the same length, it can also be thought of as a 2D table or matrix and manipulated in the same way.
In [27]:
df <- as.data.frame(ls)
In [28]:
class(ls)
Out[28]:
In [29]:
class(df)
Out[29]:
In [30]:
df
Out[30]:
dna | ispurine | |
---|---|---|
1 | G | TRUE |
2 | A | TRUE |
3 | T | FALSE |
4 | T | FALSE |
5 | A | TRUE |
6 | C | FALSE |
7 | A | TRUE |
In [31]:
df[4:6, ]
Out[31]:
dna | ispurine | |
---|---|---|
4 | T | FALSE |
5 | A | TRUE |
6 | C | FALSE |
In [32]:
df$ispurine
Out[32]:
- TRUE
- TRUE
- FALSE
- FALSE
- TRUE
- FALSE
- TRUE
In [33]:
df[df$ispurine, ]
Out[33]:
dna | ispurine | |
---|---|---|
1 | G | TRUE |
2 | A | TRUE |
5 | A | TRUE |
7 | A | TRUE |
Creating a data frame from scrach¶
In [34]:
gender <- c('M', 'M', 'F', 'F', 'M', 'F', 'M')
height <- c(1.65, 1.82, 1.56, 1.66, 1.72, 1.6, 1.8)
weight <- c(65, 102, 55, 46, 78, 60, 72)
bods <- data.frame(gender, height, weight)
In [35]:
bods
Out[35]:
gender | height | weight | |
---|---|---|---|
1 | M | 1.65 | 65 |
2 | M | 1.82 | 102 |
3 | F | 1.56 | 55 |
4 | F | 1.66 | 46 |
5 | M | 1.72 | 78 |
6 | F | 1.6 | 60 |
7 | M | 1.8 | 72 |
We can add a new calculated column easily. Let’s include the body mass index (bmi).
In [36]:
bods$bmi <- bods$weight/bods$height^2
In [37]:
bods
Out[37]:
gender | height | weight | bmi | |
---|---|---|---|---|
1 | M | 1.65 | 65 | 23.87511 |
2 | M | 1.82 | 102 | 30.79338 |
3 | F | 1.56 | 55 | 22.60026 |
4 | F | 1.66 | 46 | 16.69328 |
5 | M | 1.72 | 78 | 26.3656 |
6 | F | 1.6 | 60 | 23.4375 |
7 | M | 1.8 | 72 | 22.22222 |
Let’s get rid of the bmi column.
In [38]:
bods$bmi <- NULL
In [39]:
bods
Out[39]:
gender | height | weight | |
---|---|---|---|
1 | M | 1.65 | 65 |
2 | M | 1.82 | 102 |
3 | F | 1.56 | 55 |
4 | F | 1.66 | 46 |
5 | M | 1.72 | 78 |
6 | F | 1.6 | 60 |
7 | M | 1.8 | 72 |
Work!¶
How many males are there?
In [ ]:
What is the mean height?
What is the mean weight for femalse?
In [ ]:
A person is classified as obese if their BMI exceeds 30. Add the BMI
column back into the data frame, as well as a new logical column
is.obese
indicating if a person is obese or not.
In [ ]: