Working with Data

Scalars

In [1]:
n <- 3.14
s <- 'c'
b <- TRUE
In [2]:
typeof(n)
Out[2]:
'double'
In [3]:
typeof(s)
Out[3]:
'character'
In [4]:
typeof(b)
Out[4]:
'logical'

Vectors

Vectors are 1D collections of the same scalar type.

In [5]:
xs <- c(1, 0.5, 0.25)
ss <- c('G', 'A', 'T', 'T', 'A', 'C', 'A')
bs <- c(T, T, F, F, T, T, F, F)
In [6]:
xs
Out[6]:
  1. 1
  2. 0.5
  3. 0.25
In [7]:
ss
Out[7]:
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
In [8]:
bs
Out[8]:
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. TRUE
  7. FALSE
  8. FALSE

Extracting a single element

In [9]:
xs[1]
Out[9]:
1

Extracting elments with a position vector

In [10]:
ss[2:5]
Out[10]:
  1. 'A'
  2. 'T'
  3. 'T'
  4. 'A'

Extracting elemnents wiht a logical vector

In [11]:
ss[bs]
Out[11]:
  1. 'G'
  2. 'A'
  3. 'A'
  4. 'C'

Extracting elements with a logical condition

In [12]:
ss[ss %in% c('A', 'T')]
Out[12]:
  1. 'A'
  2. 'T'
  3. 'T'
  4. 'A'
  5. 'A'

Matrices and Arrays

Like vecorrs, only in 2D (matrices) or more (arrays).

In [13]:
m <- matrix(1:12, ncol=4)
In [14]:
m
Out[14]:
1 4 710
2 5 811
3 6 912
In [15]:
m[6:10]
Out[15]:
  1. 6
  2. 7
  3. 8
  4. 9
  5. 10
In [16]:
m[m < 10]
Out[16]:
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
In [17]:
m[2,]
Out[17]:
  1. 2
  2. 5
  3. 8
  4. 11
In [18]:
m[,2]
Out[18]:
  1. 4
  2. 5
  3. 6

Work!

Try to solve the following problems without searching the web. You can use the built-in help() function.

Create the following \(3 \times 3\) matrix and save in a variable called A.

  • Row 1 = 4, 5, 6
  • Row 2 = 1, 2, 3
  • Row 3 = 7, 8, 9
In [ ]:





What is the sum of all the numbers in A?

In [ ]:




Create a vector of the column sums in A using the colSums function.

In [ ]:




Create a vector of the row sums in A using the apply function.

In [ ]:




What is the sum of the numbers in bottom right \(2 \times2\) block (i.e the numbers 2, 3, 8, 9)

In [ ]:




Lists

In [19]:
ls <- list(dna=ss, ispurine=ss %in% c('A', 'G'))
In [20]:
ls
Out[20]:
$dna
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
$ispurine
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. FALSE
  7. TRUE

Extracting a sublist from a list

In [21]:
ls[1]
Out[21]:
$dna =
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
In [22]:
class(ls[1])
Out[22]:
'list'

Extracting an element from a list

In [23]:
ls$dna
Out[23]:
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
In [24]:
class(ls$dna)
Out[24]:
'character'
In [25]:
ls[[1]]
Out[25]:
  1. 'G'
  2. 'A'
  3. 'T'
  4. 'T'
  5. 'A'
  6. 'C'
  7. 'A'
In [26]:
class(ls[[1]])
Out[26]:
'character'

Data frames

A data frame is a special list of vectors where all the vectors have the same length. Because all the vectors have the same length, it can also be thought of as a 2D table or matrix and manipulated in the same way.

In [27]:
df <- as.data.frame(ls)
In [28]:
class(ls)
Out[28]:
'list'
In [29]:
class(df)
Out[29]:
'data.frame'
In [30]:
df
Out[30]:
dnaispurine
1GTRUE
2ATRUE
3TFALSE
4TFALSE
5ATRUE
6CFALSE
7ATRUE
In [31]:
df[4:6, ]
Out[31]:
dnaispurine
4TFALSE
5ATRUE
6CFALSE
In [32]:
df$ispurine
Out[32]:
  1. TRUE
  2. TRUE
  3. FALSE
  4. FALSE
  5. TRUE
  6. FALSE
  7. TRUE
In [33]:
df[df$ispurine, ]
Out[33]:
dnaispurine
1GTRUE
2ATRUE
5ATRUE
7ATRUE

Creating a data frame from scrach

In [34]:
gender <- c('M', 'M', 'F', 'F', 'M', 'F', 'M')
height <- c(1.65, 1.82, 1.56, 1.66, 1.72, 1.6, 1.8)
weight <- c(65, 102, 55, 46, 78, 60, 72)

bods <- data.frame(gender, height, weight)
In [35]:
bods
Out[35]:
genderheightweight
1M1.6565
2M1.82102
3F1.5655
4F1.6646
5M1.7278
6F1.660
7M1.872

We can add a new calculated column easily. Let’s include the body mass index (bmi).

In [36]:
bods$bmi <- bods$weight/bods$height^2
In [37]:
bods
Out[37]:
genderheightweightbmi
1M1.656523.87511
2M1.8210230.79338
3F1.565522.60026
4F1.664616.69328
5M1.727826.3656
6F1.66023.4375
7M1.87222.22222

Let’s get rid of the bmi column.

In [38]:
bods$bmi <- NULL
In [39]:
bods
Out[39]:
genderheightweight
1M1.6565
2M1.82102
3F1.5655
4F1.6646
5M1.7278
6F1.660
7M1.872

Work!

How many males are there?

In [ ]:




What is the mean height?

What is the mean weight for femalse?

In [ ]:




A person is classified as obese if their BMI exceeds 30. Add the BMI column back into the data frame, as well as a new logical column is.obese indicating if a person is obese or not.

In [ ]:




Reading data from files or URLs to dataframes

See Examples from the Quick-R website

In [ ]: