Duke HTS Summer Course 2017 - Computational Bootcamp¶
The first thing you should do with this notebook is make a copy! Go to the file menu and choose ‘Make a Copy’.
Great! Now you should be working in a file called ‘IntroToCompBoot-Copy1’
Briefly go back to the other tab and choose ‘Close and Halt’ from the file menu. Why did we do this? Because we have the lecture materials in something called a ‘git repository’. We want whatever changes you make to be saved under different filenames, so they don’t get overwritten.
Purpose of this component¶
In this course, you will learn how to generate and analyze RNAseq data. Roughly, we have the following components:
- Experimental Design How do we design experiments so that results are easily interpretable and answer the question(s) we are interested in. (Statistics!!!)
- Analysis of Data How do we properly analyze experimental data, so that results are correct. (More statistics!!!)
- Computational Procedure Analysis pipeline (Bioinformatics)
Finally, we want to do all of the above in a REPRODUCIBLE fashion.
This ‘computational bootcamp’ component is designed to give you the tools you need to carry out the steps above - and to do so within the context of reproducible research.
The analysis pipeline has several different apps. Some are written in R, and so require knowledge of the R programming language (minimal - but some proficiency is needed). Some have components in python, and some are binaries. Both of these last types of applications require moderate proficiency in the ‘bash shell’ or ‘unix command line’. Therefore, we will cover the following topics:
- Basic R
- Basic Unix/Linux commands
Additionally, we will use the bootcamp to reinforce the statistical lecture materials by walking you through some of the examples using R, and we will cover some ‘data visualization’ techniques that include graphics in R.
All of this will be done within the Jupyter notebook tool, which allows for what is called ‘literate programming’ and reproducible pipelines.
How this is all setup¶

image
The Notebook¶
The Jupyter notebook is an interactive tool that allows the integration of text, code and graphics in one unified environment.
Markdown Cells¶
The notebook uses the concept of ‘cells’. There are two types of cells: code and markdown. What you are reading now is in a ‘cell’ and that ‘cell type’ is called ‘markdown’ (essentially just text, but you can use some special sequences to get different effects). Markdown is a subset of html (if that means anything to you).
Give it a try! Double click on this cell to get into edit mode. Then, see how this word got italicized and how this one is bold. Here is a bulleted list:
- item 1
- item 2
- sub item 2a
- sub item 2b
How about a bit of math?
That last one is something called latex, and if you aren’t familiar, don’t worry. It’s just to show you that you can do some very pretty things in these kinds of cells.
Code Cells¶
Code cells rely on something called a ‘kernel’. The kernel is simply the programming environment that a particular notebook file is running. This one is running an R kernel. That means that cells which are not markdown cells (the code cells) are interpreted to be R code (as opposed to shell commands, python, or one of a rapidly growing list of Jupyter kernels). The following cell is a code cell. Note the In[]:
In [2]:
print("This is a code cell")
[1] "This is a code cell"
We execute code cells (and markdown cells) by hitting <shift><Enter>. Take a moment to play around a bit and get comfortable. Click on the ‘help’ menu and look at the keyboard shortcuts (they are very useful).
Examples¶
In [1]:
# Do some arithmetic here using the R kernel. Insert another R cell below and try some more.
In [2]:
# Make the next cell into a markdown cell (using the keyboard shortcut) and type some text.
# Can you make it bold? Italicized?
# Can you figure out how to make headings?
In [ ]: