Run this once in a code cell
pip install version_information
In [1]:
import numpy as npa
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext version_information
%load_ext rpy2.ipython
Getting Started with Python¶
Live Demo of Jupyter Features¶
- Administration interface
- Files
- Running
- Uploading notebooks
- New notebook
- Notebook interface
- Menu
- Cells
- Keyboard shortcuts
- Getting help
- Notebook magics
bash
R
- Other
In [ ]:
Elements of Python¶
Code and comments¶
In [46]:
# The code below makes a list of numbers
list(range(3, 10, 3))
Out[46]:
[3, 6, 9]
Types¶
None¶
In [92]:
None
Strings¶
In [32]:
'a', 'hello', 'spaces are OK', "double quotes are OK too"
Out[32]:
('a', 'hello', 'spaces are OK', 'double quotes are OK too')
In [30]:
'''triple quoted strings
can span
multiple lines'''
Out[30]:
'triple quoted strings\ncan span\nmultiple lines'
String interpolation¶
Old style
In [40]:
'There are %d planets in our %s system' % (8, 'solar')
Out[40]:
'There are 8 planets in our solar system'
New style
In [41]:
'There are {} planets in our {} system'.format(3.14, 'lunar')
Out[41]:
'There are 3.14 planets in our lunar system'
Operators¶
Arithmetic¶
In [3]:
-1, 2+3, 7%3, 7/2, 7//2, 2**4
Out[3]:
(-1, 5, 1, 3.5, 3, 16, True, False, True, True)
Logical¶
In [39]:
True and True, True & False, True | False, 3 <= 4, 3 == 4, 3 != 4, 3 > 4
Out[39]:
(True, False, True, True, False, True, False)
Containers (Collections)¶
In [6]:
a_tuple = (1, 2, 3, 4)
a_list = ['a', 'b', 'c', 'd']
a_set = {1, 2, 2, 3, 3, 3}
a_dict = {'c': 1, 'b': 2, 'a': 3}
In [7]:
a_tuple
Out[7]:
(1, 2, 3, 4)
In [8]:
a_list
Out[8]:
['a', 'b', 'c', 'd']
In [9]:
a_set
Out[9]:
{1, 2, 3}
In [10]:
a_dict
Out[10]:
{'a': 3, 'b': 2, 'c': 1}
In [11]:
from collections import OrderedDict
In [12]:
a_ordereddict = OrderedDict([('c', 1), ('b', 2), ('a', 3)])
a_ordereddict
Out[12]:
OrderedDict([('c', 1), ('b', 2), ('a', 3)])
Indexing a container¶
In [13]:
a_tuple[0]
Out[13]:
1
In [14]:
a_list[1:4]
Out[14]:
['b', 'c', 'd']
In [15]:
a_dict['b']
Out[15]:
2
In [16]:
a_ordereddict['c']
Out[16]:
1
Conversion between types¶
In [85]:
x = 123
x, type(x)
Out[85]:
(123, int)
In [86]:
x = str(x)
x, type(x)
Out[86]:
('123', str)
In [87]:
x = float(x)
x, type(x)
Out[87]:
(123.0, float)
In [89]:
d = {'a': 1, 'b': 2}
type(d)
Out[89]:
dict
In [90]:
list(d)
Out[90]:
['a', 'b']
In [91]:
list(d.items())
Out[91]:
[('a', 1), ('b', 2)]
Generator objects¶
A generator is like a container that will only ever give you one thing at a time. They are very useful because they use up very little memory, allowing us to handle massive objects easily.
In [112]:
gen = (i**2 for i in range(10**40))
In [113]:
gen
Out[113]:
<generator object <genexpr> at 0x11af5b360>
In [114]:
next(gen)
Out[114]:
0
In [115]:
next(gen)
Out[115]:
1
In [116]:
next(gen)
Out[116]:
4
In [121]:
for i in range(3):
print(next(gen))
9
16
25
In [124]:
r = range(1,15)
r
Out[124]:
range(1, 15)
In [125]:
# Looping through a generator also works
for item in r:
if item % 5 == 0:
print(item)
5
10
In [126]:
# Be very careful of converting to list unless you know the size
list(r)
Out[126]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Controlling program flow¶
In [17]:
'a' if 3 < 4 else 'b'
Out[17]:
'a'
In [18]:
'a' if 4 < 3 else 'b'
Out[18]:
'b'
In [19]:
score = np.random.uniform(60, 100)
if score > 90:
print('A')
elif score > 80:
print('B')
else:
print('C')
A
Looping¶
In [20]:
list(range(10, 20, 2))
Out[20]:
[10, 12, 14, 16, 18]
In [21]:
for i in range(10, 20, 2):
print(i, i**2)
10 100
12 144
14 196
16 256
18 324
In [22]:
max_count = 5
count = 0
while (count < max_count):
print(count)
count += 1
0
1
2
3
4
3 Ways to Populate a List¶
List Comprehension¶
In [23]:
[x**2 for x in range(5)]
Out[23]:
[0, 1, 4, 9, 16]
In [24]:
[x**2 for x in range(5) if x % 2 == 0]
Out[24]:
[0, 4, 16]
Looping¶
In [1]:
xs = []
for x in range(5):
xs.append(x**2)
xs
Out[1]:
[0, 1, 4, 9, 16]
In [3]:
xs = []
for x in range(5):
if x % 2 == 0:
xs.append(x**2)
xs
Out[3]:
[0, 4, 16]
Map and filter¶
In [6]:
list(map(lambda x: x**2, range(5)))
Out[6]:
[0, 1, 4, 9, 16]
In [7]:
list(map(lambda x: x**2, filter(lambda x: x % 2 == 0, range(5))))
Out[7]:
[0, 4, 16]
User-defined functions¶
In [44]:
def f(x):
"""Say something about the function here."""
return x
In [26]:
f(3.14)
Out[26]:
3.14
In [27]:
def g(a, b):
"""Calculate the sum of a and b."""
return a + b
In [28]:
g(3, 4)
Out[28]:
7
Default arguments¶
In [29]:
def h(a= 0 , b = 1, c = 2):
"""Cacluates some complicated mathematical function."""
return a + 2*b + 3*c
In [30]:
h()
Out[30]:
8
In [31]:
h(1)
Out[31]:
9
In [32]:
h(1, 2)
Out[32]:
11
In [33]:
h(1, 2, 3)
Out[33]:
14
In [34]:
h(c = 1, b = 2, a = 3)
Out[34]:
10
Using Libraries¶
In [35]:
import math
math.pi
Out[35]:
3.141592653589793
In [36]:
import numpy as np
np.linspace(0, 1, 11)
Out[36]:
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
In [37]:
from numpy.random import rand
In [38]:
rand(4)
Out[38]:
array([ 0.12365436, 0.57294035, 0.08970509, 0.70075327])
Built-in functions¶
Many functions are automatically imported into the main namespace. That
is why we can use functions such as range
or list
without
importing them first.
In [97]:
', '.join(dir(__builtin__))
Out[97]:
'ArithmeticError, AssertionError, AttributeError, BaseException, BlockingIOError, BrokenPipeError, BufferError, BytesWarning, ChildProcessError, ConnectionAbortedError, ConnectionError, ConnectionRefusedError, ConnectionResetError, DeprecationWarning, EOFError, Ellipsis, EnvironmentError, Exception, False, FileExistsError, FileNotFoundError, FloatingPointError, FutureWarning, GeneratorExit, IOError, ImportError, ImportWarning, IndentationError, IndexError, InterruptedError, IsADirectoryError, KeyError, KeyboardInterrupt, LookupError, MemoryError, NameError, None, NotADirectoryError, NotImplemented, NotImplementedError, OSError, OverflowError, PendingDeprecationWarning, PermissionError, ProcessLookupError, RecursionError, ReferenceError, ResourceWarning, RuntimeError, RuntimeWarning, StopAsyncIteration, StopIteration, SyntaxError, SyntaxWarning, SystemError, SystemExit, TabError, TimeoutError, True, TypeError, UnboundLocalError, UnicodeDecodeError, UnicodeEncodeError, UnicodeError, UnicodeTranslateError, UnicodeWarning, UserWarning, ValueError, Warning, ZeroDivisionError, __IPYTHON__, __build_class__, __debug__, __doc__, __import__, __loader__, __name__, __package__, __spec__, abs, all, any, ascii, bin, bool, bytearray, bytes, callable, chr, classmethod, compile, complex, copyright, credits, delattr, dict, dir, divmod, dreload, enumerate, eval, exec, filter, float, format, frozenset, get_ipython, getattr, globals, hasattr, hash, help, hex, id, input, int, isinstance, issubclass, iter, len, license, list, locals, map, max, memoryview, min, next, object, oct, open, ord, pow, print, property, range, repr, reversed, round, set, setattr, slice, sorted, staticmethod, str, sum, super, tuple, type, vars, zip'
In [98]:
help(zip)
Help on class zip in module builtins:
class zip(object)
| zip(iter1 [,iter2 [...]]) --> zip object
|
| Return a zip object whose .__next__() method returns a tuple where
| the i-th element comes from the i-th iterable argument. The .__next__()
| method continues until the shortest iterable in the argument sequence
| is exhausted and then it raises StopIteration.
|
| Methods defined here:
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __iter__(self, /)
| Implement iter(self).
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| __next__(self, /)
| Implement next(self).
|
| __reduce__(...)
| Return state information for pickling.
In [103]:
list(zip(['a', 'b', 'c'], range(10)))
Out[103]:
[('a', 0), ('b', 1), ('c', 2)]
Working with vectors and arrays¶
In [39]:
A = np.random.random((3,4))
A
Out[39]:
array([[ 0.72558757, 0.62773131, 0.84824601, 0.23886442],
[ 0.52841628, 0.71965435, 0.32039652, 0.29125637],
[ 0.81117304, 0.96326656, 0.63946624, 0.51574475]])
Indexing a matrix¶
In [40]:
A[0,0]
Out[40]:
0.7255875679496957
In [41]:
A[2,3]
Out[41]:
0.5157447477346595
In [42]:
A[1]
Out[42]:
array([ 0.52841628, 0.71965435, 0.32039652, 0.29125637])
In [43]:
A[1, :]
Out[43]:
array([ 0.52841628, 0.71965435, 0.32039652, 0.29125637])
In [44]:
A[:, 2]
Out[44]:
array([ 0.84824601, 0.32039652, 0.63946624])
In [45]:
A[:2, 1:]
Out[45]:
array([[ 0.62773131, 0.84824601, 0.23886442],
[ 0.71965435, 0.32039652, 0.29125637]])
In [46]:
A[1:3, 1:3]
Out[46]:
array([[ 0.71965435, 0.32039652],
[ 0.96326656, 0.63946624]])
Vectorized functions¶
In [47]:
A * 10
Out[47]:
array([[ 7.25587568, 6.27731313, 8.48246012, 2.38864421],
[ 5.28416285, 7.19654349, 3.20396517, 2.9125637 ],
[ 8.11173037, 9.63266563, 6.39466244, 5.15744748]])
In [48]:
A.sum()
Out[48]:
7.2298034265026363
In [49]:
A.sum(axis = 0)
Out[49]:
array([ 2.06517689, 2.31065222, 1.80810877, 1.04586554])
In [50]:
A.sum(axis = 1)
Out[50]:
array([ 2.44042931, 1.85972352, 2.92965059])
In [51]:
A.max(axis = 0)
Out[51]:
array([ 0.81117304, 0.96326656, 0.84824601, 0.51574475])
In [52]:
A.T @ A
Out[52]:
array([[ 1.46370278, 1.61712698, 1.30349727, 0.7455799 ],
[ 1.61712698, 1.83983145, 1.37902178, 0.85634626],
[ 1.30349727, 1.37902178, 1.2310923 , 0.62573468],
[ 0.7455799 , 0.85634626, 0.62573468, 0.40787913]])
Input and output¶
We will mostly be using the pandas
library to read in tabular data
files, so this section is just for completeness.
In [59]:
%%file test1.csv
1,2,3
4,5,6
Writing test1.csv
The open
function returns a generator
, allowing us to loop
through each line of potentially massive files without using much
memory.
In [66]:
with open('test1.csv', 'r') as f:
for line in f:
print(line, end='')
1,2,3
4,5,6
In [80]:
s = ['to be', 'or not to be']
with open('test2.txt', 'w') as f:
f.write('\n'.join(s))
In [ ]:
with open('text2.txt', 'a') as f:
f.write('that is the question')
In [76]:
! cat 'test2.txt'
to be
or not to be
that is the question
Warning: This may use a large amount of memory. The line by line approach shown above is recommended.
In [77]:
with open('test2.txt', 'r') as f:
s = f.read()
s
Out[77]:
'to be\nor not to be\nthat is the question'
Warning: This may use a large amount of memory. The line by line approach shown above is recommended.
In [78]:
with open('test2.txt', 'r') as f:
s = f.readlines()
s
Out[78]:
['to be\n', 'or not to be\n', 'that is the question']
Getting comfortable with error messages¶
In [49]:
foo
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-49-d3b07384d113> in <module>()
----> 1 foo
NameError: name 'foo' is not defined
In [47]:
Sort([2,3,1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-47-dd11fece4740> in <module>()
----> 1 Sort([2,3,1])
NameError: name 'Sort' is not defined
In [1]:
for i in range(3):
print(i)
File "<ipython-input-1-e9b0282dd71e>", line 2
print(i)
^
IndentationError: expected an indented block
In [48]:
3 + '1'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-48-08faeb277c1e> in <module>()
----> 1 3 + '1'
TypeError: unsupported operand type(s) for +: 'int' and 'str'
In [50]:
numbers = [1,2,3]
numbers[3]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-50-f377571de57c> in <module>()
1 numbers = [1,2,3]
----> 2 numbers[3]
IndexError: list index out of range
In [52]:
contacts = {'bart': 'ann@fox.cartoons.org', 'bob': 'bob@pinapple.under.thesea'}
contacts['homer']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-52-2aff8d06b7b7> in <module>()
1 contacts = {'bart': 'ann@fox.cartoons.org', 'bob': 'bob@pinapple.under.thesea'}
----> 2 contacts['homer']
KeyError: 'homer'
In [53]:
x = 1 // 3
y = 3 // x
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-53-95c864b76059> in <module>()
1 x = 1 // 3
----> 2 y = 3 // x
ZeroDivisionError: integer division or modulo by zero
In [54]:
range(1,2,3,4)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-54-98baec868967> in <module>()
----> 1 range(1,2,3,4)
TypeError: range expected at most 3 arguments, got 4
In [55]:
open('spongebob.txt')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-55-defe1ddacd33> in <module>()
----> 1 open('spongebob.txt')
FileNotFoundError: [Errno 2] No such file or directory: 'spongebob.txt'
Exercises¶
1a. Create a list of the integers from 1 to 9. The solution is
[1, 2, 3, 4, 5, 6, 7, 8, 9]
1b. Create a list of the cubes of the odd integers from 1 to 9. The solution is
[1, 3, 5, 7, 9]
1c. Create a list of the cubes of the odd integers from 1 to 9. The solution is
[1, 27, 125, 343, 729]
In [ ]:
2 Strings have many useful methods. Herw we will learn how to read the standard Python documentation for strings to understand what some of these string methods do. We will work with the following DNA sequence (gi|568815592:31575567-31578336 Homo sapiens chromosome 6, GRCh38.p7 Primary Assembly)
CAGACGCTCCCTCAGCAAGGACAGCAGAGGACCAGCTAAGAGGGAGAGAAGCAACTACAGACCCCCCCTG
AAAACAACCCTCAGACGCCACATCCCCTGACAAGCTGCCAGGCAGGTTCTCTTCCTCTCACATACTGACC
CACGGCTCCACCCTCTCTCCCCTGGAAAGGACACCATGAGCACTGAAAGCATGATCCGGGACGTGGAGCT
GGCCGAGGAGGCGCTCCCCAAGAAGACAGGGGGGCCCCAGGGCTCCAGGCGGTGCTTGTTCCTCAGCCTC
TTCTCCTTCCTGATCGTGGCAGGCGCCACCACGCTCTTCTGCCTGCTGCACTTTGGAGTGATCGGCCCCC
AGAGGGAAGAGGTGAGTGCCTGGCCAGCCTTCATCCACTCTCCCACCCAAGGGGAAATGGAGACGCAAGA
GAGGGAGAGAGATGGGATGGGTGAAAGATGTGCGCTGATAGGGAGGGATGGAGAGAAAAAAACGTGGAGA
AAGACGGGGATGCAGAAAGAGATGTGGCAAGAGATGGGGAAGAGAGAGAGAGAAAGATGGAGAGACAGGA
TGTCTGGCACATGGAAGGTGCTCACTAAGTGTGTATGGAGTGAATGAATGAATGAATGAATGAACAAGCA
GATATATAAATAAGATATGGAGACAGATGTGGGGTGTGAGAAGAGAGATGGGGGAAGAAACAAGTGATAT
GAATAAAGATGGTGAGACAGAAAGAGCGGGAAATATGACAGCTAAGGAGAGAGATGGGGGAGATAAGGAG
AGAAGAAGATAGGGTGTCTGGCACACAGAAGACACTCAGGGAAAGAGCTGTTGAATGCCTGGAAGGTGAA
TACACAGATGAATGGAGAGAGAAAACCAGACACCTCAGGGCTAAGAGCGCAGGCCAGACAGGCAGCCAGC
TGTTCCTCCTTTAAGGGTGACTCCCTCGATGTTAACCATTCTCCTTCTCCCCAACAGTTCCCCAGGGACC
TCTCTCTAATCAGCCCTCTGGCCCAGGCAGTCAGTAAGTGTCTCCAAACCTCTTTCCTAATTCTGGGTTT
GGGTTTGGGGGTAGGGTTAGTACCGGTATGGAAGCAGTGGGGGAAATTTAAAGTTTTGGTCTTGGGGGAG
GATGGATGGAGGTGAAAGTAGGGGGGTATTTTCTAGGAAGTTTAAGGGTCTCAGCTTTTTCTTTTCTCTC
TCCTCTTCAGGATCATCTTCTCGAACCCCGAGTGACAAGCCTGTAGCCCATGTTGTAGGTAAGAGCTCTG
AGGATGTGTCTTGGAACTTGGAGGGCTAGGATTTGGGGATTGAAGCCCGGCTGATGGTAGGCAGAACTTG
GAGACAATGTGAGAAGGACTCGCTGAGCTCAAGGGAAGGGTGGAGGAACAGCACAGGCCTTAGTGGGATA
CTCAGAACGTCATGGCCAGGTGGGATGTGGGATGACAGACAGAGAGGACAGGAACCGGATGTGGGGTGGG
CAGAGCTCGAGGGCCAGGATGTGGAGAGTGAACCGACATGGCCACACTGACTCTCCTCTCCCTCTCTCCC
TCCCTCCAGCAAACCCTCAAGCTGAGGGGCAGCTCCAGTGGCTGAACCGCCGGGCCAATGCCCTCCTGGC
CAATGGCGTGGAGCTGAGAGATAACCAGCTGGTGGTGCCATCAGAGGGCCTGTACCTCATCTACTCCCAG
GTCCTCTTCAAGGGCCAAGGCTGCCCCTCCACCCATGTGCTCCTCACCCACACCATCAGCCGCATCGCCG
TCTCCTACCAGACCAAGGTCAACCTCCTCTCTGCCATCAAGAGCCCCTGCCAGAGGGAGACCCCAGAGGG
GGCTGAGGCCAAGCCCTGGTATGAGCCCATCTATCTGGGAGGGGTCTTCCAGCTGGAGAAGGGTGACCGA
CTCAGCGCTGAGATCAATCGGCCCGACTATCTCGACTTTGCCGAGTCTGGGCAGGTCTACTTTGGGATCA
TTGCCCTGTGAGGAGGACGAACATCCAACCTTCCCAAACGCCTCCCCTGCCCCAATCCCTTTATTACCCC
CTCCTTCAGACACCCTCAACCTCTTCTGGCTCAAAAAGAGAATTGGGGGCTTAGGGTCGGAACCCAAGCT
TAGAACTTTAAGCAACAAGACCACCACTTCGAAACCTGGGATTCAGGAATGTGTGGCCTGCACAGTGAAG
TGCTGGCAACCACTAAGAATTCAAACTGGGGCCTCCAGAACTCACTGGGGCCTACAGCTTTGATCCCTGA
CATCTGGAATCTGGAGACCAGGGAGCCTTTGGTTCTGGCCAGAATGCTGCAGGACTTGAGAAGACCTCAC
CTAGAAATTGACACAAGTGGACCTTAGGCCTTCCTCTCTCCAGATGTTTCCAGACTTCCTTGAGACACGG
AGCCCAGCCCTCCCCATGGAGCCAGCTCCCTCTATTTATGTTTGCACTTGTGATTATTTATTATTTATTT
ATTATTTATTTATTTACAGATGAATGTATTTATTTGGGAGACCGGGGTATCCTGGGGGACCCAATGTAGG
AGCTGCCTTGGCTCAGACATGTTTTCCGTGAAAACGGAGCTGAACAATAGGCTGTTCCCATGTAGCCCCC
TGGCCTCTGTGCCTTCTTTTGATTATGTTTTTTAAAATATTTATCTGATTAAGTTGTCTAAACAATGCTG
ATTTGGTGACCAACTGTCACTCATTGCTGAGCCTCTGCTCCCCAGGGGAGTTGTGTCTGTAATCGCCCTA
CTATTCAGTGGCGAGAAATAAAGTTTGCTTAGAAAAGAAA
2a. Assign the sequence to a string variable called tnf
. (Hint:
This string spans multiple lines)
2b. Calculate the GC content
GC content
2c. Find the RNA transcript using the mapping A to U, T to A, G to C, C to G.
In [ ]:
3. You have 5 kids in your household, whose behavior has been
- Ann, Good
- Bob, Bad
- Charlie, Good
- David, Good
- Ella, Bad
On Christmas Eve, Santa will give good kids an iPhone 7 and bad kids a lump of coal.
3a. Store the kids name and behavior in a dictionary called
santa_dict
.
3b. On Christmas Eve Eve, David threw a tantrum and kicked his sister Ann. Change the dictionary entry for David to Bad.
3c. Write a loop that prints the name of each child, followed by ‘Coal’ or ‘iPhone’. The output should be
David Coal
Ann iPhone
Ella Coal
Charlie iPhone
Bob Coal
In [ ]:
4a. Write a function (call the function collatz
) of a positive
integer that returns the following result
- If the number is even, divide it by two
- If the number is odd, triple it and add one
Collatz
4b. Write a loop that repeatedly calls n = collatz(n)
given some
start value n
while n
is not equal to 1. At each iteration in
the loop, print the current value of n
.
4c. Write a function collatz_sequence
that takes a positive
integer argument n
and returns the list of numbers generated by the
while loop from 4b starting with the given value of n
. For
example, collatz_sequence(6)
should give the following output:
[6, 3, 10, 5, 16, 8, 4, 2, 1]
In [17]:
Version information¶
In [53]:
%load_ext version_information
%version_information
The version_information extension is already loaded. To reload it, use:
%reload_ext version_information
Out[53]:
Software | Version |
---|---|
Python | 3.5.2 64bit [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] |
IPython | 5.0.0 |
OS | Darwin 15.6.0 x86_64 i386 64bit |
Tue Aug 16 09:04:41 2016 EDT |
In [ ]: