Python review of concepts¶
Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well.
Python as a language¶
Why Python?¶
Huge community - especially in data science and ML
Easy to learn
Batteries included
Extensive 3rd party libraries
Widely used in both industry and academia
Most important “glue” language bridging multiple communities
[1]:
import __hello__
Hello world!
Versions¶
Only use Python 3 (current release version is 3.8, container is 3.7)
Do not use Python 2
[2]:
import sys
[3]:
sys.version
[3]:
'3.8.5 (default, Jul 21 2020, 10:48:26) \n[Clang 11.0.3 (clang-1103.0.32.62)]'
Multi-paradigm¶
Object-oriented¶
[6]:
class Robot:
def __init__(self, name, function):
self.name = name
self.function = function
def greet(self):
return f"I am {self.name}, a {self.function} robot!"
[7]:
fido = Robot('roomba', 'vacuum cleaner')
[8]:
fido.name
[8]:
'roomba'
[9]:
fido.function
[9]:
'vacuum cleaner'
[10]:
fido.greet()
[10]:
'I am roomba, a vacuum cleaner robot!'
Dynamic typing¶
Complexity of a + b¶
[11]:
1 + 2.3
[11]:
3.3
[12]:
type(1), type(2.3)
[12]:
(int, float)
[13]:
'hello' + ' world'
[13]:
'hello world'
[14]:
[1,2,3] + [4,5,6]
[14]:
[1, 2, 3, 4, 5, 6]
[15]:
import numpy as np
np.arange(3) + 10
[15]:
array([10, 11, 12])
Several Python implementations!¶
CPtyhon
Pypy
IronPython
Jython
Global interpreter lock (GIL)¶
Only applies to CPython
Threads vs processes
Avoid threads in general
Performance not predictable
[16]:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
[17]:
def f(n):
x = np.random.uniform(0,1,n)
y = np.random.uniform(0,1,n)
count = 0
for i in range(n):
if x[i]**2 + y[i]**2 < 1:
count += 1
return count*4/n
[18]:
n = 100000
niter = 4
[19]:
%%time
[f(n) for i in range(niter)]
CPU times: user 525 ms, sys: 3.21 ms, total: 528 ms
Wall time: 528 ms
[19]:
[3.1392, 3.153, 3.14876, 3.14132]
[20]:
%%time
with ThreadPoolExecutor(4) as pool:
xs = list(pool.map(f, [n]*niter))
xs
CPU times: user 549 ms, sys: 6.37 ms, total: 556 ms
Wall time: 546 ms
[20]:
[3.14536, 3.1468, 3.13868, 3.14756]
[21]:
%%time
with ProcessPoolExecutor(4) as pool:
xs = list(pool.map(f, [n]*niter))
xs
---------------------------------------------------------------------------
BrokenProcessPool Traceback (most recent call last)
<timed exec> in <module>
/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/process.py in _chain_from_iterable_of_lists(iterable)
482 careful not to keep references to yielded objects.
483 """
--> 484 for element in iterable:
485 element.reverse()
486 while element:
/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py in result_iterator()
609 # Careful not to keep a reference to the popped future
610 if timeout is None:
--> 611 yield fs.pop().result()
612 else:
613 yield fs.pop().result(end_time - time.monotonic())
/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
437 raise CancelledError()
438 elif self._state == FINISHED:
--> 439 return self.__get_result()
440 else:
441 raise TimeoutError()
/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Coding in Python¶
[22]:
import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Coding conventions¶
PEP 8
Avoid magic numbers
Avoid copy and paste
extract common functionality into functions
Data types¶
Integers
Arbitrary precision
Integer division operator
Base conversion
Check if integer
[23]:
import math
[24]:
n = math.factorial(100)
[25]:
n
[25]:
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
[26]:
f'{n:,}'
[26]:
'93,326,215,443,944,152,681,699,238,856,266,700,490,715,968,264,381,621,468,592,963,895,217,599,993,229,915,608,941,463,976,156,518,286,253,697,920,827,223,758,251,185,210,916,864,000,000,000,000,000,000,000,000'
[27]:
h = math.sqrt(3**2 + 4**2)
[28]:
h
[28]:
5.0
[29]:
h.is_integer()
[29]:
True
Floats
Checking for equality
Catastrophic cancellation
Complex
[30]:
x = np.arange(9).reshape(3,3)
x = x / x.sum(axis=0)
λ = np.linalg.eigvals(x)
[31]:
λ[0]
[31]:
0.9999999999999993
[32]:
λ[0] == 1
[32]:
False
[33]:
math.isclose(λ[0], 1)
[33]:
True
[34]:
def var(xs):
"""Returns variance of sample data."""
n = 0
s = 0
ss = 0
for x in xs:
n +=1
s += x
ss += x*x
v = (ss - (s*s)/n)/(n-1)
return v
[35]:
xs = np.random.normal(1e9, 1, int(1e6))
[36]:
var(xs)
[36]:
13287.56835956836
[37]:
np.var(xs)
[37]:
1.0003007438816822
Boolean
What evaluates as False?
[38]:
stuff = [[], [1], {},'', 'hello', 0, 1, 1==1, 1==2]
for s in stuff:
if s:
print(f'{s} evaluates as True')
else:
print(f'{s} evaluates as False')
[] evaluates as False
[1] evaluates as True
{} evaluates as False
evaluates as False
hello evaluates as True
0 evaluates as False
1 evaluates as True
True evaluates as True
False evaluates as False
String
Unicode by default
b, r, f strings
[39]:
u'\u732b'
[39]:
'猫'
String formatting
Learn to use the f-string.
[40]:
import string
[41]:
char = 'e'
pos = string.ascii_lowercase.index(char) + 1
f"The letter {char} has position {pos} in the alphabet"
[41]:
'The letter e has position 5 in the alphabet'
[42]:
n = int(1e9)
f"{n:,}"
[42]:
'1,000,000,000'
[43]:
x = math.pi
[44]:
f"{x:8.2f}"
[44]:
' 3.14'
[45]:
import datetime
now = datetime.datetime.now()
now
[45]:
datetime.datetime(2020, 11, 11, 19, 23, 45, 578067)
[46]:
f"{now:%Y-%m-%d %H:%M}"
[46]:
'2020-11-11 19:23'
Data structures¶
Immutable - string, tulle
Mutable - list, set, dictionary
Collections module
heapq
[47]:
import collections
[x for x in dir(collections) if not x.startswith('_')]
[47]:
['ChainMap',
'Counter',
'OrderedDict',
'UserDict',
'UserList',
'UserString',
'abc',
'defaultdict',
'deque',
'namedtuple']
Functions¶
*args, **kwargs
Care with mutable default values
First class objects
Anonymous functions
Decorators
[48]:
def f(*args, **kwargs):
print(f"args = {args}") # in Python 3.8, you can just write f'{args = }'
print(f"kwargs = {kwargs}")
[49]:
f(1,2,3,a=4,b=5,c=6)
args = (1, 2, 3)
kwargs = {'a': 4, 'b': 5, 'c': 6}
[50]:
def g(a, xs=[]):
xs.append(a)
return xs
[51]:
g(1)
[51]:
[1]
[52]:
g(2)
[52]:
[1, 2]
[53]:
h = lambda x, y, z: x**2 + y**2 + z**2
[54]:
h(1,2,3)
[54]:
14
[55]:
from functools import lru_cache
[56]:
def fib(n):
print(n, end=', ')
if n <= 1:
return n
else:
return fib(n-2) + fib(n-1)
[57]:
fib(10)
10, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 9, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1,
[57]:
55
[58]:
@lru_cache(maxsize=100)
def fib_cache(n):
print(n, end=', ')
if n <= 1:
return n
else:
return fib_cache(n-2) + fib_cache(n-1)
[59]:
fib_cache(10)
10, 8, 6, 4, 2, 0, 1, 3, 5, 7, 9,
[59]:
55
Classes¶
Key idea is encapsulation into objects
Everything in Python is an object
Attributes and methods
What is self?
Special methods - double underscore methods
Avoid complex inheritance schemes - prefer composition
Learn “design patterns” if interested in OOP
[60]:
(3.0).is_integer()
[60]:
True
[61]:
'hello world'.title()
[61]:
'Hello World'
[62]:
class Student:
def __init__(self, first, last):
self.first = first
self.last = last
@property
def name(self):
return f'{self.first} {self.last}'
[63]:
s = Student('Santa', 'Claus')
[64]:
s.name
[64]:
'Santa Claus'
Enums¶
Use enums readability when you have a discrete set of CONSTANTS.
[65]:
from enum import Enum
[66]:
class Day(Enum):
MON = 1
TUE = 2
WED = 3
THU = 4
FRI = 5
SAT = 6
SUN = 7
[67]:
for day in Day:
print(day)
Day.MON
Day.TUE
Day.WED
Day.THU
Day.FRI
Day.SAT
Day.SUN
NamedTuple¶
[68]:
from collections import namedtuple
[69]:
Student = namedtuple('Student', ['name', 'email', 'age', 'gpa', 'species'])
[70]:
abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', 23, 3.4, 'Human')
[71]:
abe.species
[71]:
'Human'
[72]:
abe[1:4]
[72]:
('abe.lincoln@gmail.com', 23, 3.4)
Data Classes¶
Simplifies creation and use of classes for data records.
Note: NamedTuple serves a similar function but are immutable.
[73]:
from dataclasses import dataclass
[74]:
@dataclass
class Student:
name: str
email: str
age: int
gpa: float
species: str = 'Human'
[75]:
abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', age=23, gpa=3.4)
[76]:
abe
[76]:
Student(name='Abraham Lincoln', email='abe.lincoln@gmail.com', age=23, gpa=3.4, species='Human')
[77]:
abe.email
[77]:
'abe.lincoln@gmail.com'
[78]:
abe.species
[78]:
'Human'
Note
The type annotations are informative only. Python does not enforce them.
[79]:
Student(*'abcde')
[79]:
Student(name='a', email='b', age='c', gpa='d', species='e')
Imports, modules and namespaces¶
A namespace is basically just a dictionary
LEGB
Avoid polluting the global namespace
[80]:
[x for x in dir(__builtin__) if x[0].islower()][:8]
[80]:
['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray']
[81]:
x1 = 23
def f1(x2):
print(locals())
# x1 is global (G), x2 is enclosing (E), x3 is local
def g(x3):
print(locals())
return x3 + x2 + x1
return g
[82]:
x = 23
def f2(x):
print(locals())
def g(x):
print(locals())
return x
return g
[83]:
g1 = f1(3)
g1(2)
{'x2': 3}
{'x3': 2, 'x2': 3}
[83]:
28
[84]:
g2 = f2(3)
g2(2)
{'x': 3}
{'x': 2}
[84]:
2
Loops¶
Prefer vectorization unless using numba
Difference between continue and break
Avoid infinite loops
Comprehensions and generator expressions
[85]:
import string
[86]:
{char: ord(char) for char in string.ascii_lowercase}
[86]:
{'a': 97,
'b': 98,
'c': 99,
'd': 100,
'e': 101,
'f': 102,
'g': 103,
'h': 104,
'i': 105,
'j': 106,
'k': 107,
'l': 108,
'm': 109,
'n': 110,
'o': 111,
'p': 112,
'q': 113,
'r': 114,
's': 115,
't': 116,
'u': 117,
'v': 118,
'w': 119,
'x': 120,
'y': 121,
'z': 122}
Iterations and generators¶
The iterator protocol
__iter__
and__next__
iter()
next()
What happens in a for loop
Generators with
yield
andyield from
[87]:
class Iterator:
"""A silly class that implements the Iterator protocol and Strategy pattern.
start = start of range to square
stop = end of range to square
"""
def __init__(self, start, stop, func):
self.start = start
self.stop = stop
self.func = func
def __iter__(self):
self.n = self.start
return self
def __next__(self):
if self.n >= self.stop:
raise StopIteration
else:
x = self.func(self.n)
self.n += 1
return x
[88]:
sq = Iterator(0, 5, lambda x: x*x)
[89]:
list(sq)
[89]:
[0, 1, 4, 9, 16]
Generators¶
Like functions, but lazy.
[90]:
def cycle1(xs, n):
"""Cuycles through values in xs n times."""
for i in range(n):
for x in xs:
yield x
[91]:
list(cycle1([1,2,3], 4))
[91]:
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
[92]:
for x in cycle1(['ann', 'bob', 'stop', 'charles'], 1000):
if x == 'stop':
break
else:
print(x)
ann
bob
[93]:
def cycle2(xs, n):
"""Cuycles through values in xs n times."""
for i in range(n):
yield from xs
[94]:
list(cycle2([1,2,3], 4))
[94]:
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
Because they are lazy, generators can be used for infinite streams.
[95]:
def fib():
a, b = 1, 1
while True:
yield a
a, b = b, a + b
[96]:
for n in fib():
if n > 100:
break
print(n, end=', ')
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,
You can even slice infinite generators. More when we cover functional programming.
[97]:
import itertools as it
[98]:
list(it.islice(fib(), 5, 10))
[98]:
[8, 13, 21, 34, 55]
[ ]: