Python review of concepts

Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well.

Python as a language

Why Python?

  • Huge community - especially in data science and ML

  • Easy to learn

  • Batteries included

  • Extensive 3rd party libraries

  • Widely used in both industry and academia

  • Most important “glue” language bridging multiple communities

[1]:
import __hello__
Hello world!

Versions

  • Only use Python 3 (current release version is 3.8, container is 3.7)

  • Do not use Python 2

[2]:
import sys
[3]:
sys.version
[3]:
'3.8.5 (default, Jul 21 2020, 10:48:26) \n[Clang 11.0.3 (clang-1103.0.32.62)]'

Multi-paradigm

Procedural

[4]:
x = []
for i in range(5):
    x.append(i*i)
x
[4]:
[0, 1, 4, 9, 16]

Functional

[5]:
list(map(lambda x: x*x, range(5)))
[5]:
[0, 1, 4, 9, 16]

Object-oriented

[6]:
class Robot:
    def __init__(self, name, function):
        self.name = name
        self.function = function

    def greet(self):
        return f"I am {self.name}, a {self.function} robot!"
[7]:
fido = Robot('roomba', 'vacuum cleaner')
[8]:
fido.name
[8]:
'roomba'
[9]:
fido.function
[9]:
'vacuum cleaner'
[10]:
fido.greet()
[10]:
'I am roomba, a vacuum cleaner robot!'

Dynamic typing

Complexity of a + b

[11]:
1 + 2.3
[11]:
3.3
[12]:
type(1), type(2.3)
[12]:
(int, float)
[13]:
'hello' + ' world'
[13]:
'hello world'
[14]:
[1,2,3] + [4,5,6]
[14]:
[1, 2, 3, 4, 5, 6]
[15]:
import numpy as np

np.arange(3) + 10
[15]:
array([10, 11, 12])

Several Python implementations!

  • CPtyhon

  • Pypy

  • IronPython

  • Jython

Global interpreter lock (GIL)

  • Only applies to CPython

  • Threads vs processes

  • Avoid threads in general

  • Performance not predictable

[16]:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
[17]:
def f(n):
    x = np.random.uniform(0,1,n)
    y = np.random.uniform(0,1,n)
    count = 0
    for i in range(n):
        if x[i]**2 + y[i]**2 < 1:
            count += 1
    return count*4/n
[18]:
n = 100000
niter = 4
[19]:
%%time

[f(n) for i in range(niter)]
CPU times: user 525 ms, sys: 3.21 ms, total: 528 ms
Wall time: 528 ms
[19]:
[3.1392, 3.153, 3.14876, 3.14132]
[20]:
%%time

with ThreadPoolExecutor(4) as pool:
    xs = list(pool.map(f, [n]*niter))
xs
CPU times: user 549 ms, sys: 6.37 ms, total: 556 ms
Wall time: 546 ms
[20]:
[3.14536, 3.1468, 3.13868, 3.14756]
[21]:
%%time

with ProcessPoolExecutor(4) as pool:
    xs = list(pool.map(f, [n]*niter))
xs
---------------------------------------------------------------------------
BrokenProcessPool                         Traceback (most recent call last)
<timed exec> in <module>

/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/process.py in _chain_from_iterable_of_lists(iterable)
    482     careful not to keep references to yielded objects.
    483     """
--> 484     for element in iterable:
    485         element.reverse()
    486         while element:

/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py in result_iterator()
    609                     # Careful not to keep a reference to the popped future
    610                     if timeout is None:
--> 611                         yield fs.pop().result()
    612                     else:
    613                         yield fs.pop().result(end_time - time.monotonic())

/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
    437                 raise CancelledError()
    438             elif self._state == FINISHED:
--> 439                 return self.__get_result()
    440             else:
    441                 raise TimeoutError()

/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
    386     def __get_result(self):
    387         if self._exception:
--> 388             raise self._exception
    389         else:
    390             return self._result

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Coding in Python

[22]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Coding conventions

  • PEP 8

  • Avoid magic numbers

  • Avoid copy and paste

  • extract common functionality into functions

Style Guide for Python Code

Data types

  • Integers

    • Arbitrary precision

    • Integer division operator

    • Base conversion

    • Check if integer

[23]:
import math
[24]:
n = math.factorial(100)
[25]:
n
[25]:
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
[26]:
f'{n:,}'
[26]:
'93,326,215,443,944,152,681,699,238,856,266,700,490,715,968,264,381,621,468,592,963,895,217,599,993,229,915,608,941,463,976,156,518,286,253,697,920,827,223,758,251,185,210,916,864,000,000,000,000,000,000,000,000'
[27]:
h = math.sqrt(3**2 + 4**2)
[28]:
h
[28]:
5.0
[29]:
h.is_integer()
[29]:
True
  • Floats

    • Checking for equality

    • Catastrophic cancellation

  • Complex

[30]:
x = np.arange(9).reshape(3,3)
x = x / x.sum(axis=0)
λ = np.linalg.eigvals(x)
[31]:
λ[0]
[31]:
0.9999999999999993
[32]:
λ[0] == 1
[32]:
False
[33]:
math.isclose(λ[0], 1)
[33]:
True
[34]:
def var(xs):
    """Returns variance of sample data."""

    n = 0
    s = 0
    ss = 0

    for x in xs:
        n +=1
        s += x
        ss += x*x

    v = (ss - (s*s)/n)/(n-1)
    return v
[35]:
xs = np.random.normal(1e9, 1, int(1e6))
[36]:
var(xs)
[36]:
13287.56835956836
[37]:
np.var(xs)
[37]:
1.0003007438816822
  • Boolean

    • What evaluates as False?

[38]:
stuff = [[], [1], {},'', 'hello', 0, 1, 1==1, 1==2]
for s in stuff:
    if s:
        print(f'{s} evaluates as True')
    else:
        print(f'{s} evaluates as False')
[] evaluates as False
[1] evaluates as True
{} evaluates as False
 evaluates as False
hello evaluates as True
0 evaluates as False
1 evaluates as True
True evaluates as True
False evaluates as False
  • String

    • Unicode by default

    • b, r, f strings

[39]:
u'\u732b'
[39]:
'猫'

String formatting

  • Learn to use the f-string.

[40]:
import string
[41]:
char = 'e'
pos = string.ascii_lowercase.index(char) + 1
f"The letter {char} has position {pos} in the alphabet"
[41]:
'The letter e has position 5 in the alphabet'
[42]:
n = int(1e9)
f"{n:,}"
[42]:
'1,000,000,000'
[43]:
x = math.pi
[44]:
f"{x:8.2f}"
[44]:
'    3.14'
[45]:
import datetime
now = datetime.datetime.now()
now
[45]:
datetime.datetime(2020, 11, 11, 19, 23, 45, 578067)
[46]:
f"{now:%Y-%m-%d %H:%M}"
[46]:
'2020-11-11 19:23'

Data structures

  • Immutable - string, tulle

  • Mutable - list, set, dictionary

  • Collections module

  • heapq

[47]:
import collections

[x for x in dir(collections) if not x.startswith('_')]
[47]:
['ChainMap',
 'Counter',
 'OrderedDict',
 'UserDict',
 'UserList',
 'UserString',
 'abc',
 'defaultdict',
 'deque',
 'namedtuple']

Functions

  • *args, **kwargs

  • Care with mutable default values

  • First class objects

  • Anonymous functions

  • Decorators

[48]:
def f(*args, **kwargs):
    print(f"args = {args}") # in Python 3.8, you can just write f'{args = }'
    print(f"kwargs = {kwargs}")
[49]:
f(1,2,3,a=4,b=5,c=6)
args = (1, 2, 3)
kwargs = {'a': 4, 'b': 5, 'c': 6}
[50]:
def g(a, xs=[]):
    xs.append(a)
    return xs
[51]:
g(1)
[51]:
[1]
[52]:
g(2)
[52]:
[1, 2]
[53]:
h = lambda x, y, z: x**2 + y**2 + z**2
[54]:
h(1,2,3)
[54]:
14
[55]:
from functools import lru_cache
[56]:
def fib(n):
    print(n, end=', ')
    if n <= 1:
        return n
    else:
        return fib(n-2) + fib(n-1)
[57]:
fib(10)
10, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 9, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1,
[57]:
55
[58]:
@lru_cache(maxsize=100)
def fib_cache(n):
    print(n, end=', ')
    if n <= 1:
        return n
    else:
        return fib_cache(n-2) + fib_cache(n-1)
[59]:
fib_cache(10)
10, 8, 6, 4, 2, 0, 1, 3, 5, 7, 9,
[59]:
55

Classes

  • Key idea is encapsulation into objects

  • Everything in Python is an object

  • Attributes and methods

  • What is self?

  • Special methods - double underscore methods

  • Avoid complex inheritance schemes - prefer composition

  • Learn “design patterns” if interested in OOP

[60]:
(3.0).is_integer()
[60]:
True
[61]:
'hello world'.title()
[61]:
'Hello World'
[62]:
class Student:
    def __init__(self, first, last):
        self.first = first
        self.last = last

    @property
    def name(self):
        return f'{self.first} {self.last}'
[63]:
s = Student('Santa', 'Claus')
[64]:
s.name
[64]:
'Santa Claus'

Enums

Use enums readability when you have a discrete set of CONSTANTS.

[65]:
from enum import Enum
[66]:
class Day(Enum):
    MON = 1
    TUE = 2
    WED = 3
    THU = 4
    FRI = 5
    SAT = 6
    SUN = 7
[67]:
for day in Day:
    print(day)
Day.MON
Day.TUE
Day.WED
Day.THU
Day.FRI
Day.SAT
Day.SUN

NamedTuple

[68]:
from collections import namedtuple
[69]:
Student = namedtuple('Student', ['name', 'email', 'age', 'gpa', 'species'])
[70]:
abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', 23, 3.4, 'Human')
[71]:
abe.species
[71]:
'Human'
[72]:
abe[1:4]
[72]:
('abe.lincoln@gmail.com', 23, 3.4)

Data Classes

Simplifies creation and use of classes for data records.

Note: NamedTuple serves a similar function but are immutable.

[73]:
from dataclasses import dataclass
[74]:
@dataclass
class Student:
    name: str
    email: str
    age: int
    gpa: float
    species: str = 'Human'
[75]:
abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', age=23, gpa=3.4)
[76]:
abe
[76]:
Student(name='Abraham Lincoln', email='abe.lincoln@gmail.com', age=23, gpa=3.4, species='Human')
[77]:
abe.email
[77]:
'abe.lincoln@gmail.com'
[78]:
abe.species
[78]:
'Human'

Note

The type annotations are informative only. Python does not enforce them.

[79]:
Student(*'abcde')
[79]:
Student(name='a', email='b', age='c', gpa='d', species='e')

Imports, modules and namespaces

  • A namespace is basically just a dictionary

  • LEGB

  • Avoid polluting the global namespace

[80]:
[x for x in dir(__builtin__) if x[0].islower()][:8]
[80]:
['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray']
[81]:
x1 = 23

def f1(x2):
    print(locals())
    # x1 is global (G), x2 is enclosing (E), x3 is local
    def g(x3):
        print(locals())
        return x3 + x2 + x1
    return g
[82]:
x = 23

def f2(x):
    print(locals())
    def g(x):
        print(locals())
        return x
    return g
[83]:
g1 = f1(3)
g1(2)
{'x2': 3}
{'x3': 2, 'x2': 3}
[83]:
28
[84]:
g2 = f2(3)
g2(2)
{'x': 3}
{'x': 2}
[84]:
2

Loops

  • Prefer vectorization unless using numba

  • Difference between continue and break

  • Avoid infinite loops

  • Comprehensions and generator expressions

[85]:
import string
[86]:
{char: ord(char) for char in string.ascii_lowercase}
[86]:
{'a': 97,
 'b': 98,
 'c': 99,
 'd': 100,
 'e': 101,
 'f': 102,
 'g': 103,
 'h': 104,
 'i': 105,
 'j': 106,
 'k': 107,
 'l': 108,
 'm': 109,
 'n': 110,
 'o': 111,
 'p': 112,
 'q': 113,
 'r': 114,
 's': 115,
 't': 116,
 'u': 117,
 'v': 118,
 'w': 119,
 'x': 120,
 'y': 121,
 'z': 122}

Iterations and generators

  • The iterator protocol

    • __iter__ and __next__

    • iter()

    • next()

  • What happens in a for loop

  • Generators with yield and yield from

[87]:
class Iterator:
    """A silly class that implements the Iterator protocol and Strategy pattern.

    start = start of range to square
    stop = end of range to square
    """
    def __init__(self, start, stop, func):
        self.start = start
        self.stop = stop
        self.func = func

    def __iter__(self):
        self.n = self.start
        return self

    def __next__(self):
        if self.n >= self.stop:
            raise StopIteration
        else:
            x = self.func(self.n)
            self.n += 1
            return x
[88]:
sq = Iterator(0, 5, lambda x: x*x)
[89]:
list(sq)
[89]:
[0, 1, 4, 9, 16]

Generators

Like functions, but lazy.

[90]:
def cycle1(xs, n):
    """Cuycles through values in xs n times."""

    for i in range(n):
        for x in xs:
            yield x
[91]:
list(cycle1([1,2,3], 4))
[91]:
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
[92]:
for x in cycle1(['ann', 'bob', 'stop', 'charles'], 1000):
    if x == 'stop':
        break
    else:
        print(x)
ann
bob
[93]:
def cycle2(xs, n):
    """Cuycles through values in xs n times."""

    for i in range(n):
        yield from xs
[94]:
list(cycle2([1,2,3], 4))
[94]:
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

Because they are lazy, generators can be used for infinite streams.

[95]:
def fib():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b
[96]:
for n in fib():
    if n > 100:
        break
    print(n, end=', ')
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,

You can even slice infinite generators. More when we cover functional programming.

[97]:
import itertools as it
[98]:
list(it.islice(fib(), 5, 10))
[98]:
[8, 13, 21, 34, 55]
[ ]: