%matplotlib inline
import matplotlib.pyplot as plt

Bonus Material: The Humble For Loop

Much of numerical computing consists of populating an array of some sort. The general concept is looping, and there is a lot to discover about how to write loops in Python.

(1) Filling a list

Using a for loop

This is often the most flexible and simplest method to populate a list or array for beginners in Python . Traditionally, the for loop is discouraged in interpreted languages because it is slow. However, the for loop is amenable to JIT compilation that in many cases out-performs the vectorized version.

n = 10
xs = []
for i in range(n):
    xs.append(i**2)
xs
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Using a list comprehension

This is just syntactic sugar for the for loop. If you are writing a pure python program, list comprehensions and the related generator expressions, set comprehension, and dictionary comprehension are very slick. The big disadvantage is that these forms are currently not supported by the numba JIT compiler.

n = 10
xs = [i**2 for i in range(n)]
xs
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Using map

This is a “functional” style of writing the for loop, and provides a higher level of abstraction. As a result, the map function is generic and can be used in different parallel or concurrent applications. We will see quite a lot of this functional style being used in the course for this reason, especially in the parallel and distributed computing sections.

n = 10
xs = list(map(lambda x: x**2, range(n)))
xs
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Python guidelines encourage use of named functions rather than lambdas

def square(x):
    return x**2

xs = list(map(square, range(n)))
xs
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Use vectorization

Traditionally, vectorization with numpy universal functions is the way to write fast numerical programs in interpreted languages such as Python. We see a lot of vectorized code in the course too.

import numpy as np
n = 10
xs = np.arange(n)**2
xs
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

(2) Filling a list conditionally

Using a for loop

n = 10
xs = []
for i in range(n):
    if i % 2 ==0:
        xs.append(i**2)
xs
[0, 4, 16, 36, 64]

Using a list comprehension

n = 10
xs = [i**2 for i in range(n) if i%2 == 0]
xs
[0, 4, 16, 36, 64]

Using map

n = 10
xs = list(map(lambda x: x**2,
              filter(lambda x: x%2==0, range(n))))
xs
[0, 4, 16, 36, 64]

Python guidelines encourage use of named functions rather than lambdas

def square(x):
    return x**2

def is_even(x):
    return x%2 == 0

xs = list(map(square,
              filter(is_even, range(n))))
xs
[0, 4, 16, 36, 64]

Use vectorization

import numpy as np
n = 10
xs = np.arange(n)
xs = xs[xs % 2 == 0]
xs = xs**2
xs
array([ 0,  4, 16, 36, 64])

(3) Writing as functions

def make_squares_for(n):
    xs = []
    for i in range(n):
        xs.append(i**2)
    return xs
make_squares_for(10)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
def make_squares_lc(n):
    return [x**2 for x in range(n)]
make_squares_lc(10)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
def make_squares_map(n):
    return list(map(square, range(n)))
make_squares_map(10)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
def make_squares_vec(n):
    return np.arange(n)**2
make_squares_vec(10)
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

(4) Timing

n = 1000000
%timeit make_squares_for(n)
%timeit make_squares_lc(n)
%timeit make_squares_map(n)
%timeit make_squares_vec(n)
1 loop, best of 3: 657 ms per loop
1 loop, best of 3: 611 ms per loop
1 loop, best of 3: 811 ms per loop
100 loops, best of 3: 9.82 ms per loop

(5) Compilation

from numba import njit
make_squares_for_numba = njit()(make_squares_for)
make_squares_for_numba(10)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
make_squares_vec_numba = njit()(make_squares_vec)
make_squares_vec_numba(10)
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
n = 1000000
%timeit make_squares_for_numba(n)
%timeit make_squares_vec_numba(n)
10 loops, best of 3: 103 ms per loop
100 loops, best of 3: 10 ms per loop

Here we see the benefits to pre-allocating storage.

@njit
def make_squares_for_numba_prealloc(n):
    xs = np.empty(n, dtype=np.int64)
    for i in range(n):
        xs[i] = i**2
    return xs
make_squares_for_numba_prealloc(10)
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
n = 1000000
%timeit make_squares_for_numba(n)
%timeit make_squares_for_numba_prealloc(n)
10 loops, best of 3: 104 ms per loop
100 loops, best of 3: 2.66 ms per loop

(6) Vectorizing scalar functions

@np.vectorize
def square_numpy(x):
    return x**2
square_numpy(range(10))
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
from numba import vectorize
@vectorize('int64(int64)')
def square_numba(x):
    return x**2
square_numba(range(10))
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
n = 1000000
%timeit square_numpy(np.arange(n))
%timeit square_numba(np.arange(n))
1 loop, best of 3: 902 ms per loop
100 loops, best of 3: 12.3 ms per loop

(7) With numba and JIT compilation, we need to reconsider the old taboo against looping.

def cabs2(z):
    return z.real**2 + z.imag**2

def f(z, c, max_iter):
    for i in range(max_iter):
        z = z*z + c
        if cabs2(z) > 4:
            return i
    return i

def julia(xs, ys, c, max_iter= 1000):
    grid = np.empty((len(ys), len(xs)))
    for a, x in enumerate(ys):
        for b, y in enumerate(xs):
            z = x + y*1.0j
            grid[a, b] = f(z, c, max_iter)
    return grid
c = 0.285+0.01j
xs = np.linspace(-1.4, 1.4, 400)
ys = np.linspace(-1, 1, 400)
%time grid = julia(xs, ys, c)
plt.imshow(grid, cmap='hot', interpolation='nearest')
pass
CPU times: user 8.01 s, sys: 13.9 ms, total: 8.03 s
Wall time: 8.02 s
_images/Extras_01_The_Humble_For_Loop_55_1.png
@njit
def cabs2(z):
    return z.real**2 + z.imag**2

@njit
def f(z, c, max_iter):
    for i in range(max_iter):
        z = z*z + c
        if cabs2(z) > 4:
            return i
    return i

@njit
def julia(xs, ys, c, max_iter= 1000):
    grid = np.empty((len(ys), len(xs)))
    for a, x in enumerate(ys):
        for b, y in enumerate(xs):
            z = x + y*1.0j
            grid[a, b] = f(z, c, max_iter)
    return grid
c = 0.285+0.01j
xs = np.linspace(-1.4, 1.4, 400)
ys = np.linspace(-1, 1, 400)
%time grid = julia(xs, ys, c)
plt.imshow(grid, cmap='hot', interpolation='nearest')
pass
CPU times: user 310 ms, sys: 5.7 ms, total: 315 ms
Wall time: 313 ms
_images/Extras_01_The_Humble_For_Loop_57_1.png