Native Code Compilation

Exercise 1 (100 points)

The code given below performs a stochastic gradient descent to fit a quadratic polynomila to \(n\) data points. Maake the code run faster by:

  1. Using numba JIT (20 points)
  2. Using Cython (30 poits)
  3. Rewrite the sgd function in C and wrap for use in Python using Cython (50 points)

Replace the code stubs below with your otpimized code. Reprot the ratio original_time/optimized_time for each of the four parts.

In [4]:
def sgd(b, x, y, max_iter, alpha):
    n = x.shape[0]
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] -= alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b
In [5]:
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.0001
b0 = np.random.random(3)
In [6]:
%%time
np.random.seed(123)
b = sgd(b0, x, y, k, alpha)
print(b)
[ 7.90234588 -0.01733983  2.5184669 ]
CPU times: user 8.12 s, sys: 9.8 ms, total: 8.13 s
Wall time: 8.13 s
In [7]:
yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass
homework/../_build/doctrees/nbsphinx/homework_Homework09_5_0.png

Using numba JIT

In [105]:
def sgd_numba(b, x, y, max_iter, alpha):
    return np.zeros(3)
In [106]:
%%time
b = sgd_numba(b0, x, y, k, alpha)
print(b)
[ 0.  0.  0.]
CPU times: user 652 µs, sys: 119 µs, total: 771 µs
Wall time: 729 µs

Speed-up ratio

In [ ]:

Using Cython

In [107]:
def sgd_cython(b, x, y, max_iter, alpha):
    return np.zeros(3)
In [108]:
%%time
np.random.seed(123)
b = sgd_cython(b0, x, y, k, alpha)
print(b)
[ 0.  0.  0.]
CPU times: user 660 µs, sys: 89 µs, total: 749 µs
Wall time: 697 µs

Speed-up ratio

In [ ]:

Using C and wrapping with Cython

In [109]:
def sgd_wrap(b, x, y, max_iter, alpha):
    return np.zeros(3)
In [110]:
%%time
np.random.seed(123)
b = sgd_wrap(b0, x, y, k, alpha)
print(b)
[ 0.  0.  0.]
CPU times: user 662 µs, sys: 80 µs, total: 742 µs
Wall time: 689 µs

Speed-up ratio

In [ ]:

Hints for Part 3

The mechanics for how you can wrap the C sgd function are illustrated below.

In [9]:
%%file f.h

#pragma once
void foo(double *x, double *res, int n);
Overwriting f.h
In [14]:
%%file f.c
#include "f.h"

void foo(double *x, double *res, int n) {
    for (int i=0; i<n; i++) {
        res[i] = x[i]*x[i];
    }
}
Overwriting f.c
In [26]:
%%file f_wrap.pyx

cdef extern from 'f.h':
    void foo(double *x, double *res, int n)

def foo_wrap(double[:] x, double[:] res):
    foo(&x[0], &res[0], len(x))
    return res
Overwriting f_wrap.pyx
In [2]:
%%file setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

ext = Extension("f_wrap",
                sources=["f_wrap.pyx", "f.c"],
                libraries=[],
                extra_compile_args=["-w", "-std=c99"])

setup(
    ext_modules = cythonize(
            ext
    ))
Overwriting setup.py
In [3]:
! python setup.py -q build_ext --inplace
In [4]:
from f_wrap import foo_wrap
In [7]:
x = np.arange(10).astype('float')
res = np.zeros_like(x)
np.array(foo_wrap(x, res))
Out[7]:
array([  0.,   1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81.])
In [ ]: