{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 9: Improving performance\n", "\n", "This homework provides practice in making Python code faster. Note that we start with functions that already use idiomatic `numpy` (which are about two orders of magnitude faster than the pure Python versions)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext Cython" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import math\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.datasets import make_blobs \n", "from numba import jit, vectorize, float64, int64" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "sns.set_context('notebook', font_scale=1.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Functions to optimize**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def logistic(x):\n", " \"\"\"Logistic function.\"\"\"\n", " return np.exp(x)/(1 + np.exp(x))\n", "\n", "def gd(X, y, beta, alpha, niter):\n", " \"\"\"Gradient descent algorihtm.\"\"\"\n", " n, p = X.shape\n", " Xt = X.T\n", " for i in range(niter):\n", " y_pred = logistic(X @ beta)\n", " epsilon = y - y_pred\n", " grad = Xt @ epsilon / n\n", " beta += alpha * grad\n", " return beta" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x = np.linspace(-6, 6, 100)\n", "plt.plot(x, logistic(x))\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Data set for classification**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "n = 10000\n", "p = 2\n", "X, y = make_blobs(n_samples=n, n_features=p, centers=2, cluster_std=1.05, random_state=23)\n", "X = np.c_[np.ones(len(X)), X]\n", "y = y.astype('float')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Using gradient descent for classification by logistic regression**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# initial parameters\n", "niter = 1000\n", "α = 0.01\n", "β = np.zeros(p+1)\n", "\n", "# call gradient descent\n", "β = gd(X, y, β, α, niter)\n", "\n", "# assign labels to points based on prediction\n", "y_pred = logistic(X @ β)\n", "labels = y_pred > 0.5\n", "\n", "# calculate separating plane\n", "sep = (-β[0] - β[1] * X)/β[2]\n", "\n", "plt.scatter(X[:, 1], X[:, 2], c=labels, cmap='winter')\n", "plt.plot(X, sep, 'r-')\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1**. Rewrite the `logistic` function so it only makes one `np.exp` call. Compare the time of both versions with the input x given below using the `@timeit` magic. (10 points)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "np.random.seed(123)\n", "n = int(1e7)\n", "x = np.random.normal(0, 1, n)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2**. (20 points) Use `numba` to compile the gradient descent function. \n", "\n", "- Use the `@vectorize` decorator to create a ufunc version of the logistic function and call this `logistic_numba_cpu` with function signatures of `float64(float64)`. Create another function called `logistic_numba_parallel` by giving an extra argument to the decorator of `target=parallel` (5 points)\n", "- For each function, check that the answers are the same as with the original logistic function using `np.testing`. Use `%timeit` to compare the three logistic functions (5 points)\n", "- Now use `@jit` to create a JIT_compiled version of the `logistic` and `gd` functions, calling them `logistic_numba` and `gd_numba`. Provide appropriate function signatures to the decorator in each case. (5 points)\n", "- Compare the two gradient descent functions `gd` and `gd_numba` for correctness and performance. (5 points)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3**. (30 points) Use `cython` to compile the gradient descent function. \n", "\n", "- Cythonize the logistic function as `logistic_cython`. Use the `--annotate (-a)` argument to the `cython` magic function to find slow regions. Compare accuracy and performance. The final performance should be comparable to the `numba` cpu version. (10 points)\n", "- Now cythonize the gd function as `gd_cython`. This function should use of the cythonized `logistic_cython` as a C function call. Compare accuracy and performance. The final performance should be comparable to the `numba` cpu version. (20 points)\n", "\n", "Hints: \n", "\n", "- Give static types to all variables\n", "- Know how to use `def`, `cdef` and `cpdef`\n", "- Use Typed MemoryViews\n", "- Find out how to transpose a Typed MemoryView to store the transpose of X\n", "- Typed MemoryVeiws are not `numpy` arrays - you often have to write explicit loops to operate on them\n", "- Use the cython boundscheck, wraparound, and cdivision operators" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4**. (40 points) Wrapping modules in C++.\n", "\n", "Rewrite the `logistic` and `gd` functions in C++, using `pybind11` to create Python wrappers. Compare accuracy and performance as usual. Replicate the plotted example using the C++ wrapped functions for `logistic` and `gd`\n", "\n", "- Writing a vectorized `logistic` function callable from both C++ and Python (10 points)\n", "- Writing the `gd` function callable from Python (25 points)\n", "- Checking accuracy, benchmarking and creating diagnostic plots (5 points)\n", "\n", "Hints:\n", "\n", "- Use the C++ `Eigen` library to do vector and matrix operations (include path is `../notebooks/eigen3`)\n", "- When calling the exponential function, you have to use `exp(m.array())` instead of `exp(m)` if you use an Eigen dynamic template.\n", "- Use `cppimport` to simplify the wrapping for Python\n", "- See [`pybind11` docs](http://pybind11.readthedocs.io/en/latest/index.html)\n", "- See my [examples](http://people.duke.edu/~ccc14/cspy/18G_C++_Python_pybind11.html#) for help" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "latex_envs": { "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0 } }, "nbformat": 4, "nbformat_minor": 2 }