{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import cython\n", "import timeit\n", "import math" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%load_ext cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Native code compilation\n", "\n", "We will see how to convert Python code to native compiled code. We will use the example of calculating the pairwise distance between a set of vectors, a $O(n^2)$ operation. \n", "\n", "For native code compilation, it is usually preferable to use explicit for loops and minimize the use of `numpy` vectorization and broadcasting because\n", "\n", "- It makes it easier for the `numba` JIT to optimize\n", "- It is easier to \"cythonize\"\n", "- It is easier to port to C++\n", "\n", "However, use of vectors and matrices is fine especially if you will be porting to use a C++ library such as Eigen." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Timing code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manual" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "def f(n=1):\n", " start = time.time()\n", " time.sleep(n)\n", " elapsed = time.time() - start\n", " return elapsed" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0016651153564453" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clock time" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 634 µs, sys: 1.11 ms, total: 1.74 ms\n", "Wall time: 1 s\n" ] } ], "source": [ "%%time\n", "\n", "time.sleep(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `timeit`\n", "\n", "The `-r` argument says how many runs to average over, and `-n` says how many times to run the function in a loop per run." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11.3 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%timeit time.sleep(0.01)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11.1 ms ± 124 µs per loop (mean ± std. dev. of 3 runs, 100 loops each)\n" ] } ], "source": [ "%timeit -r3 time.sleep(0.01)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11.2 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -n10 time.sleep(0.01)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11 ms ± 99.5 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -r3 -n10 time.sleep(0.01)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time unit conversions\n", "\n", "```\n", "1 s = 1,000 ms\n", "1 ms = 1,000 µs\n", "1 µs = 1,000 ns\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling\n", "\n", "If you want to identify bottlenecks in a Python script, do the following:\n", " \n", "- First make sure that the script is modular - i.e. it consists mainly of function calls\n", "- Each function should be fairly small and only do one thing\n", "- Then run a profiler to identify the bottleneck function(s) and optimize them\n", "\n", "See the Python docs on [profiling Python code](https://docs.python.org/3/library/profile.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Profiling can be done in a notebook with %prun, with the following readouts as column headers:\n", "\n", "- ncalls\n", " - for the number of calls,\n", "- tottime\n", " - for the total time spent in the given function (and excluding time made in calls to sub-functions),\n", "- percall\n", " - is the quotient of tottime divided by ncalls\n", "- cumtime\n", " - is the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.\n", "- percall\n", " - is the quotient of cumtime divided by primitive calls\n", "- filename:lineno(function)\n", " - provides the respective data of each function " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def foo1(n):\n", " return np.sum(np.square(np.arange(n)))\n", "\n", "def foo2(n):\n", " return sum(i*i for i in range(n))\n", "\n", "def foo3(n):\n", " [foo1(n) for i in range(10)]\n", " foo2(n)\n", "\n", "def foo4(n):\n", " return [foo2(n) for i in range(100)]\n", " \n", "def work(n):\n", " foo1(n)\n", " foo2(n)\n", " foo3(n)\n", " foo4(n)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.33 s, sys: 4.25 ms, total: 1.33 s\n", "Wall time: 1.34 s\n" ] } ], "source": [ "%%time\n", "\n", "work(int(1e5))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \n", "*** Profile stats marshalled to file 'work.prof'. \n" ] } ], "source": [ "%prun -q -D work.prof work(int(1e5))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fri Mar 30 15:17:26 2018 work.prof\n", "\n", " 10200380 function calls in 2.535 seconds\n", "\n", " Random listing order was used\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 1 0.000 0.000 2.535 2.535 {built-in method builtins.exec}\n", " 11 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}\n", " 102 1.088 0.011 2.531 0.025 {built-in method builtins.sum}\n", " 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n", " 1 0.000 0.000 0.003 0.003 :8()\n", " 11 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.arange}\n", " 11 0.001 0.000 0.001 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", " 11 0.000 0.000 0.001 0.000 /usr/local/lib/python3.6/site-packages/numpy/core/fromnumeric.py:1778(sum)\n", " 11 0.000 0.000 0.001 0.000 /usr/local/lib/python3.6/site-packages/numpy/core/_methods.py:31(_sum)\n", " 102 0.000 0.000 2.531 0.025 :4(foo2)\n", " 1 0.000 0.000 2.480 2.480 :12()\n", " 1 0.000 0.000 0.028 0.028 :7(foo3)\n", " 11 0.002 0.000 0.003 0.000 :1(foo1)\n", " 10200102 1.443 0.000 1.443 0.000 :5()\n", " 1 0.000 0.000 2.535 2.535 :14(work)\n", " 1 0.000 0.000 2.480 2.480 :11(foo4)\n", " 1 0.000 0.000 2.535 2.535 :1()\n", "\n", "\n" ] } ], "source": [ "import pstats\n", "p = pstats.Stats('work.prof')\n", "p.print_stats()\n", "pass" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fri Mar 30 15:17:26 2018 work.prof\n", "\n", " 10200380 function calls in 2.535 seconds\n", "\n", " Ordered by: internal time, cumulative time\n", " List reduced from 17 to 4 due to restriction <'foo'>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 11 0.002 0.000 0.003 0.000 :1(foo1)\n", " 102 0.000 0.000 2.531 0.025 :4(foo2)\n", " 1 0.000 0.000 0.028 0.028 :7(foo3)\n", " 1 0.000 0.000 2.480 2.480 :11(foo4)\n", "\n", "\n" ] } ], "source": [ "p.sort_stats('time', 'cumulative').print_stats('foo')\n", "pass" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fri Mar 30 15:17:26 2018 work.prof\n", "\n", " 10200380 function calls in 2.535 seconds\n", "\n", " Ordered by: call count\n", " List reduced from 17 to 5 due to restriction <5>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 10200102 1.443 0.000 1.443 0.000 :5()\n", " 102 1.088 0.011 2.531 0.025 {built-in method builtins.sum}\n", " 102 0.000 0.000 2.531 0.025 :4(foo2)\n", " 11 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}\n", " 11 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.arange}\n", "\n", "\n" ] } ], "source": [ "p.sort_stats('ncalls').print_stats(5)\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimizing a function\n", "\n", "Our example will be to optimize a function that calculates the pairwise distance between a set of vectors.\n", "\n", "We first use a built-in function from`scipy` to check that our answers are right and also to benchmark how our code compares in speed to an optimized compiled routine." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from scipy.spatial.distance import squareform, pdist" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "n = 100\n", "p = 100\n", "xs = np.random.random((n, p))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "sol = squareform(pdist(xs))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "492 µs ± 14.1 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -r3 -n10 squareform(pdist(xs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple version" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def pdist_py(xs):\n", " \"\"\"Unvectorized Python.\"\"\"\n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(n):\n", " for k in range(p):\n", " A[i,j] += (xs[i, k] - xs[j, k])**2\n", " A[i,j] = np.sqrt(A[i,j])\n", " return A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we \n", "\n", "- first check that the output is **right**\n", "- then check how fast the code is" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "1.17 s ± 2.98 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_py\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exploiting symmetry" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def pdist_sym(xs):\n", " \"\"\"Unvectorized Python.\"\"\"\n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " for k in range(p):\n", " A[i,j] += (xs[i, k] - xs[j, k])**2\n", " A[i,j] = np.sqrt(A[i,j])\n", " A += A.T\n", " return A" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "573 ms ± 2.35 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_sym\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vectorizing inner loop" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "def pdist_vec(xs): \n", " \"\"\"Vectorize inner loop.\"\"\"\n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " A[i,j] = np.sqrt(np.sum((xs[i] - xs[j])**2))\n", " A += A.T\n", " return A" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "70.8 ms ± 275 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_vec\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Broadcasting and vectorizing\n", "\n", "Note that the broadcast version does twice as much work as it does not exploit symmetry." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def pdist_numpy(xs):\n", " \"\"\"Fully vectroized version.\"\"\"\n", " return np.sqrt(np.square(xs[:, None] - xs[None, :]).sum(axis=-1))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "4.23 ms ± 165 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numpy\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 squareform(func(xs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## JIT with `numba`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the `numba.jit` decorator which will trigger generation and execution of compiled code when the function is first called." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "from numba import jit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `jit` as a function" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "pdist_numba_py = jit(pdist_py, nopython=True, cache=True)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "1.8 ms ± 25 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numba_py\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `jit` as a decorator" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "@jit(nopython=True, cache=True)\n", "def pdist_numba_py_1(xs):\n", " \"\"\"Unvectorized Python.\"\"\"\n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(n):\n", " for k in range(p):\n", " A[i,j] += (xs[i, k] - xs[j, k])**2\n", " A[i,j] = np.sqrt(A[i,j])\n", " return A" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "1.75 ms ± 34.8 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numba_py_1\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Can we make the code faster?\n", "\n", "Note that in the inner loop, we are updating a matrix when we only need to update a scalar. Let's fix this." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "@jit(nopython=True, cache=True)\n", "def pdist_numba_py_2(xs):\n", " \"\"\"Unvectorized Python.\"\"\"\n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(n):\n", " d = 0.0\n", " for k in range(p):\n", " d += (xs[i, k] - xs[j, k])**2\n", " A[i,j] = np.sqrt(d)\n", " return A" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "954 µs ± 25.2 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numba_py_2\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Can we make the code even faster?\n", "\n", "We can also try to exploit symmetry." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "@jit(nopython=True, cache=True)\n", "def pdist_numba_py_sym(xs):\n", " \"\"\"Unvectorized Python.\"\"\"\n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " d = 0.0\n", " for k in range(p):\n", " d += (xs[i, k] - xs[j, k])**2\n", " A[i,j] = np.sqrt(d)\n", " A += A.T\n", " return A" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "521 µs ± 7.12 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numba_py_sym\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Does `jit` work with vectorized code?" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "pdist_numba_vec = jit(pdist_vec, nopython=True, cache=True)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "70.1 ms ± 454 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -r3 -n10 pdist_vec(xs)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "1.58 ms ± 17.8 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numba_vec\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Does `jit` work with broadcasting?" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "pdist_numba_numpy = jit(pdist_numpy, nopython=True, cache=True)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.01 ms ± 159 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -r3 -n10 pdist_numpy(xs)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Failed at nopython (nopython frontend)\n", "Internal error at :\n", "--%<-----------------------------------------------------------------\n", "Traceback (most recent call last):\n", " File \"/usr/local/lib/python3.6/site-packages/numba/errors.py\", line 259, in new_error_context\n", " yield\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 503, in __call__\n", " self.resolve(typeinfer, typeinfer.typevars, fnty=self.func)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 441, in resolve\n", " sig = typeinfer.resolve_call(fnty, pos_args, kw_args, literals=literals)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 1115, in resolve_call\n", " literals=literals)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/context.py\", line 191, in resolve_function_type\n", " res = defn.apply(args, kws)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/templates.py\", line 207, in apply\n", " sig = generic(args, kws)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/arraydecl.py\", line 165, in generic\n", " out = get_array_index_type(ary, idx)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/arraydecl.py\", line 71, in get_array_index_type\n", " % (ty, idx))\n", "TypeError: unsupported array index type none in (slice, none)\n", "\n", "During handling of the above exception, another exception occurred:\n", "\n", "Traceback (most recent call last):\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 137, in propagate\n", " constraint(typeinfer)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 341, in __call__\n", " self.fallback(typeinfer)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 503, in __call__\n", " self.resolve(typeinfer, typeinfer.typevars, fnty=self.func)\n", " File \"/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py\", line 99, in __exit__\n", " self.gen.throw(type, value, traceback)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/errors.py\", line 265, in new_error_context\n", " six.reraise(type(newerr), newerr, sys.exc_info()[2])\n", " File \"/usr/local/lib/python3.6/site-packages/numba/six.py\", line 658, in reraise\n", " raise value.with_traceback(tb)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/errors.py\", line 259, in new_error_context\n", " yield\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 503, in __call__\n", " self.resolve(typeinfer, typeinfer.typevars, fnty=self.func)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 441, in resolve\n", " sig = typeinfer.resolve_call(fnty, pos_args, kw_args, literals=literals)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typeinfer.py\", line 1115, in resolve_call\n", " literals=literals)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/context.py\", line 191, in resolve_function_type\n", " res = defn.apply(args, kws)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/templates.py\", line 207, in apply\n", " sig = generic(args, kws)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/arraydecl.py\", line 165, in generic\n", " out = get_array_index_type(ary, idx)\n", " File \"/usr/local/lib/python3.6/site-packages/numba/typing/arraydecl.py\", line 71, in get_array_index_type\n", " % (ty, idx))\n", "numba.errors.InternalError: unsupported array index type none in (slice, none)\n", "[1] During: typing of intrinsic-call at (3)\n", "[2] During: typing of static-get-item at (3)\n", "--%<-----------------------------------------------------------------\n", "\n", "File \"\", line 3\n" ] } ], "source": [ "func = pdist_numba_numpy\n", "try:\n", " print(np.allclose(func(xs), sol))\n", " %timeit -r3 -n10 func(xs)\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We need to use `reshape` to broadcast" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "def pdist_numpy_(xs):\n", " \"\"\"Fully vectroized version.\"\"\"\n", " return np.sqrt(np.square(xs.reshape(n,1,p) - xs.reshape(1,n,p)).sum(axis=-1))" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "pdist_numba_numpy_ = jit(pdist_numpy_, nopython=True, cache=True)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.17 ms ± 310 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -r3 -n10 pdist_numpy_(xs)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "6.02 ms ± 65.9 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "func = pdist_numba_numpy_\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n10 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "- `numba` appears to work best with converting fairly explicit Python code\n", "- This might change in the future as the `numba` JIT compiler becomes more sophisticated\n", "- Always check optimized code for correctness\n", "- We can use `timeit` magic as a simple way to benchmark functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cython\n", "\n", "Cython is an Ahead Of Time (AOT) compiler. It compiles the code and replaces the function invoked with the compiled version.\n", "\n", "In the notebook, calling `%cython -a` magic shows code colored by how many Python C API calls are being made. You want to reduce the yellow as much as possible." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_a0e00db7609c25c75f864fdfa1538fee.pyx\n", " \n", "\n", "\n", "

Generated by Cython 0.28

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
 01: 
\n", "
+02: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 03: 
\n", "
+04: def pdist_cython_1(xs):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_1pdist_cython_1(PyObject *__pyx_self, PyObject *__pyx_v_xs); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_1pdist_cython_1 = {\"pdist_cython_1\", (PyCFunction)__pyx_pw_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_1pdist_cython_1, METH_O, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_1pdist_cython_1(PyObject *__pyx_self, PyObject *__pyx_v_xs) {\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"pdist_cython_1 (wrapper)\", 0);\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_pdist_cython_1(__pyx_self, ((PyObject *)__pyx_v_xs));\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_pdist_cython_1(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_xs) {\n",
       "  PyObject *__pyx_v_n = NULL;\n",
       "  PyObject *__pyx_v_p = NULL;\n",
       "  PyObject *__pyx_v_A = NULL;\n",
       "  PyObject *__pyx_v_i = NULL;\n",
       "  PyObject *__pyx_v_j = NULL;\n",
       "  PyObject *__pyx_v_d = NULL;\n",
       "  PyObject *__pyx_v_k = NULL;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"pdist_cython_1\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_6);\n",
       "  __Pyx_XDECREF(__pyx_t_13);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_a0e00db7609c25c75f864fdfa1538fee.pdist_cython_1\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __Pyx_XDECREF(__pyx_v_n);\n",
       "  __Pyx_XDECREF(__pyx_v_p);\n",
       "  __Pyx_XDECREF(__pyx_v_A);\n",
       "  __Pyx_XDECREF(__pyx_v_i);\n",
       "  __Pyx_XDECREF(__pyx_v_j);\n",
       "  __Pyx_XDECREF(__pyx_v_d);\n",
       "  __Pyx_XDECREF(__pyx_v_k);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple_ = PyTuple_Pack(8, __pyx_n_s_xs, __pyx_n_s_n, __pyx_n_s_p, __pyx_n_s_A, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_d, __pyx_n_s_k); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 4, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_tuple_);\n",
       "  __Pyx_GIVEREF(__pyx_tuple_);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_a0e00db7609c25c75f864fdfa1538fee_1pdist_cython_1, NULL, __pyx_n_s_cython_magic_a0e00db7609c25c75f); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_pdist_cython_1, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
+05:     n, p = xs.shape
\n", "
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_v_xs, __pyx_n_s_shape); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if ((likely(PyTuple_CheckExact(__pyx_t_1))) || (PyList_CheckExact(__pyx_t_1))) {\n",
       "    PyObject* sequence = __pyx_t_1;\n",
       "    Py_ssize_t size = __Pyx_PySequence_SIZE(sequence);\n",
       "    if (unlikely(size != 2)) {\n",
       "      if (size > 2) __Pyx_RaiseTooManyValuesError(2);\n",
       "      else if (size >= 0) __Pyx_RaiseNeedMoreValuesError(size);\n",
       "      __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "    }\n",
       "    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "    if (likely(PyTuple_CheckExact(sequence))) {\n",
       "      __pyx_t_2 = PyTuple_GET_ITEM(sequence, 0); \n",
       "      __pyx_t_3 = PyTuple_GET_ITEM(sequence, 1); \n",
       "    } else {\n",
       "      __pyx_t_2 = PyList_GET_ITEM(sequence, 0); \n",
       "      __pyx_t_3 = PyList_GET_ITEM(sequence, 1); \n",
       "    }\n",
       "    __Pyx_INCREF(__pyx_t_2);\n",
       "    __Pyx_INCREF(__pyx_t_3);\n",
       "    #else\n",
       "    __pyx_t_2 = PySequence_ITEM(sequence, 0); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    __pyx_t_3 = PySequence_ITEM(sequence, 1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    #endif\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  } else {\n",
       "    Py_ssize_t index = -1;\n",
       "    __pyx_t_4 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_4);\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "    __pyx_t_5 = Py_TYPE(__pyx_t_4)->tp_iternext;\n",
       "    index = 0; __pyx_t_2 = __pyx_t_5(__pyx_t_4); if (unlikely(!__pyx_t_2)) goto __pyx_L3_unpacking_failed;\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    index = 1; __pyx_t_3 = __pyx_t_5(__pyx_t_4); if (unlikely(!__pyx_t_3)) goto __pyx_L3_unpacking_failed;\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    if (__Pyx_IternextUnpackEndCheck(__pyx_t_5(__pyx_t_4), 2) < 0) __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "    __pyx_t_5 = NULL;\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    goto __pyx_L4_unpacking_done;\n",
       "    __pyx_L3_unpacking_failed:;\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    __pyx_t_5 = NULL;\n",
       "    if (__Pyx_IterFinish() == 0) __Pyx_RaiseNeedMoreValuesError(index);\n",
       "    __PYX_ERR(0, 5, __pyx_L1_error)\n",
       "    __pyx_L4_unpacking_done:;\n",
       "  }\n",
       "  __pyx_v_n = __pyx_t_2;\n",
       "  __pyx_t_2 = 0;\n",
       "  __pyx_v_p = __pyx_t_3;\n",
       "  __pyx_t_3 = 0;\n",
       "
+06:     A = np.zeros((n, n))
\n", "
  __pyx_t_3 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_zeros); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_INCREF(__pyx_v_n);\n",
       "  __Pyx_GIVEREF(__pyx_v_n);\n",
       "  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_v_n);\n",
       "  __Pyx_INCREF(__pyx_v_n);\n",
       "  __Pyx_GIVEREF(__pyx_v_n);\n",
       "  PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_v_n);\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_2))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_2);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_2);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_2, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_2, __pyx_t_3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    #if CYTHON_FAST_PYCALL\n",
       "    if (PyFunction_Check(__pyx_t_2)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_3};\n",
       "      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_2, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    #if CYTHON_FAST_PYCCALL\n",
       "    if (__Pyx_PyFastCFunction_Check(__pyx_t_2)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_3};\n",
       "      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_2, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    {\n",
       "      __pyx_t_6 = PyTuple_New(1+1); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_6);\n",
       "      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_6, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "      __Pyx_GIVEREF(__pyx_t_3);\n",
       "      PyTuple_SET_ITEM(__pyx_t_6, 0+1, __pyx_t_3);\n",
       "      __pyx_t_3 = 0;\n",
       "      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_6, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;\n",
       "    }\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_v_A = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "
+07:     for i in range(n):
\n", "
  __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_builtin_range, __pyx_v_n); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) {\n",
       "    __pyx_t_2 = __pyx_t_1; __Pyx_INCREF(__pyx_t_2); __pyx_t_7 = 0;\n",
       "    __pyx_t_8 = NULL;\n",
       "  } else {\n",
       "    __pyx_t_7 = -1; __pyx_t_2 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    __pyx_t_8 = Py_TYPE(__pyx_t_2)->tp_iternext; if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  for (;;) {\n",
       "    if (likely(!__pyx_t_8)) {\n",
       "      if (likely(PyList_CheckExact(__pyx_t_2))) {\n",
       "        if (__pyx_t_7 >= PyList_GET_SIZE(__pyx_t_2)) break;\n",
       "        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "        __pyx_t_1 = PyList_GET_ITEM(__pyx_t_2, __pyx_t_7); __Pyx_INCREF(__pyx_t_1); __pyx_t_7++; if (unlikely(0 < 0)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "        #else\n",
       "        __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        #endif\n",
       "      } else {\n",
       "        if (__pyx_t_7 >= PyTuple_GET_SIZE(__pyx_t_2)) break;\n",
       "        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "        __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_2, __pyx_t_7); __Pyx_INCREF(__pyx_t_1); __pyx_t_7++; if (unlikely(0 < 0)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "        #else\n",
       "        __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        #endif\n",
       "      }\n",
       "    } else {\n",
       "      __pyx_t_1 = __pyx_t_8(__pyx_t_2);\n",
       "      if (unlikely(!__pyx_t_1)) {\n",
       "        PyObject* exc_type = PyErr_Occurred();\n",
       "        if (exc_type) {\n",
       "          if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "          else __PYX_ERR(0, 7, __pyx_L1_error)\n",
       "        }\n",
       "        break;\n",
       "      }\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "    }\n",
       "    __Pyx_XDECREF_SET(__pyx_v_i, __pyx_t_1);\n",
       "    __pyx_t_1 = 0;\n",
       "/* … */\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "
+08:         for j in range(i+1, n):
\n", "
    __pyx_t_1 = __Pyx_PyInt_AddObjC(__pyx_v_i, __pyx_int_1, 1, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __pyx_t_6 = PyTuple_New(2); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_6);\n",
       "    __Pyx_GIVEREF(__pyx_t_1);\n",
       "    PyTuple_SET_ITEM(__pyx_t_6, 0, __pyx_t_1);\n",
       "    __Pyx_INCREF(__pyx_v_n);\n",
       "    __Pyx_GIVEREF(__pyx_v_n);\n",
       "    PyTuple_SET_ITEM(__pyx_t_6, 1, __pyx_v_n);\n",
       "    __pyx_t_1 = 0;\n",
       "    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_6, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;\n",
       "    if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) {\n",
       "      __pyx_t_6 = __pyx_t_1; __Pyx_INCREF(__pyx_t_6); __pyx_t_9 = 0;\n",
       "      __pyx_t_10 = NULL;\n",
       "    } else {\n",
       "      __pyx_t_9 = -1; __pyx_t_6 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_6);\n",
       "      __pyx_t_10 = Py_TYPE(__pyx_t_6)->tp_iternext; if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "    }\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "    for (;;) {\n",
       "      if (likely(!__pyx_t_10)) {\n",
       "        if (likely(PyList_CheckExact(__pyx_t_6))) {\n",
       "          if (__pyx_t_9 >= PyList_GET_SIZE(__pyx_t_6)) break;\n",
       "          #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "          __pyx_t_1 = PyList_GET_ITEM(__pyx_t_6, __pyx_t_9); __Pyx_INCREF(__pyx_t_1); __pyx_t_9++; if (unlikely(0 < 0)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "          #else\n",
       "          __pyx_t_1 = PySequence_ITEM(__pyx_t_6, __pyx_t_9); __pyx_t_9++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          #endif\n",
       "        } else {\n",
       "          if (__pyx_t_9 >= PyTuple_GET_SIZE(__pyx_t_6)) break;\n",
       "          #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "          __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_6, __pyx_t_9); __Pyx_INCREF(__pyx_t_1); __pyx_t_9++; if (unlikely(0 < 0)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "          #else\n",
       "          __pyx_t_1 = PySequence_ITEM(__pyx_t_6, __pyx_t_9); __pyx_t_9++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          #endif\n",
       "        }\n",
       "      } else {\n",
       "        __pyx_t_1 = __pyx_t_10(__pyx_t_6);\n",
       "        if (unlikely(!__pyx_t_1)) {\n",
       "          PyObject* exc_type = PyErr_Occurred();\n",
       "          if (exc_type) {\n",
       "            if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "            else __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "          }\n",
       "          break;\n",
       "        }\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "      }\n",
       "      __Pyx_XDECREF_SET(__pyx_v_j, __pyx_t_1);\n",
       "      __pyx_t_1 = 0;\n",
       "/* … */\n",
       "    }\n",
       "    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;\n",
       "
+09:             d = 0.0
\n", "
      __Pyx_INCREF(__pyx_float_0_0);\n",
       "      __Pyx_XDECREF_SET(__pyx_v_d, __pyx_float_0_0);\n",
       "
+10:             for k in range(p):
\n", "
      __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_builtin_range, __pyx_v_p); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) {\n",
       "        __pyx_t_3 = __pyx_t_1; __Pyx_INCREF(__pyx_t_3); __pyx_t_11 = 0;\n",
       "        __pyx_t_12 = NULL;\n",
       "      } else {\n",
       "        __pyx_t_11 = -1; __pyx_t_3 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_3);\n",
       "        __pyx_t_12 = Py_TYPE(__pyx_t_3)->tp_iternext; if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "      for (;;) {\n",
       "        if (likely(!__pyx_t_12)) {\n",
       "          if (likely(PyList_CheckExact(__pyx_t_3))) {\n",
       "            if (__pyx_t_11 >= PyList_GET_SIZE(__pyx_t_3)) break;\n",
       "            #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "            __pyx_t_1 = PyList_GET_ITEM(__pyx_t_3, __pyx_t_11); __Pyx_INCREF(__pyx_t_1); __pyx_t_11++; if (unlikely(0 < 0)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "            #else\n",
       "            __pyx_t_1 = PySequence_ITEM(__pyx_t_3, __pyx_t_11); __pyx_t_11++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "            __Pyx_GOTREF(__pyx_t_1);\n",
       "            #endif\n",
       "          } else {\n",
       "            if (__pyx_t_11 >= PyTuple_GET_SIZE(__pyx_t_3)) break;\n",
       "            #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "            __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_3, __pyx_t_11); __Pyx_INCREF(__pyx_t_1); __pyx_t_11++; if (unlikely(0 < 0)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "            #else\n",
       "            __pyx_t_1 = PySequence_ITEM(__pyx_t_3, __pyx_t_11); __pyx_t_11++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "            __Pyx_GOTREF(__pyx_t_1);\n",
       "            #endif\n",
       "          }\n",
       "        } else {\n",
       "          __pyx_t_1 = __pyx_t_12(__pyx_t_3);\n",
       "          if (unlikely(!__pyx_t_1)) {\n",
       "            PyObject* exc_type = PyErr_Occurred();\n",
       "            if (exc_type) {\n",
       "              if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "              else __PYX_ERR(0, 10, __pyx_L1_error)\n",
       "            }\n",
       "            break;\n",
       "          }\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "        }\n",
       "        __Pyx_XDECREF_SET(__pyx_v_k, __pyx_t_1);\n",
       "        __pyx_t_1 = 0;\n",
       "/* … */\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "
+11:                 d += (xs[i,k] - xs[j,k])**2
\n", "
        __pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        __Pyx_INCREF(__pyx_v_i);\n",
       "        __Pyx_GIVEREF(__pyx_v_i);\n",
       "        PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_i);\n",
       "        __Pyx_INCREF(__pyx_v_k);\n",
       "        __Pyx_GIVEREF(__pyx_v_k);\n",
       "        PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_v_k);\n",
       "        __pyx_t_4 = __Pyx_PyObject_GetItem(__pyx_v_xs, __pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_4);\n",
       "        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "        __pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        __Pyx_INCREF(__pyx_v_j);\n",
       "        __Pyx_GIVEREF(__pyx_v_j);\n",
       "        PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_j);\n",
       "        __Pyx_INCREF(__pyx_v_k);\n",
       "        __Pyx_GIVEREF(__pyx_v_k);\n",
       "        PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_v_k);\n",
       "        __pyx_t_13 = __Pyx_PyObject_GetItem(__pyx_v_xs, __pyx_t_1); if (unlikely(!__pyx_t_13)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_13);\n",
       "        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "        __pyx_t_1 = PyNumber_Subtract(__pyx_t_4, __pyx_t_13); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "        __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "        __pyx_t_13 = PyNumber_Power(__pyx_t_1, __pyx_int_2, Py_None); if (unlikely(!__pyx_t_13)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_13);\n",
       "        __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "        __pyx_t_1 = PyNumber_InPlaceAdd(__pyx_v_d, __pyx_t_13); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "        __Pyx_DECREF_SET(__pyx_v_d, __pyx_t_1);\n",
       "        __pyx_t_1 = 0;\n",
       "
+12:             A[i,j] = np.sqrt(d)
\n", "
      __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __pyx_t_13 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_sqrt); if (unlikely(!__pyx_t_13)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_13);\n",
       "      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "      __pyx_t_1 = NULL;\n",
       "      if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_13))) {\n",
       "        __pyx_t_1 = PyMethod_GET_SELF(__pyx_t_13);\n",
       "        if (likely(__pyx_t_1)) {\n",
       "          PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_13);\n",
       "          __Pyx_INCREF(__pyx_t_1);\n",
       "          __Pyx_INCREF(function);\n",
       "          __Pyx_DECREF_SET(__pyx_t_13, function);\n",
       "        }\n",
       "      }\n",
       "      if (!__pyx_t_1) {\n",
       "        __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_t_13, __pyx_v_d); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_3);\n",
       "      } else {\n",
       "        #if CYTHON_FAST_PYCALL\n",
       "        if (PyFunction_Check(__pyx_t_13)) {\n",
       "          PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_d};\n",
       "          __pyx_t_3 = __Pyx_PyFunction_FastCall(__pyx_t_13, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "          __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "          __Pyx_GOTREF(__pyx_t_3);\n",
       "        } else\n",
       "        #endif\n",
       "        #if CYTHON_FAST_PYCCALL\n",
       "        if (__Pyx_PyFastCFunction_Check(__pyx_t_13)) {\n",
       "          PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_d};\n",
       "          __pyx_t_3 = __Pyx_PyCFunction_FastCall(__pyx_t_13, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "          __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "          __Pyx_GOTREF(__pyx_t_3);\n",
       "        } else\n",
       "        #endif\n",
       "        {\n",
       "          __pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_4);\n",
       "          __Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;\n",
       "          __Pyx_INCREF(__pyx_v_d);\n",
       "          __Pyx_GIVEREF(__pyx_v_d);\n",
       "          PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_d);\n",
       "          __pyx_t_3 = __Pyx_PyObject_Call(__pyx_t_13, __pyx_t_4, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_3);\n",
       "          __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "        }\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "      __pyx_t_13 = PyTuple_New(2); if (unlikely(!__pyx_t_13)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_13);\n",
       "      __Pyx_INCREF(__pyx_v_i);\n",
       "      __Pyx_GIVEREF(__pyx_v_i);\n",
       "      PyTuple_SET_ITEM(__pyx_t_13, 0, __pyx_v_i);\n",
       "      __Pyx_INCREF(__pyx_v_j);\n",
       "      __Pyx_GIVEREF(__pyx_v_j);\n",
       "      PyTuple_SET_ITEM(__pyx_t_13, 1, __pyx_v_j);\n",
       "      if (unlikely(PyObject_SetItem(__pyx_v_A, __pyx_t_13, __pyx_t_3) < 0)) __PYX_ERR(0, 12, __pyx_L1_error)\n",
       "      __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "
+13:     A += A.T
\n", "
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_v_A, __pyx_n_s_T); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_6 = PyNumber_InPlaceAdd(__pyx_v_A, __pyx_t_2); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_6);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __Pyx_DECREF_SET(__pyx_v_A, __pyx_t_6);\n",
       "  __pyx_t_6 = 0;\n",
       "
+14:     return A
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __Pyx_INCREF(__pyx_v_A);\n",
       "  __pyx_r = __pyx_v_A;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a \n", "\n", "import numpy as np\n", "\n", "def pdist_cython_1(xs): \n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " d = 0.0\n", " for k in range(p):\n", " d += (xs[i,k] - xs[j,k])**2\n", " A[i,j] = np.sqrt(d)\n", " A += A.T\n", " return A" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def pdist_base(xs): \n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " d = 0.0\n", " for k in range(p):\n", " d += (xs[i,k] - xs[j,k])**2\n", " A[i,j] = np.sqrt(d)\n", " A += A.T\n", " return A" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "424 ms ± 1.52 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n" ] } ], "source": [ "%timeit -r3 -n1 pdist_base(xs)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "399 ms ± 7.19 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n" ] } ], "source": [ "func = pdist_cython_1\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n1 func(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cython with static types\n", "\n", "- We provide types for all variables so that Cython can optimize their compilation to C code.\n", "- Note `numpy` functions are optimized for working with `ndarrays` and have unnecessary overhead for scalars. We therefor replace them with math functions from the C `math` library." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_e68147339654c54aed8e40a1749baa8e.pyx\n", " \n", "\n", "\n", "

Generated by Cython 0.28

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
 01: 
\n", "
+02: import cython
\n", "
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
+03: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 04: cimport numpy as np
\n", "
 05: from libc.math cimport sqrt, pow
\n", "
 06: 
\n", "
 07: @cython.boundscheck(False)
\n", "
 08: @cython.wraparound(False)
\n", "
+09: def pdist_cython_2(double[:, :] xs):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_e68147339654c54aed8e40a1749baa8e_1pdist_cython_2(PyObject *__pyx_self, PyObject *__pyx_arg_xs); /*proto*/\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_e68147339654c54aed8e40a1749baa8e_1pdist_cython_2 = {\"pdist_cython_2\", (PyCFunction)__pyx_pw_46_cython_magic_e68147339654c54aed8e40a1749baa8e_1pdist_cython_2, METH_O, 0};\n",
       "static PyObject *__pyx_pw_46_cython_magic_e68147339654c54aed8e40a1749baa8e_1pdist_cython_2(PyObject *__pyx_self, PyObject *__pyx_arg_xs) {\n",
       "  __Pyx_memviewslice __pyx_v_xs = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"pdist_cython_2 (wrapper)\", 0);\n",
       "  assert(__pyx_arg_xs); {\n",
       "    __pyx_v_xs = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(__pyx_arg_xs, PyBUF_WRITABLE); if (unlikely(!__pyx_v_xs.memview)) __PYX_ERR(0, 9, __pyx_L3_error)\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_e68147339654c54aed8e40a1749baa8e.pdist_cython_2\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_e68147339654c54aed8e40a1749baa8e_pdist_cython_2(__pyx_self, __pyx_v_xs);\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_e68147339654c54aed8e40a1749baa8e_pdist_cython_2(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_xs) {\n",
       "  int __pyx_v_n;\n",
       "  int __pyx_v_p;\n",
       "  int __pyx_v_i;\n",
       "  int __pyx_v_j;\n",
       "  int __pyx_v_k;\n",
       "  __Pyx_memviewslice __pyx_v_A = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  double __pyx_v_d;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"pdist_cython_2\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_e68147339654c54aed8e40a1749baa8e.pdist_cython_2\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_xs, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_A, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple__31 = PyTuple_Pack(9, __pyx_n_s_xs, __pyx_n_s_xs, __pyx_n_s_n, __pyx_n_s_p, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_k, __pyx_n_s_A, __pyx_n_s_d); if (unlikely(!__pyx_tuple__31)) __PYX_ERR(0, 9, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_tuple__31);\n",
       "  __Pyx_GIVEREF(__pyx_tuple__31);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_e68147339654c54aed8e40a1749baa8e_1pdist_cython_2, NULL, __pyx_n_s_cython_magic_e68147339654c54aed); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_pdist_cython_2, __pyx_t_1) < 0) __PYX_ERR(0, 9, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_codeobj__32 = (PyObject*)__Pyx_PyCode_New(1, 0, 9, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__31, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_cliburn_ipython_cython__c, __pyx_n_s_pdist_cython_2, 9, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__32)) __PYX_ERR(0, 9, __pyx_L1_error)\n",
       "
 10:     cdef int n, p
\n", "
 11:     cdef int i, j, k
\n", "
 12:     cdef double[:, :] A
\n", "
 13:     cdef double d
\n", "
 14: 
\n", "
+15:     n = xs.shape[0]
\n", "
  __pyx_v_n = (__pyx_v_xs.shape[0]);\n",
       "
+16:     p = xs.shape[1]
\n", "
  __pyx_v_p = (__pyx_v_xs.shape[1]);\n",
       "
+17:     A = np.zeros((n, n))
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_zeros); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_n); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_4 = __Pyx_PyInt_From_int(__pyx_v_n); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_4);\n",
       "  __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_5);\n",
       "  __Pyx_GIVEREF(__pyx_t_2);\n",
       "  PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);\n",
       "  __Pyx_GIVEREF(__pyx_t_4);\n",
       "  PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_4);\n",
       "  __pyx_t_2 = 0;\n",
       "  __pyx_t_4 = 0;\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    #if CYTHON_FAST_PYCALL\n",
       "    if (PyFunction_Check(__pyx_t_3)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};\n",
       "      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    #if CYTHON_FAST_PYCCALL\n",
       "    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};\n",
       "      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    {\n",
       "      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_2);\n",
       "      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "      __Pyx_GIVEREF(__pyx_t_5);\n",
       "      PyTuple_SET_ITEM(__pyx_t_2, 0+1, __pyx_t_5);\n",
       "      __pyx_t_5 = 0;\n",
       "      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    }\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(__pyx_t_1, PyBUF_WRITABLE); if (unlikely(!__pyx_t_6.memview)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_A = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
+18:     for i in range(n):
\n", "
  __pyx_t_7 = __pyx_v_n;\n",
       "  __pyx_t_8 = __pyx_t_7;\n",
       "  for (__pyx_t_9 = 0; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {\n",
       "    __pyx_v_i = __pyx_t_9;\n",
       "
+19:         for j in range(i+1, n):
\n", "
    __pyx_t_10 = __pyx_v_n;\n",
       "    __pyx_t_11 = __pyx_t_10;\n",
       "    for (__pyx_t_12 = (__pyx_v_i + 1); __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) {\n",
       "      __pyx_v_j = __pyx_t_12;\n",
       "
+20:             d = 0.0
\n", "
      __pyx_v_d = 0.0;\n",
       "
+21:             for k in range(p):
\n", "
      __pyx_t_13 = __pyx_v_p;\n",
       "      __pyx_t_14 = __pyx_t_13;\n",
       "      for (__pyx_t_15 = 0; __pyx_t_15 < __pyx_t_14; __pyx_t_15+=1) {\n",
       "        __pyx_v_k = __pyx_t_15;\n",
       "
+22:                 d += pow(xs[i,k] - xs[j,k],2)
\n", "
        __pyx_t_16 = __pyx_v_i;\n",
       "        __pyx_t_17 = __pyx_v_k;\n",
       "        __pyx_t_18 = __pyx_v_j;\n",
       "        __pyx_t_19 = __pyx_v_k;\n",
       "        __pyx_v_d = (__pyx_v_d + pow(((*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_xs.data + __pyx_t_16 * __pyx_v_xs.strides[0]) ) + __pyx_t_17 * __pyx_v_xs.strides[1]) ))) - (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_xs.data + __pyx_t_18 * __pyx_v_xs.strides[0]) ) + __pyx_t_19 * __pyx_v_xs.strides[1]) )))), 2.0));\n",
       "      }\n",
       "
+23:             A[i,j] = sqrt(d)
\n", "
      __pyx_t_20 = __pyx_v_i;\n",
       "      __pyx_t_21 = __pyx_v_j;\n",
       "      *((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_A.data + __pyx_t_20 * __pyx_v_A.strides[0]) ) + __pyx_t_21 * __pyx_v_A.strides[1]) )) = sqrt(__pyx_v_d);\n",
       "    }\n",
       "  }\n",
       "
+24:     for i in range(1, n):
\n", "
  __pyx_t_7 = __pyx_v_n;\n",
       "  __pyx_t_8 = __pyx_t_7;\n",
       "  for (__pyx_t_9 = 1; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {\n",
       "    __pyx_v_i = __pyx_t_9;\n",
       "
+25:         for j in range(i):
\n", "
    __pyx_t_10 = __pyx_v_i;\n",
       "    __pyx_t_11 = __pyx_t_10;\n",
       "    for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) {\n",
       "      __pyx_v_j = __pyx_t_12;\n",
       "
+26:             A[i, j] = A[j, i]
\n", "
      __pyx_t_22 = __pyx_v_j;\n",
       "      __pyx_t_23 = __pyx_v_i;\n",
       "      __pyx_t_24 = __pyx_v_i;\n",
       "      __pyx_t_25 = __pyx_v_j;\n",
       "      *((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_A.data + __pyx_t_24 * __pyx_v_A.strides[0]) ) + __pyx_t_25 * __pyx_v_A.strides[1]) )) = (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_A.data + __pyx_t_22 * __pyx_v_A.strides[0]) ) + __pyx_t_23 * __pyx_v_A.strides[1]) )));\n",
       "    }\n",
       "  }\n",
       "
+27:     return A
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_A, 2, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 27, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_r = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a \n", "\n", "import cython\n", "import numpy as np\n", "cimport numpy as np\n", "from libc.math cimport sqrt, pow\n", "\n", "@cython.boundscheck(False)\n", "@cython.wraparound(False)\n", "def pdist_cython_2(double[:, :] xs):\n", " cdef int n, p\n", " cdef int i, j, k\n", " cdef double[:, :] A\n", " cdef double d\n", " \n", " n = xs.shape[0]\n", " p = xs.shape[1]\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " d = 0.0\n", " for k in range(p):\n", " d += pow(xs[i,k] - xs[j,k],2)\n", " A[i,j] = sqrt(d)\n", " for i in range(1, n):\n", " for j in range(i):\n", " A[i, j] = A[j, i] \n", " return A" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "693 µs ± 342 µs per loop (mean ± std. dev. of 3 runs, 1 loop each)\n" ] } ], "source": [ "func = pdist_cython_2\n", "print(np.allclose(func(xs), sol))\n", "%timeit -r3 -n1 func(xs)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Wrapping C++ cdoe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Function to port" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "def pdist_base(xs): \n", " n, p = xs.shape\n", " A = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(i+1, n):\n", " d = 0.0\n", " for k in range(p):\n", " d += (xs[i,k] - xs[j,k])**2\n", " A[i,j] = np.sqrt(d)\n", " A += A.T\n", " return A\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First check that the function works as expected" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting main.cpp\n" ] } ], "source": [ "%%file main.cpp\n", "#include \n", "#include \n", "#include \n", "\n", "using std::cout;\n", "\n", "// takes numpy array as input and returns another numpy array\n", "Eigen::MatrixXd pdist(Eigen::MatrixXd xs) {\n", " int n = xs.rows() ;\n", " int p = xs.cols();\n", " \n", " Eigen::MatrixXd A = Eigen::MatrixXd::Zero(n, n);\n", " for (int i=0; i\n", "\n", "#include \n", "#include \n", "\n", "// takes numpy array as input and returns another numpy array\n", "Eigen::MatrixXd pdist(Eigen::MatrixXd xs) {\n", " int n = xs.rows() ;\n", " int p = xs.cols();\n", " \n", " Eigen::MatrixXd A = Eigen::MatrixXd::Zero(n, n);\n", " for (int i=0; i