{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from timeit import timeit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Enhancing performance\n", "\n", "We first look at native code compilation. Here we show 3 common methods for doing this using `numba` JIT compilation, `cython` AOT compilation, and direct wrapping of C++ code using `pybind11`. In general, `numba` is the simplest to use, while you have the most flexibility with `pybind11`. Which approach gives the best performance generally requires some experimentation.\n", "\n", "Then we review common methods for concurrent execution of embarrassingly parallel code using `multiprocessing`, `concurrent.futures` and `joblib`. Comparison of performance using processes and threads is made, with a brief explanation of the Global Interpreter Lock (GIL).\n", "\n", "More details for each of the libraries used to improve performance is provided in the course notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def cdist(xs, ys):\n", " \"\"\"Returns pairwise distance between row vectors in xs and ys.\n", " \n", " xs has shape (m, p)\n", " ys has shape (n, p)\n", " \n", " Return value has shape (m, n) \n", " \"\"\"\n", " \n", " m, p = xs.shape\n", " n, p = ys.shape\n", " \n", " res = np.empty((m, n))\n", " for i in range(m):\n", " for j in range(n):\n", " res[i, j] = np.sqrt(np.sum((ys[j] - xs[i])**2))\n", " return res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sanity check" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "xs = np.arange(6).reshape(-1,2).astype('float')\n", "ys = np.arange(4).reshape(-1, 2).astype('float')\n", "zs = cdist(xs, ys)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 2.82842712],\n", " [2.82842712, 0. ],\n", " [5.65685425, 2.82842712]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zs" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "60.7 µs ± 1.39 µs per loop (mean ± std. dev. of 3 runs, 10 loops each)\n" ] } ], "source": [ "%timeit -r 3 -n 10 cdist(xs, ys)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "m = 1000\n", "n = 1000\n", "p = 100\n", "\n", "X = np.random.random((m, p))\n", "Y = np.random.random((n, p))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 9.46 s, sys: 20 ms, total: 9.48 s\n", "Wall time: 9.48 s\n" ] } ], "source": [ "%%time\n", "\n", "Z = cdist(X, Y)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "t0 = timeit(lambda : cdist(X, Y), number=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `numba`" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from numba import jit, njit" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def cdist_numba(xs, ys):\n", " \"\"\"Returns pairwise distance between row vectors in xs and ys.\n", " \n", " xs has shape (m, p)\n", " ys has shape (n, p)\n", " \n", " Return value has shape (m, n) \n", " \"\"\"\n", " \n", " m, p = xs.shape\n", " n, p = ys.shape\n", " \n", " res = np.empty((m, n))\n", " for i in range(m):\n", " for j in range(n):\n", " res[i, j] = np.sqrt(np.sum((ys[j] - xs[i])**2))\n", " return res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "assert(np.allclose(cdist(xs, ys), cdist_numba(xs, ys)))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 280 ms, sys: 8 ms, total: 288 ms\n", "Wall time: 287 ms\n" ] } ], "source": [ "%%time\n", "\n", "Z = cdist_numba(X, Y)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "t_numba = timeit(lambda : cdist_numba(X, Y), number=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Unrolling\n", "\n", "We can help `numba` by unrolling the code." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "@njit\n", "def cdist_numba1(xs, ys):\n", " \"\"\"Returns pairwise distance between row vectors in xs and ys.\n", " \n", " xs has shape (m, p)\n", " ys has shape (n, p)\n", " \n", " Return value has shape (m, n) \n", " \"\"\"\n", " \n", " m, p = xs.shape\n", " n, p = ys.shape\n", " \n", " res = np.empty((m, n))\n", " for i in range(m):\n", " for j in range(n):\n", " s = 0\n", " for k in range(p):\n", " s += (ys[j,k] - xs[i,k])**2\n", " res[i, j] = np.sqrt(s)\n", " return res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "assert(np.allclose(cdist(xs, ys), cdist_numba1(xs, ys)))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 112 ms, sys: 0 ns, total: 112 ms\n", "Wall time: 111 ms\n" ] } ], "source": [ "%%time\n", "\n", "Z = cdist_numba1(X, Y)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "t_numba1 = timeit(lambda : cdist_numba1(X, Y), number=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `cython`" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "%load_ext cython" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_e0260da0fe5bce3ea805711e620fd9f5.pyx\n", " \n", "\n", "\n", "

Generated by Cython 0.28.5

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
 01: 
\n", "
+02: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 03: 
\n", "
+04: def cdist_cython(xs, ys):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_1cdist_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static char __pyx_doc_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_cdist_cython[] = \"Returns pairwise distance between row vectors in xs and ys.\\n    \\n    xs has shape (m, p)\\n    ys has shape (n, p)\\n    \\n    Return value has shape (m, n)    \\n    \";\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_1cdist_cython = {\"cdist_cython\", (PyCFunction)__pyx_pw_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_1cdist_cython, METH_VARARGS|METH_KEYWORDS, __pyx_doc_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_cdist_cython};\n",
       "static PyObject *__pyx_pw_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_1cdist_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  PyObject *__pyx_v_xs = 0;\n",
       "  PyObject *__pyx_v_ys = 0;\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"cdist_cython (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_xs,&__pyx_n_s_ys,0};\n",
       "    PyObject* values[2] = {0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        CYTHON_FALLTHROUGH;\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        CYTHON_FALLTHROUGH;\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_xs)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        CYTHON_FALLTHROUGH;\n",
       "        case  1:\n",
       "        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_ys)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"cdist_cython\", 1, 2, 2, 1); __PYX_ERR(0, 4, __pyx_L3_error)\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"cdist_cython\") < 0)) __PYX_ERR(0, 4, __pyx_L3_error)\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 2) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "    }\n",
       "    __pyx_v_xs = values[0];\n",
       "    __pyx_v_ys = values[1];\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"cdist_cython\", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 4, __pyx_L3_error)\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_e0260da0fe5bce3ea805711e620fd9f5.cdist_cython\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_cdist_cython(__pyx_self, __pyx_v_xs, __pyx_v_ys);\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_cdist_cython(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_xs, PyObject *__pyx_v_ys) {\n",
       "  PyObject *__pyx_v_m = NULL;\n",
       "  CYTHON_UNUSED PyObject *__pyx_v_p = NULL;\n",
       "  PyObject *__pyx_v_n = NULL;\n",
       "  PyObject *__pyx_v_res = NULL;\n",
       "  PyObject *__pyx_v_i = NULL;\n",
       "  PyObject *__pyx_v_j = NULL;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"cdist_cython\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_6);\n",
       "  __Pyx_XDECREF(__pyx_t_11);\n",
       "  __Pyx_XDECREF(__pyx_t_12);\n",
       "  __Pyx_XDECREF(__pyx_t_13);\n",
       "  __Pyx_XDECREF(__pyx_t_14);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_e0260da0fe5bce3ea805711e620fd9f5.cdist_cython\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __Pyx_XDECREF(__pyx_v_m);\n",
       "  __Pyx_XDECREF(__pyx_v_p);\n",
       "  __Pyx_XDECREF(__pyx_v_n);\n",
       "  __Pyx_XDECREF(__pyx_v_res);\n",
       "  __Pyx_XDECREF(__pyx_v_i);\n",
       "  __Pyx_XDECREF(__pyx_v_j);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple_ = PyTuple_Pack(8, __pyx_n_s_xs, __pyx_n_s_ys, __pyx_n_s_m, __pyx_n_s_p, __pyx_n_s_n, __pyx_n_s_res, __pyx_n_s_i, __pyx_n_s_j); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 4, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_tuple_);\n",
       "  __Pyx_GIVEREF(__pyx_tuple_);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_e0260da0fe5bce3ea805711e620fd9f5_1cdist_cython, NULL, __pyx_n_s_cython_magic_e0260da0fe5bce3ea8); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_cdist_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 05:     """Returns pairwise distance between row vectors in xs and ys.
\n", "
 06:     
\n", "
 07:     xs has shape (m, p)
\n", "
 08:     ys has shape (n, p)
\n", "
 09:     
\n", "
 10:     Return value has shape (m, n)    
\n", "
 11:     """
\n", "
 12: 
\n", "
+13:     m, p = xs.shape
\n", "
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_v_xs, __pyx_n_s_shape); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if ((likely(PyTuple_CheckExact(__pyx_t_1))) || (PyList_CheckExact(__pyx_t_1))) {\n",
       "    PyObject* sequence = __pyx_t_1;\n",
       "    Py_ssize_t size = __Pyx_PySequence_SIZE(sequence);\n",
       "    if (unlikely(size != 2)) {\n",
       "      if (size > 2) __Pyx_RaiseTooManyValuesError(2);\n",
       "      else if (size >= 0) __Pyx_RaiseNeedMoreValuesError(size);\n",
       "      __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "    }\n",
       "    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "    if (likely(PyTuple_CheckExact(sequence))) {\n",
       "      __pyx_t_2 = PyTuple_GET_ITEM(sequence, 0); \n",
       "      __pyx_t_3 = PyTuple_GET_ITEM(sequence, 1); \n",
       "    } else {\n",
       "      __pyx_t_2 = PyList_GET_ITEM(sequence, 0); \n",
       "      __pyx_t_3 = PyList_GET_ITEM(sequence, 1); \n",
       "    }\n",
       "    __Pyx_INCREF(__pyx_t_2);\n",
       "    __Pyx_INCREF(__pyx_t_3);\n",
       "    #else\n",
       "    __pyx_t_2 = PySequence_ITEM(sequence, 0); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    __pyx_t_3 = PySequence_ITEM(sequence, 1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    #endif\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  } else {\n",
       "    Py_ssize_t index = -1;\n",
       "    __pyx_t_4 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_4);\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "    __pyx_t_5 = Py_TYPE(__pyx_t_4)->tp_iternext;\n",
       "    index = 0; __pyx_t_2 = __pyx_t_5(__pyx_t_4); if (unlikely(!__pyx_t_2)) goto __pyx_L3_unpacking_failed;\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    index = 1; __pyx_t_3 = __pyx_t_5(__pyx_t_4); if (unlikely(!__pyx_t_3)) goto __pyx_L3_unpacking_failed;\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    if (__Pyx_IternextUnpackEndCheck(__pyx_t_5(__pyx_t_4), 2) < 0) __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "    __pyx_t_5 = NULL;\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    goto __pyx_L4_unpacking_done;\n",
       "    __pyx_L3_unpacking_failed:;\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    __pyx_t_5 = NULL;\n",
       "    if (__Pyx_IterFinish() == 0) __Pyx_RaiseNeedMoreValuesError(index);\n",
       "    __PYX_ERR(0, 13, __pyx_L1_error)\n",
       "    __pyx_L4_unpacking_done:;\n",
       "  }\n",
       "  __pyx_v_m = __pyx_t_2;\n",
       "  __pyx_t_2 = 0;\n",
       "  __pyx_v_p = __pyx_t_3;\n",
       "  __pyx_t_3 = 0;\n",
       "
+14:     n, p = ys.shape
\n", "
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_v_ys, __pyx_n_s_shape); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if ((likely(PyTuple_CheckExact(__pyx_t_1))) || (PyList_CheckExact(__pyx_t_1))) {\n",
       "    PyObject* sequence = __pyx_t_1;\n",
       "    Py_ssize_t size = __Pyx_PySequence_SIZE(sequence);\n",
       "    if (unlikely(size != 2)) {\n",
       "      if (size > 2) __Pyx_RaiseTooManyValuesError(2);\n",
       "      else if (size >= 0) __Pyx_RaiseNeedMoreValuesError(size);\n",
       "      __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "    }\n",
       "    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "    if (likely(PyTuple_CheckExact(sequence))) {\n",
       "      __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); \n",
       "      __pyx_t_2 = PyTuple_GET_ITEM(sequence, 1); \n",
       "    } else {\n",
       "      __pyx_t_3 = PyList_GET_ITEM(sequence, 0); \n",
       "      __pyx_t_2 = PyList_GET_ITEM(sequence, 1); \n",
       "    }\n",
       "    __Pyx_INCREF(__pyx_t_3);\n",
       "    __Pyx_INCREF(__pyx_t_2);\n",
       "    #else\n",
       "    __pyx_t_3 = PySequence_ITEM(sequence, 0); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    __pyx_t_2 = PySequence_ITEM(sequence, 1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    #endif\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  } else {\n",
       "    Py_ssize_t index = -1;\n",
       "    __pyx_t_4 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_4);\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "    __pyx_t_5 = Py_TYPE(__pyx_t_4)->tp_iternext;\n",
       "    index = 0; __pyx_t_3 = __pyx_t_5(__pyx_t_4); if (unlikely(!__pyx_t_3)) goto __pyx_L5_unpacking_failed;\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    index = 1; __pyx_t_2 = __pyx_t_5(__pyx_t_4); if (unlikely(!__pyx_t_2)) goto __pyx_L5_unpacking_failed;\n",
       "    __Pyx_GOTREF(__pyx_t_2);\n",
       "    if (__Pyx_IternextUnpackEndCheck(__pyx_t_5(__pyx_t_4), 2) < 0) __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "    __pyx_t_5 = NULL;\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    goto __pyx_L6_unpacking_done;\n",
       "    __pyx_L5_unpacking_failed:;\n",
       "    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "    __pyx_t_5 = NULL;\n",
       "    if (__Pyx_IterFinish() == 0) __Pyx_RaiseNeedMoreValuesError(index);\n",
       "    __PYX_ERR(0, 14, __pyx_L1_error)\n",
       "    __pyx_L6_unpacking_done:;\n",
       "  }\n",
       "  __pyx_v_n = __pyx_t_3;\n",
       "  __pyx_t_3 = 0;\n",
       "  __Pyx_DECREF_SET(__pyx_v_p, __pyx_t_2);\n",
       "  __pyx_t_2 = 0;\n",
       "
 15: 
\n", "
+16:     res = np.empty((m, n))
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_empty); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __Pyx_INCREF(__pyx_v_m);\n",
       "  __Pyx_GIVEREF(__pyx_v_m);\n",
       "  PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_v_m);\n",
       "  __Pyx_INCREF(__pyx_v_n);\n",
       "  __Pyx_GIVEREF(__pyx_v_n);\n",
       "  PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_v_n);\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    #if CYTHON_FAST_PYCALL\n",
       "    if (PyFunction_Check(__pyx_t_3)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_2};\n",
       "      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    #if CYTHON_FAST_PYCCALL\n",
       "    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_2};\n",
       "      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    {\n",
       "      __pyx_t_6 = PyTuple_New(1+1); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_6);\n",
       "      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_6, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "      __Pyx_GIVEREF(__pyx_t_2);\n",
       "      PyTuple_SET_ITEM(__pyx_t_6, 0+1, __pyx_t_2);\n",
       "      __pyx_t_2 = 0;\n",
       "      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_6, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 16, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;\n",
       "    }\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_v_res = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "
+17:     for i in range(m):
\n", "
  __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_builtin_range, __pyx_v_m); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) {\n",
       "    __pyx_t_3 = __pyx_t_1; __Pyx_INCREF(__pyx_t_3); __pyx_t_7 = 0;\n",
       "    __pyx_t_8 = NULL;\n",
       "  } else {\n",
       "    __pyx_t_7 = -1; __pyx_t_3 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_3);\n",
       "    __pyx_t_8 = Py_TYPE(__pyx_t_3)->tp_iternext; if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  for (;;) {\n",
       "    if (likely(!__pyx_t_8)) {\n",
       "      if (likely(PyList_CheckExact(__pyx_t_3))) {\n",
       "        if (__pyx_t_7 >= PyList_GET_SIZE(__pyx_t_3)) break;\n",
       "        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "        __pyx_t_1 = PyList_GET_ITEM(__pyx_t_3, __pyx_t_7); __Pyx_INCREF(__pyx_t_1); __pyx_t_7++; if (unlikely(0 < 0)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "        #else\n",
       "        __pyx_t_1 = PySequence_ITEM(__pyx_t_3, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        #endif\n",
       "      } else {\n",
       "        if (__pyx_t_7 >= PyTuple_GET_SIZE(__pyx_t_3)) break;\n",
       "        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "        __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_3, __pyx_t_7); __Pyx_INCREF(__pyx_t_1); __pyx_t_7++; if (unlikely(0 < 0)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "        #else\n",
       "        __pyx_t_1 = PySequence_ITEM(__pyx_t_3, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "        #endif\n",
       "      }\n",
       "    } else {\n",
       "      __pyx_t_1 = __pyx_t_8(__pyx_t_3);\n",
       "      if (unlikely(!__pyx_t_1)) {\n",
       "        PyObject* exc_type = PyErr_Occurred();\n",
       "        if (exc_type) {\n",
       "          if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "          else __PYX_ERR(0, 17, __pyx_L1_error)\n",
       "        }\n",
       "        break;\n",
       "      }\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "    }\n",
       "    __Pyx_XDECREF_SET(__pyx_v_i, __pyx_t_1);\n",
       "    __pyx_t_1 = 0;\n",
       "/* … */\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "
+18:         for j in range(n):
\n", "
    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_builtin_range, __pyx_v_n); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "    if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) {\n",
       "      __pyx_t_6 = __pyx_t_1; __Pyx_INCREF(__pyx_t_6); __pyx_t_9 = 0;\n",
       "      __pyx_t_10 = NULL;\n",
       "    } else {\n",
       "      __pyx_t_9 = -1; __pyx_t_6 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_6);\n",
       "      __pyx_t_10 = Py_TYPE(__pyx_t_6)->tp_iternext; if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "    }\n",
       "    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "    for (;;) {\n",
       "      if (likely(!__pyx_t_10)) {\n",
       "        if (likely(PyList_CheckExact(__pyx_t_6))) {\n",
       "          if (__pyx_t_9 >= PyList_GET_SIZE(__pyx_t_6)) break;\n",
       "          #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "          __pyx_t_1 = PyList_GET_ITEM(__pyx_t_6, __pyx_t_9); __Pyx_INCREF(__pyx_t_1); __pyx_t_9++; if (unlikely(0 < 0)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "          #else\n",
       "          __pyx_t_1 = PySequence_ITEM(__pyx_t_6, __pyx_t_9); __pyx_t_9++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          #endif\n",
       "        } else {\n",
       "          if (__pyx_t_9 >= PyTuple_GET_SIZE(__pyx_t_6)) break;\n",
       "          #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS\n",
       "          __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_6, __pyx_t_9); __Pyx_INCREF(__pyx_t_1); __pyx_t_9++; if (unlikely(0 < 0)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "          #else\n",
       "          __pyx_t_1 = PySequence_ITEM(__pyx_t_6, __pyx_t_9); __pyx_t_9++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          #endif\n",
       "        }\n",
       "      } else {\n",
       "        __pyx_t_1 = __pyx_t_10(__pyx_t_6);\n",
       "        if (unlikely(!__pyx_t_1)) {\n",
       "          PyObject* exc_type = PyErr_Occurred();\n",
       "          if (exc_type) {\n",
       "            if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();\n",
       "            else __PYX_ERR(0, 18, __pyx_L1_error)\n",
       "          }\n",
       "          break;\n",
       "        }\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "      }\n",
       "      __Pyx_XDECREF_SET(__pyx_v_j, __pyx_t_1);\n",
       "      __pyx_t_1 = 0;\n",
       "/* … */\n",
       "    }\n",
       "    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;\n",
       "
+19:             res[i, j] = np.sqrt(np.sum((ys[j] - xs[i])**2))
\n", "
      __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_2);\n",
       "      __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_sqrt); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_4);\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "      __pyx_t_11 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_11)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_11);\n",
       "      __pyx_t_12 = __Pyx_PyObject_GetAttrStr(__pyx_t_11, __pyx_n_s_sum); if (unlikely(!__pyx_t_12)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_12);\n",
       "      __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "      __pyx_t_11 = __Pyx_PyObject_GetItem(__pyx_v_ys, __pyx_v_j); if (unlikely(!__pyx_t_11)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_11);\n",
       "      __pyx_t_13 = __Pyx_PyObject_GetItem(__pyx_v_xs, __pyx_v_i); if (unlikely(!__pyx_t_13)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_13);\n",
       "      __pyx_t_14 = PyNumber_Subtract(__pyx_t_11, __pyx_t_13); if (unlikely(!__pyx_t_14)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_14);\n",
       "      __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "      __pyx_t_13 = PyNumber_Power(__pyx_t_14, __pyx_int_2, Py_None); if (unlikely(!__pyx_t_13)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_13);\n",
       "      __Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;\n",
       "      __pyx_t_14 = NULL;\n",
       "      if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_12))) {\n",
       "        __pyx_t_14 = PyMethod_GET_SELF(__pyx_t_12);\n",
       "        if (likely(__pyx_t_14)) {\n",
       "          PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_12);\n",
       "          __Pyx_INCREF(__pyx_t_14);\n",
       "          __Pyx_INCREF(function);\n",
       "          __Pyx_DECREF_SET(__pyx_t_12, function);\n",
       "        }\n",
       "      }\n",
       "      if (!__pyx_t_14) {\n",
       "        __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_12, __pyx_t_13); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "        __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "        __Pyx_GOTREF(__pyx_t_2);\n",
       "      } else {\n",
       "        #if CYTHON_FAST_PYCALL\n",
       "        if (PyFunction_Check(__pyx_t_12)) {\n",
       "          PyObject *__pyx_temp[2] = {__pyx_t_14, __pyx_t_13};\n",
       "          __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_12, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_XDECREF(__pyx_t_14); __pyx_t_14 = 0;\n",
       "          __Pyx_GOTREF(__pyx_t_2);\n",
       "          __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "        } else\n",
       "        #endif\n",
       "        #if CYTHON_FAST_PYCCALL\n",
       "        if (__Pyx_PyFastCFunction_Check(__pyx_t_12)) {\n",
       "          PyObject *__pyx_temp[2] = {__pyx_t_14, __pyx_t_13};\n",
       "          __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_12, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_XDECREF(__pyx_t_14); __pyx_t_14 = 0;\n",
       "          __Pyx_GOTREF(__pyx_t_2);\n",
       "          __Pyx_DECREF(__pyx_t_13); __pyx_t_13 = 0;\n",
       "        } else\n",
       "        #endif\n",
       "        {\n",
       "          __pyx_t_11 = PyTuple_New(1+1); if (unlikely(!__pyx_t_11)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_11);\n",
       "          __Pyx_GIVEREF(__pyx_t_14); PyTuple_SET_ITEM(__pyx_t_11, 0, __pyx_t_14); __pyx_t_14 = NULL;\n",
       "          __Pyx_GIVEREF(__pyx_t_13);\n",
       "          PyTuple_SET_ITEM(__pyx_t_11, 0+1, __pyx_t_13);\n",
       "          __pyx_t_13 = 0;\n",
       "          __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_12, __pyx_t_11, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_2);\n",
       "          __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "        }\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_12); __pyx_t_12 = 0;\n",
       "      __pyx_t_12 = NULL;\n",
       "      if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_4))) {\n",
       "        __pyx_t_12 = PyMethod_GET_SELF(__pyx_t_4);\n",
       "        if (likely(__pyx_t_12)) {\n",
       "          PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_4);\n",
       "          __Pyx_INCREF(__pyx_t_12);\n",
       "          __Pyx_INCREF(function);\n",
       "          __Pyx_DECREF_SET(__pyx_t_4, function);\n",
       "        }\n",
       "      }\n",
       "      if (!__pyx_t_12) {\n",
       "        __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "        __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        __Pyx_GOTREF(__pyx_t_1);\n",
       "      } else {\n",
       "        #if CYTHON_FAST_PYCALL\n",
       "        if (PyFunction_Check(__pyx_t_4)) {\n",
       "          PyObject *__pyx_temp[2] = {__pyx_t_12, __pyx_t_2};\n",
       "          __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_4, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_XDECREF(__pyx_t_12); __pyx_t_12 = 0;\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        } else\n",
       "        #endif\n",
       "        #if CYTHON_FAST_PYCCALL\n",
       "        if (__Pyx_PyFastCFunction_Check(__pyx_t_4)) {\n",
       "          PyObject *__pyx_temp[2] = {__pyx_t_12, __pyx_t_2};\n",
       "          __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_4, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_XDECREF(__pyx_t_12); __pyx_t_12 = 0;\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "        } else\n",
       "        #endif\n",
       "        {\n",
       "          __pyx_t_11 = PyTuple_New(1+1); if (unlikely(!__pyx_t_11)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_11);\n",
       "          __Pyx_GIVEREF(__pyx_t_12); PyTuple_SET_ITEM(__pyx_t_11, 0, __pyx_t_12); __pyx_t_12 = NULL;\n",
       "          __Pyx_GIVEREF(__pyx_t_2);\n",
       "          PyTuple_SET_ITEM(__pyx_t_11, 0+1, __pyx_t_2);\n",
       "          __pyx_t_2 = 0;\n",
       "          __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_4, __pyx_t_11, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "          __Pyx_GOTREF(__pyx_t_1);\n",
       "          __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;\n",
       "        }\n",
       "      }\n",
       "      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(__pyx_v_i);\n",
       "      __Pyx_GIVEREF(__pyx_v_i);\n",
       "      PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_v_i);\n",
       "      __Pyx_INCREF(__pyx_v_j);\n",
       "      __Pyx_GIVEREF(__pyx_v_j);\n",
       "      PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_v_j);\n",
       "      if (unlikely(PyObject_SetItem(__pyx_v_res, __pyx_t_4, __pyx_t_1) < 0)) __PYX_ERR(0, 19, __pyx_L1_error)\n",
       "      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
+20:     return res
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __Pyx_INCREF(__pyx_v_res);\n",
       "  __pyx_r = __pyx_v_res;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a\n", "\n", "import numpy as np\n", "\n", "def cdist_cython(xs, ys):\n", " \"\"\"Returns pairwise distance between row vectors in xs and ys.\n", " \n", " xs has shape (m, p)\n", " ys has shape (n, p)\n", " \n", " Return value has shape (m, n) \n", " \"\"\"\n", " \n", " m, p = xs.shape\n", " n, p = ys.shape\n", " \n", " res = np.empty((m, n))\n", " for i in range(m):\n", " for j in range(n):\n", " res[i, j] = np.sqrt(np.sum((ys[j] - xs[i])**2))\n", " return res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "assert(np.allclose(cdist(xs, ys), cdist_cython(xs, ys)))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 9.88 s, sys: 8 ms, total: 9.88 s\n", "Wall time: 9.88 s\n" ] } ], "source": [ "%%time\n", "\n", "Z = cdist_cython(X, Y)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "t_cython = timeit(lambda : cdist_cython(X, Y), number=1)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " Cython: _cython_magic_c23979ecdfeb0fdf6011e4df6437b314.pyx\n", " \n", "\n", "\n", "

Generated by Cython 0.28.5

\n", "

\n", " Yellow lines hint at Python interaction.
\n", " Click on a line that starts with a \"+\" to see the C code that Cython generated for it.\n", "

\n", "
 01: 
\n", "
+02: import cython
\n", "
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
+03: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 04: from libc.math cimport sqrt, pow
\n", "
 05: 
\n", "
 06: @cython.boundscheck(False)
\n", "
 07: @cython.wraparound(False)
\n", "
+08: def cdist_cython1(double[:, :] xs, double[:, :] ys):
\n", "
/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_1cdist_cython1(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/\n",
       "static char __pyx_doc_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_cdist_cython1[] = \"Returns pairwise distance between row vectors in xs and ys.\\n    \\n    xs has shape (m, p)\\n    ys has shape (n, p)\\n    \\n    Return value has shape (m, n)    \\n    \";\n",
       "static PyMethodDef __pyx_mdef_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_1cdist_cython1 = {\"cdist_cython1\", (PyCFunction)__pyx_pw_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_1cdist_cython1, METH_VARARGS|METH_KEYWORDS, __pyx_doc_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_cdist_cython1};\n",
       "static PyObject *__pyx_pw_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_1cdist_cython1(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {\n",
       "  __Pyx_memviewslice __pyx_v_xs = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  __Pyx_memviewslice __pyx_v_ys = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"cdist_cython1 (wrapper)\", 0);\n",
       "  {\n",
       "    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_xs,&__pyx_n_s_ys,0};\n",
       "    PyObject* values[2] = {0,0};\n",
       "    if (unlikely(__pyx_kwds)) {\n",
       "      Py_ssize_t kw_args;\n",
       "      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);\n",
       "      switch (pos_args) {\n",
       "        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "        CYTHON_FALLTHROUGH;\n",
       "        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "        CYTHON_FALLTHROUGH;\n",
       "        case  0: break;\n",
       "        default: goto __pyx_L5_argtuple_error;\n",
       "      }\n",
       "      kw_args = PyDict_Size(__pyx_kwds);\n",
       "      switch (pos_args) {\n",
       "        case  0:\n",
       "        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_xs)) != 0)) kw_args--;\n",
       "        else goto __pyx_L5_argtuple_error;\n",
       "        CYTHON_FALLTHROUGH;\n",
       "        case  1:\n",
       "        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_ys)) != 0)) kw_args--;\n",
       "        else {\n",
       "          __Pyx_RaiseArgtupleInvalid(\"cdist_cython1\", 1, 2, 2, 1); __PYX_ERR(0, 8, __pyx_L3_error)\n",
       "        }\n",
       "      }\n",
       "      if (unlikely(kw_args > 0)) {\n",
       "        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, \"cdist_cython1\") < 0)) __PYX_ERR(0, 8, __pyx_L3_error)\n",
       "      }\n",
       "    } else if (PyTuple_GET_SIZE(__pyx_args) != 2) {\n",
       "      goto __pyx_L5_argtuple_error;\n",
       "    } else {\n",
       "      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);\n",
       "      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);\n",
       "    }\n",
       "    __pyx_v_xs = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_xs.memview)) __PYX_ERR(0, 8, __pyx_L3_error)\n",
       "    __pyx_v_ys = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(values[1], PyBUF_WRITABLE); if (unlikely(!__pyx_v_ys.memview)) __PYX_ERR(0, 8, __pyx_L3_error)\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L5_argtuple_error:;\n",
       "  __Pyx_RaiseArgtupleInvalid(\"cdist_cython1\", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 8, __pyx_L3_error)\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_c23979ecdfeb0fdf6011e4df6437b314.cdist_cython1\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_cdist_cython1(__pyx_self, __pyx_v_xs, __pyx_v_ys);\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_cdist_cython1(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_xs, __Pyx_memviewslice __pyx_v_ys) {\n",
       "  int __pyx_v_m;\n",
       "  int __pyx_v_n;\n",
       "  int __pyx_v_p;\n",
       "  __Pyx_memviewslice __pyx_v_res = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  int __pyx_v_i;\n",
       "  int __pyx_v_j;\n",
       "  double __pyx_v_s;\n",
       "  int __pyx_v_k;\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"cdist_cython1\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_c23979ecdfeb0fdf6011e4df6437b314.cdist_cython1\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_res, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_xs, 1);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_ys, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "/* … */\n",
       "  __pyx_tuple__22 = PyTuple_Pack(10, __pyx_n_s_xs, __pyx_n_s_ys, __pyx_n_s_m, __pyx_n_s_n, __pyx_n_s_p, __pyx_n_s_res, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_s, __pyx_n_s_k); if (unlikely(!__pyx_tuple__22)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_tuple__22);\n",
       "  __Pyx_GIVEREF(__pyx_tuple__22);\n",
       "/* … */\n",
       "  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_c23979ecdfeb0fdf6011e4df6437b314_1cdist_cython1, NULL, __pyx_n_s_cython_magic_c23979ecdfeb0fdf60); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_cdist_cython1, __pyx_t_1) < 0) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_codeobj__23 = (PyObject*)__Pyx_PyCode_New(2, 0, 10, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__22, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_home_jovyan_cache_ipython_cytho, __pyx_n_s_cdist_cython1, 8, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__23)) __PYX_ERR(0, 8, __pyx_L1_error)\n",
       "
 09:     """Returns pairwise distance between row vectors in xs and ys.
\n", "
 10:     
\n", "
 11:     xs has shape (m, p)
\n", "
 12:     ys has shape (n, p)
\n", "
 13:     
\n", "
 14:     Return value has shape (m, n)    
\n", "
 15:     """
\n", "
 16: 
\n", "
 17:     cdef int m, n, p
\n", "
 18: 
\n", "
+19:     m = xs.shape[0]
\n", "
  __pyx_v_m = (__pyx_v_xs.shape[0]);\n",
       "
+20:     n = ys.shape[0]
\n", "
  __pyx_v_n = (__pyx_v_ys.shape[0]);\n",
       "
+21:     p = xs.shape[1]
\n", "
  __pyx_v_p = (__pyx_v_xs.shape[1]);\n",
       "
 22: 
\n", "
+23:     cdef double[:, :] res = np.empty((m, n))
\n", "
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_empty); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_m); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __pyx_t_4 = __Pyx_PyInt_From_int(__pyx_v_n); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_4);\n",
       "  __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_5);\n",
       "  __Pyx_GIVEREF(__pyx_t_2);\n",
       "  PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);\n",
       "  __Pyx_GIVEREF(__pyx_t_4);\n",
       "  PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_4);\n",
       "  __pyx_t_2 = 0;\n",
       "  __pyx_t_4 = 0;\n",
       "  __pyx_t_4 = NULL;\n",
       "  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {\n",
       "    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);\n",
       "    if (likely(__pyx_t_4)) {\n",
       "      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);\n",
       "      __Pyx_INCREF(__pyx_t_4);\n",
       "      __Pyx_INCREF(function);\n",
       "      __Pyx_DECREF_SET(__pyx_t_3, function);\n",
       "    }\n",
       "  }\n",
       "  if (!__pyx_t_4) {\n",
       "    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "    __Pyx_GOTREF(__pyx_t_1);\n",
       "  } else {\n",
       "    #if CYTHON_FAST_PYCALL\n",
       "    if (PyFunction_Check(__pyx_t_3)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};\n",
       "      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    #if CYTHON_FAST_PYCCALL\n",
       "    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {\n",
       "      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};\n",
       "      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "    } else\n",
       "    #endif\n",
       "    {\n",
       "      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_2);\n",
       "      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;\n",
       "      __Pyx_GIVEREF(__pyx_t_5);\n",
       "      PyTuple_SET_ITEM(__pyx_t_2, 0+1, __pyx_t_5);\n",
       "      __pyx_t_5 = 0;\n",
       "      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "      __Pyx_GOTREF(__pyx_t_1);\n",
       "      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "    }\n",
       "  }\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_dsds_double(__pyx_t_1, PyBUF_WRITABLE); if (unlikely(!__pyx_t_6.memview)) __PYX_ERR(0, 23, __pyx_L1_error)\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_v_res = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
 24: 
\n", "
 25:     cdef int i, j
\n", "
 26: 
\n", "
 27:     cdef double s
\n", "
+28:     for i in range(m):
\n", "
  __pyx_t_7 = __pyx_v_m;\n",
       "  __pyx_t_8 = __pyx_t_7;\n",
       "  for (__pyx_t_9 = 0; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {\n",
       "    __pyx_v_i = __pyx_t_9;\n",
       "
+29:         for j in range(n):
\n", "
    __pyx_t_10 = __pyx_v_n;\n",
       "    __pyx_t_11 = __pyx_t_10;\n",
       "    for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) {\n",
       "      __pyx_v_j = __pyx_t_12;\n",
       "
+30:             s = 0.0
\n", "
      __pyx_v_s = 0.0;\n",
       "
+31:             for k in range(p):
\n", "
      __pyx_t_13 = __pyx_v_p;\n",
       "      __pyx_t_14 = __pyx_t_13;\n",
       "      for (__pyx_t_15 = 0; __pyx_t_15 < __pyx_t_14; __pyx_t_15+=1) {\n",
       "        __pyx_v_k = __pyx_t_15;\n",
       "
+32:                 s += pow(ys[j,k] - xs[i,k], 2)
\n", "
        __pyx_t_16 = __pyx_v_j;\n",
       "        __pyx_t_17 = __pyx_v_k;\n",
       "        __pyx_t_18 = __pyx_v_i;\n",
       "        __pyx_t_19 = __pyx_v_k;\n",
       "        __pyx_v_s = (__pyx_v_s + pow(((*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_ys.data + __pyx_t_16 * __pyx_v_ys.strides[0]) ) + __pyx_t_17 * __pyx_v_ys.strides[1]) ))) - (*((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_xs.data + __pyx_t_18 * __pyx_v_xs.strides[0]) ) + __pyx_t_19 * __pyx_v_xs.strides[1]) )))), 2.0));\n",
       "      }\n",
       "
+33:             res[i, j] = sqrt(s)
\n", "
      __pyx_t_20 = __pyx_v_i;\n",
       "      __pyx_t_21 = __pyx_v_j;\n",
       "      *((double *) ( /* dim=1 */ (( /* dim=0 */ (__pyx_v_res.data + __pyx_t_20 * __pyx_v_res.strides[0]) ) + __pyx_t_21 * __pyx_v_res.strides[1]) )) = sqrt(__pyx_v_s);\n",
       "    }\n",
       "  }\n",
       "
+34:     return res
\n", "
  __Pyx_XDECREF(__pyx_r);\n",
       "  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_res, 2, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 34, __pyx_L1_error)\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_r = __pyx_t_1;\n",
       "  __pyx_t_1 = 0;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython -a\n", "\n", "import cython\n", "import numpy as np\n", "from libc.math cimport sqrt, pow\n", "\n", "@cython.boundscheck(False)\n", "@cython.wraparound(False)\n", "def cdist_cython1(double[:, :] xs, double[:, :] ys):\n", " \"\"\"Returns pairwise distance between row vectors in xs and ys.\n", " \n", " xs has shape (m, p)\n", " ys has shape (n, p)\n", " \n", " Return value has shape (m, n) \n", " \"\"\"\n", " \n", " cdef int m, n, p\n", " \n", " m = xs.shape[0]\n", " n = ys.shape[0]\n", " p = xs.shape[1]\n", " \n", " cdef double[:, :] res = np.empty((m, n))\n", " \n", " cdef int i, j\n", " \n", " cdef double s\n", " for i in range(m):\n", " for j in range(n):\n", " s = 0.0\n", " for k in range(p):\n", " s += pow(ys[j,k] - xs[i,k], 2) \n", " res[i, j] = sqrt(s)\n", " return res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "assert(np.allclose(cdist(xs, ys), cdist_cython(xs, ys)))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 112 ms, sys: 8 ms, total: 120 ms\n", "Wall time: 120 ms\n" ] } ], "source": [ "%%time\n", "\n", "Z = cdist_cython1(X, Y)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "t_cython1 = timeit(lambda : cdist_cython1(X, Y), number=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `pybind11`" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting funcs.cpp\n" ] } ], "source": [ "%%file funcs.cpp\n", "<%\n", "cfg['compiler_args'] = ['-std=c++11']\n", "cfg['include_dirs'] = ['/usr/include/eigen3']\n", "setup_pybind11(cfg)\n", "%>\n", "\n", "#include \n", "#include \n", "\n", "#include \n", "#include \n", "\n", "namespace py = pybind11;\n", "\n", "using Eigen::MatrixXd;\n", "\n", "MatrixXd cdist(MatrixXd xs, MatrixXd ys) {\n", " int m = xs.rows();\n", " int n = ys.rows();\n", " int p = ys.cols();\n", " \n", " MatrixXd res(m, n);\n", " \n", " double s;\n", " for (int i=0; i\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
methodstimesspeed-up
0python9.3404641.0
1numba0.27799733.6
2numba10.11179383.6
3cython9.9112340.9
4cython10.11933278.3
5pybind110.15422460.6
\n", "" ], "text/plain": [ " methods times speed-up\n", "0 python 9.340464 1.0\n", "1 numba 0.277997 33.6\n", "2 numba1 0.111793 83.6\n", "3 cython 9.911234 0.9\n", "4 cython1 0.119332 78.3\n", "5 pybind11 0.154224 60.6" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perf['speed-up'] = np.around(perf['times'][0]/perf['times'], 1)\n", "perf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using multiple cores\n", "\n", "The standard implementation of Python uses a Global Interpreter Lock (GIL). This means that only one thread can be run at any one time, and multiple threads work by time-slicing. Hence multi-threaded code with lots of latency can result in speed-ups, but multi-threaded code which is computationally intensive will not see any speed-up. For numerically intensive code, parallel code needs to be run in separate processes to see speed-ups." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we see how to split the computation into pieces using a loop." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 1.],\n", " [2., 3.],\n", " [4., 5.]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 1.],\n", " [2., 3.]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ys" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 2.82842712],\n", " [2.82842712, 0. ],\n", " [5.65685425, 2.82842712]])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdist(xs, ys)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 2.82842712],\n", " [2.82842712, 0. ],\n", " [5.65685425, 2.82842712]])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = np.concatenate([cdist(x, ys) for x in np.split(xs, 3, 0)])\n", "res" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 9.98 s, sys: 8 ms, total: 9.99 s\n", "Wall time: 9.99 s\n" ] } ], "source": [ "%%time\n", "\n", "Z = cdist(X, Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `multiprocessing`" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "from multiprocessing import Pool" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 32 ms, sys: 44 ms, total: 76 ms\n", "Wall time: 2.94 s\n" ] } ], "source": [ "%%time\n", "\n", "with Pool(processes=4) as p:\n", " Z1 = p.starmap(cdist, [(X_, Y) for X_ in np.split(X, 100, 0)])\n", " Z1 = np.concatenate(Z1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "np.testing.assert_allclose(Z, Z1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `concurrent.futures" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "def cdist_(args):\n", " return cdist(*args)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 136 ms, sys: 88 ms, total: 224 ms\n", "Wall time: 2.7 s\n" ] } ], "source": [ "%%time\n", "\n", "with ProcessPoolExecutor(max_workers=4) as pool:\n", " Z2 = list(pool.map(cdist_, [(X_, Y) for X_ in np.split(X, 100, 0)]))\n", " Z2 = np.concatenate(Z2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "np.testing.assert_allclose(Z, Z2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `joblib`\n", "\n", "`joblib` provides parallel processing using a comprehension syntax." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "from joblib import Parallel, delayed" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 144 ms, sys: 60 ms, total: 204 ms\n", "Wall time: 2.68 s\n" ] } ], "source": [ "%%time\n", "\n", "Z3 = Parallel(n_jobs=4)(delayed(cdist)(X_, Y) for X_ in np.split(X, 100, 0))\n", "Z3 = np.concatenate(Z3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "np.testing.assert_allclose(Z, Z3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using threads\n", "\n", "Note that there is no gain with using multiple threads for computationally intensive tasks because of the GIL." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 10.3 s, sys: 184 ms, total: 10.4 s\n", "Wall time: 10.2 s\n" ] } ], "source": [ "%%time\n", "\n", "with ThreadPoolExecutor(max_workers=4) as pool:\n", " Z4 = list(pool.map(cdist_, [(X_, Y) for X_ in np.split(X, 100, 0)]))\n", " Z4 = np.concatenate(Z4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "np.testing.assert_allclose(Z, Z4)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }