{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Code Optimization\n",
    "====\n",
    "\n",
    "Here we will look briefly at how to time and profile your code, and then at an approach to making your code run faster. There is a sequence of mini-gaols that is applicable to nearly every programming problem:\n",
    "\n",
    "1. Make it run\n",
    "2. Make it right\n",
    "3. Make it fast\n",
    "\n",
    "Note that the list does not start with `Make it fast`. Testing, debugging and optimization are a set of strategies and practices to achieve those goals. Only optimization will be covered in these notes - pointers to resources for testing and debugging are provided but not covered."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Testing code\n",
    "\n",
    "- For an introduction to standard testing in Python, see [Testing Your Code](http://docs.python-guide.org/en/latest/writing/tests/)\n",
    "- For automated generation of tests, see the [Hypothesis package](https://github.com/DRMacIver/hypothesis)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing distance.py\n"
     ]
    }
   ],
   "source": [
    "%%file distance.py\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "def euclidean_dist(u, v):\n",
    "    \"\"\"Returns Euclidean distance betwen numpy vectors u and v.\"\"\"\n",
    "    w = u - v\n",
    "    return np.sqrt(np.sum(w**2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_distance.py\n"
     ]
    }
   ],
   "source": [
    "%%file test_distance.py\n",
    "import numpy as np\n",
    "from numpy.testing import assert_almost_equal\n",
    "from distance import euclidean_dist\n",
    "\n",
    "def test_non_negativity():\n",
    "    for i in range(10):\n",
    "        u = np.random.normal(3)\n",
    "        v = np.random.normal(3)\n",
    "        assert euclidean_dist(u, v) >= 0\n",
    "\n",
    "def test_coincidence_when_zero():\n",
    "    u = np.zeros(3)\n",
    "    v = np.zeros(3)\n",
    "    assert euclidean_dist(u, v) == 0\n",
    "\n",
    "def test_coincidence_when_not_zero():\n",
    "     for i in range(10):\n",
    "        u = np.random.random(3)\n",
    "        v = np.zeros(3)\n",
    "        assert euclidean_dist(u, v) != 0\n",
    "\n",
    "def test_symmetry():\n",
    "    for i in range(10):\n",
    "        u = np.random.random(3)\n",
    "        v = np.random.random(3)\n",
    "        assert euclidean_dist(u, v) == euclidean_dist(v, u)\n",
    "\n",
    "def test_triangle():\n",
    "    u = np.random.random(3)\n",
    "    v = np.random.random(3)\n",
    "    w = np.random.random(3)\n",
    "    assert euclidean_dist(u, w) <= euclidean_dist(u, v) + euclidean_dist(v, w)\n",
    "\n",
    "def test_known1():\n",
    "    u = np.array([0])\n",
    "    v = np.array([3])\n",
    "    assert_almost_equal(euclidean_dist(u, v), 3)\n",
    "\n",
    "def test_known2():\n",
    "    u = np.array([0,0])\n",
    "    v = np.array([3, 4])\n",
    "    assert_almost_equal(euclidean_dist(u, v), 5)\n",
    "\n",
    "def test_known3():\n",
    "    u = np.array([0,0])\n",
    "    v = np.array([-3, -4])\n",
    "    assert_almost_equal(euclidean_dist(u, v), 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m============================= test session starts ==============================\u001b[0m\n",
      "platform darwin -- Python 2.7.11, pytest-2.8.5, py-1.4.31, pluggy-0.3.1\n",
      "rootdir: /Users/cliburn/git/sta-663-2016/lectures, inifile: \n",
      "collected 8 items \n",
      "\u001b[0m\n",
      "test_distance.py ........\n",
      "\n",
      "\u001b[32m\u001b[1m=========================== 8 passed in 0.68 seconds ===========================\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "! py.test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Debugging\n",
    "\n",
    "Tools within Jupyter from the [official tutorial](https://ipython.org/ipython-doc/1/interactive/tutorial.html#debugging) \n",
    "```\n",
    "After an exception occurs, you can call %debug to jump into the Python debugger (pdb) and examine the problem. Alternatively, if you call %pdb, IPython will automatically start the debugger on any uncaught exception. You can print variables, see code, execute statements and even walk up and down the call stack to track down the true source of the problem. This can be an efficient way to develop and debug code, in many cases eliminating the need for print statements or external debugging tools.\n",
    "\n",
    "You can also step through a program from the beginning by calling %run -d theprogram.py.\n",
    "```\n",
    "\n",
    "- See the [Scipy tutorial](http://www.scipy-lectures.org/advanced/debugging/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Timing and profiling code\n",
    "----\n",
    "\n",
    "Install profiling tools:\n",
    "```bash\n",
    "pip install --pre line-profiler\n",
    "pip install psutil\n",
    "pip install memory_profiler\n",
    "```\n",
    "\n",
    "References:\n",
    "\n",
    "1. <http://scipy-lectures.github.com/advanced/optimizing/index.html>\n",
    "2. <http://pynash.org/2013/03/06/timing-and-profiling.html>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Timing code\n",
    "\n",
    "- 1s = 1000 ms \n",
    "- 1 ms = 1000 $\\mu$s \n",
    "- 1 $\\mu$s = 1000 ns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Simple approach"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0024928400525823"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import time\n",
    "import timeit\n",
    "\n",
    "def f(nsec=1.0):\n",
    "    \"\"\"Function sleeps for nsec seconds.\"\"\"\n",
    "    time.sleep(nsec) \n",
    "    \n",
    "start = timeit.default_timer()\n",
    "f()\n",
    "elapsed = timeit.default_timer() - start\n",
    "elapsed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### We can make a decorator for convenience"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def process_time(f, *args, **kwargs):\n",
    "    def func(*args, **kwargs):\n",
    "        import timeit\n",
    "        start = timeit.default_timer()\n",
    "        f(*args, **kwargs)\n",
    "        print(timeit.default_timer() - start)\n",
    "    return func"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "@process_time\n",
    "def f1(nsec=1.0):\n",
    "    \"\"\"Function sleeps for nsec seconds.\"\"\"\n",
    "    time.sleep(nsec)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.000329414033331\n"
     ]
    }
   ],
   "source": [
    "f1()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Within the Jupyter notebook, use the timeit magic function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 loops, best of 3: 11.2 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit f(0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10 loops, best of 3: 11.3 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit -n10 f(0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 loops, best of 10: 11.2 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit -r10 f(0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10 loops, best of 3: 11.4 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit -n10 -r3 f(0.01)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Profiling code\n",
    "\n",
    "This can be done in a notebook with %prun, with the following readouts as column headers:\n",
    "\n",
    "- ncalls\n",
    "    - for the number of calls,\n",
    "- tottime\n",
    "    - for the total time spent in the given function (and excluding time made in calls to sub-functions),\n",
    "- percall\n",
    "    - is the quotient of tottime divided by ncalls\n",
    "- cumtime\n",
    "    - is the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.\n",
    "- percall\n",
    "    - is the quotient of cumtime divided by primitive calls\n",
    "- filename:lineno(function)\n",
    "    - provides the respective data of each function "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def foo1(n):\n",
    "    return sum(i**2 for i in range(n))\n",
    "\n",
    "def foo2(n):\n",
    "    return sum(i*i for i in range(n))\n",
    "\n",
    "def foo3(n):\n",
    "    [foo1(n) for i in range(10)]\n",
    "    foo2(n)\n",
    "\n",
    "def bar(n):\n",
    "    return sum(i**3 for i in range(n))\n",
    "\n",
    "def work(n):\n",
    "    foo1(n)\n",
    "    foo2(n)\n",
    "    foo3(n)\n",
    "    bar(n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " \n",
      "*** Profile stats marshalled to file 'work.prof'. \n"
     ]
    }
   ],
   "source": [
    "%prun -q -D work.prof work(int(1e6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mon Mar  7 11:18:30 2016    work.prof\n",
      "\n",
      "         14000048 function calls in 10.525 seconds\n",
      "\n",
      "   Random listing order was used\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      "        2    0.000    0.000    0.699    0.350 <ipython-input-9-48ed291f45d5>:4(foo2)\n",
      "       14    2.233    0.159   10.525    0.752 {built-in method builtins.sum}\n",
      "        1    0.000    0.000    1.008    1.008 <ipython-input-9-48ed291f45d5>:11(bar)\n",
      "        1    0.000    0.000   10.525   10.525 <string>:1(<module>)\n",
      "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}\n",
      "  1000001    0.803    0.000    0.803    0.000 <ipython-input-9-48ed291f45d5>:12(<genexpr>)\n",
      "       11    0.000    0.000    8.818    0.802 <ipython-input-9-48ed291f45d5>:1(foo1)\n",
      "        1    0.000    0.000   10.525   10.525 <ipython-input-9-48ed291f45d5>:14(work)\n",
      "        1    0.000    0.000    8.362    8.362 <ipython-input-9-48ed291f45d5>:7(foo3)\n",
      "        1    0.000    0.000    8.011    8.011 <ipython-input-9-48ed291f45d5>:8(<listcomp>)\n",
      "        1    0.000    0.000   10.525   10.525 {built-in method builtins.exec}\n",
      "  2000002    0.410    0.000    0.410    0.000 <ipython-input-9-48ed291f45d5>:5(<genexpr>)\n",
      " 11000011    7.079    0.000    7.079    0.000 <ipython-input-9-48ed291f45d5>:2(<genexpr>)\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import pstats\n",
    "p = pstats.Stats('work.prof')\n",
    "p.print_stats()\n",
    "pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mon Mar  7 11:18:30 2016    work.prof\n",
      "\n",
      "         14000048 function calls in 10.525 seconds\n",
      "\n",
      "   Ordered by: internal time, cumulative time\n",
      "   List reduced from 13 to 3 due to restriction <'foo'>\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      "       11    0.000    0.000    8.818    0.802 <ipython-input-9-48ed291f45d5>:1(foo1)\n",
      "        2    0.000    0.000    0.699    0.350 <ipython-input-9-48ed291f45d5>:4(foo2)\n",
      "        1    0.000    0.000    8.362    8.362 <ipython-input-9-48ed291f45d5>:7(foo3)\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "p.sort_stats('time', 'cumulative').print_stats('foo')\n",
    "pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mon Mar  7 11:18:30 2016    work.prof\n",
      "\n",
      "         14000048 function calls in 10.525 seconds\n",
      "\n",
      "   Ordered by: call count\n",
      "   List reduced from 13 to 5 due to restriction <5>\n",
      "\n",
      "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)\n",
      " 11000011    7.079    0.000    7.079    0.000 <ipython-input-9-48ed291f45d5>:2(<genexpr>)\n",
      "  2000002    0.410    0.000    0.410    0.000 <ipython-input-9-48ed291f45d5>:5(<genexpr>)\n",
      "  1000001    0.803    0.000    0.803    0.000 <ipython-input-9-48ed291f45d5>:12(<genexpr>)\n",
      "       14    2.233    0.159   10.525    0.752 {built-in method builtins.sum}\n",
      "       11    0.000    0.000    8.818    0.802 <ipython-input-9-48ed291f45d5>:1(foo1)\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "p.sort_stats('ncalls').print_stats(5)\n",
    "pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Checking memory usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%load_ext memory_profiler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting foo.py\n"
     ]
    }
   ],
   "source": [
    "%%file foo.py\n",
    "\n",
    "def foo(n):\n",
    "    phrase = 'repeat me'\n",
    "    pmul = phrase * n\n",
    "    pjoi = ''.join([phrase for x in range(n)])\n",
    "    pinc = ''\n",
    "    for x in range(n):\n",
    "        pinc += phrase\n",
    "    del pmul, pjoi, pinc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "# mprun requires the code be in a file \n",
    "# funcions declared interactively in python will not work\n",
    "\n",
    "from foo import foo\n",
    "\n",
    "%mprun -f foo foo(100000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "peak memory: 136.37 MiB, increment: 17.44 MiB\n"
     ]
    }
   ],
   "source": [
    "# However, memit does work with interactive functions\n",
    "# Unlike mprun which gives a line by line analysis\n",
    "# memit gives the total amount of memory used\n",
    "\n",
    "def gobble(n):\n",
    "    x = [i*i for i in range(n)]\n",
    "    \n",
    "%memit -r 3 gobble(1000000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data structures and algorithms\n",
    "----\n",
    "\n",
    "There are many ways to speed up slow code. However, the first thing that should come to mind (after profiling to identify the bottlenecks) is whether there is a more appropriate data structure or algorithm that can be used. The reason is that this is the only approach that makes a difference to the big O complexity, and this makes all the difference for scalability. A few examples are shown here; a large collection of classic data structures and algorithms in Python with detailed explanations is available at [Problem Solving wiht Algorihms and Data Structures](http://interactivepython.org/runestone/static/pythonds/index.html#)\n",
    "\n",
    "You are highly encouraged to take an algoorithms class, where you will discover strategies such as:\n",
    "\n",
    "- adaptive methods (e.g. adaptive quadrature, adaprive Runge-Kutta)\n",
    "- divide and conquer (e.g. Barnes-Hut, Fast Fourier Transform)\n",
    "- tabling and dynamic programming (e.g. Viterbi algorithm for Hidden Markov Models)\n",
    "- graphs and network algorihtms (e.g. shortest path, max flow min cut)\n",
    "- hashing (e.g. locality senstive hashing, Bloom filters)\n",
    "- probabilistic algorithms (e.g. randomized projections, Monte Carlo integration)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example 1 - finding common elements in two data collections"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "xs = np.random.randint(0, 1000, 100)\n",
    "ys = np.random.randint(0, 1000, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using lists**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def common1(xs, ys):\n",
    "    \"\"\"Using lists.\"\"\"\n",
    "    zs = set([])\n",
    "    for x in xs:\n",
    "        for y in ys:\n",
    "            if x==y:\n",
    "                zs.add(x)\n",
    "    return zs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 1.58 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit -n3 -r3 common1(xs, ys)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using sets**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 44.7 µs per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit -n3 -r3 set(xs) & set(ys)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example 2 - Find minimum item in a list each time a new item is inserted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using lists**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "alist = list(np.random.randint(1000, 100000, 1000))\n",
    "blist = alist[:]\n",
    "entries = np.random.randint(1, 10000, 10000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def f1(alist, entries):\n",
    "    \"\"\"Using repeated sorts.\"\"\"\n",
    "    zs = []\n",
    "    for entry in entries:\n",
    "        alist.append(entry)\n",
    "        alist.sort(reverse=True)\n",
    "        zs.append(alist.pop())\n",
    "    return zs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 loops, best of 3: 398 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit f1(alist, entries)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using a heap (priority queue)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from heapq import heappushpop, heapify"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def f2(alist, entries):\n",
    "    \"\"\"Using a priority queue.\"\"\"\n",
    "    heapify(alist)\n",
    "    zs = []\n",
    "    for entry in entries:\n",
    "        zs.append(heappushpop(alist, entry))\n",
    "    return zs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 loops, best of 3: 3.61 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit f2(blist, entries) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Python idioms for speed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### String concatenation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 352 ms per loop\n",
      "3 loops, best of 3: 19.3 ms per loop\n"
     ]
    }
   ],
   "source": [
    "def concat1(alist):\n",
    "    \"\"\"Using string concatenation.\"\"\"\n",
    "    s = alist[0]\n",
    "    for item in alist[1:]:\n",
    "        s += \" \" + item\n",
    "    return s\n",
    "    \n",
    "def concat2(alist):\n",
    "    \"\"\"Using join.\"\"\"\n",
    "    return \" \".join(alist)\n",
    "\n",
    "alist = ['abcde'] * 1000000\n",
    "%timeit -r3 -n3 concat1(alist)\n",
    "%timeit -r3 -n3 concat2(alist)    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Avoiding loops"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 loops, best of 1: 476 ms per loop\n",
      "1 loops, best of 1: 373 ms per loop\n",
      "1 loops, best of 1: 286 ms per loop\n",
      "1 loops, best of 1: 268 ms per loop\n",
      "1 loops, best of 1: 111 ms per loop\n"
     ]
    }
   ],
   "source": [
    "\"\"\"Avoiding loops.\"\"\"\n",
    "\n",
    "import math\n",
    "\n",
    "def loop1(n):\n",
    "    \"\"\"Using for loop with function call.\"\"\"\n",
    "    z = []\n",
    "    for i in range(n):\n",
    "        z.append(math.sin(i))\n",
    "    return z\n",
    "\n",
    "def loop2(n):\n",
    "    \"\"\"Using local version of function.\"\"\"\n",
    "    z = []\n",
    "    sin = math.sin\n",
    "    for i in range(n):\n",
    "        z.append(sin(i))\n",
    "    return z\n",
    "\n",
    "def loop3(n):\n",
    "    \"\"\"Using list comprehension.\"\"\"\n",
    "    sin = math.sin\n",
    "    return [sin(i) for i in range(n)]\n",
    "\n",
    "def loop4(n):\n",
    "    \"\"\"Using map.\"\"\"\n",
    "    sin = math.sin\n",
    "    return list(map(sin, range(n)))\n",
    "\n",
    "def loop5(n):\n",
    "    \"\"\"Using numpy.\"\"\"\n",
    "    return np.sin(np.arange(n)).tolist()\n",
    "\n",
    "n = 1000000\n",
    "%timeit -r1 -n1 loop1(n)\n",
    "%timeit -r1 -n1 loop2(n)\n",
    "%timeit -r1 -n1 loop3(n)\n",
    "%timeit -r1 -n1 loop4(n)\n",
    "%timeit -r1 -n1 loop5(n)\n",
    "\n",
    "assert(np.all(loop1(n) == loop2(n)))\n",
    "assert(np.all(loop1(n) == loop3(n)))\n",
    "assert(np.all(loop1(n) == loop4(n)))\n",
    "assert(np.all(loop1(n) == loop5(n)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using in-place operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 loops, best of 3: 5.6 ms per loop\n",
      "100 loops, best of 3: 2.96 ms per loop\n"
     ]
    }
   ],
   "source": [
    "a = np.arange(1e6)\n",
    "    \n",
    "%timeit global a; a = a * 0\n",
    "%timeit global a; a *= 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using appropriate indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 431 ms per loop\n",
      "3 loops, best of 3: 10.3 ms per loop\n"
     ]
    }
   ],
   "source": [
    "def idx1(xs):\n",
    "    \"\"\"Using loops.\"\"\"\n",
    "    s = 0\n",
    "    for x in xs:\n",
    "        if (x > 10) and (x < 20):\n",
    "            s += x\n",
    "    return s\n",
    "\n",
    "def idx2(xs):\n",
    "    \"\"\"Using logical indexing.\"\"\"\n",
    "    return np.sum(xs[(xs > 10) & (xs < 20)])\n",
    "\n",
    "n = 1000000\n",
    "xs = np.random.randint(0, 100, n)\n",
    "%timeit -r3 -n3 idx1(xs)\n",
    "%timeit -r3 -n3 idx2(xs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using views to implement stencils"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 4.68 ms per loop\n",
      "3 loops, best of 3: 107 µs per loop\n"
     ]
    }
   ],
   "source": [
    "def average1(xs):\n",
    "    \"\"\"Using loops.\"\"\"\n",
    "    ys = xs.copy()\n",
    "    rows, cols = xs.shape\n",
    "    for i in range(rows):\n",
    "        for j in range(cols):\n",
    "            s = 0\n",
    "            for u in range(i-1, i+2):\n",
    "                if u < 0 or u >= rows:\n",
    "                    continue\n",
    "                for v in range(j-1, j+2):\n",
    "                    if v < 0 or v >= cols:\n",
    "                        continue\n",
    "                    s += xs[u, v]\n",
    "            ys[i, j] = s/9.0\n",
    "    return ys\n",
    "\n",
    "def average2(xs):\n",
    "    \"\"\"Using shifted array views and border to avoid out of bounds checks.\"\"\"\n",
    "    rows, cols = xs.shape\n",
    "    xs1 = np.zeros((rows+2, cols+2))\n",
    "    xs1[1:-1, 1:-1] = xs[:]\n",
    "    ys = (xs1[:-2, :-2]  + xs1[1:-1, :-2]  + xs1[2:, :-2] +\n",
    "          xs1[:-2, 1:-1] + xs1[1:-1, 1:-1] + xs1[2:, 1:-1] +\n",
    "          xs1[:-2, 2:]   + xs1[1:-1, 2:]   + xs1[2:, 2:])/9.0\n",
    "    return ys\n",
    "\n",
    "n = 25\n",
    "xs = np.random.uniform(0,10,(n, n))\n",
    "%timeit -r3 -n3 average1(xs)\n",
    "%timeit -r3 -n3 average2(xs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using generalized universal functions (gufuncs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.0239949 ,  0.9448899 ,  0.38954008, ...,  0.90376854,\n",
       "         0.00426179,  0.85283056],\n",
       "       [ 0.84696008,  0.41795534,  0.33089656, ...,  0.77224093,\n",
       "         0.13336724,  0.37291434],\n",
       "       [ 0.96967546,  0.24677243,  0.91441873, ...,  0.6430914 ,\n",
       "         0.1975462 ,  0.91088953],\n",
       "       ..., \n",
       "       [ 0.70033398,  0.23787021,  0.36570841, ...,  0.07397977,\n",
       "         0.64451552,  0.13583896],\n",
       "       [ 0.34292311,  0.95505168,  0.8044513 , ...,  0.98800589,\n",
       "         0.43128007,  0.67242465],\n",
       "       [ 0.36700419,  0.84937765,  0.44672394, ...,  0.82128124,\n",
       "         0.1343562 ,  0.11249669]])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xs = np.random.random((1000, 10))\n",
    "xs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.02427837,  0.59096979,  0.19144668, ...,  0.30528168,\n",
       "         0.73472361,  0.52060017],\n",
       "       [ 0.89185624,  0.41738057,  0.35344497, ...,  0.15926106,\n",
       "         0.56084192,  0.85950004],\n",
       "       [ 0.56798758,  0.42511275,  0.83825657, ...,  0.04916259,\n",
       "         0.94247933,  0.46567012],\n",
       "       ..., \n",
       "       [ 0.19913379,  0.62601032,  0.47914341, ...,  0.19906258,\n",
       "         0.49500519,  0.6781382 ],\n",
       "       [ 0.36574487,  0.25007863,  0.92439174, ...,  0.03072802,\n",
       "         0.35768397,  0.06059906],\n",
       "       [ 0.51692658,  0.37195484,  0.59856346, ...,  0.25166055,\n",
       "         0.48383847,  0.93378644]])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ys = np.random.random((1000, 10))\n",
    "ys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 2.05 ms per loop\n",
      "3 loops, best of 3: 21.7 µs per loop\n"
     ]
    }
   ],
   "source": [
    "from numpy.core.umath_tests import inner1d\n",
    "\n",
    "%timeit -n3 -r3 np.array([x @ y for x, y in zip(xs, ys)])\n",
    "%timeit -n3 -r3 inner1d(xs, ys)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from numpy.core.umath_tests import matrix_multiply"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "xs = np.random.randint(0, 10, (500, 2, 2))\n",
    "ys = np.random.randint(0, 10, (500, 2, 2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 loops, best of 3: 3.89 ms per loop\n",
      "3 loops, best of 3: 18.7 µs per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit -n3 -r3 np.array([x @ y for x, y in zip(xs, ys)])\n",
    "%timeit -r3 -n3 matrix_multiply(xs, ys)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Memoization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from functools import lru_cache"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10 loops, best of 1: 430 ms per loop\n",
      "10 loops, best of 1: 42.4 ms per loop\n",
      "10 loops, best of 1: 43.6 ms per loop\n"
     ]
    }
   ],
   "source": [
    "def fib(n):\n",
    "    if n <= 2:\n",
    "        return 1\n",
    "    else:\n",
    "        return fib(n-1) + fib(n-2)\n",
    "\n",
    "# A simple example of memoization - in practice, use `lru_cache` from functools\n",
    "def memoize(f):\n",
    "    store = {}\n",
    "    def func(n):\n",
    "        if n not in store:\n",
    "            store[n] = f(n)\n",
    "        return store[n]\n",
    "    return func\n",
    "\n",
    "@memoize\n",
    "def mfib(n):\n",
    "    return fib(n)\n",
    "\n",
    "@lru_cache()\n",
    "def lfib(n):\n",
    "    return fib(n)\n",
    "\n",
    "assert(fib(10) == mfib(10))\n",
    "assert(fib(10) == lfib(10))\n",
    "\n",
    "%timeit -r1 -n10 fib(30)\n",
    "%timeit -r1 -n10 mfib(30)\n",
    "%timeit -r1 -n10 lfib(30)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}