{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python review of concepts\n", "\n", "Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python as a language" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Why Python? \n", "\n", "- Huge community - especially in data science and ML \n", "- Easy to learn \n", "- Batteries included \n", "- Extensive 3rd party libraries \n", "- Widely used in both industry and academia \n", "- Most important “glue” language bridging multiple communities" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello world!\n" ] } ], "source": [ "import __hello__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Versions \n", "\n", "- Only use Python 3 (current release version is 3.8, container is 3.7) \n", "- Do not use Python 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import sys" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3.8.5 (default, Jul 21 2020, 10:48:26) \\n[Clang 11.0.3 (clang-1103.0.32.62)]'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sys.version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multi-paradigm " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Procedural" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = []\n", "for i in range(5):\n", " x.append(i*i)\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Functional" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(map(lambda x: x*x, range(5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Object-oriented " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "class Robot:\n", " def __init__(self, name, function):\n", " self.name = name\n", " self.function = function\n", " \n", " def greet(self):\n", " return f\"I am {self.name}, a {self.function} robot!\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "fido = Robot('roomba', 'vacuum cleaner')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'roomba'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fido.name" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'vacuum cleaner'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fido.function" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'I am roomba, a vacuum cleaner robot!'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fido.greet()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dynamic typing " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Complexity of a + b " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.3" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + 2.3" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(int, float)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(1), type(2.3)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'hello world'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hello' + ' world'" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5, 6]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[1,2,3] + [4,5,6]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 11, 12])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "np.arange(3) + 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Several Python implementations! \n", "\n", "- CPtyhon \n", "- Pypy \n", "- IronPython \n", "- Jython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Global interpreter lock (GIL) \n", "\n", "- Only applies to CPython\n", "- Threads vs processes \n", "- Avoid threads in general \n", "- Performance not predictable" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def f(n):\n", " x = np.random.uniform(0,1,n)\n", " y = np.random.uniform(0,1,n)\n", " count = 0\n", " for i in range(n):\n", " if x[i]**2 + y[i]**2 < 1:\n", " count += 1\n", " return count*4/n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "n = 100000\n", "niter = 4" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 525 ms, sys: 3.21 ms, total: 528 ms\n", "Wall time: 528 ms\n" ] }, { "data": { "text/plain": [ "[3.1392, 3.153, 3.14876, 3.14132]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "\n", "[f(n) for i in range(niter)]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 549 ms, sys: 6.37 ms, total: 556 ms\n", "Wall time: 546 ms\n" ] }, { "data": { "text/plain": [ "[3.14536, 3.1468, 3.13868, 3.14756]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "\n", "with ThreadPoolExecutor(4) as pool:\n", " xs = list(pool.map(f, [n]*niter))\n", "xs" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "ename": "BrokenProcessPool", "evalue": "A process in the process pool was terminated abruptly while the future was running or pending.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mBrokenProcessPool\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n", "\u001b[0;32m/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/process.py\u001b[0m in \u001b[0;36m_chain_from_iterable_of_lists\u001b[0;34m(iterable)\u001b[0m\n\u001b[1;32m 482\u001b[0m \u001b[0mcareful\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mto\u001b[0m \u001b[0mkeep\u001b[0m \u001b[0mreferences\u001b[0m \u001b[0mto\u001b[0m \u001b[0myielded\u001b[0m \u001b[0mobjects\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 483\u001b[0m \"\"\"\n\u001b[0;32m--> 484\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0melement\u001b[0m \u001b[0;32min\u001b[0m \u001b[0miterable\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 485\u001b[0m \u001b[0melement\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreverse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 486\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0melement\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py\u001b[0m in \u001b[0;36mresult_iterator\u001b[0;34m()\u001b[0m\n\u001b[1;32m 609\u001b[0m \u001b[0;31m# Careful not to keep a reference to the popped future\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 610\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtimeout\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 611\u001b[0;31m \u001b[0;32myield\u001b[0m \u001b[0mfs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 612\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 613\u001b[0m \u001b[0;32myield\u001b[0m \u001b[0mfs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mend_time\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmonotonic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mCancelledError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 438\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_state\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mFINISHED\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 439\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__get_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 440\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 441\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTimeoutError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py\u001b[0m in \u001b[0;36m__get_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 386\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__get_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 387\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_exception\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 388\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_exception\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 389\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 390\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_result\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mBrokenProcessPool\u001b[0m: A process in the process pool was terminated abruptly while the future was running or pending." ] } ], "source": [ "%%time\n", "\n", "with ProcessPoolExecutor(4) as pool:\n", " xs = list(pool.map(f, [n]*niter))\n", "xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coding in Python" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Zen of Python, by Tim Peters\n", "\n", "Beautiful is better than ugly.\n", "Explicit is better than implicit.\n", "Simple is better than complex.\n", "Complex is better than complicated.\n", "Flat is better than nested.\n", "Sparse is better than dense.\n", "Readability counts.\n", "Special cases aren't special enough to break the rules.\n", "Although practicality beats purity.\n", "Errors should never pass silently.\n", "Unless explicitly silenced.\n", "In the face of ambiguity, refuse the temptation to guess.\n", "There should be one-- and preferably only one --obvious way to do it.\n", "Although that way may not be obvious at first unless you're Dutch.\n", "Now is better than never.\n", "Although never is often better than *right* now.\n", "If the implementation is hard to explain, it's a bad idea.\n", "If the implementation is easy to explain, it may be a good idea.\n", "Namespaces are one honking great idea -- let's do more of those!\n" ] } ], "source": [ "import this" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coding conventions \n", "\n", "- PEP 8 \n", "- Avoid magic numbers \n", "- Avoid copy and paste \n", "- extract common functionality into functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data types \n", "\n", "- Integers \n", " - Arbitrary precision \n", " - Integer division operator \n", " - Base conversion \n", " - Check if integer " ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import math" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "n = math.factorial(100)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'93,326,215,443,944,152,681,699,238,856,266,700,490,715,968,264,381,621,468,592,963,895,217,599,993,229,915,608,941,463,976,156,518,286,253,697,920,827,223,758,251,185,210,916,864,000,000,000,000,000,000,000,000'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f'{n:,}'" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "h = math.sqrt(3**2 + 4**2)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h.is_integer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Floats \n", " - Checking for equality \n", " - Catastrophic cancellation \n", "- Complex" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "x = np.arange(9).reshape(3,3)\n", "x = x / x.sum(axis=0)\n", "λ = np.linalg.eigvals(x)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9999999999999993" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "λ[0]" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "λ[0] == 1" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math.isclose(λ[0], 1)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "def var(xs):\n", " \"\"\"Returns variance of sample data.\"\"\"\n", " \n", " n = 0\n", " s = 0\n", " ss = 0\n", "\n", " for x in xs:\n", " n +=1\n", " s += x\n", " ss += x*x\n", "\n", " v = (ss - (s*s)/n)/(n-1)\n", " return v" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "xs = np.random.normal(1e9, 1, int(1e6))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13287.56835956836" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var(xs)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0003007438816822" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.var(xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Boolean \n", " - What evaluates as False? " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[] evaluates as False\n", "[1] evaluates as True\n", "{} evaluates as False\n", " evaluates as False\n", "hello evaluates as True\n", "0 evaluates as False\n", "1 evaluates as True\n", "True evaluates as True\n", "False evaluates as False\n" ] } ], "source": [ "stuff = [[], [1], {},'', 'hello', 0, 1, 1==1, 1==2]\n", "for s in stuff:\n", " if s:\n", " print(f'{s} evaluates as True')\n", " else:\n", " print(f'{s} evaluates as False')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- String \n", " - Unicode by default \n", " - b, r, f strings" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'猫'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u'\\u732b'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "String formatting\n", "\n", "- Learn to use the f-string." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'The letter e has position 5 in the alphabet'" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "char = 'e'\n", "pos = string.ascii_lowercase.index(char) + 1\n", "f\"The letter {char} has position {pos} in the alphabet\"" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1,000,000,000'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = int(1e9)\n", "f\"{n:,}\"" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "x = math.pi" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' 3.14'" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{x:8.2f}\"" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2020, 11, 11, 19, 23, 45, 578067)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import datetime\n", "now = datetime.datetime.now()\n", "now" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2020-11-11 19:23'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{now:%Y-%m-%d %H:%M}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data structures \n", "\n", "- Immutable - string, tulle \n", "- Mutable - list, set, dictionary \n", "- Collections module \n", "- heapq " ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ChainMap',\n", " 'Counter',\n", " 'OrderedDict',\n", " 'UserDict',\n", " 'UserList',\n", " 'UserString',\n", " 'abc',\n", " 'defaultdict',\n", " 'deque',\n", " 'namedtuple']" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import collections\n", "\n", "[x for x in dir(collections) if not x.startswith('_')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions \n", "\n", "- \\*args, \\*\\*kwargs \n", "- Care with mutable default values \n", "- First class objects \n", "- Anonymous functions \n", "- Decorators" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def f(*args, **kwargs):\n", " print(f\"args = {args}\") # in Python 3.8, you can just write f'{args = }'\n", " print(f\"kwargs = {kwargs}\")" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "args = (1, 2, 3)\n", "kwargs = {'a': 4, 'b': 5, 'c': 6}\n" ] } ], "source": [ "f(1,2,3,a=4,b=5,c=6)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "def g(a, xs=[]):\n", " xs.append(a)\n", " return xs" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g(1)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g(2)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "h = lambda x, y, z: x**2 + y**2 + z**2" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "14" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h(1,2,3)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "from functools import lru_cache" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "def fib(n):\n", " print(n, end=', ')\n", " if n <= 1:\n", " return n\n", " else:\n", " return fib(n-2) + fib(n-1)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 9, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 8, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 7, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, 6, 4, 2, 0, 1, 3, 1, 2, 0, 1, 5, 3, 1, 2, 0, 1, 4, 2, 0, 1, 3, 1, 2, 0, 1, " ] }, { "data": { "text/plain": [ "55" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fib(10)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "@lru_cache(maxsize=100)\n", "def fib_cache(n):\n", " print(n, end=', ')\n", " if n <= 1:\n", " return n\n", " else:\n", " return fib_cache(n-2) + fib_cache(n-1)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10, 8, 6, 4, 2, 0, 1, 3, 5, 7, 9, " ] }, { "data": { "text/plain": [ "55" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fib_cache(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classes \n", "\n", "- Key idea is encapsulation into objects \n", "- Everything in Python is an object \n", "- Attributes and methods \n", "- What is self? \n", "- Special methods - double underscore methods \n", "- Avoid complex inheritance schemes - prefer composition \n", "- Learn “design patterns” if interested in OOP" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(3.0).is_integer()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Hello World'" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hello world'.title()" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "class Student:\n", " def __init__(self, first, last):\n", " self.first = first\n", " self.last = last\n", " \n", " @property\n", " def name(self):\n", " return f'{self.first} {self.last}' " ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "s = Student('Santa', 'Claus')" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Santa Claus'" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Enums\n", "\n", "Use enums readability when you have a discrete set of CONSTANTS." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "from enum import Enum" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "class Day(Enum):\n", " MON = 1\n", " TUE = 2\n", " WED = 3\n", " THU = 4\n", " FRI = 5\n", " SAT = 6\n", " SUN = 7" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Day.MON\n", "Day.TUE\n", "Day.WED\n", "Day.THU\n", "Day.FRI\n", "Day.SAT\n", "Day.SUN\n" ] } ], "source": [ "for day in Day:\n", " print(day)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NamedTuple" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "from collections import namedtuple" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "Student = namedtuple('Student', ['name', 'email', 'age', 'gpa', 'species'])" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', 23, 3.4, 'Human')" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Human'" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abe.species" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('abe.lincoln@gmail.com', 23, 3.4)" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abe[1:4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Classes\n", "\n", "Simplifies creation and use of classes for data records. \n", "\n", "Note: NamedTuple serves a similar function but are immutable." ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "from dataclasses import dataclass" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "@dataclass\n", "class Student:\n", " name: str\n", " email: str\n", " age: int\n", " gpa: float\n", " species: str = 'Human'" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', age=23, gpa=3.4)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Student(name='Abraham Lincoln', email='abe.lincoln@gmail.com', age=23, gpa=3.4, species='Human')" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abe" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'abe.lincoln@gmail.com'" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abe.email" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Human'" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abe.species" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**\n", "\n", "The type annotations are informative only. Python does *not* enforce them." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Student(name='a', email='b', age='c', gpa='d', species='e')" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Student(*'abcde')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports, modules and namespaces \n", "\n", "- A namespace is basically just a dictionary \n", "- LEGB \n", "- Avoid polluting the global namespace" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray']" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[x for x in dir(__builtin__) if x[0].islower()][:8]" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "x1 = 23\n", "\n", "def f1(x2):\n", " print(locals())\n", " # x1 is global (G), x2 is enclosing (E), x3 is local\n", " def g(x3):\n", " print(locals())\n", " return x3 + x2 + x1 \n", " return g" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "x = 23\n", "\n", "def f2(x):\n", " print(locals())\n", " def g(x):\n", " print(locals())\n", " return x \n", " return g" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'x2': 3}\n", "{'x3': 2, 'x2': 3}\n" ] }, { "data": { "text/plain": [ "28" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g1 = f1(3)\n", "g1(2)" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'x': 3}\n", "{'x': 2}\n" ] }, { "data": { "text/plain": [ "2" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g2 = f2(3)\n", "g2(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loops \n", "\n", "- Prefer vectorization unless using numba \n", "- Difference between continue and break \n", "- Avoid infinite loops \n", "- Comprehensions and generator expressions" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': 97,\n", " 'b': 98,\n", " 'c': 99,\n", " 'd': 100,\n", " 'e': 101,\n", " 'f': 102,\n", " 'g': 103,\n", " 'h': 104,\n", " 'i': 105,\n", " 'j': 106,\n", " 'k': 107,\n", " 'l': 108,\n", " 'm': 109,\n", " 'n': 110,\n", " 'o': 111,\n", " 'p': 112,\n", " 'q': 113,\n", " 'r': 114,\n", " 's': 115,\n", " 't': 116,\n", " 'u': 117,\n", " 'v': 118,\n", " 'w': 119,\n", " 'x': 120,\n", " 'y': 121,\n", " 'z': 122}" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{char: ord(char) for char in string.ascii_lowercase}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterations and generators \n", "\n", "- The iterator protocol\n", " - `__iter__` and `__next__`\n", " - iter()\n", " - next()\n", "- What happens in a for loop\n", "- Generators with `yield` and `yield from`" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "class Iterator:\n", " \"\"\"A silly class that implements the Iterator protocol and Strategy pattern.\n", " \n", " start = start of range to square\n", " stop = end of range to square\n", " \"\"\"\n", " def __init__(self, start, stop, func):\n", " self.start = start\n", " self.stop = stop\n", " self.func = func\n", " \n", " def __iter__(self):\n", " self.n = self.start\n", " return self\n", " \n", " def __next__(self):\n", " if self.n >= self.stop:\n", " raise StopIteration\n", " else:\n", " x = self.func(self.n)\n", " self.n += 1\n", " return x" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [], "source": [ "sq = Iterator(0, 5, lambda x: x*x)" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16]" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(sq)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators\n", "\n", "Like functions, but lazy." ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "def cycle1(xs, n):\n", " \"\"\"Cuycles through values in xs n times.\"\"\"\n", " \n", " for i in range(n):\n", " for x in xs:\n", " yield x" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(cycle1([1,2,3], 4))" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ann\n", "bob\n" ] } ], "source": [ "for x in cycle1(['ann', 'bob', 'stop', 'charles'], 1000):\n", " if x == 'stop':\n", " break\n", " else:\n", " print(x)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "def cycle2(xs, n):\n", " \"\"\"Cuycles through values in xs n times.\"\"\"\n", " \n", " for i in range(n):\n", " yield from xs" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(cycle2([1,2,3], 4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because they are lazy, generators can be used for infinite streams." ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "def fib():\n", " a, b = 1, 1\n", " while True:\n", " yield a\n", " a, b = b, a + b" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, " ] } ], "source": [ "for n in fib():\n", " if n > 100:\n", " break\n", " print(n, end=', ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can even slice infinite generators. More when we cover functional programming." ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "import itertools as it" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[8, 13, 21, 34, 55]" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.islice(fib(), 5, 10))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }