{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Functional programming in Python (operator, functional, itertoools, toolz)\n", "\n", "- Pure functions\n", "- Recursive functions\n", "- Anonymous functions\n", "- Lazy evaluation\n", "- Higher-order functions\n", "- Decorators\n", "- Partial application\n", "- Using `operator`\n", "- Using `functional`\n", "- Using `itertools`\n", "- Pipelines with `toolz`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pure functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deterministic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pure" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(148.4131591025766, 148.4131591025766)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(5), np.exp(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not pure" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.1539109537552415, 0.25636728267414016)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.randn(), np.random.randn()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### No side effects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pure" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def f(xs):\n", " '''Modify value at first index.'''\n", " if len(xs) > 0:\n", " xs = list(xs)\n", " xs[0] = '@'\n", " return xs" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(['@', 2, 3], [1, 2, 3])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs = [1,2,3]\n", "f(xs), xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not pure" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def g(xs):\n", " '''Modify value at first index.'''\n", " if len(xs) > 0:\n", " xs[0] = '@'\n", " return xs" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(['@', 2, 3], ['@', 2, 3])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs = [1,2,3]\n", "g(xs), xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Is the function `h` pure or impure?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def h(n, xs=[]):\n", " for i in range(n):\n", " xs.append(i)\n", " return xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function is not deterministic, and it has side effects!" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "n = 5\n", "xs = [1,2,3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Non-deterministic" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h(n)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To avoid non-determinism, do not set default mutable arguments. The usaal Python idiom is\n", "\n", "```python\n", "def func(xs=None):\n", " \"\"\"Docstring.\"\"\"\n", " \n", " if xs is None:\n", " xs = []\n", " do_something(xs)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Side effects" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "xs = [1,2,3]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 0, 1, 2, 3, 4]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h(n, xs)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 0, 1, 2, 3, 4]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recursive functions\n", "\n", "A recursive function is one that calls itself. Python supports recursion, but recursive functions in Python are not efficient and iterative algorithms are preferred in general." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def rec_sum(xs):\n", " \"\"\"Recursive sum.\"\"\"\n", " \n", " if len(xs) == 0:\n", " return 0\n", " else:\n", " return xs[0] + rec_sum(xs[1:])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rec_sum([1,2,3,4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Anonymous functions" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(x, y)>" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lambda x, y: x + y" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "add = lambda x, y: x + y" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add(3, 4)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(x, y)>" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lambda x, y: x if x < y else y" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "smaller = lambda x, y: x if x < y else y" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "smaller(9,1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lazy evaluation" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "range(0, 10)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "range(10)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(range(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators\n", "\n", "Generators behave like functions that retain the last state and can be re-entered. Results are returned with `yield` rather than `return`. Generators are used extensively in Python, and almost exclusively in the `itertools` and `toolz` packages we will review later in this notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Differences between a function and a generator" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "def fib_eager(n):\n", " \"\"\"Eager Fibonacci function.\"\"\"\n", " \n", " xs = []\n", " a, b = 1,1\n", " for i in range(n):\n", " xs.append(a)\n", " a, b = b, a + b\n", " return xs" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fib_eager(10)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "def fib_lazy(n):\n", " \"\"\"Lazy Fibonacci generator.\"\"\"\n", " \n", " a, b = 1,1\n", " for i in range(n):\n", " yield a\n", " a, b = b, a + b" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fib_lazy(10)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(fib_lazy(10))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "fibs1 = fib_eager(10)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1,1,2,3,5,8,13,21,34,55," ] } ], "source": [ "for i in fibs1:\n", " print(i, end=',')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1,1,2,3,5,8,13,21,34,55," ] } ], "source": [ "for i in fibs1:\n", " print(i, end=',')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "fibs2 = fib_lazy(10)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1,1,2,3,5,8,13,21,34,55," ] } ], "source": [ "for i in fibs2:\n", " print(i, end=',')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "for i in fibs2:\n", " print(i, end=',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Generators can return infinite sequences" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "def iota(start = 1):\n", " \"\"\"An infinite incrementing genrator.\"\"\"\n", " \n", " while True:\n", " yield start\n", " start += 1 " ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26," ] } ], "source": [ "for i in iota():\n", " print(i, end=',')\n", " if i > 25:\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Higher order functions\n", "\n", "- Take a function as an argument \n", "- Returns a function" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "def dist(x, y, measure):\n", " \"\"\"Returns distance between x and y using given measure.\n", " \n", " measure is a function that takes two arguments x and y.\n", " \"\"\"\n", " \n", " return measure(x, y)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "def euclid(x, y):\n", " \"\"\"Returns Euclidean distance between x and y.\"\"\"\n", " \n", " return np.sqrt(np.sum(x**2 + y**2))" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([0,0])\n", "y = np.array([3,4])\n", "\n", "dist(x, y, measure=euclid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standard HOFs" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "from functools import reduce" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(map(lambda x: x**2, range(10)))" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 2, 4, 6, 8]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(filter(lambda x: x % 2 == 0, range(10)))" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "45" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reduce(lambda x, y: x + y, range(10))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "55" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reduce(lambda x, y: x + y, range(10), 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: Flattening a nested list" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['the', 'quick', 'brown', 'fox'], ['jumps', 'over', 'the', 'dog']]" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 = 'the quick brown fox'\n", "s2 = 'jumps over the dog'\n", "xs = [s.split() for s in [s1, s2]]\n", "xs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a nested for loop" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ys = []\n", "for x in xs:\n", " for y in x:\n", " ys.append(y)\n", "ys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a list comprehension" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[y for x in xs for y in x]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using `reduce`" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reduce(lambda x, y: x + y, xs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Closure" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "def f(a):\n", " \"\"\"The argument given to f is visible in g when g is called.\"\"\"\n", " \n", " def g(b):\n", " return a + b\n", " return g" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "g3 = f(3)\n", "g5 = f(5)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7, 9)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g3(4), g5(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Decorators\n", "\n", "A decorator is a higher-order function that takes a function as input and returns a *decorated* version of the original function with additional capabilities. There is syntactical sugar for decorators in Python using the `@decorator_function` notation as shown below." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "import time" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "def timer(f):\n", " \"\"\"Times how long f takes.\"\"\"\n", " \n", " def g(*args, **kwargs):\n", " \n", " start = time.time() \n", " res = f(*args, **kwargs)\n", " elapsed = time.time() - start\n", " return res, elapsed\n", " return g" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "def slow(n=1):\n", " time.sleep(n)\n", " return n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use as a function" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "slow_t1 = timer(slow)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.5, 0.5037732124328613)" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slow_t1(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use as a decorator" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "@timer\n", "def slow_t2(n=1):\n", " time.sleep(n)\n", " return n" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.5, 0.5038459300994873)" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slow_t2(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Partial application\n", "\n", "Partial application takes a function with two or more parameters, and returns the function with some parameters *filled in*. Partial application is very useful when constructing pipelines that transform data in stages." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "from functools import partial" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "def add(a, b):\n", " \"\"\"Add a and b.\"\"\"\n", " \n", " return a + b" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[10, 11, 12, 13, 14]" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(map(partial(add, b=10), range(5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `operator`\n", "\n", "Operator provides named version of Python operators, and is typically used in place of anonymous functions in higher order functions to improve readability." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "import operator as op" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "xs = [('a', 3), ('b', 1), ('c', 2)]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('b', 1), ('c', 2), ('a', 3)]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(xs, key=op.itemgetter(1))" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('b', 1), ('c', 2), ('a', 3)]" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(xs, key=lambda x: x[1])" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reduce(op.add, range(1,5))" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reduce(lambda a, b: a + b, range(1,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `functional`\n", "\n", "We have already seen the use of `reduce` and `partial` from `functools`. Another useful function from the package is the `lrucache` decorator." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "def fib(n):\n", " \"\"\"Recursive version of Fibonacci.\"\"\"\n", " \n", " print('Call fib(%d)' % n)\n", " \n", " if n == 0:\n", " return 0\n", " elif n == 1:\n", " return 1\n", " else:\n", " return fib(n-2) + fib(n-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice the inefficiency from repeatedly calling the function with the same arguments." ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Call fib(6)\n", "Call fib(4)\n", "Call fib(2)\n", "Call fib(0)\n", "Call fib(1)\n", "Call fib(3)\n", "Call fib(1)\n", "Call fib(2)\n", "Call fib(0)\n", "Call fib(1)\n", "Call fib(5)\n", "Call fib(3)\n", "Call fib(1)\n", "Call fib(2)\n", "Call fib(0)\n", "Call fib(1)\n", "Call fib(4)\n", "Call fib(2)\n", "Call fib(0)\n", "Call fib(1)\n", "Call fib(3)\n", "Call fib(1)\n", "Call fib(2)\n", "Call fib(0)\n", "Call fib(1)\n" ] }, { "data": { "text/plain": [ "8" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fib(6)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "from functools import lru_cache" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "@lru_cache(maxsize=None)\n", "def fib_cache(n):\n", " \"\"\"Recursive version of Fibonacci.\"\"\"\n", " \n", " print('Call fib(%d)' % n)\n", " \n", " if n == 0:\n", " return 0\n", " elif n == 1:\n", " return 1\n", " else:\n", " return fib_cache(n-2) + fib_cache(n-1)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Call fib(6)\n", "Call fib(4)\n", "Call fib(2)\n", "Call fib(0)\n", "Call fib(1)\n", "Call fib(3)\n", "Call fib(5)\n" ] }, { "data": { "text/plain": [ "8" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fib_cache(6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `itertools`\n", "\n", "The `itertools` package provides tools for efficient looping." ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "import itertools as it" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generators" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.islice(it.count(), 10))" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 1, 2, 3, 1, 2, 3, 1]" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.islice(it.cycle([1,2,3]), 10))" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3],\n", " [1, 2, 3]]" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.islice(it.repeat([1,2,3]), 10))" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(0, 'a'), (1, 'b'), (2, 'c'), (3, None), (4, None)]" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.zip_longest(range(5), 'abc'))" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'out of donuts'), (4, 'out of donuts')]" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.zip_longest(range(5), 'abc', fillvalue='out of donuts'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Permutations and combinations" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('a', 'b', 'c'),\n", " ('a', 'c', 'b'),\n", " ('b', 'a', 'c'),\n", " ('b', 'c', 'a'),\n", " ('c', 'a', 'b'),\n", " ('c', 'b', 'a')]" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.permutations('abc'))" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.permutations('abc', 2))" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('a', 'b'), ('a', 'c'), ('b', 'c')]" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.combinations('abc', 2))" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.combinations_with_replacement('abc', 2))" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('a', 'a'),\n", " ('a', 'b'),\n", " ('a', 'c'),\n", " ('b', 'a'),\n", " ('b', 'b'),\n", " ('b', 'c'),\n", " ('c', 'a'),\n", " ('c', 'b'),\n", " ('c', 'c')]" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.product('abc', repeat=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Miscellaneous" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.chain([1,2,3], [4,5,6], [7,8,9]))" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.chain.from_iterable([[1,2,3], [4,5,6], [7,8,9]]))" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "nums = [1,3,5,7,2,4,6,1,3,5,7,9,2,2,2]" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False [1, 3, 5, 7]\n", "True [2, 4, 6]\n", "False [1, 3, 5, 7, 9]\n", "True [2, 2, 2]\n" ] } ], "source": [ "for i, g in it.groupby(nums, key=lambda x: x % 2 ==0):\n", " print(i, list(g))" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 3, 5, 7]" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.takewhile(lambda x: x % 2 == 1, nums))" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 4, 6, 1, 3, 5, 7, 9, 2, 2, 2]" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(it.dropwhile(lambda x: x % 2 == 1, nums))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Classic word-count using map-reduce\n", "\n", "There are two basic steps - first we create a tuple (word, 1), then group by word, then reduce the grouping to sum up the ones. " ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "rhyme = 'humpty dumpty sat on a wall humpty dumpty had a great fall'\n", "words = rhyme.split()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Map to create key-value pair" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [], "source": [ "x1 = map(lambda x: (x, 1), words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Group similar keys together" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "x2 = it.groupby(sorted(x1), key=lambda x: x[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Reduce on value of key-value pair" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [], "source": [ "x3 = map(lambda x: (x[0], reduce(lambda a, b: a[1] + b[1], x[1])), x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Clean-up because the nested reduce stops when the list has only one element" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "x4 = map(lambda x: (x[0], x[1] if isinstance(x[1], int) else x[1][1]), x3)" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[('a', 2),\n", " ('dumpty', 2),\n", " ('fall', 1),\n", " ('great', 1),\n", " ('had', 1),\n", " ('humpty', 2),\n", " ('on', 1),\n", " ('sat', 1),\n", " ('wall', 1)]" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(x4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using `toolz`\n", "\n", "The `toolz` package provides a very rich set of functional operators, and is recommended if you want to program in the functional style using Python." ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "import toolz" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "from toolz import sliding_window, pipe, frequencies, concat" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [], "source": [ "import toolz.curried as c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example word count revisited" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'humpty': 2,\n", " 'dumpty': 2,\n", " 'sat': 1,\n", " 'on': 1,\n", " 'a': 2,\n", " 'wall': 1,\n", " 'had': 1,\n", " 'great': 1,\n", " 'fall': 1}" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frequencies(words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example\n", "\n", "Read in some documents, remove punctuation, split into words and then into individual characters, and find the most commonly occurring sliding windows containing 3 characters.\n", "\n", "This is most naturally done by piping in a data set through a series of transforms." ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "d1 = 'the doo doo doo the dah dah dah'\n", "d2 = 'every breath she takes, every move she makes'\n", "d3 = 'another brick in the wall'\n", "d4 = 'another one bites the dust and another one gone'\n", "docs = [d1,d2,d3,d4]" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [], "source": [ "triple_freqs = pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " c.sliding_window(3),\n", " frequencies,\n", ")" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(('t', 'h', 'e'), 7),\n", " (('o', 't', 'h'), 4),\n", " (('h', 'e', 'd'), 3),\n", " (('d', 'o', 'o'), 3),\n", " (('d', 'a', 'h'), 3)]" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(triple_freqs.items(), key=lambda x: x[1], reverse=True)[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step by step" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{('t', 'h', 'e'): 7,\n", " ('h', 'e', 'd'): 3,\n", " ('e', 'd', 'o'): 1,\n", " ('d', 'o', 'o'): 3,\n", " ('o', 'o', 'd'): 2,\n", " ('o', 'd', 'o'): 2,\n", " ('o', 'o', 't'): 1,\n", " ('o', 't', 'h'): 4,\n", " ('e', 'd', 'a'): 1,\n", " ('d', 'a', 'h'): 3,\n", " ('a', 'h', 'd'): 2,\n", " ('h', 'd', 'a'): 2,\n", " ('a', 'h', 'e'): 1,\n", " ('h', 'e', 'v'): 1,\n", " ('e', 'v', 'e'): 2,\n", " ('v', 'e', 'r'): 2,\n", " ('e', 'r', 'y'): 2,\n", " ('r', 'y', 'b'): 1,\n", " ('y', 'b', 'r'): 1,\n", " ('b', 'r', 'e'): 1,\n", " ('r', 'e', 'a'): 1,\n", " ('e', 'a', 't'): 1,\n", " ('a', 't', 'h'): 1,\n", " ('t', 'h', 's'): 1,\n", " ('h', 's', 'h'): 1,\n", " ('s', 'h', 'e'): 2,\n", " ('h', 'e', 't'): 1,\n", " ('e', 't', 'a'): 1,\n", " ('t', 'a', 'k'): 1,\n", " ('a', 'k', 'e'): 2,\n", " ('k', 'e', 's'): 2,\n", " ('e', 's', 'e'): 1,\n", " ('s', 'e', 'v'): 1,\n", " ('r', 'y', 'm'): 1,\n", " ('y', 'm', 'o'): 1,\n", " ('m', 'o', 'v'): 1,\n", " ('o', 'v', 'e'): 1,\n", " ('v', 'e', 's'): 1,\n", " ('e', 's', 'h'): 1,\n", " ('h', 'e', 'm'): 1,\n", " ('e', 'm', 'a'): 1,\n", " ('m', 'a', 'k'): 1,\n", " ('e', 's', 'a'): 1,\n", " ('s', 'a', 'n'): 1,\n", " ('a', 'n', 'o'): 3,\n", " ('n', 'o', 't'): 3,\n", " ('h', 'e', 'r'): 3,\n", " ('e', 'r', 'b'): 1,\n", " ('r', 'b', 'r'): 1,\n", " ('b', 'r', 'i'): 1,\n", " ('r', 'i', 'c'): 1,\n", " ('i', 'c', 'k'): 1,\n", " ('c', 'k', 'i'): 1,\n", " ('k', 'i', 'n'): 1,\n", " ('i', 'n', 't'): 1,\n", " ('n', 't', 'h'): 1,\n", " ('h', 'e', 'w'): 1,\n", " ('e', 'w', 'a'): 1,\n", " ('w', 'a', 'l'): 1,\n", " ('a', 'l', 'l'): 1,\n", " ('l', 'l', 'a'): 1,\n", " ('l', 'a', 'n'): 1,\n", " ('e', 'r', 'o'): 2,\n", " ('r', 'o', 'n'): 2,\n", " ('o', 'n', 'e'): 3,\n", " ('n', 'e', 'b'): 1,\n", " ('e', 'b', 'i'): 1,\n", " ('b', 'i', 't'): 1,\n", " ('i', 't', 'e'): 1,\n", " ('t', 'e', 's'): 1,\n", " ('e', 's', 't'): 1,\n", " ('s', 't', 'h'): 1,\n", " ('e', 'd', 'u'): 1,\n", " ('d', 'u', 's'): 1,\n", " ('u', 's', 't'): 1,\n", " ('s', 't', 'a'): 1,\n", " ('t', 'a', 'n'): 1,\n", " ('a', 'n', 'd'): 1,\n", " ('n', 'd', 'a'): 1,\n", " ('d', 'a', 'n'): 1,\n", " ('n', 'e', 'g'): 1,\n", " ('e', 'g', 'o'): 1,\n", " ('g', 'o', 'n'): 1}" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " c.sliding_window(3),\n", " frequencies,\n", ")" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{('t', 'h', 'e'): 7,\n", " ('h', 'e', 'd'): 3,\n", " ('e', 'd', 'o'): 1,\n", " ('d', 'o', 'o'): 3,\n", " ('o', 'o', 'd'): 2,\n", " ('o', 'd', 'o'): 2,\n", " ('o', 'o', 't'): 1,\n", " ('o', 't', 'h'): 4,\n", " ('e', 'd', 'a'): 1,\n", " ('d', 'a', 'h'): 3,\n", " ('a', 'h', 'd'): 2,\n", " ('h', 'd', 'a'): 2,\n", " ('a', 'h', 'e'): 1,\n", " ('h', 'e', 'v'): 1,\n", " ('e', 'v', 'e'): 2,\n", " ('v', 'e', 'r'): 2,\n", " ('e', 'r', 'y'): 2,\n", " ('r', 'y', 'b'): 1,\n", " ('y', 'b', 'r'): 1,\n", " ('b', 'r', 'e'): 1,\n", " ('r', 'e', 'a'): 1,\n", " ('e', 'a', 't'): 1,\n", " ('a', 't', 'h'): 1,\n", " ('t', 'h', 's'): 1,\n", " ('h', 's', 'h'): 1,\n", " ('s', 'h', 'e'): 2,\n", " ('h', 'e', 't'): 1,\n", " ('e', 't', 'a'): 1,\n", " ('t', 'a', 'k'): 1,\n", " ('a', 'k', 'e'): 2,\n", " ('k', 'e', 's'): 2,\n", " ('e', 's', 'e'): 1,\n", " ('s', 'e', 'v'): 1,\n", " ('r', 'y', 'm'): 1,\n", " ('y', 'm', 'o'): 1,\n", " ('m', 'o', 'v'): 1,\n", " ('o', 'v', 'e'): 1,\n", " ('v', 'e', 's'): 1,\n", " ('e', 's', 'h'): 1,\n", " ('h', 'e', 'm'): 1,\n", " ('e', 'm', 'a'): 1,\n", " ('m', 'a', 'k'): 1,\n", " ('e', 's', 'a'): 1,\n", " ('s', 'a', 'n'): 1,\n", " ('a', 'n', 'o'): 3,\n", " ('n', 'o', 't'): 3,\n", " ('h', 'e', 'r'): 3,\n", " ('e', 'r', 'b'): 1,\n", " ('r', 'b', 'r'): 1,\n", " ('b', 'r', 'i'): 1,\n", " ('r', 'i', 'c'): 1,\n", " ('i', 'c', 'k'): 1,\n", " ('c', 'k', 'i'): 1,\n", " ('k', 'i', 'n'): 1,\n", " ('i', 'n', 't'): 1,\n", " ('n', 't', 'h'): 1,\n", " ('h', 'e', 'w'): 1,\n", " ('e', 'w', 'a'): 1,\n", " ('w', 'a', 'l'): 1,\n", " ('a', 'l', 'l'): 1,\n", " ('l', 'l', 'a'): 1,\n", " ('l', 'a', 'n'): 1,\n", " ('e', 'r', 'o'): 2,\n", " ('r', 'o', 'n'): 2,\n", " ('o', 'n', 'e'): 3,\n", " ('n', 'e', 'b'): 1,\n", " ('e', 'b', 'i'): 1,\n", " ('b', 'i', 't'): 1,\n", " ('i', 't', 'e'): 1,\n", " ('t', 'e', 's'): 1,\n", " ('e', 's', 't'): 1,\n", " ('s', 't', 'h'): 1,\n", " ('e', 'd', 'u'): 1,\n", " ('d', 'u', 's'): 1,\n", " ('u', 's', 't'): 1,\n", " ('s', 't', 'a'): 1,\n", " ('t', 'a', 'n'): 1,\n", " ('a', 'n', 'd'): 1,\n", " ('n', 'd', 'a'): 1,\n", " ('d', 'a', 'n'): 1,\n", " ('n', 'e', 'g'): 1,\n", " ('e', 'g', 'o'): 1,\n", " ('g', 'o', 'n'): 1}" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " c.sliding_window(3),\n", " frequencies,\n", ")" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{('t', 'h', 'e'): 7,\n", " ('h', 'e', 'd'): 3,\n", " ('e', 'd', 'o'): 1,\n", " ('d', 'o', 'o'): 3,\n", " ('o', 'o', 'd'): 2,\n", " ('o', 'd', 'o'): 2,\n", " ('o', 'o', 't'): 1,\n", " ('o', 't', 'h'): 4,\n", " ('e', 'd', 'a'): 1,\n", " ('d', 'a', 'h'): 3,\n", " ('a', 'h', 'd'): 2,\n", " ('h', 'd', 'a'): 2,\n", " ('a', 'h', 'e'): 1,\n", " ('h', 'e', 'v'): 1,\n", " ('e', 'v', 'e'): 2,\n", " ('v', 'e', 'r'): 2,\n", " ('e', 'r', 'y'): 2,\n", " ('r', 'y', 'b'): 1,\n", " ('y', 'b', 'r'): 1,\n", " ('b', 'r', 'e'): 1,\n", " ('r', 'e', 'a'): 1,\n", " ('e', 'a', 't'): 1,\n", " ('a', 't', 'h'): 1,\n", " ('t', 'h', 's'): 1,\n", " ('h', 's', 'h'): 1,\n", " ('s', 'h', 'e'): 2,\n", " ('h', 'e', 't'): 1,\n", " ('e', 't', 'a'): 1,\n", " ('t', 'a', 'k'): 1,\n", " ('a', 'k', 'e'): 2,\n", " ('k', 'e', 's'): 2,\n", " ('e', 's', 'e'): 1,\n", " ('s', 'e', 'v'): 1,\n", " ('r', 'y', 'm'): 1,\n", " ('y', 'm', 'o'): 1,\n", " ('m', 'o', 'v'): 1,\n", " ('o', 'v', 'e'): 1,\n", " ('v', 'e', 's'): 1,\n", " ('e', 's', 'h'): 1,\n", " ('h', 'e', 'm'): 1,\n", " ('e', 'm', 'a'): 1,\n", " ('m', 'a', 'k'): 1,\n", " ('e', 's', 'a'): 1,\n", " ('s', 'a', 'n'): 1,\n", " ('a', 'n', 'o'): 3,\n", " ('n', 'o', 't'): 3,\n", " ('h', 'e', 'r'): 3,\n", " ('e', 'r', 'b'): 1,\n", " ('r', 'b', 'r'): 1,\n", " ('b', 'r', 'i'): 1,\n", " ('r', 'i', 'c'): 1,\n", " ('i', 'c', 'k'): 1,\n", " ('c', 'k', 'i'): 1,\n", " ('k', 'i', 'n'): 1,\n", " ('i', 'n', 't'): 1,\n", " ('n', 't', 'h'): 1,\n", " ('h', 'e', 'w'): 1,\n", " ('e', 'w', 'a'): 1,\n", " ('w', 'a', 'l'): 1,\n", " ('a', 'l', 'l'): 1,\n", " ('l', 'l', 'a'): 1,\n", " ('l', 'a', 'n'): 1,\n", " ('e', 'r', 'o'): 2,\n", " ('r', 'o', 'n'): 2,\n", " ('o', 'n', 'e'): 3,\n", " ('n', 'e', 'b'): 1,\n", " ('e', 'b', 'i'): 1,\n", " ('b', 'i', 't'): 1,\n", " ('i', 't', 'e'): 1,\n", " ('t', 'e', 's'): 1,\n", " ('e', 's', 't'): 1,\n", " ('s', 't', 'h'): 1,\n", " ('e', 'd', 'u'): 1,\n", " ('d', 'u', 's'): 1,\n", " ('u', 's', 't'): 1,\n", " ('s', 't', 'a'): 1,\n", " ('t', 'a', 'n'): 1,\n", " ('a', 'n', 'd'): 1,\n", " ('n', 'd', 'a'): 1,\n", " ('d', 'a', 'n'): 1,\n", " ('n', 'e', 'g'): 1,\n", " ('e', 'g', 'o'): 1,\n", " ('g', 'o', 'n'): 1}" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " c.sliding_window(3),\n", " frequencies,\n", ")" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['the doo doo doo the dah dah dah',\n", " 'every breath she takes, every move she makes',\n", " 'another brick in the wall',\n", " 'another one bites the dust and another one gone']" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " list\n", ")" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['the doo doo doo the dah dah dah',\n", " 'every breath she takes every move she makes',\n", " 'another brick in the wall',\n", " 'another one bites the dust and another one gone']" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " list\n", ")" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['the', 'doo', 'doo', 'doo', 'the', 'dah', 'dah', 'dah'],\n", " ['every', 'breath', 'she', 'takes', 'every', 'move', 'she', 'makes'],\n", " ['another', 'brick', 'in', 'the', 'wall'],\n", " ['another', 'one', 'bites', 'the', 'dust', 'and', 'another', 'one', 'gone']]" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " list\n", ")" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['the',\n", " 'doo',\n", " 'doo',\n", " 'doo',\n", " 'the',\n", " 'dah',\n", " 'dah',\n", " 'dah',\n", " 'every',\n", " 'breath',\n", " 'she',\n", " 'takes',\n", " 'every',\n", " 'move',\n", " 'she',\n", " 'makes',\n", " 'another',\n", " 'brick',\n", " 'in',\n", " 'the',\n", " 'wall',\n", " 'another',\n", " 'one',\n", " 'bites',\n", " 'the',\n", " 'dust',\n", " 'and',\n", " 'another',\n", " 'one',\n", " 'gone']" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " list\n", ")" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "['t',\n", " 'h',\n", " 'e',\n", " 'd',\n", " 'o',\n", " 'o',\n", " 'd',\n", " 'o',\n", " 'o',\n", " 'd',\n", " 'o',\n", " 'o',\n", " 't',\n", " 'h',\n", " 'e',\n", " 'd',\n", " 'a',\n", " 'h',\n", " 'd',\n", " 'a',\n", " 'h',\n", " 'd',\n", " 'a',\n", " 'h',\n", " 'e',\n", " 'v',\n", " 'e',\n", " 'r',\n", " 'y',\n", " 'b',\n", " 'r',\n", " 'e',\n", " 'a',\n", " 't',\n", " 'h',\n", " 's',\n", " 'h',\n", " 'e',\n", " 't',\n", " 'a',\n", " 'k',\n", " 'e',\n", " 's',\n", " 'e',\n", " 'v',\n", " 'e',\n", " 'r',\n", " 'y',\n", " 'm',\n", " 'o',\n", " 'v',\n", " 'e',\n", " 's',\n", " 'h',\n", " 'e',\n", " 'm',\n", " 'a',\n", " 'k',\n", " 'e',\n", " 's',\n", " 'a',\n", " 'n',\n", " 'o',\n", " 't',\n", " 'h',\n", " 'e',\n", " 'r',\n", " 'b',\n", " 'r',\n", " 'i',\n", " 'c',\n", " 'k',\n", " 'i',\n", " 'n',\n", " 't',\n", " 'h',\n", " 'e',\n", " 'w',\n", " 'a',\n", " 'l',\n", " 'l',\n", " 'a',\n", " 'n',\n", " 'o',\n", " 't',\n", " 'h',\n", " 'e',\n", " 'r',\n", " 'o',\n", " 'n',\n", " 'e',\n", " 'b',\n", " 'i',\n", " 't',\n", " 'e',\n", " 's',\n", " 't',\n", " 'h',\n", " 'e',\n", " 'd',\n", " 'u',\n", " 's',\n", " 't',\n", " 'a',\n", " 'n',\n", " 'd',\n", " 'a',\n", " 'n',\n", " 'o',\n", " 't',\n", " 'h',\n", " 'e',\n", " 'r',\n", " 'o',\n", " 'n',\n", " 'e',\n", " 'g',\n", " 'o',\n", " 'n',\n", " 'e']" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " list\n", ")" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "[('t', 'h', 'e'),\n", " ('h', 'e', 'd'),\n", " ('e', 'd', 'o'),\n", " ('d', 'o', 'o'),\n", " ('o', 'o', 'd'),\n", " ('o', 'd', 'o'),\n", " ('d', 'o', 'o'),\n", " ('o', 'o', 'd'),\n", " ('o', 'd', 'o'),\n", " ('d', 'o', 'o'),\n", " ('o', 'o', 't'),\n", " ('o', 't', 'h'),\n", " ('t', 'h', 'e'),\n", " ('h', 'e', 'd'),\n", " ('e', 'd', 'a'),\n", " ('d', 'a', 'h'),\n", " ('a', 'h', 'd'),\n", " ('h', 'd', 'a'),\n", " ('d', 'a', 'h'),\n", " ('a', 'h', 'd'),\n", " ('h', 'd', 'a'),\n", " ('d', 'a', 'h'),\n", " ('a', 'h', 'e'),\n", " ('h', 'e', 'v'),\n", " ('e', 'v', 'e'),\n", " ('v', 'e', 'r'),\n", " ('e', 'r', 'y'),\n", " ('r', 'y', 'b'),\n", " ('y', 'b', 'r'),\n", " ('b', 'r', 'e'),\n", " ('r', 'e', 'a'),\n", " ('e', 'a', 't'),\n", " ('a', 't', 'h'),\n", " ('t', 'h', 's'),\n", " ('h', 's', 'h'),\n", " ('s', 'h', 'e'),\n", " ('h', 'e', 't'),\n", " ('e', 't', 'a'),\n", " ('t', 'a', 'k'),\n", " ('a', 'k', 'e'),\n", " ('k', 'e', 's'),\n", " ('e', 's', 'e'),\n", " ('s', 'e', 'v'),\n", " ('e', 'v', 'e'),\n", " ('v', 'e', 'r'),\n", " ('e', 'r', 'y'),\n", " ('r', 'y', 'm'),\n", " ('y', 'm', 'o'),\n", " ('m', 'o', 'v'),\n", " ('o', 'v', 'e'),\n", " ('v', 'e', 's'),\n", " ('e', 's', 'h'),\n", " ('s', 'h', 'e'),\n", " ('h', 'e', 'm'),\n", " ('e', 'm', 'a'),\n", " ('m', 'a', 'k'),\n", " ('a', 'k', 'e'),\n", " ('k', 'e', 's'),\n", " ('e', 's', 'a'),\n", " ('s', 'a', 'n'),\n", " ('a', 'n', 'o'),\n", " ('n', 'o', 't'),\n", " ('o', 't', 'h'),\n", " ('t', 'h', 'e'),\n", " ('h', 'e', 'r'),\n", " ('e', 'r', 'b'),\n", " ('r', 'b', 'r'),\n", " ('b', 'r', 'i'),\n", " ('r', 'i', 'c'),\n", " ('i', 'c', 'k'),\n", " ('c', 'k', 'i'),\n", " ('k', 'i', 'n'),\n", " ('i', 'n', 't'),\n", " ('n', 't', 'h'),\n", " ('t', 'h', 'e'),\n", " ('h', 'e', 'w'),\n", " ('e', 'w', 'a'),\n", " ('w', 'a', 'l'),\n", " ('a', 'l', 'l'),\n", " ('l', 'l', 'a'),\n", " ('l', 'a', 'n'),\n", " ('a', 'n', 'o'),\n", " ('n', 'o', 't'),\n", " ('o', 't', 'h'),\n", " ('t', 'h', 'e'),\n", " ('h', 'e', 'r'),\n", " ('e', 'r', 'o'),\n", " ('r', 'o', 'n'),\n", " ('o', 'n', 'e'),\n", " ('n', 'e', 'b'),\n", " ('e', 'b', 'i'),\n", " ('b', 'i', 't'),\n", " ('i', 't', 'e'),\n", " ('t', 'e', 's'),\n", " ('e', 's', 't'),\n", " ('s', 't', 'h'),\n", " ('t', 'h', 'e'),\n", " ('h', 'e', 'd'),\n", " ('e', 'd', 'u'),\n", " ('d', 'u', 's'),\n", " ('u', 's', 't'),\n", " ('s', 't', 'a'),\n", " ('t', 'a', 'n'),\n", " ('a', 'n', 'd'),\n", " ('n', 'd', 'a'),\n", " ('d', 'a', 'n'),\n", " ('a', 'n', 'o'),\n", " ('n', 'o', 't'),\n", " ('o', 't', 'h'),\n", " ('t', 'h', 'e'),\n", " ('h', 'e', 'r'),\n", " ('e', 'r', 'o'),\n", " ('r', 'o', 'n'),\n", " ('o', 'n', 'e'),\n", " ('n', 'e', 'g'),\n", " ('e', 'g', 'o'),\n", " ('g', 'o', 'n'),\n", " ('o', 'n', 'e')]" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " c.sliding_window(3),\n", " list\n", ")" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{('t', 'h', 'e'): 7,\n", " ('h', 'e', 'd'): 3,\n", " ('e', 'd', 'o'): 1,\n", " ('d', 'o', 'o'): 3,\n", " ('o', 'o', 'd'): 2,\n", " ('o', 'd', 'o'): 2,\n", " ('o', 'o', 't'): 1,\n", " ('o', 't', 'h'): 4,\n", " ('e', 'd', 'a'): 1,\n", " ('d', 'a', 'h'): 3,\n", " ('a', 'h', 'd'): 2,\n", " ('h', 'd', 'a'): 2,\n", " ('a', 'h', 'e'): 1,\n", " ('h', 'e', 'v'): 1,\n", " ('e', 'v', 'e'): 2,\n", " ('v', 'e', 'r'): 2,\n", " ('e', 'r', 'y'): 2,\n", " ('r', 'y', 'b'): 1,\n", " ('y', 'b', 'r'): 1,\n", " ('b', 'r', 'e'): 1,\n", " ('r', 'e', 'a'): 1,\n", " ('e', 'a', 't'): 1,\n", " ('a', 't', 'h'): 1,\n", " ('t', 'h', 's'): 1,\n", " ('h', 's', 'h'): 1,\n", " ('s', 'h', 'e'): 2,\n", " ('h', 'e', 't'): 1,\n", " ('e', 't', 'a'): 1,\n", " ('t', 'a', 'k'): 1,\n", " ('a', 'k', 'e'): 2,\n", " ('k', 'e', 's'): 2,\n", " ('e', 's', 'e'): 1,\n", " ('s', 'e', 'v'): 1,\n", " ('r', 'y', 'm'): 1,\n", " ('y', 'm', 'o'): 1,\n", " ('m', 'o', 'v'): 1,\n", " ('o', 'v', 'e'): 1,\n", " ('v', 'e', 's'): 1,\n", " ('e', 's', 'h'): 1,\n", " ('h', 'e', 'm'): 1,\n", " ('e', 'm', 'a'): 1,\n", " ('m', 'a', 'k'): 1,\n", " ('e', 's', 'a'): 1,\n", " ('s', 'a', 'n'): 1,\n", " ('a', 'n', 'o'): 3,\n", " ('n', 'o', 't'): 3,\n", " ('h', 'e', 'r'): 3,\n", " ('e', 'r', 'b'): 1,\n", " ('r', 'b', 'r'): 1,\n", " ('b', 'r', 'i'): 1,\n", " ('r', 'i', 'c'): 1,\n", " ('i', 'c', 'k'): 1,\n", " ('c', 'k', 'i'): 1,\n", " ('k', 'i', 'n'): 1,\n", " ('i', 'n', 't'): 1,\n", " ('n', 't', 'h'): 1,\n", " ('h', 'e', 'w'): 1,\n", " ('e', 'w', 'a'): 1,\n", " ('w', 'a', 'l'): 1,\n", " ('a', 'l', 'l'): 1,\n", " ('l', 'l', 'a'): 1,\n", " ('l', 'a', 'n'): 1,\n", " ('e', 'r', 'o'): 2,\n", " ('r', 'o', 'n'): 2,\n", " ('o', 'n', 'e'): 3,\n", " ('n', 'e', 'b'): 1,\n", " ('e', 'b', 'i'): 1,\n", " ('b', 'i', 't'): 1,\n", " ('i', 't', 'e'): 1,\n", " ('t', 'e', 's'): 1,\n", " ('e', 's', 't'): 1,\n", " ('s', 't', 'h'): 1,\n", " ('e', 'd', 'u'): 1,\n", " ('d', 'u', 's'): 1,\n", " ('u', 's', 't'): 1,\n", " ('s', 't', 'a'): 1,\n", " ('t', 'a', 'n'): 1,\n", " ('a', 'n', 'd'): 1,\n", " ('n', 'd', 'a'): 1,\n", " ('d', 'a', 'n'): 1,\n", " ('n', 'e', 'g'): 1,\n", " ('e', 'g', 'o'): 1,\n", " ('g', 'o', 'n'): 1}" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe(\n", " docs,\n", " c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),\n", " c.map(lambda x: x.split()),\n", " concat,\n", " concat,\n", " c.sliding_window(3),\n", " frequencies\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }