Functional programming in Python (operator, functional, itertoools, toolz)

  • Pure functions

  • Recursive functions

  • Anonymous functions

  • Lazy evaluation

  • Higher-order functions

  • Decorators

  • Partial application

  • Using operator

  • Using functional

  • Using itertools

  • Pipelines with toolz

[1]:
import numpy as np

Pure functions

Deterministic

Pure

[2]:
np.exp(5), np.exp(5)
[2]:
(148.4131591025766, 148.4131591025766)

Not pure

[3]:
np.random.randn(), np.random.randn()
[3]:
(0.1539109537552415, 0.25636728267414016)

No side effects

Pure

[4]:
def f(xs):
    '''Modify value at first index.'''
    if len(xs) > 0:
        xs = list(xs)
        xs[0] = '@'
    return xs
[5]:
xs = [1,2,3]
f(xs), xs
[5]:
(['@', 2, 3], [1, 2, 3])

Not pure

[6]:
def g(xs):
    '''Modify value at first index.'''
    if len(xs) > 0:
        xs[0] = '@'
    return xs
[7]:
xs = [1,2,3]
g(xs), xs
[7]:
(['@', 2, 3], ['@', 2, 3])

Exercise

Is the function h pure or impure?

[8]:
def h(n, xs=[]):
    for i in range(n):
        xs.append(i)
    return xs

The function is not deterministic, and it has side effects!

[9]:
n = 5
xs = [1,2,3]

Non-deterministic

[10]:
h(n)
[10]:
[0, 1, 2, 3, 4]
[11]:
h(n)
[11]:
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

To avoid non-determinism, do not set default mutable arguments. The usaal Python idiom is

def func(xs=None):
    """Docstring."""

    if xs is None:
        xs = []
    do_something(xs)

Side effects

[12]:
xs = [1,2,3]
[13]:
h(n, xs)
[13]:
[1, 2, 3, 0, 1, 2, 3, 4]
[14]:
xs
[14]:
[1, 2, 3, 0, 1, 2, 3, 4]

Recursive functions

A recursive function is one that calls itself. Python supports recursion, but recursive functions in Python are not efficient and iterative algorithms are preferred in general.

[15]:
def rec_sum(xs):
    """Recursive sum."""

    if len(xs) == 0:
        return 0
    else:
        return xs[0] + rec_sum(xs[1:])
[16]:
rec_sum([1,2,3,4])
[16]:
10

Anonymous functions

[17]:
lambda x, y: x + y
[17]:
<function __main__.<lambda>(x, y)>
[18]:
add = lambda x, y: x + y
[19]:
add(3, 4)
[19]:
7
[20]:
lambda x, y: x if x < y else y
[20]:
<function __main__.<lambda>(x, y)>
[21]:
smaller = lambda x, y: x if x < y else y
[22]:
smaller(9,1)
[22]:
1

Lazy evaluation

[23]:
range(10)
[23]:
range(0, 10)
[24]:
list(range(10))
[24]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Generators

Generators behave like functions that retain the last state and can be re-entered. Results are returned with yield rather than return. Generators are used extensively in Python, and almost exclusively in the itertools and toolz packages we will review later in this notebook.

Differences between a function and a generator

[25]:
def fib_eager(n):
    """Eager Fibonacci function."""

    xs = []
    a, b = 1,1
    for i in range(n):
        xs.append(a)
        a, b = b, a + b
    return xs
[26]:
fib_eager(10)
[26]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
[27]:
def fib_lazy(n):
    """Lazy Fibonacci generator."""

    a, b = 1,1
    for i in range(n):
        yield a
        a, b = b, a + b
[28]:
fib_lazy(10)
[28]:
<generator object fib_lazy at 0x116874c80>
[29]:
list(fib_lazy(10))
[29]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
[30]:
fibs1 = fib_eager(10)
[31]:
for i in fibs1:
    print(i, end=',')
1,1,2,3,5,8,13,21,34,55,
[32]:
for i in fibs1:
    print(i, end=',')
1,1,2,3,5,8,13,21,34,55,
[33]:
fibs2 = fib_lazy(10)
[34]:
for i in fibs2:
    print(i, end=',')
1,1,2,3,5,8,13,21,34,55,
[35]:
for i in fibs2:
    print(i, end=',')

Generators can return infinite sequences

[36]:
def iota(start = 1):
    """An infinite incrementing genrator."""

    while True:
        yield start
        start += 1
[37]:
for i in iota():
    print(i, end=',')
    if i > 25:
        break
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,

Higher order functions

  • Take a function as an argument

  • Returns a function

[38]:
def dist(x, y, measure):
    """Returns distance between x and y using given measure.

    measure is a function that takes two arguments x and y.
    """

    return measure(x, y)
[39]:
def euclid(x, y):
    """Returns Euclidean distance between x and y."""

    return np.sqrt(np.sum(x**2 + y**2))
[40]:
x = np.array([0,0])
y = np.array([3,4])

dist(x, y, measure=euclid)
[40]:
5.0

Standard HOFs

[41]:
from functools import reduce
[42]:
list(map(lambda x: x**2, range(10)))
[42]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[43]:
list(filter(lambda x: x % 2 == 0, range(10)))
[43]:
[0, 2, 4, 6, 8]
[44]:
reduce(lambda x, y: x + y, range(10))
[44]:
45
[45]:
reduce(lambda x, y: x + y, range(10), 10)
[45]:
55

Example: Flattening a nested list

[46]:
s1 = 'the quick brown fox'
s2 = 'jumps over the dog'
xs = [s.split() for s in [s1, s2]]
xs
[46]:
[['the', 'quick', 'brown', 'fox'], ['jumps', 'over', 'the', 'dog']]

Using a nested for loop

[47]:
ys = []
for x in xs:
    for y in x:
        ys.append(y)
ys
[47]:
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']

Using a list comprehension

[48]:
[y for x in xs for y in x]
[48]:
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']

Using reduce

[49]:
reduce(lambda x, y: x + y, xs)
[49]:
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']

Closure

[50]:
def f(a):
    """The argument given to f is visible in g when g is called."""

    def g(b):
        return a + b
    return g
[51]:
g3 = f(3)
g5 = f(5)
[52]:
g3(4), g5(4)
[52]:
(7, 9)

Decorators

A decorator is a higher-order function that takes a function as input and returns a decorated version of the original function with additional capabilities. There is syntactical sugar for decorators in Python using the @decorator_function notation as shown below.

[53]:
import time
[54]:
def timer(f):
    """Times how long f takes."""

    def g(*args, **kwargs):

        start = time.time()
        res = f(*args, **kwargs)
        elapsed = time.time() - start
        return res, elapsed
    return g
[55]:
def slow(n=1):
    time.sleep(n)
    return n

Use as a function

[56]:
slow_t1 = timer(slow)
[57]:
slow_t1(0.5)
[57]:
(0.5, 0.5037732124328613)

Use as a decorator

[58]:
@timer
def slow_t2(n=1):
    time.sleep(n)
    return n
[59]:
slow_t2(0.5)
[59]:
(0.5, 0.5038459300994873)

Partial application

Partial application takes a function with two or more parameters, and returns the function with some parameters filled in. Partial application is very useful when constructing pipelines that transform data in stages.

[60]:
from functools import partial
[61]:
def add(a, b):
    """Add a and b."""

    return a + b
[62]:
list(map(partial(add, b=10), range(5)))
[62]:
[10, 11, 12, 13, 14]

Using operator

Operator provides named version of Python operators, and is typically used in place of anonymous functions in higher order functions to improve readability.

[63]:
import operator as op
[64]:
xs = [('a', 3), ('b', 1), ('c', 2)]
[65]:
sorted(xs, key=op.itemgetter(1))
[65]:
[('b', 1), ('c', 2), ('a', 3)]
[66]:
sorted(xs, key=lambda x: x[1])
[66]:
[('b', 1), ('c', 2), ('a', 3)]
[67]:
reduce(op.add, range(1,5))
[67]:
10
[68]:
reduce(lambda a, b: a + b, range(1,5))
[68]:
10

Using functional

We have already seen the use of reduce and partial from functools. Another useful function from the package is the lrucache decorator.

[69]:
def fib(n):
    """Recursive version of Fibonacci."""

    print('Call fib(%d)' % n)

    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-2) + fib(n-1)

Notice the inefficiency from repeatedly calling the function with the same arguments.

[70]:
fib(6)
Call fib(6)
Call fib(4)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(3)
Call fib(1)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(5)
Call fib(3)
Call fib(1)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(4)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(3)
Call fib(1)
Call fib(2)
Call fib(0)
Call fib(1)
[70]:
8
[71]:
from functools import lru_cache
[72]:
@lru_cache(maxsize=None)
def fib_cache(n):
    """Recursive version of Fibonacci."""

    print('Call fib(%d)' % n)

    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib_cache(n-2) + fib_cache(n-1)
[73]:
fib_cache(6)
Call fib(6)
Call fib(4)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(3)
Call fib(5)
[73]:
8

Using itertools

The itertools package provides tools for efficient looping.

[74]:
import itertools as it

Generators

[75]:
list(it.islice(it.count(), 10))
[75]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[76]:
list(it.islice(it.cycle([1,2,3]), 10))
[76]:
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
[77]:
list(it.islice(it.repeat([1,2,3]), 10))
[77]:
[[1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3],
 [1, 2, 3]]
[78]:
list(it.zip_longest(range(5), 'abc'))
[78]:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, None), (4, None)]
[79]:
list(it.zip_longest(range(5), 'abc', fillvalue='out of donuts'))
[79]:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'out of donuts'), (4, 'out of donuts')]

Permutations and combinations

[80]:
list(it.permutations('abc'))
[80]:
[('a', 'b', 'c'),
 ('a', 'c', 'b'),
 ('b', 'a', 'c'),
 ('b', 'c', 'a'),
 ('c', 'a', 'b'),
 ('c', 'b', 'a')]
[81]:
list(it.permutations('abc', 2))
[81]:
[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
[82]:
list(it.combinations('abc', 2))
[82]:
[('a', 'b'), ('a', 'c'), ('b', 'c')]
[83]:
list(it.combinations_with_replacement('abc', 2))
[83]:
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
[84]:
list(it.product('abc', repeat=2))
[84]:
[('a', 'a'),
 ('a', 'b'),
 ('a', 'c'),
 ('b', 'a'),
 ('b', 'b'),
 ('b', 'c'),
 ('c', 'a'),
 ('c', 'b'),
 ('c', 'c')]

Miscellaneous

[85]:
list(it.chain([1,2,3], [4,5,6], [7,8,9]))
[85]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[86]:
list(it.chain.from_iterable([[1,2,3], [4,5,6], [7,8,9]]))
[86]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[87]:
nums = [1,3,5,7,2,4,6,1,3,5,7,9,2,2,2]
[88]:
for i, g in it.groupby(nums, key=lambda x: x % 2 ==0):
    print(i, list(g))
False [1, 3, 5, 7]
True [2, 4, 6]
False [1, 3, 5, 7, 9]
True [2, 2, 2]
[89]:
list(it.takewhile(lambda x: x % 2 == 1, nums))
[89]:
[1, 3, 5, 7]
[90]:
list(it.dropwhile(lambda x: x % 2 == 1, nums))
[90]:
[2, 4, 6, 1, 3, 5, 7, 9, 2, 2, 2]

Example: Classic word-count using map-reduce

There are two basic steps - first we create a tuple (word, 1), then group by word, then reduce the grouping to sum up the ones.

[91]:
rhyme = 'humpty dumpty sat on a wall humpty dumpty had a great fall'
words = rhyme.split()

Map to create key-value pair

[92]:
x1 = map(lambda x: (x, 1), words)

Group similar keys together

[93]:
x2 = it.groupby(sorted(x1), key=lambda x: x[0])

Reduce on value of key-value pair

[94]:
x3 = map(lambda x: (x[0], reduce(lambda a, b: a[1] + b[1], x[1])), x2)

Clean-up because the nested reduce stops when the list has only one element

[95]:
x4 = map(lambda x: (x[0], x[1] if isinstance(x[1], int) else x[1][1]), x3)
[96]:
list(x4)
[96]:
[('a', 2),
 ('dumpty', 2),
 ('fall', 1),
 ('great', 1),
 ('had', 1),
 ('humpty', 2),
 ('on', 1),
 ('sat', 1),
 ('wall', 1)]

Using toolz

The toolz package provides a very rich set of functional operators, and is recommended if you want to program in the functional style using Python.

[97]:
import toolz
[98]:
from toolz import sliding_window, pipe, frequencies, concat
[99]:
import toolz.curried as c

Example word count revisited

[100]:
frequencies(words)
[100]:
{'humpty': 2,
 'dumpty': 2,
 'sat': 1,
 'on': 1,
 'a': 2,
 'wall': 1,
 'had': 1,
 'great': 1,
 'fall': 1}

Example

Read in some documents, remove punctuation, split into words and then into individual characters, and find the most commonly occurring sliding windows containing 3 characters.

This is most naturally done by piping in a data set through a series of transforms.

[101]:
d1 = 'the doo doo doo the dah dah dah'
d2 = 'every breath she takes, every move she makes'
d3 = 'another brick in the wall'
d4 = 'another one bites the dust and another one gone'
docs = [d1,d2,d3,d4]
[102]:
import string
[103]:
triple_freqs = pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    c.sliding_window(3),
    frequencies,
)
[104]:
sorted(triple_freqs.items(), key=lambda x: x[1], reverse=True)[:5]
[104]:
[(('t', 'h', 'e'), 7),
 (('o', 't', 'h'), 4),
 (('h', 'e', 'd'), 3),
 (('d', 'o', 'o'), 3),
 (('d', 'a', 'h'), 3)]

Step by step

[105]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    c.sliding_window(3),
    frequencies,
)
[105]:
{('t', 'h', 'e'): 7,
 ('h', 'e', 'd'): 3,
 ('e', 'd', 'o'): 1,
 ('d', 'o', 'o'): 3,
 ('o', 'o', 'd'): 2,
 ('o', 'd', 'o'): 2,
 ('o', 'o', 't'): 1,
 ('o', 't', 'h'): 4,
 ('e', 'd', 'a'): 1,
 ('d', 'a', 'h'): 3,
 ('a', 'h', 'd'): 2,
 ('h', 'd', 'a'): 2,
 ('a', 'h', 'e'): 1,
 ('h', 'e', 'v'): 1,
 ('e', 'v', 'e'): 2,
 ('v', 'e', 'r'): 2,
 ('e', 'r', 'y'): 2,
 ('r', 'y', 'b'): 1,
 ('y', 'b', 'r'): 1,
 ('b', 'r', 'e'): 1,
 ('r', 'e', 'a'): 1,
 ('e', 'a', 't'): 1,
 ('a', 't', 'h'): 1,
 ('t', 'h', 's'): 1,
 ('h', 's', 'h'): 1,
 ('s', 'h', 'e'): 2,
 ('h', 'e', 't'): 1,
 ('e', 't', 'a'): 1,
 ('t', 'a', 'k'): 1,
 ('a', 'k', 'e'): 2,
 ('k', 'e', 's'): 2,
 ('e', 's', 'e'): 1,
 ('s', 'e', 'v'): 1,
 ('r', 'y', 'm'): 1,
 ('y', 'm', 'o'): 1,
 ('m', 'o', 'v'): 1,
 ('o', 'v', 'e'): 1,
 ('v', 'e', 's'): 1,
 ('e', 's', 'h'): 1,
 ('h', 'e', 'm'): 1,
 ('e', 'm', 'a'): 1,
 ('m', 'a', 'k'): 1,
 ('e', 's', 'a'): 1,
 ('s', 'a', 'n'): 1,
 ('a', 'n', 'o'): 3,
 ('n', 'o', 't'): 3,
 ('h', 'e', 'r'): 3,
 ('e', 'r', 'b'): 1,
 ('r', 'b', 'r'): 1,
 ('b', 'r', 'i'): 1,
 ('r', 'i', 'c'): 1,
 ('i', 'c', 'k'): 1,
 ('c', 'k', 'i'): 1,
 ('k', 'i', 'n'): 1,
 ('i', 'n', 't'): 1,
 ('n', 't', 'h'): 1,
 ('h', 'e', 'w'): 1,
 ('e', 'w', 'a'): 1,
 ('w', 'a', 'l'): 1,
 ('a', 'l', 'l'): 1,
 ('l', 'l', 'a'): 1,
 ('l', 'a', 'n'): 1,
 ('e', 'r', 'o'): 2,
 ('r', 'o', 'n'): 2,
 ('o', 'n', 'e'): 3,
 ('n', 'e', 'b'): 1,
 ('e', 'b', 'i'): 1,
 ('b', 'i', 't'): 1,
 ('i', 't', 'e'): 1,
 ('t', 'e', 's'): 1,
 ('e', 's', 't'): 1,
 ('s', 't', 'h'): 1,
 ('e', 'd', 'u'): 1,
 ('d', 'u', 's'): 1,
 ('u', 's', 't'): 1,
 ('s', 't', 'a'): 1,
 ('t', 'a', 'n'): 1,
 ('a', 'n', 'd'): 1,
 ('n', 'd', 'a'): 1,
 ('d', 'a', 'n'): 1,
 ('n', 'e', 'g'): 1,
 ('e', 'g', 'o'): 1,
 ('g', 'o', 'n'): 1}
[106]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    c.sliding_window(3),
    frequencies,
)
[106]:
{('t', 'h', 'e'): 7,
 ('h', 'e', 'd'): 3,
 ('e', 'd', 'o'): 1,
 ('d', 'o', 'o'): 3,
 ('o', 'o', 'd'): 2,
 ('o', 'd', 'o'): 2,
 ('o', 'o', 't'): 1,
 ('o', 't', 'h'): 4,
 ('e', 'd', 'a'): 1,
 ('d', 'a', 'h'): 3,
 ('a', 'h', 'd'): 2,
 ('h', 'd', 'a'): 2,
 ('a', 'h', 'e'): 1,
 ('h', 'e', 'v'): 1,
 ('e', 'v', 'e'): 2,
 ('v', 'e', 'r'): 2,
 ('e', 'r', 'y'): 2,
 ('r', 'y', 'b'): 1,
 ('y', 'b', 'r'): 1,
 ('b', 'r', 'e'): 1,
 ('r', 'e', 'a'): 1,
 ('e', 'a', 't'): 1,
 ('a', 't', 'h'): 1,
 ('t', 'h', 's'): 1,
 ('h', 's', 'h'): 1,
 ('s', 'h', 'e'): 2,
 ('h', 'e', 't'): 1,
 ('e', 't', 'a'): 1,
 ('t', 'a', 'k'): 1,
 ('a', 'k', 'e'): 2,
 ('k', 'e', 's'): 2,
 ('e', 's', 'e'): 1,
 ('s', 'e', 'v'): 1,
 ('r', 'y', 'm'): 1,
 ('y', 'm', 'o'): 1,
 ('m', 'o', 'v'): 1,
 ('o', 'v', 'e'): 1,
 ('v', 'e', 's'): 1,
 ('e', 's', 'h'): 1,
 ('h', 'e', 'm'): 1,
 ('e', 'm', 'a'): 1,
 ('m', 'a', 'k'): 1,
 ('e', 's', 'a'): 1,
 ('s', 'a', 'n'): 1,
 ('a', 'n', 'o'): 3,
 ('n', 'o', 't'): 3,
 ('h', 'e', 'r'): 3,
 ('e', 'r', 'b'): 1,
 ('r', 'b', 'r'): 1,
 ('b', 'r', 'i'): 1,
 ('r', 'i', 'c'): 1,
 ('i', 'c', 'k'): 1,
 ('c', 'k', 'i'): 1,
 ('k', 'i', 'n'): 1,
 ('i', 'n', 't'): 1,
 ('n', 't', 'h'): 1,
 ('h', 'e', 'w'): 1,
 ('e', 'w', 'a'): 1,
 ('w', 'a', 'l'): 1,
 ('a', 'l', 'l'): 1,
 ('l', 'l', 'a'): 1,
 ('l', 'a', 'n'): 1,
 ('e', 'r', 'o'): 2,
 ('r', 'o', 'n'): 2,
 ('o', 'n', 'e'): 3,
 ('n', 'e', 'b'): 1,
 ('e', 'b', 'i'): 1,
 ('b', 'i', 't'): 1,
 ('i', 't', 'e'): 1,
 ('t', 'e', 's'): 1,
 ('e', 's', 't'): 1,
 ('s', 't', 'h'): 1,
 ('e', 'd', 'u'): 1,
 ('d', 'u', 's'): 1,
 ('u', 's', 't'): 1,
 ('s', 't', 'a'): 1,
 ('t', 'a', 'n'): 1,
 ('a', 'n', 'd'): 1,
 ('n', 'd', 'a'): 1,
 ('d', 'a', 'n'): 1,
 ('n', 'e', 'g'): 1,
 ('e', 'g', 'o'): 1,
 ('g', 'o', 'n'): 1}
[107]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    c.sliding_window(3),
    frequencies,
)
[107]:
{('t', 'h', 'e'): 7,
 ('h', 'e', 'd'): 3,
 ('e', 'd', 'o'): 1,
 ('d', 'o', 'o'): 3,
 ('o', 'o', 'd'): 2,
 ('o', 'd', 'o'): 2,
 ('o', 'o', 't'): 1,
 ('o', 't', 'h'): 4,
 ('e', 'd', 'a'): 1,
 ('d', 'a', 'h'): 3,
 ('a', 'h', 'd'): 2,
 ('h', 'd', 'a'): 2,
 ('a', 'h', 'e'): 1,
 ('h', 'e', 'v'): 1,
 ('e', 'v', 'e'): 2,
 ('v', 'e', 'r'): 2,
 ('e', 'r', 'y'): 2,
 ('r', 'y', 'b'): 1,
 ('y', 'b', 'r'): 1,
 ('b', 'r', 'e'): 1,
 ('r', 'e', 'a'): 1,
 ('e', 'a', 't'): 1,
 ('a', 't', 'h'): 1,
 ('t', 'h', 's'): 1,
 ('h', 's', 'h'): 1,
 ('s', 'h', 'e'): 2,
 ('h', 'e', 't'): 1,
 ('e', 't', 'a'): 1,
 ('t', 'a', 'k'): 1,
 ('a', 'k', 'e'): 2,
 ('k', 'e', 's'): 2,
 ('e', 's', 'e'): 1,
 ('s', 'e', 'v'): 1,
 ('r', 'y', 'm'): 1,
 ('y', 'm', 'o'): 1,
 ('m', 'o', 'v'): 1,
 ('o', 'v', 'e'): 1,
 ('v', 'e', 's'): 1,
 ('e', 's', 'h'): 1,
 ('h', 'e', 'm'): 1,
 ('e', 'm', 'a'): 1,
 ('m', 'a', 'k'): 1,
 ('e', 's', 'a'): 1,
 ('s', 'a', 'n'): 1,
 ('a', 'n', 'o'): 3,
 ('n', 'o', 't'): 3,
 ('h', 'e', 'r'): 3,
 ('e', 'r', 'b'): 1,
 ('r', 'b', 'r'): 1,
 ('b', 'r', 'i'): 1,
 ('r', 'i', 'c'): 1,
 ('i', 'c', 'k'): 1,
 ('c', 'k', 'i'): 1,
 ('k', 'i', 'n'): 1,
 ('i', 'n', 't'): 1,
 ('n', 't', 'h'): 1,
 ('h', 'e', 'w'): 1,
 ('e', 'w', 'a'): 1,
 ('w', 'a', 'l'): 1,
 ('a', 'l', 'l'): 1,
 ('l', 'l', 'a'): 1,
 ('l', 'a', 'n'): 1,
 ('e', 'r', 'o'): 2,
 ('r', 'o', 'n'): 2,
 ('o', 'n', 'e'): 3,
 ('n', 'e', 'b'): 1,
 ('e', 'b', 'i'): 1,
 ('b', 'i', 't'): 1,
 ('i', 't', 'e'): 1,
 ('t', 'e', 's'): 1,
 ('e', 's', 't'): 1,
 ('s', 't', 'h'): 1,
 ('e', 'd', 'u'): 1,
 ('d', 'u', 's'): 1,
 ('u', 's', 't'): 1,
 ('s', 't', 'a'): 1,
 ('t', 'a', 'n'): 1,
 ('a', 'n', 'd'): 1,
 ('n', 'd', 'a'): 1,
 ('d', 'a', 'n'): 1,
 ('n', 'e', 'g'): 1,
 ('e', 'g', 'o'): 1,
 ('g', 'o', 'n'): 1}
[108]:
pipe(
    docs,
    list
)
[108]:
['the doo doo doo the dah dah dah',
 'every breath she takes, every move she makes',
 'another brick in the wall',
 'another one bites the dust and another one gone']
[109]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    list
)
[109]:
['the doo doo doo the dah dah dah',
 'every breath she takes every move she makes',
 'another brick in the wall',
 'another one bites the dust and another one gone']
[110]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    list
)
[110]:
[['the', 'doo', 'doo', 'doo', 'the', 'dah', 'dah', 'dah'],
 ['every', 'breath', 'she', 'takes', 'every', 'move', 'she', 'makes'],
 ['another', 'brick', 'in', 'the', 'wall'],
 ['another', 'one', 'bites', 'the', 'dust', 'and', 'another', 'one', 'gone']]
[111]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    list
)
[111]:
['the',
 'doo',
 'doo',
 'doo',
 'the',
 'dah',
 'dah',
 'dah',
 'every',
 'breath',
 'she',
 'takes',
 'every',
 'move',
 'she',
 'makes',
 'another',
 'brick',
 'in',
 'the',
 'wall',
 'another',
 'one',
 'bites',
 'the',
 'dust',
 'and',
 'another',
 'one',
 'gone']
[112]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    list
)
[112]:
['t',
 'h',
 'e',
 'd',
 'o',
 'o',
 'd',
 'o',
 'o',
 'd',
 'o',
 'o',
 't',
 'h',
 'e',
 'd',
 'a',
 'h',
 'd',
 'a',
 'h',
 'd',
 'a',
 'h',
 'e',
 'v',
 'e',
 'r',
 'y',
 'b',
 'r',
 'e',
 'a',
 't',
 'h',
 's',
 'h',
 'e',
 't',
 'a',
 'k',
 'e',
 's',
 'e',
 'v',
 'e',
 'r',
 'y',
 'm',
 'o',
 'v',
 'e',
 's',
 'h',
 'e',
 'm',
 'a',
 'k',
 'e',
 's',
 'a',
 'n',
 'o',
 't',
 'h',
 'e',
 'r',
 'b',
 'r',
 'i',
 'c',
 'k',
 'i',
 'n',
 't',
 'h',
 'e',
 'w',
 'a',
 'l',
 'l',
 'a',
 'n',
 'o',
 't',
 'h',
 'e',
 'r',
 'o',
 'n',
 'e',
 'b',
 'i',
 't',
 'e',
 's',
 't',
 'h',
 'e',
 'd',
 'u',
 's',
 't',
 'a',
 'n',
 'd',
 'a',
 'n',
 'o',
 't',
 'h',
 'e',
 'r',
 'o',
 'n',
 'e',
 'g',
 'o',
 'n',
 'e']
[113]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    c.sliding_window(3),
    list
)
[113]:
[('t', 'h', 'e'),
 ('h', 'e', 'd'),
 ('e', 'd', 'o'),
 ('d', 'o', 'o'),
 ('o', 'o', 'd'),
 ('o', 'd', 'o'),
 ('d', 'o', 'o'),
 ('o', 'o', 'd'),
 ('o', 'd', 'o'),
 ('d', 'o', 'o'),
 ('o', 'o', 't'),
 ('o', 't', 'h'),
 ('t', 'h', 'e'),
 ('h', 'e', 'd'),
 ('e', 'd', 'a'),
 ('d', 'a', 'h'),
 ('a', 'h', 'd'),
 ('h', 'd', 'a'),
 ('d', 'a', 'h'),
 ('a', 'h', 'd'),
 ('h', 'd', 'a'),
 ('d', 'a', 'h'),
 ('a', 'h', 'e'),
 ('h', 'e', 'v'),
 ('e', 'v', 'e'),
 ('v', 'e', 'r'),
 ('e', 'r', 'y'),
 ('r', 'y', 'b'),
 ('y', 'b', 'r'),
 ('b', 'r', 'e'),
 ('r', 'e', 'a'),
 ('e', 'a', 't'),
 ('a', 't', 'h'),
 ('t', 'h', 's'),
 ('h', 's', 'h'),
 ('s', 'h', 'e'),
 ('h', 'e', 't'),
 ('e', 't', 'a'),
 ('t', 'a', 'k'),
 ('a', 'k', 'e'),
 ('k', 'e', 's'),
 ('e', 's', 'e'),
 ('s', 'e', 'v'),
 ('e', 'v', 'e'),
 ('v', 'e', 'r'),
 ('e', 'r', 'y'),
 ('r', 'y', 'm'),
 ('y', 'm', 'o'),
 ('m', 'o', 'v'),
 ('o', 'v', 'e'),
 ('v', 'e', 's'),
 ('e', 's', 'h'),
 ('s', 'h', 'e'),
 ('h', 'e', 'm'),
 ('e', 'm', 'a'),
 ('m', 'a', 'k'),
 ('a', 'k', 'e'),
 ('k', 'e', 's'),
 ('e', 's', 'a'),
 ('s', 'a', 'n'),
 ('a', 'n', 'o'),
 ('n', 'o', 't'),
 ('o', 't', 'h'),
 ('t', 'h', 'e'),
 ('h', 'e', 'r'),
 ('e', 'r', 'b'),
 ('r', 'b', 'r'),
 ('b', 'r', 'i'),
 ('r', 'i', 'c'),
 ('i', 'c', 'k'),
 ('c', 'k', 'i'),
 ('k', 'i', 'n'),
 ('i', 'n', 't'),
 ('n', 't', 'h'),
 ('t', 'h', 'e'),
 ('h', 'e', 'w'),
 ('e', 'w', 'a'),
 ('w', 'a', 'l'),
 ('a', 'l', 'l'),
 ('l', 'l', 'a'),
 ('l', 'a', 'n'),
 ('a', 'n', 'o'),
 ('n', 'o', 't'),
 ('o', 't', 'h'),
 ('t', 'h', 'e'),
 ('h', 'e', 'r'),
 ('e', 'r', 'o'),
 ('r', 'o', 'n'),
 ('o', 'n', 'e'),
 ('n', 'e', 'b'),
 ('e', 'b', 'i'),
 ('b', 'i', 't'),
 ('i', 't', 'e'),
 ('t', 'e', 's'),
 ('e', 's', 't'),
 ('s', 't', 'h'),
 ('t', 'h', 'e'),
 ('h', 'e', 'd'),
 ('e', 'd', 'u'),
 ('d', 'u', 's'),
 ('u', 's', 't'),
 ('s', 't', 'a'),
 ('t', 'a', 'n'),
 ('a', 'n', 'd'),
 ('n', 'd', 'a'),
 ('d', 'a', 'n'),
 ('a', 'n', 'o'),
 ('n', 'o', 't'),
 ('o', 't', 'h'),
 ('t', 'h', 'e'),
 ('h', 'e', 'r'),
 ('e', 'r', 'o'),
 ('r', 'o', 'n'),
 ('o', 'n', 'e'),
 ('n', 'e', 'g'),
 ('e', 'g', 'o'),
 ('g', 'o', 'n'),
 ('o', 'n', 'e')]
[114]:
pipe(
    docs,
    c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
    c.map(lambda x: x.split()),
    concat,
    concat,
    c.sliding_window(3),
    frequencies
)
[114]:
{('t', 'h', 'e'): 7,
 ('h', 'e', 'd'): 3,
 ('e', 'd', 'o'): 1,
 ('d', 'o', 'o'): 3,
 ('o', 'o', 'd'): 2,
 ('o', 'd', 'o'): 2,
 ('o', 'o', 't'): 1,
 ('o', 't', 'h'): 4,
 ('e', 'd', 'a'): 1,
 ('d', 'a', 'h'): 3,
 ('a', 'h', 'd'): 2,
 ('h', 'd', 'a'): 2,
 ('a', 'h', 'e'): 1,
 ('h', 'e', 'v'): 1,
 ('e', 'v', 'e'): 2,
 ('v', 'e', 'r'): 2,
 ('e', 'r', 'y'): 2,
 ('r', 'y', 'b'): 1,
 ('y', 'b', 'r'): 1,
 ('b', 'r', 'e'): 1,
 ('r', 'e', 'a'): 1,
 ('e', 'a', 't'): 1,
 ('a', 't', 'h'): 1,
 ('t', 'h', 's'): 1,
 ('h', 's', 'h'): 1,
 ('s', 'h', 'e'): 2,
 ('h', 'e', 't'): 1,
 ('e', 't', 'a'): 1,
 ('t', 'a', 'k'): 1,
 ('a', 'k', 'e'): 2,
 ('k', 'e', 's'): 2,
 ('e', 's', 'e'): 1,
 ('s', 'e', 'v'): 1,
 ('r', 'y', 'm'): 1,
 ('y', 'm', 'o'): 1,
 ('m', 'o', 'v'): 1,
 ('o', 'v', 'e'): 1,
 ('v', 'e', 's'): 1,
 ('e', 's', 'h'): 1,
 ('h', 'e', 'm'): 1,
 ('e', 'm', 'a'): 1,
 ('m', 'a', 'k'): 1,
 ('e', 's', 'a'): 1,
 ('s', 'a', 'n'): 1,
 ('a', 'n', 'o'): 3,
 ('n', 'o', 't'): 3,
 ('h', 'e', 'r'): 3,
 ('e', 'r', 'b'): 1,
 ('r', 'b', 'r'): 1,
 ('b', 'r', 'i'): 1,
 ('r', 'i', 'c'): 1,
 ('i', 'c', 'k'): 1,
 ('c', 'k', 'i'): 1,
 ('k', 'i', 'n'): 1,
 ('i', 'n', 't'): 1,
 ('n', 't', 'h'): 1,
 ('h', 'e', 'w'): 1,
 ('e', 'w', 'a'): 1,
 ('w', 'a', 'l'): 1,
 ('a', 'l', 'l'): 1,
 ('l', 'l', 'a'): 1,
 ('l', 'a', 'n'): 1,
 ('e', 'r', 'o'): 2,
 ('r', 'o', 'n'): 2,
 ('o', 'n', 'e'): 3,
 ('n', 'e', 'b'): 1,
 ('e', 'b', 'i'): 1,
 ('b', 'i', 't'): 1,
 ('i', 't', 'e'): 1,
 ('t', 'e', 's'): 1,
 ('e', 's', 't'): 1,
 ('s', 't', 'h'): 1,
 ('e', 'd', 'u'): 1,
 ('d', 'u', 's'): 1,
 ('u', 's', 't'): 1,
 ('s', 't', 'a'): 1,
 ('t', 'a', 'n'): 1,
 ('a', 'n', 'd'): 1,
 ('n', 'd', 'a'): 1,
 ('d', 'a', 'n'): 1,
 ('n', 'e', 'g'): 1,
 ('e', 'g', 'o'): 1,
 ('g', 'o', 'n'): 1}
[ ]: