Functional programming in Python (operator, functional, itertoools, toolz)¶
Pure functions
Recursive functions
Anonymous functions
Lazy evaluation
Higher-order functions
Decorators
Partial application
Using
operator
Using
functional
Using
itertools
Pipelines with
toolz
In [1]:
import numpy as np
Pure functions¶
Deterministic¶
Pure
In [2]:
np.exp(5), np.exp(5)
Out[2]:
(148.4131591025766, 148.4131591025766)
Not pure
In [3]:
np.random.randn(), np.random.randn()
Out[3]:
(0.6256954394060825, 1.4702462450747522)
No side effects¶
Pure
In [4]:
def f(xs):
'''Modify value at first index.'''
if len(xs) > 0:
xs = list(xs)
xs[0] = '@'
return xs
In [5]:
xs = [1,2,3]
f(xs), xs
Out[5]:
(['@', 2, 3], [1, 2, 3])
Not pure
In [6]:
def g(xs):
'''Modify value at first index.'''
if len(xs) > 0:
xs[0] = '@'
return xs
In [7]:
xs = [1,2,3]
g(xs), xs
Out[7]:
(['@', 2, 3], ['@', 2, 3])
Exercise¶
Is the function h
pure or impure?
In [8]:
def h(n, xs=[]):
for i in range(n):
xs.append(i)
return xs
The function is not deterministic, and it has side effects!
In [9]:
n = 5
xs = [1,2,3]
Non-deterministic
In [10]:
h(n)
Out[10]:
[0, 1, 2, 3, 4]
In [11]:
h(n)
Out[11]:
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
To avoid non-determinism, do not set default mutable arguments. The usaal Python idiom is
def func(xs=None):
"""Docstring."""
if xs is None:
xs = []
do_something(xs)
Side effects
In [12]:
xs = [1,2,3]
In [13]:
h(n, xs)
Out[13]:
[1, 2, 3, 0, 1, 2, 3, 4]
In [14]:
xs
Out[14]:
[1, 2, 3, 0, 1, 2, 3, 4]
Recursive functions¶
A recursive function is one that calls itself. Python supports recursion, but recursive functions in Python are not efficient and iterative algorithms are preferred in general.
In [15]:
def rec_sum(xs):
"""Recursive sum."""
if len(xs) == 0:
return 0
else:
return xs[0] + rec_sum(xs[1:])
In [16]:
rec_sum([1,2,3,4])
Out[16]:
10
Anonymous functions¶
In [17]:
lambda x, y: x + y
Out[17]:
<function __main__.<lambda>(x, y)>
In [18]:
add = lambda x, y: x + y
In [19]:
add(3, 4)
Out[19]:
7
In [20]:
lambda x, y: x if x < y else y
Out[20]:
<function __main__.<lambda>(x, y)>
In [21]:
smaller = lambda x, y: x if x < y else y
In [22]:
smaller(9,1)
Out[22]:
1
Lazy evaluation¶
In [23]:
range(10)
Out[23]:
range(0, 10)
In [24]:
list(range(10))
Out[24]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Generators¶
Generators behave like functions that retain the last state and can be
re-entered. Results are returned with yield
rather than return
.
Generators are used extensively in Python, and almost exclusively in the
itertools
and toolz
packages we will review later in this
notebook.
Differences between a function and a generator¶
In [25]:
def fib_eager(n):
"""Eager Fibonacci function."""
xs = []
a, b = 1,1
for i in range(n):
xs.append(a)
a, b = b, a + b
return xs
In [26]:
fib_eager(10)
Out[26]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
In [27]:
def fib_lazy(n):
"""Lazy Fibonacci generator."""
a, b = 1,1
for i in range(n):
yield a
a, b = b, a + b
In [28]:
fib_lazy(10)
Out[28]:
<generator object fib_lazy at 0x103b668b8>
In [29]:
list(fib_lazy(10))
Out[29]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
In [30]:
fibs1 = fib_eager(10)
In [31]:
for i in fibs1:
print(i, end=',')
1,1,2,3,5,8,13,21,34,55,
In [32]:
for i in fibs1:
print(i, end=',')
1,1,2,3,5,8,13,21,34,55,
In [33]:
fibs2 = fib_lazy(10)
In [34]:
for i in fibs2:
print(i, end=',')
1,1,2,3,5,8,13,21,34,55,
In [35]:
for i in fibs2:
print(i, end=',')
Generators can return infinite sequences¶
In [36]:
def iota(start = 1):
"""An infinite incrementing genrator."""
while True:
yield start
start += 1
In [37]:
for i in iota():
print(i, end=',')
if i > 25:
break
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,
Higher order functions¶
Take a function as an argument
Returns a function
In [38]:
def dist(x, y, measure):
"""Returns distance between x and y using given measure.
measure is a function that takes two arguments x and y.
"""
return measure(x, y)
In [39]:
def euclid(x, y):
"""Returns Euclidean distance between x and y."""
return np.sqrt(np.sum(x**2 + y**2))
In [40]:
x = np.array([0,0])
y = np.array([3,4])
dist(x, y, measure=euclid)
Out[40]:
5.0
Standard HOFs¶
In [41]:
from functools import reduce
In [42]:
list(map(lambda x: x**2, range(10)))
Out[42]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In [43]:
list(filter(lambda x: x % 2 == 0, range(10)))
Out[43]:
[0, 2, 4, 6, 8]
In [44]:
reduce(lambda x, y: x + y, range(10))
Out[44]:
45
In [45]:
reduce(lambda x, y: x + y, range(10), 10)
Out[45]:
55
Example: Flattening a nested list¶
In [46]:
s1 = 'the quick brown fox'
s2 = 'jumps over the dog'
xs = [s.split() for s in [s1, s2]]
xs
Out[46]:
[['the', 'quick', 'brown', 'fox'], ['jumps', 'over', 'the', 'dog']]
Using a nested for loop
In [47]:
ys = []
for x in xs:
for y in x:
ys.append(y)
ys
Out[47]:
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']
Using a list comprehension
In [48]:
[y for x in xs for y in x]
Out[48]:
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']
Using reduce
In [49]:
reduce(lambda x, y: x + y, xs)
Out[49]:
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'dog']
Closure¶
In [50]:
def f(a):
"""The argument given to f is visible in g when g is called."""
def g(b):
return a + b
return g
In [51]:
g3 = f(3)
g5 = f(5)
In [52]:
g3(4), g5(4)
Out[52]:
(7, 9)
Decorators¶
A decorator is a higher-order function that takes a function as input
and returns a decorated version of the original function with
additional capabilities. There is syntactical sugar for decorators in
Python using the @decorator_function
notation as shown below.
In [53]:
import time
In [54]:
def timer(f):
"""Times how long f takes."""
def g(*args, **kwargs):
start = time.time()
res = f(*args, **kwargs)
elapsed = time.time() - start
return res, elapsed
return g
In [55]:
def slow(n=1):
time.sleep(n)
return n
Use as a function
In [56]:
slow_t1 = timer(slow)
In [57]:
slow_t1(0.5)
Out[57]:
(0.5, 0.5016078948974609)
Use as a decorator
In [58]:
@timer
def slow_t2(n=1):
time.sleep(n)
return n
In [59]:
slow_t2(0.5)
Out[59]:
(0.5, 0.5036330223083496)
Partial application¶
Partial application takes a function with two or more parameters, and returns the function with some parameters filled in. Partial application is very useful when constructing pipelines that transform data in stages.
In [60]:
from functools import partial
In [61]:
def add(a, b):
"""Add a and b."""
return a + b
In [62]:
list(map(partial(add, b=10), range(5)))
Out[62]:
[10, 11, 12, 13, 14]
Using operator
¶
Operator provides named version of Python operators, and is typically used in place of anonymous functions in higher order functions to improve readability.
In [63]:
import operator as op
In [64]:
xs = [('a', 3), ('b', 1), ('c', 2)]
In [65]:
sorted(xs, key=op.itemgetter(1))
Out[65]:
[('b', 1), ('c', 2), ('a', 3)]
In [66]:
sorted(xs, key=lambda x: x[1])
Out[66]:
[('b', 1), ('c', 2), ('a', 3)]
In [67]:
reduce(op.add, range(1,5))
Out[67]:
10
In [68]:
reduce(lambda a, b: a + b, range(1,5))
Out[68]:
10
Using functional
¶
We have already seen the use of reduce
and partial
from
functools
. Another useful function from the package is the
lrucache
decorator.
In [69]:
def fib(n):
"""Recursive version of Fibonacci."""
print('Call fib(%d)' % n)
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-2) + fib(n-1)
Notice the inefficiency from repeatedly calling the function with the same arguments.
In [70]:
fib(6)
Call fib(6)
Call fib(4)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(3)
Call fib(1)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(5)
Call fib(3)
Call fib(1)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(4)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(3)
Call fib(1)
Call fib(2)
Call fib(0)
Call fib(1)
Out[70]:
8
In [71]:
from functools import lru_cache
In [72]:
@lru_cache(maxsize=None)
def fib_cache(n):
"""Recursive version of Fibonacci."""
print('Call fib(%d)' % n)
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib_cache(n-2) + fib_cache(n-1)
In [73]:
fib_cache(6)
Call fib(6)
Call fib(4)
Call fib(2)
Call fib(0)
Call fib(1)
Call fib(3)
Call fib(5)
Out[73]:
8
Using itertools
¶
The itertools
package provides tools for efficient looping.
In [74]:
import itertools as it
Generators¶
In [75]:
list(it.islice(it.count(), 10))
Out[75]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [76]:
list(it.islice(it.cycle([1,2,3]), 10))
Out[76]:
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
In [77]:
list(it.islice(it.repeat([1,2,3]), 10))
Out[77]:
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
In [78]:
list(it.zip_longest(range(5), 'abc'))
Out[78]:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, None), (4, None)]
In [79]:
list(it.zip_longest(range(5), 'abc', fillvalue='out of donuts'))
Out[79]:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'out of donuts'), (4, 'out of donuts')]
Permutations and combinations¶
In [80]:
list(it.permutations('abc'))
Out[80]:
[('a', 'b', 'c'),
('a', 'c', 'b'),
('b', 'a', 'c'),
('b', 'c', 'a'),
('c', 'a', 'b'),
('c', 'b', 'a')]
In [81]:
list(it.permutations('abc', 2))
Out[81]:
[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
In [82]:
list(it.combinations('abc', 2))
Out[82]:
[('a', 'b'), ('a', 'c'), ('b', 'c')]
In [83]:
list(it.combinations_with_replacement('abc', 2))
Out[83]:
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
In [84]:
list(it.product('abc', repeat=2))
Out[84]:
[('a', 'a'),
('a', 'b'),
('a', 'c'),
('b', 'a'),
('b', 'b'),
('b', 'c'),
('c', 'a'),
('c', 'b'),
('c', 'c')]
Miscellaneous¶
In [85]:
list(it.chain([1,2,3], [4,5,6], [7,8,9]))
Out[85]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
In [86]:
list(it.chain.from_iterable([[1,2,3], [4,5,6], [7,8,9]]))
Out[86]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
In [87]:
nums = [1,3,5,7,2,4,6,1,3,5,7,9,2,2,2]
In [88]:
for i, g in it.groupby(nums, key=lambda x: x % 2 ==0):
print(i, list(g))
False [1, 3, 5, 7]
True [2, 4, 6]
False [1, 3, 5, 7, 9]
True [2, 2, 2]
In [89]:
list(it.takewhile(lambda x: x % 2 == 1, nums))
Out[89]:
[1, 3, 5, 7]
In [90]:
list(it.dropwhile(lambda x: x % 2 == 1, nums))
Out[90]:
[2, 4, 6, 1, 3, 5, 7, 9, 2, 2, 2]
Using toolz
¶
The toolz
package provides a very rich set of functional operators,
and is recommended if you want to program in the functional style using
Python.
In [91]:
import toolz
In [92]:
from toolz import sliding_window, pipe, frequencies, concat
In [93]:
import toolz.curried as c
Example: Read in some documents, remove punctuation, split into words and then into individual characters, and find the most commonly occurring sliding windows containing 3 characters.
This is most naturally done by piping in a data set through a series of transforms.
In [94]:
d1 = 'the doo doo doo the dah dah dah'
d2 = 'every breath she takes, every move she makes'
d3 = 'another brick in the wall'
d4 = 'another one bites the dust and another one gone'
docs = [d1,d2,d3,d4]
In [95]:
import string
In [96]:
triple_freqs = pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
c.sliding_window(3),
frequencies,
)
In [97]:
sorted(triple_freqs.items(), key=lambda x: x[1], reverse=True)[:5]
Out[97]:
[(('t', 'h', 'e'), 7),
(('o', 't', 'h'), 4),
(('h', 'e', 'd'), 3),
(('d', 'o', 'o'), 3),
(('d', 'a', 'h'), 3)]
Step by step¶
In [98]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
c.sliding_window(3),
frequencies,
)
Out[98]:
{('t', 'h', 'e'): 7,
('h', 'e', 'd'): 3,
('e', 'd', 'o'): 1,
('d', 'o', 'o'): 3,
('o', 'o', 'd'): 2,
('o', 'd', 'o'): 2,
('o', 'o', 't'): 1,
('o', 't', 'h'): 4,
('e', 'd', 'a'): 1,
('d', 'a', 'h'): 3,
('a', 'h', 'd'): 2,
('h', 'd', 'a'): 2,
('a', 'h', 'e'): 1,
('h', 'e', 'v'): 1,
('e', 'v', 'e'): 2,
('v', 'e', 'r'): 2,
('e', 'r', 'y'): 2,
('r', 'y', 'b'): 1,
('y', 'b', 'r'): 1,
('b', 'r', 'e'): 1,
('r', 'e', 'a'): 1,
('e', 'a', 't'): 1,
('a', 't', 'h'): 1,
('t', 'h', 's'): 1,
('h', 's', 'h'): 1,
('s', 'h', 'e'): 2,
('h', 'e', 't'): 1,
('e', 't', 'a'): 1,
('t', 'a', 'k'): 1,
('a', 'k', 'e'): 2,
('k', 'e', 's'): 2,
('e', 's', 'e'): 1,
('s', 'e', 'v'): 1,
('r', 'y', 'm'): 1,
('y', 'm', 'o'): 1,
('m', 'o', 'v'): 1,
('o', 'v', 'e'): 1,
('v', 'e', 's'): 1,
('e', 's', 'h'): 1,
('h', 'e', 'm'): 1,
('e', 'm', 'a'): 1,
('m', 'a', 'k'): 1,
('e', 's', 'a'): 1,
('s', 'a', 'n'): 1,
('a', 'n', 'o'): 3,
('n', 'o', 't'): 3,
('h', 'e', 'r'): 3,
('e', 'r', 'b'): 1,
('r', 'b', 'r'): 1,
('b', 'r', 'i'): 1,
('r', 'i', 'c'): 1,
('i', 'c', 'k'): 1,
('c', 'k', 'i'): 1,
('k', 'i', 'n'): 1,
('i', 'n', 't'): 1,
('n', 't', 'h'): 1,
('h', 'e', 'w'): 1,
('e', 'w', 'a'): 1,
('w', 'a', 'l'): 1,
('a', 'l', 'l'): 1,
('l', 'l', 'a'): 1,
('l', 'a', 'n'): 1,
('e', 'r', 'o'): 2,
('r', 'o', 'n'): 2,
('o', 'n', 'e'): 3,
('n', 'e', 'b'): 1,
('e', 'b', 'i'): 1,
('b', 'i', 't'): 1,
('i', 't', 'e'): 1,
('t', 'e', 's'): 1,
('e', 's', 't'): 1,
('s', 't', 'h'): 1,
('e', 'd', 'u'): 1,
('d', 'u', 's'): 1,
('u', 's', 't'): 1,
('s', 't', 'a'): 1,
('t', 'a', 'n'): 1,
('a', 'n', 'd'): 1,
('n', 'd', 'a'): 1,
('d', 'a', 'n'): 1,
('n', 'e', 'g'): 1,
('e', 'g', 'o'): 1,
('g', 'o', 'n'): 1}
In [99]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
c.sliding_window(3),
frequencies,
)
Out[99]:
{('t', 'h', 'e'): 7,
('h', 'e', 'd'): 3,
('e', 'd', 'o'): 1,
('d', 'o', 'o'): 3,
('o', 'o', 'd'): 2,
('o', 'd', 'o'): 2,
('o', 'o', 't'): 1,
('o', 't', 'h'): 4,
('e', 'd', 'a'): 1,
('d', 'a', 'h'): 3,
('a', 'h', 'd'): 2,
('h', 'd', 'a'): 2,
('a', 'h', 'e'): 1,
('h', 'e', 'v'): 1,
('e', 'v', 'e'): 2,
('v', 'e', 'r'): 2,
('e', 'r', 'y'): 2,
('r', 'y', 'b'): 1,
('y', 'b', 'r'): 1,
('b', 'r', 'e'): 1,
('r', 'e', 'a'): 1,
('e', 'a', 't'): 1,
('a', 't', 'h'): 1,
('t', 'h', 's'): 1,
('h', 's', 'h'): 1,
('s', 'h', 'e'): 2,
('h', 'e', 't'): 1,
('e', 't', 'a'): 1,
('t', 'a', 'k'): 1,
('a', 'k', 'e'): 2,
('k', 'e', 's'): 2,
('e', 's', 'e'): 1,
('s', 'e', 'v'): 1,
('r', 'y', 'm'): 1,
('y', 'm', 'o'): 1,
('m', 'o', 'v'): 1,
('o', 'v', 'e'): 1,
('v', 'e', 's'): 1,
('e', 's', 'h'): 1,
('h', 'e', 'm'): 1,
('e', 'm', 'a'): 1,
('m', 'a', 'k'): 1,
('e', 's', 'a'): 1,
('s', 'a', 'n'): 1,
('a', 'n', 'o'): 3,
('n', 'o', 't'): 3,
('h', 'e', 'r'): 3,
('e', 'r', 'b'): 1,
('r', 'b', 'r'): 1,
('b', 'r', 'i'): 1,
('r', 'i', 'c'): 1,
('i', 'c', 'k'): 1,
('c', 'k', 'i'): 1,
('k', 'i', 'n'): 1,
('i', 'n', 't'): 1,
('n', 't', 'h'): 1,
('h', 'e', 'w'): 1,
('e', 'w', 'a'): 1,
('w', 'a', 'l'): 1,
('a', 'l', 'l'): 1,
('l', 'l', 'a'): 1,
('l', 'a', 'n'): 1,
('e', 'r', 'o'): 2,
('r', 'o', 'n'): 2,
('o', 'n', 'e'): 3,
('n', 'e', 'b'): 1,
('e', 'b', 'i'): 1,
('b', 'i', 't'): 1,
('i', 't', 'e'): 1,
('t', 'e', 's'): 1,
('e', 's', 't'): 1,
('s', 't', 'h'): 1,
('e', 'd', 'u'): 1,
('d', 'u', 's'): 1,
('u', 's', 't'): 1,
('s', 't', 'a'): 1,
('t', 'a', 'n'): 1,
('a', 'n', 'd'): 1,
('n', 'd', 'a'): 1,
('d', 'a', 'n'): 1,
('n', 'e', 'g'): 1,
('e', 'g', 'o'): 1,
('g', 'o', 'n'): 1}
In [100]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
c.sliding_window(3),
frequencies,
)
Out[100]:
{('t', 'h', 'e'): 7,
('h', 'e', 'd'): 3,
('e', 'd', 'o'): 1,
('d', 'o', 'o'): 3,
('o', 'o', 'd'): 2,
('o', 'd', 'o'): 2,
('o', 'o', 't'): 1,
('o', 't', 'h'): 4,
('e', 'd', 'a'): 1,
('d', 'a', 'h'): 3,
('a', 'h', 'd'): 2,
('h', 'd', 'a'): 2,
('a', 'h', 'e'): 1,
('h', 'e', 'v'): 1,
('e', 'v', 'e'): 2,
('v', 'e', 'r'): 2,
('e', 'r', 'y'): 2,
('r', 'y', 'b'): 1,
('y', 'b', 'r'): 1,
('b', 'r', 'e'): 1,
('r', 'e', 'a'): 1,
('e', 'a', 't'): 1,
('a', 't', 'h'): 1,
('t', 'h', 's'): 1,
('h', 's', 'h'): 1,
('s', 'h', 'e'): 2,
('h', 'e', 't'): 1,
('e', 't', 'a'): 1,
('t', 'a', 'k'): 1,
('a', 'k', 'e'): 2,
('k', 'e', 's'): 2,
('e', 's', 'e'): 1,
('s', 'e', 'v'): 1,
('r', 'y', 'm'): 1,
('y', 'm', 'o'): 1,
('m', 'o', 'v'): 1,
('o', 'v', 'e'): 1,
('v', 'e', 's'): 1,
('e', 's', 'h'): 1,
('h', 'e', 'm'): 1,
('e', 'm', 'a'): 1,
('m', 'a', 'k'): 1,
('e', 's', 'a'): 1,
('s', 'a', 'n'): 1,
('a', 'n', 'o'): 3,
('n', 'o', 't'): 3,
('h', 'e', 'r'): 3,
('e', 'r', 'b'): 1,
('r', 'b', 'r'): 1,
('b', 'r', 'i'): 1,
('r', 'i', 'c'): 1,
('i', 'c', 'k'): 1,
('c', 'k', 'i'): 1,
('k', 'i', 'n'): 1,
('i', 'n', 't'): 1,
('n', 't', 'h'): 1,
('h', 'e', 'w'): 1,
('e', 'w', 'a'): 1,
('w', 'a', 'l'): 1,
('a', 'l', 'l'): 1,
('l', 'l', 'a'): 1,
('l', 'a', 'n'): 1,
('e', 'r', 'o'): 2,
('r', 'o', 'n'): 2,
('o', 'n', 'e'): 3,
('n', 'e', 'b'): 1,
('e', 'b', 'i'): 1,
('b', 'i', 't'): 1,
('i', 't', 'e'): 1,
('t', 'e', 's'): 1,
('e', 's', 't'): 1,
('s', 't', 'h'): 1,
('e', 'd', 'u'): 1,
('d', 'u', 's'): 1,
('u', 's', 't'): 1,
('s', 't', 'a'): 1,
('t', 'a', 'n'): 1,
('a', 'n', 'd'): 1,
('n', 'd', 'a'): 1,
('d', 'a', 'n'): 1,
('n', 'e', 'g'): 1,
('e', 'g', 'o'): 1,
('g', 'o', 'n'): 1}
In [101]:
pipe(
docs,
list
)
Out[101]:
['the doo doo doo the dah dah dah',
'every breath she takes, every move she makes',
'another brick in the wall',
'another one bites the dust and another one gone']
In [102]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
list
)
Out[102]:
['the doo doo doo the dah dah dah',
'every breath she takes every move she makes',
'another brick in the wall',
'another one bites the dust and another one gone']
In [103]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
list
)
Out[103]:
[['the', 'doo', 'doo', 'doo', 'the', 'dah', 'dah', 'dah'],
['every', 'breath', 'she', 'takes', 'every', 'move', 'she', 'makes'],
['another', 'brick', 'in', 'the', 'wall'],
['another', 'one', 'bites', 'the', 'dust', 'and', 'another', 'one', 'gone']]
In [104]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
list
)
Out[104]:
['the',
'doo',
'doo',
'doo',
'the',
'dah',
'dah',
'dah',
'every',
'breath',
'she',
'takes',
'every',
'move',
'she',
'makes',
'another',
'brick',
'in',
'the',
'wall',
'another',
'one',
'bites',
'the',
'dust',
'and',
'another',
'one',
'gone']
In [105]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
list
)
Out[105]:
['t',
'h',
'e',
'd',
'o',
'o',
'd',
'o',
'o',
'd',
'o',
'o',
't',
'h',
'e',
'd',
'a',
'h',
'd',
'a',
'h',
'd',
'a',
'h',
'e',
'v',
'e',
'r',
'y',
'b',
'r',
'e',
'a',
't',
'h',
's',
'h',
'e',
't',
'a',
'k',
'e',
's',
'e',
'v',
'e',
'r',
'y',
'm',
'o',
'v',
'e',
's',
'h',
'e',
'm',
'a',
'k',
'e',
's',
'a',
'n',
'o',
't',
'h',
'e',
'r',
'b',
'r',
'i',
'c',
'k',
'i',
'n',
't',
'h',
'e',
'w',
'a',
'l',
'l',
'a',
'n',
'o',
't',
'h',
'e',
'r',
'o',
'n',
'e',
'b',
'i',
't',
'e',
's',
't',
'h',
'e',
'd',
'u',
's',
't',
'a',
'n',
'd',
'a',
'n',
'o',
't',
'h',
'e',
'r',
'o',
'n',
'e',
'g',
'o',
'n',
'e']
In [106]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
c.sliding_window(3),
list
)
Out[106]:
[('t', 'h', 'e'),
('h', 'e', 'd'),
('e', 'd', 'o'),
('d', 'o', 'o'),
('o', 'o', 'd'),
('o', 'd', 'o'),
('d', 'o', 'o'),
('o', 'o', 'd'),
('o', 'd', 'o'),
('d', 'o', 'o'),
('o', 'o', 't'),
('o', 't', 'h'),
('t', 'h', 'e'),
('h', 'e', 'd'),
('e', 'd', 'a'),
('d', 'a', 'h'),
('a', 'h', 'd'),
('h', 'd', 'a'),
('d', 'a', 'h'),
('a', 'h', 'd'),
('h', 'd', 'a'),
('d', 'a', 'h'),
('a', 'h', 'e'),
('h', 'e', 'v'),
('e', 'v', 'e'),
('v', 'e', 'r'),
('e', 'r', 'y'),
('r', 'y', 'b'),
('y', 'b', 'r'),
('b', 'r', 'e'),
('r', 'e', 'a'),
('e', 'a', 't'),
('a', 't', 'h'),
('t', 'h', 's'),
('h', 's', 'h'),
('s', 'h', 'e'),
('h', 'e', 't'),
('e', 't', 'a'),
('t', 'a', 'k'),
('a', 'k', 'e'),
('k', 'e', 's'),
('e', 's', 'e'),
('s', 'e', 'v'),
('e', 'v', 'e'),
('v', 'e', 'r'),
('e', 'r', 'y'),
('r', 'y', 'm'),
('y', 'm', 'o'),
('m', 'o', 'v'),
('o', 'v', 'e'),
('v', 'e', 's'),
('e', 's', 'h'),
('s', 'h', 'e'),
('h', 'e', 'm'),
('e', 'm', 'a'),
('m', 'a', 'k'),
('a', 'k', 'e'),
('k', 'e', 's'),
('e', 's', 'a'),
('s', 'a', 'n'),
('a', 'n', 'o'),
('n', 'o', 't'),
('o', 't', 'h'),
('t', 'h', 'e'),
('h', 'e', 'r'),
('e', 'r', 'b'),
('r', 'b', 'r'),
('b', 'r', 'i'),
('r', 'i', 'c'),
('i', 'c', 'k'),
('c', 'k', 'i'),
('k', 'i', 'n'),
('i', 'n', 't'),
('n', 't', 'h'),
('t', 'h', 'e'),
('h', 'e', 'w'),
('e', 'w', 'a'),
('w', 'a', 'l'),
('a', 'l', 'l'),
('l', 'l', 'a'),
('l', 'a', 'n'),
('a', 'n', 'o'),
('n', 'o', 't'),
('o', 't', 'h'),
('t', 'h', 'e'),
('h', 'e', 'r'),
('e', 'r', 'o'),
('r', 'o', 'n'),
('o', 'n', 'e'),
('n', 'e', 'b'),
('e', 'b', 'i'),
('b', 'i', 't'),
('i', 't', 'e'),
('t', 'e', 's'),
('e', 's', 't'),
('s', 't', 'h'),
('t', 'h', 'e'),
('h', 'e', 'd'),
('e', 'd', 'u'),
('d', 'u', 's'),
('u', 's', 't'),
('s', 't', 'a'),
('t', 'a', 'n'),
('a', 'n', 'd'),
('n', 'd', 'a'),
('d', 'a', 'n'),
('a', 'n', 'o'),
('n', 'o', 't'),
('o', 't', 'h'),
('t', 'h', 'e'),
('h', 'e', 'r'),
('e', 'r', 'o'),
('r', 'o', 'n'),
('o', 'n', 'e'),
('n', 'e', 'g'),
('e', 'g', 'o'),
('g', 'o', 'n'),
('o', 'n', 'e')]
In [107]:
pipe(
docs,
c.map(lambda x: x.translate(str.maketrans('', '', string.punctuation))),
c.map(lambda x: x.split()),
concat,
concat,
c.sliding_window(3),
frequencies
)
Out[107]:
{('t', 'h', 'e'): 7,
('h', 'e', 'd'): 3,
('e', 'd', 'o'): 1,
('d', 'o', 'o'): 3,
('o', 'o', 'd'): 2,
('o', 'd', 'o'): 2,
('o', 'o', 't'): 1,
('o', 't', 'h'): 4,
('e', 'd', 'a'): 1,
('d', 'a', 'h'): 3,
('a', 'h', 'd'): 2,
('h', 'd', 'a'): 2,
('a', 'h', 'e'): 1,
('h', 'e', 'v'): 1,
('e', 'v', 'e'): 2,
('v', 'e', 'r'): 2,
('e', 'r', 'y'): 2,
('r', 'y', 'b'): 1,
('y', 'b', 'r'): 1,
('b', 'r', 'e'): 1,
('r', 'e', 'a'): 1,
('e', 'a', 't'): 1,
('a', 't', 'h'): 1,
('t', 'h', 's'): 1,
('h', 's', 'h'): 1,
('s', 'h', 'e'): 2,
('h', 'e', 't'): 1,
('e', 't', 'a'): 1,
('t', 'a', 'k'): 1,
('a', 'k', 'e'): 2,
('k', 'e', 's'): 2,
('e', 's', 'e'): 1,
('s', 'e', 'v'): 1,
('r', 'y', 'm'): 1,
('y', 'm', 'o'): 1,
('m', 'o', 'v'): 1,
('o', 'v', 'e'): 1,
('v', 'e', 's'): 1,
('e', 's', 'h'): 1,
('h', 'e', 'm'): 1,
('e', 'm', 'a'): 1,
('m', 'a', 'k'): 1,
('e', 's', 'a'): 1,
('s', 'a', 'n'): 1,
('a', 'n', 'o'): 3,
('n', 'o', 't'): 3,
('h', 'e', 'r'): 3,
('e', 'r', 'b'): 1,
('r', 'b', 'r'): 1,
('b', 'r', 'i'): 1,
('r', 'i', 'c'): 1,
('i', 'c', 'k'): 1,
('c', 'k', 'i'): 1,
('k', 'i', 'n'): 1,
('i', 'n', 't'): 1,
('n', 't', 'h'): 1,
('h', 'e', 'w'): 1,
('e', 'w', 'a'): 1,
('w', 'a', 'l'): 1,
('a', 'l', 'l'): 1,
('l', 'l', 'a'): 1,
('l', 'a', 'n'): 1,
('e', 'r', 'o'): 2,
('r', 'o', 'n'): 2,
('o', 'n', 'e'): 3,
('n', 'e', 'b'): 1,
('e', 'b', 'i'): 1,
('b', 'i', 't'): 1,
('i', 't', 'e'): 1,
('t', 'e', 's'): 1,
('e', 's', 't'): 1,
('s', 't', 'h'): 1,
('e', 'd', 'u'): 1,
('d', 'u', 's'): 1,
('u', 's', 't'): 1,
('s', 't', 'a'): 1,
('t', 'a', 'n'): 1,
('a', 'n', 'd'): 1,
('n', 'd', 'a'): 1,
('d', 'a', 'n'): 1,
('n', 'e', 'g'): 1,
('e', 'g', 'o'): 1,
('g', 'o', 'n'): 1}
In [ ]: