Python: Functions

To a large extent, data analysis consists of a sequence of data transformations using functions. Given the centrality of functions in this course, the notebook goes into some depth into Python function construction and usage.

Built-in functions

See Python docs

In [1]:
xs = ['apple', 'pear', 'grape', 'orange', 'rambutan', 'durian', 'longan', 'mango']
In [2]:
sorted(xs)
Out[2]:
['apple', 'durian', 'grape', 'longan', 'mango', 'orange', 'pear', 'rambutan']
In [3]:
sorted(xs, key=len)
Out[3]:
['pear', 'apple', 'grape', 'mango', 'orange', 'durian', 'longan', 'rambutan']
In [4]:
sorted(xs, key=len, reverse=True)
Out[4]:
['rambutan', 'orange', 'durian', 'longan', 'apple', 'grape', 'mango', 'pear']
In [5]:
max(xs)
Out[5]:
'rambutan'
In [6]:
max(xs, key=len)
Out[6]:
'rambutan'
In [7]:
min(xs)
Out[7]:
'apple'
In [8]:
min(xs, key=len)
Out[8]:
'pear'
In [9]:
sum(map(len, xs))
Out[9]:
45

Custom functions

It is very simple to write a function. The docstring (a triple quoted string with 1 or more lines of function documentation is not mandatory but highly recommended).

In [10]:
def power(x, n=2):
    """Returns x to the nth power.

    n has a default value of 2."""

    return x**n
In [11]:
help(power)
Help on function power in module __main__:

power(x, n=2)
    Returns x to the nth power.

    n has a default value of 2.

Use default value for exponent

In [12]:
power(3)
Out[12]:
9

Give explicit exponent value

In [13]:
power(3, 4)
Out[13]:
81

Give named arguments out of order

In [14]:
power(n=0.5, x=3)
Out[14]:
1.7320508075688772

With arbitrary arguments

In [15]:
def f(a, b, *args, **kwargs):
    """Example to illustrate use * and ** arguments."""
    return a, b, args, kwargs
In [16]:
a, b, args, kwargs = f(1, 2, 3, 4, 5, x=10, y=11, z = 13)
In [17]:
a, b
Out[17]:
(1, 2)
In [18]:
args
Out[18]:
(3, 4, 5)
In [19]:
kwargs
Out[19]:
{'x': 10, 'y': 11, 'z': 13}

Required keyword arguments

In [20]:
def f1(a, b, *, c, d):
    """c and d MUST be given as keyword arguments."""
    return a, b, c, d
In [21]:
f1(1, 2, c=3, d=4)
Out[21]:
(1, 2, 3, 4)
In [22]:
try:
    f1(1, 2, 3, 4)
except Exception as e:
    print(e)
f1() takes 2 positional arguments but 4 were given
In [23]:
def f2(a, b, *args, c, d, **kwargs):
    """Combining requyired and optional arguments."""
    return a, b, c, d, args, kwargs
In [24]:
a, b, c, d, args, kwargs = f2(1, 2, 3, 4, c=5, d=6, e=7, f=8)
In [25]:
a, b,
Out[25]:
(1, 2)
In [26]:
c, d
Out[26]:
(5, 6)
In [27]:
args
Out[27]:
(3, 4)
In [28]:
kwargs
Out[28]:
{'e': 7, 'f': 8}
In [29]:
try:
    f2(1, 2, 3, 4, 5, 6, e=7, f=8)
except Exception as e:
    print(e)
f2() missing 2 required keyword-only arguments: 'c' and 'd'

All arguments are keyword only

In [30]:
def f3(*, a, b, c):
    """a, b and c  must all be given as keyword arguments."""
    return a, b, c
In [31]:
f3(c=1, b=2, a=4)
Out[31]:
(4, 2, 1)
In [32]:
try:
    f3(4, 2, 1)
except Exception as e:
    print(e)
f3() takes 0 positional arguments but 3 were given

Expanding function arguments

In [33]:
def f4(a, b, c, d):
    return a, b, c, d
In [34]:
a = 1
bc = [2, 3]
d = 4
In [35]:
f4(a, *bc, d)
Out[35]:
(1, 2, 3, 4)

Function annotations

You can indicate the type of the arguments and return values using function annotations. Python itself does not do anything with these except put them in a dictionary under the __annotations__ attribute, but 3rd party packages may use them if present. You will not be expected to use function annotations in this course.

In [36]:
def power2(x : float, n : dict(type=float, help='exponent') =2) -> float:
    """Returns x to the nth power.

    n has a default value of 2."""

    return x**n
In [37]:
power2.__annotations__
Out[37]:
{'n': {'help': 'exponent', 'type': float}, 'return': float, 'x': float}

Lambda functions

Also known as anonymous functions. These are often used to construct one-use-only short functions for higher order functions such as map, filter and reduce.

In [38]:
power2 = lambda x, n=2: x**n
In [39]:
power2(3)
Out[39]:
9
In [40]:
power2(3, 4)
Out[40]:
81

Recursive functions

A recursive function is one that calls itself. Python is not optimized for such functions and may crash if the recursion goes too deep. There is always a non-recursive version for any recursive algorithm that should be used instead, but this may not be obvious. Unlike functional languages with tail call optimization, recursive functions are rarely used in Python as they are usually slower and consume more memory than the equivalent non-recursive version.

Recursive functions consist of

  • a base case which terminates the computation
  • a recursive call that MUST eventually end up in the base case
In [41]:
def factorial_1(n):
    """Recursive factorial function."""

    if n == 0:
        return 1
    else:
        return n * factorial_1(n-1)
In [42]:
factorial_1(50)
Out[42]:
30414093201713378043612608166064768844377641568960512000000000000
In [43]:
from functools import reduce
In [44]:
def factorial_2(n):
    """Non-recursive version."""
    return reduce(lambda a, b: a*b, range(1, n+1))
In [45]:
factorial_2(50)
Out[45]:
30414093201713378043612608166064768844377641568960512000000000000

The non-recursive version is usually more time and memory efficient

In [46]:
%timeit factorial_1(50)
14 µs ± 89.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [47]:
%timeit factorial_2(50)
10.6 µs ± 71.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Higher order functions

A higher order function is a function that takes another function as an argument or returns a function. Since functions are “first class” in Python, they can be treated in the same way as any other value. In particular, they can be used as function arguments and/or as return values.

Classical higher order functions are map, filter and reduce, here presented with more Pythonic versions using built-ins and comprehensions. However, map, filter and reduce are still important because often parallel and distributed code is most efficiently coded using these functional operators.

Map

In [48]:
list(map(lambda x: x**3, range(5)))
Out[48]:
[0, 1, 8, 27, 64]
In [49]:
[x**3 for x in range(5)]
Out[49]:
[0, 1, 8, 27, 64]

Filter

In [50]:
list(filter(lambda x: x % 2 == 0,
           range(5)))
Out[50]:
[0, 2, 4]
In [51]:
[x for x in range(5) if x % 2 == 0]
Out[51]:
[0, 2, 4]

Reduce

In [52]:
from functools import reduce
In [53]:
reduce(lambda x, y: x+y,
       map(lambda x: x**3, range(5)))
Out[53]:
100
In [54]:
sum([x**3 for x in range(5)])
Out[54]:
100

Using the operator module

The operator modules provides function equivalents for all of Python’s operators that are convenient for use in higher order functions.

In [55]:
import operator as op
In [56]:
reduce(op.add, map(lambda x: x**3, range(5)))
Out[56]:
100

Custom function taking function arguments

In [57]:
def f(a, b, g, h):
    """Function taking functions g and h as arguments."""
    return g(a) + h(b)
In [58]:
f(2, 2, lambda x: x**2, lambda x: x**3)
Out[58]:
12
In [59]:
f('abc', -2, len, abs)
Out[59]:
5

Functions returning functions

The partial function takes a function as argument and returns another function

In [60]:
from functools import partial
In [61]:
def f1(a, b, c):
    """A function with 3 arguments."""
    return a, b, c
In [62]:
f1(1, 2, 3)
Out[62]:
(1, 2, 3)

f2 takes a single argument since b and c have been given values by partial

In [63]:
f2 = partial(f1, b=12, c=13)
In [64]:
f2(11)
Out[64]:
(11, 12, 13)

Custom function returning a function

In [65]:
def timed(f, *args, **kwargs):
    """Decorates the function f with time takin in seconds."""
    import time

    def func(*args, **kwargs):
        start = time.time()
        result = f(*args, **kwargs)
        elapsed = time.time() - start
        return elapsed, result
    return func
In [66]:
def my_sum(xs):
    s = 0
    for x in xs:
        s += x
    return s
In [67]:
my_sum(range(10000000))
Out[67]:
49999995000000
In [68]:
my_sum2 = timed(my_sum)
my_sum2(range(10000000))
Out[68]:
(0.8492159843444824, 49999995000000)

Decorators

There is syntactic sugar to decorate functions when using functions such as the timed function above.

In [69]:
@timed
def my_sum3(xs):
    s = 0
    for x in xs:
        s += x
    return s
In [70]:
my_sum3(range(10000000))
Out[70]:
(0.9857957363128662, 49999995000000)

Giving decorators their own arguments

You will come across packages that provide decorators which can take arguments. This is one way that they can be implemented. See if you can follow how it works!

In [71]:
def timed2(fudge_factor=0.0):
    """Decorator with decorartor arguments."""
    def timed(f):
        import time
        def func(*args, **kwargs):
            start = time.time()
            result = f(*args, **kwargs)
            elapsed = fudge_factor + time.time() - start
            return elapsed, result
        return func
    return timed
In [72]:
@timed2(fudge_factor=10)
def my_sum4(xs):
    s = 0
    for x in xs:
        s += x
    return s
In [73]:
my_sum4(range(10000000))
Out[73]:
(11.009344100952148, 49999995000000)