{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python: Numeric data\n", "\n", "The foundation for numerical computation in Python is the `numpy` package, and essentially all scientific libraries in Python build on this - e.g. `scipy`, `pandas`, `statsmodels`, `scikit-learn`, `cv2` etc. The basic data structure in `numpy` is the NDArray, and it is essential to become familiar with how to slice and dice this object.\n", "\n", "Numpy also has the `random`, and `linalg` modules that we will discuss in later lectures." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Resources\n", "----\n", "\n", "- [Numpy for R users](http://mathesaurus.sourceforge.net/r-numpy.html)\n", "- [NumPy: creating and manipulating numerical data](http://www.scipy-lectures.org/intro/numpy/index.html)\n", "- [Advanced Numpy](http://www.scipy-lectures.org/advanced/advanced_numpy/index.html)\n", "- [100 Numpy Exercises](http://www.labri.fr/perso/nrougier/teaching/numpy.100/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NDArray\n", "\n", "The base structure in `numpy` is `ndarray`, used to represent vectors, matrices and higher-dimensional arrays. Each `ndarray` has the following attributes:\n", "\n", "- dtype = corresponds to data types in C\n", "- shape = dimensions of array\n", "- strides = number of bytes to step in each direction when traversing the array" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4 5 6]\n", "dytpe int64\n", "shape (6,)\n", "strides (8,)\n" ] } ], "source": [ "x = np.array([1,2,3,4,5,6])\n", "print(x)\n", "print('dytpe', x.dtype)\n", "print('shape', x.shape)\n", "print('strides', x.strides)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]]\n", "dytpe int64\n", "shape (2, 3)\n", "strides (24, 8)\n" ] } ], "source": [ "x.shape = (2,3)\n", "print(x)\n", "print('dytpe', x.dtype)\n", "print('shape', x.shape)\n", "print('strides', x.strides)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1.+0.j 2.+0.j 3.+0.j]\n", " [ 4.+0.j 5.+0.j 6.+0.j]]\n", "dytpe complex128\n", "shape (2, 3)\n", "strides (48, 16)\n" ] } ], "source": [ "x = x.astype('complex')\n", "print(x)\n", "print('dytpe', x.dtype)\n", "print('shape', x.shape)\n", "print('strides', x.strides)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Array creation\n", "----" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array([1,2,3])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 2., 3.])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array([1,2,3], np.float64)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(3)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3. , 3.5, 4. , 4.5, 5. , 5.5])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(3, 6, 0.5)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array([[1,2,3],[4,5,6]])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 1., 1.])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ones(3)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0.],\n", " [ 0., 0., 0., 0.]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((3,4))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0., 0.],\n", " [ 0., 1., 0., 0.],\n", " [ 0., 0., 1., 0.],\n", " [ 0., 0., 0., 1.]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.eye(4)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0, 0],\n", " [0, 2, 0, 0],\n", " [0, 0, 3, 0],\n", " [0, 0, 0, 4]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.diag([1,2,3,4])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 9., 16.],\n", " [ 1., 2., 5., 10., 17.],\n", " [ 4., 5., 8., 13., 20.],\n", " [ 9., 10., 13., 18., 25.]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.fromfunction(lambda i, j: i**2+j**2, (4,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Array manipulation\n", "----" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 9., 16.],\n", " [ 1., 2., 5., 10., 17.],\n", " [ 4., 5., 8., 13., 20.],\n", " [ 9., 10., 13., 18., 25.]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.fromfunction(lambda i, j: i**2+j**2, (4,5))\n", "x" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 5)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.shape" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.size" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.dtype" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 4, 9, 16],\n", " [ 1, 2, 5, 10, 17],\n", " [ 4, 5, 8, 13, 20],\n", " [ 9, 10, 13, 18, 25]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.astype(np.int64)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 9.],\n", " [ 1., 2., 5., 10.],\n", " [ 4., 5., 8., 13.],\n", " [ 9., 10., 13., 18.],\n", " [ 16., 17., 20., 25.]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.T" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 9., 16., 1., 2., 5., 10., 17.],\n", " [ 4., 5., 8., 13., 20., 9., 10., 13., 18., 25.]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.reshape(2,-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Array indexing\n", "----" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 9., 16.],\n", " [ 1., 2., 5., 10., 17.],\n", " [ 4., 5., 8., 13., 20.],\n", " [ 9., 10., 13., 18., 25.]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 1., 4., 9., 16.])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 1., 4., 9., 16.])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0,:]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0., 1., 4., 9.])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:,0]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 9., 10., 13., 18., 25.])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[-1]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1,1]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 4.],\n", " [ 2., 5.],\n", " [ 5., 8.],\n", " [ 10., 13.]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:, 1:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Boolean indexing" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[False, False, True, True, True],\n", " [False, True, True, True, True],\n", " [ True, True, True, True, True],\n", " [ True, True, True, True, True]], dtype=bool)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x >= 2" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 4., 9., 16., 5., 10., 17., 4., 5., 8., 13., 20.,\n", " 9., 10., 13., 18., 25.])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[x > 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fancy indexing" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 4.])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0, [1,2]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculations and broadcasting\n", "----\n", "\n", "Broadcasting refers to the set of rules that numpy uses to perfrom operations on arrays with different shapes. See official [documentation](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html) for a clear explanation of the rules. Array shapes can be manipulated using the `reshape` method or by inserting a new axis with `np.newaxis`. Note that `np.newaxis` is an alias for `None`, which I sometimes use in my examples." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.fromfunction(lambda i, j: i**2+j**2, (2,3))\n", "x" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 5., 20.],\n", " [ 5., 10., 25.]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x * 5" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 2., 8.],\n", " [ 2., 4., 10.]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x + x" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 17., 22.],\n", " [ 22., 30.]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x @ x.T" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 5.],\n", " [ 2., 5., 14.],\n", " [ 5., 14., 41.]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.T @ x" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0. , 0.69314718, 1.60943791],\n", " [ 0.69314718, 1.09861229, 1.79175947]])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.log1p(x)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 2.71828183, 54.59815003],\n", " [ 2.71828183, 7.3890561 , 148.4131591 ]])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Combining and splitting arrays\n", "----" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.r_[x, x]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.vstack([x, x])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate([x, x], axis=0)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 0., 1., 4.],\n", " [ 1., 2., 5., 1., 2., 5.]])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.c_[x,x]" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 0., 1., 4.],\n", " [ 1., 2., 5., 1., 2., 5.]])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hstack([x, x])" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4., 0., 1., 4.],\n", " [ 1., 2., 5., 1., 2., 5.]])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate([x,x], axis=1)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = np.r_[x, x]\n", "y" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": true }, "outputs": [], "source": [ "a, b, c = np.hsplit(y, 3)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.],\n", " [ 1.],\n", " [ 0.],\n", " [ 1.]])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1.],\n", " [ 2.],\n", " [ 1.],\n", " [ 2.]])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4.],\n", " [ 5.],\n", " [ 4.],\n", " [ 5.]])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.]]), array([[ 1., 2., 5.]])]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.vsplit(y, [3])" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.]]), array([[ 1., 2., 5.]])]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.split(y, [3], axis=0)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hstack(np.hsplit(y, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reductions\n", "----" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 1., 4.],\n", " [ 1., 2., 5.],\n", " [ 0., 1., 4.],\n", " [ 1., 2., 5.]])" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.0" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.sum()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2., 6., 18.])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.sum(0) # column sum" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 5., 8., 5., 8.])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.sum(1) # row sum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standardize by column mean and standard deviation" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "z = (y - y.mean(0))/y.std(0)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-1., -1., -1.],\n", " [ 1., 1., 1.],\n", " [-1., -1., -1.],\n", " [ 1., 1., 1.]])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 0., 0., 0.]), array([ 1., 1., 1.]))" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z.mean(0), z.std(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standardize by row mean and standard deviation" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": true }, "outputs": [], "source": [ "z = (y - y.mean(1)[:,None])/y.std(1)[:,None]" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.98058068, -0.39223227, 1.37281295],\n", " [-0.98058068, -0.39223227, 1.37281295],\n", " [-0.98058068, -0.39223227, 1.37281295],\n", " [-0.98058068, -0.39223227, 1.37281295]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ -7.40148683e-17, 7.40148683e-17, -7.40148683e-17,\n", " 7.40148683e-17]), array([ 1., 1., 1., 1.]))" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z.mean(1), z.std(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Calculating pairwise distance matrix using broadcasting and vectorization\n", "\n", "Calculate the pairwise distance matrix between the following points\n", "\n", "- (0,0)\n", "- (4,0)\n", "- (4,3)\n", "- (0,3)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def distance_matrix_py(pts):\n", " \"\"\"Returns matrix of pairwise Euclidean distances. Pure Python version.\"\"\"\n", " n = len(pts)\n", " p = len(pts[0])\n", " m = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(n):\n", " s = 0\n", " for k in range(p):\n", " s += (pts[i,k] - pts[j,k])**2\n", " m[i, j] = s**0.5\n", " return m" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def distance_matrix_np(pts):\n", " \"\"\"Returns matrix of pairwise Euclidean distances. Vectorized numpy version.\"\"\"\n", " return np.sum((pts[None,:] - pts[:, None])**2, -1)**0.5" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0],\n", " [4, 0],\n", " [4, 3],\n", " [0, 3]])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts = np.array([(0,0), (4,0), (4,3), (0,3)])\n", "pts" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 2)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts.shape" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 4., 5., 3.],\n", " [ 4., 0., 3., 5.],\n", " [ 5., 3., 0., 4.],\n", " [ 3., 5., 4., 0.]])" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = pts.shape[0]\n", "p = pts.shape[1]\n", "dist = np.zeros((n, n))\n", "for i in range(n):\n", " for j in range(n):\n", " s = 0\n", " for k in range(p):\n", " s += (pts[i, k] - pts[j, k])**2\n", " dist[i, j] = np.sqrt(s)\n", "dist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using broadcasting" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 4, 2)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts[None, :].shape" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 1, 2)" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts[:, None].shape" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0, 0],\n", " [ 4, 0],\n", " [ 4, 3],\n", " [ 0, 3]],\n", "\n", " [[-4, 0],\n", " [ 0, 0],\n", " [ 0, 3],\n", " [-4, 3]],\n", "\n", " [[-4, -3],\n", " [ 0, -3],\n", " [ 0, 0],\n", " [-4, 0]],\n", "\n", " [[ 0, -3],\n", " [ 4, -3],\n", " [ 4, 0],\n", " [ 0, 0]]])" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = pts[None, :] - pts[:, None]\n", "m" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0, 0],\n", " [16, 0],\n", " [16, 9],\n", " [ 0, 9]],\n", "\n", " [[16, 0],\n", " [ 0, 0],\n", " [ 0, 9],\n", " [16, 9]],\n", "\n", " [[16, 9],\n", " [ 0, 9],\n", " [ 0, 0],\n", " [16, 0]],\n", "\n", " [[ 0, 9],\n", " [16, 9],\n", " [16, 0],\n", " [ 0, 0]]])" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m**2" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 4, 2)" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(m**2).shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to end up with a 4 by 4 matrix, so sum over the axis with dimension 2. This is axis=2, or axis=-1 since it is the first axis from the end." ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 16, 25, 9],\n", " [16, 0, 9, 25],\n", " [25, 9, 0, 16],\n", " [ 9, 25, 16, 0]])" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum((pts[None, :] - pts[:, None])**2, -1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basically, the distance matrix can be calculated in one line of numpy code" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 4., 5., 3.],\n", " [ 4., 0., 3., 5.],\n", " [ 5., 3., 0., 4.],\n", " [ 3., 5., 4., 0.]])" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sqrt(np.sum((pts[None, :] - pts[:, None])**2, -1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's put them in functions and compare the time." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def pdist1(pts):\n", " n = pts.shape[0]\n", " p = pts.shape[1]\n", " dist = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(n):\n", " s = 0\n", " for k in range(p):\n", " s += (pts[i, k] - pts[j, k])**2\n", " dist[i, j] = s\n", " return np.sqrt(dist)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def pdist2(pts):\n", " return np.sqrt(np.sum((pts[None, :] - pts[:, None])**2, -1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check that the outputs are the same" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.alltrue(pdist1(pts) == pdist2(pts))" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pts = np.random.random((1000, 2))" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loops, best of 3: 3.26 s per loop\n" ] } ], "source": [ "%timeit pdist1(pts)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 loops, best of 3: 77.3 ms per loop\n" ] } ], "source": [ "%timeit pdist2(pts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### But don't give up on loops yet" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from numba import njit" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": true }, "outputs": [], "source": [ "@njit\n", "def pdist3(pts):\n", " n = pts.shape[0]\n", " p = pts.shape[1]\n", " dist = np.zeros((n, n))\n", " for i in range(n):\n", " for j in range(n):\n", " s = 0\n", " for k in range(p):\n", " s += (pts[i, k] - pts[j, k])**2\n", " dist[i, j] = s\n", " return np.sqrt(dist)" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The slowest run took 27.17 times longer than the fastest. This could mean that an intermediate result is being cached \n", "1 loops, best of 3: 16.1 ms per loop\n" ] } ], "source": [ "%timeit pdist3(pts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is going on?\n", "\n", "This is 3-5 times faster than the broadcasting version! We have just performed Just In Time (JIT) compilation of a function, which will be discussed in a later lecture." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Constructing leave-one-out arrays\n", "\n", "Another example of numpy trickery is to construct a leave-one-out matrix of a vector of length k. In the matrix, each row is a vector of length k-1, with a different vector component dropped each time. This can be used for LOOCV to evalaute the out-of-sample accuracy of a predictive model.\n", "\n", "For example, suppose you have data points [(1,4), (2,7), (3,11), (4,9), (5,15)] that you want to perfrom LOOCV on for a simple regression model. For each cross-validation, you use one point for testing, and the remaining 4 points for training. In other words, you want the training set to be:\n", "```\n", "[(2,7), (3,11), (4,9), (5,15)]\n", "[(1,4), (3,11), (4,9), (5,15)]\n", "[(1,4), (2,7), (4,9), (5,15)]\n", "[(1,4), (2,7), (3,11), (5,15)]\n", "[(1,4), (2,7), (3,11), (4,9)]\n", "```\n", "Here is one way to do create the training set using numpy tricks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a triangular matrix with N rows, N-1 columns and offset from diagnonal by -1" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": true }, "outputs": [], "source": [ "N = 5" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0., 0., 0.],\n", " [ 1., 1., 0., 0., 0.],\n", " [ 1., 1., 1., 0., 0.],\n", " [ 1., 1., 1., 1., 0.],\n", " [ 1., 1., 1., 1., 1.]])" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.tri(N)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0., 0.],\n", " [ 1., 1., 0., 0.],\n", " [ 1., 1., 1., 0.],\n", " [ 1., 1., 1., 1.],\n", " [ 1., 1., 1., 1.]])" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.tri(N, N-1)" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0., 0.],\n", " [ 1., 0., 0., 0.],\n", " [ 1., 1., 0., 0.],\n", " [ 1., 1., 1., 0.],\n", " [ 1., 1., 1., 1.]])" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.tri(N, N-1, -1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use broadcasting to create a new index matrix" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, N)" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3., 4.],\n", " [ 0., 2., 3., 4.],\n", " [ 0., 1., 3., 4.],\n", " [ 0., 1., 2., 4.],\n", " [ 0., 1., 2., 3.]])" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, N) - np.tri(N, N-1, -1)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": true }, "outputs": [], "source": [ "idx = np.arange(1, N) - np.tri(N, N-1, -1).astype('int')" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 4],\n", " [ 2, 7],\n", " [ 3, 11],\n", " [ 4, 9],\n", " [ 5, 15]])" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = np.array([(1,4), (2,7), (3,11), (4,9), (5,15)])\n", "data" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 2, 7],\n", " [ 3, 11],\n", " [ 4, 9],\n", " [ 5, 15]],\n", "\n", " [[ 1, 4],\n", " [ 3, 11],\n", " [ 4, 9],\n", " [ 5, 15]],\n", "\n", " [[ 1, 4],\n", " [ 2, 7],\n", " [ 4, 9],\n", " [ 5, 15]],\n", "\n", " [[ 1, 4],\n", " [ 2, 7],\n", " [ 3, 11],\n", " [ 5, 15]],\n", "\n", " [[ 1, 4],\n", " [ 2, 7],\n", " [ 3, 11],\n", " [ 4, 9]]])" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[idx]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### All but one\n", "\n", "R uses negative indexing to mean delete the component at that index. Because Python uses negative indexing to mean count from the end, we have to do a little more work to get the same effect. Here are two ways of deleting one item from a vector." ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def f1(a, k):\n", " idx = np.ones_like(a).astype('bool')\n", " idx[k] = 0\n", " return a[idx]" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def f2(a, k):\n", " return np.r_[a[:k], a[k+1:]]" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": true }, "outputs": [], "source": [ "a = np.arange(100)\n", "k = 50" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The slowest run took 6.39 times longer than the fastest. This could mean that an intermediate result is being cached \n", "100000 loops, best of 3: 12.4 µs per loop\n" ] } ], "source": [ "%timeit f1(a, k)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10000 loops, best of 3: 47.6 µs per loop\n" ] } ], "source": [ "%timeit f2(a, k)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Universal functions (Ufuncs)\n", "\n", "Functions that work on both scalars and arrays are known as ufuncs. For arrays, ufuncs apply the function in an element-wise fashion. Use of ufuncs is an esssential aspect of vectorization and typically much more computationally efficient than using an explicit loop over each element." ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEACAYAAACwB81wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8zmX+x/HXB5E2mUkykZJoX+hHaTuVRBlpMdkyLUMq\nqmGUaaNNJamkTZOKNPrRJlMTTU5DSpTIFkqKdiYljJxz/f64bjN+Ott97uX6fu/7/Xw8zuMsvuf+\nvh3H577uazXnHCIikvuqhA4gIiLZoYIvIpInVPBFRPKECr6ISJ5QwRcRyRMq+CIieSItBd/MHjez\nr81sfhnXjDCzZWb2gZkdmY77iohIxaWrhf8EcHppf2hm7YD9nXMHAJcCj6TpviIiUkFpKfjOuRnA\nv8q45CxgTOLaWUAtM6ubjnuLiEjFZKsPf2/g820+X534moiIZIkGbUVE8kS1LN1nNdBgm8/rJ772\nC2amzX1ERJLknLPyrklnC98SbyWZBPQAMLNjgO+dc1+X9kDOuVi+DRo0KKXv37zZ8cQTjqZNHc2b\nO4YPd6xcmdxjbNrkeO01R6dOjt13d/Ts6VixIjv5Q78pv/Lna/6KSksL38yeAQqAX5vZZ8AgoLqv\n3W6Uc+4VMzvDzJYDPwEXpeO+ucI5mDABBgyAAw6Ahx6Ck08GK/f5+pdq1IA2bfzbV1/5x2reHC66\nCK6/HmrXTn9+EYmHtBR851zXClzTJx33yjWffAJXXAGrVsHTT8MJJ6TvsffaC265BS67DAYNgqZN\n4d57oWvXyj2ZiEi8adA2jQoKCip8rXPw6KPQogUUFMD776e32G+rXj0YNQpefRVuvx06d4a1a395\nXTL5o0j5w1L+6LNk+n+ywcxc1DKl2/r10Ls3zJsHEyf6lne2bNwI113nu5D+93+hVavs3VtEMsPM\ncFketJUKWL4cWraEatVg1qzsFnuAmjV9t86oUdCxI4wZk937i0g4auFn0Zw58Nvfwo03+n710P3o\nixb5PJ06wZAhUEVP/yKxVNEWvgp+lkyZAt26wWOP+ZZ1VHz3HZx9Nuy/Pzz+OFStGjqRiCRLBT9C\nnn/et+ifew6OPz50ml/66Sf/JLTHHr6LZ4cdQicSkWSo4EfE5MlwySXw97/DUUeFTlO6TZvg3HP9\nPP7x46F69dCJRKSiNGgbAVOnwsUXw8svR7vYA+y4o38lsmWLX6RVXBw6kYikmwp+hrz1ll/g9Pzz\nfq59HNSoAc8+C599Bv37+7UCIpI7VPAzYPly3z0ydmw0++zLUrMmTJoEr78OQ4eGTiMi6ZSt3TLz\nxtq1cOaZMHgwtG0bOk3l1K7txxyOOw722Qe6dAmdSETSQYO2abR5s9+07OijYdiw0GlS9+GHcMop\nvvg3bx46jYiURrN0Aujd2+9Q+dxzuTOf/bnnoF8/ePddqKtDKUUiqaIFX106aTJmDEybBrNn506x\nBz8WMX++f//GG5quKRJnauGnwfz5cOqpvuAfemjoNOlXXOxX4zZq5PfhEZFo0Tz8LFm3zrd+778/\nN4s9+D12nnwSXnjBz+ARkXhSCz8Fzvm59r/6FTz4YOg0mTdzpm/pz5kDDRqUf72IZIda+Fkwbpyf\nyZILM3IqolUr+OMf/TTNLVtCpxGRZKmFX0krVvgVtK+/DkccETpN9hQXQ7t2vvgPGhQ6jYiApmVm\n1JYt/ljCs8/2WxDkm9Wr/d5Af/87NGsWOo2IqEsng+6+2+8788c/hk4Sxt57w/Dh0KMH/PvfodOI\nSEWphZ+kxYvhxBP9wGXDhqHThOOcn53UpAnceWfoNCL5TV06GVBUBCecAN27w+WXh04T3jffwOGH\n++maxx4bOo1I/lKXTgY88IA/Dap379BJomHPPWHECOjZ0+8jJCLRphZ+BX38MbRsCW+/DQccEDpN\ndDgH7dv7WTvXXx86jUh+UpdOGjnntzo+9VS45prQaaJn5Uq/m+bMmb5PX0SyS106aTRhAnzxRf7O\nyilPw4a+dd+7t07JEokyFfxy/PCD3x744Yd9/72UrG9f/7MaMyZ0EhEpjbp0ytGvH3z/PYweHTpJ\n9M2eDR06wJIlUKtW6DQi+UN9+Gkwb54/wWrhQthjj9Bp4uEPf4Bdd9U2yiLZpIKfIufgpJOgWze4\n9NLQaeLj22/h4IOhsBAOOSR0GpH8oEHbFE2c6Puk//CH0EnipU4dv6la374awBWJGhX8EmzcCAMG\n+ENNcum4wmzp3RvWrPFPmiISHSr4JbjnHjj6aN+lI8mrVg3uuw+uvRY2bQqdRkS2Uh/+dlav9vvD\nzJkD++0XLEZO6NDB7z00YEDoJCK5TYO2lXThhVCvHtxxR7AIOWPJEjj+eP9es5xEMkcFvxLmzYPT\nT4elS2G33YJEyDl9+oCZ33hORDJDBb8S2rWDM87wM0wkPbZO05w+HQ48MHQakdykaZlJev11WLZM\nc+7TrU4d34evnTRFwlMLH38w99FHw5//DJ06ZfXWeWHjRr+l9HPP+S2mRSS91MJPwvjxfmO0884L\nnSQ31azpF2MNHKjFWCIh5X3B//lnuPFGuOsuP7gomXHRRX6L6alTQycRyV95X/BHj4b994eCgtBJ\nclu1anD77b7brLg4dBqR/JTXBX/TJrjtNv8mmXfuuf5V1IQJoZOI5Ke8LvgPP+yP5mvRInSS/GAG\nQ4b4/vyiotBpRPJP3s7S+fFHP3Nk6lQ47LCM304SnIMTT4ReveCCC0KnEckNWnhVjttvh0WLYNy4\njN9KtjNtGvTs6bdcqFYtdBqR+MvqtEwza2tmS8xsqZldW8Kfn2Rm35vZ+4m3G9Jx38pat87v5jho\nUMgU+evkk6FBAxg7NnQSkfyScgvfzKoAS4FTgS+A2UBn59ySba45CejvnOtQgcfLeAv/ttv8fjk6\ncDuc6dOhRw/46COoXj10GpF4y2YLvwWwzDm30jn3MzAeOKukTGm4V8rWrfMHm9wQ9DWGnHCCH0N5\n4onQSUTyRzoK/t7A59t8virxte0da2YfmNnfzOzgNNy3Uh54ANq2hSZNQiWQrW6+2W9D/fPPoZOI\n5IdsDZm9B+zjnNtgZu2AF4FSS+7gwYP/83FBQQEFaVoVtbV1P2NGWh5OUnTssb6VP3YsXHxx6DQi\n8VFYWEhhYWHS35eOPvxjgMHOubaJzwcCzjl3VxnfswJo7pxbW8KfZawP/7bbfJ+xBguj45//9MVe\nM3ZEKi+bffizgcZm1tDMqgOdgUnbham7zcct8E80vyj2mbR+PYwYoW16o+bEE6F+ffjrX0MnEcl9\nKRd851wR0AeYAiwExjvnFpvZpWbWK3HZeWa2wMzmAvcB56d632Q98oifDqhDOKLnppv8qy+tvhXJ\nrLxYeLVxo98g7dVX4Ygj0vrQkgbO+Vk7V1wBXbqETiMSP9oPfxujR/sDTlTso8nMT5MdMkQ7aYpk\nUs4X/M2bYehQ9d1H3emn+0No/va30ElEclfOF/ynn/Zz7nW0XrSZ+b3yhwzRqVgimZLTBb+oyJ9k\ndd11oZNIRZxzDqxdC2++GTqJSG7K6YL/4otQu7ZOs4qLqlXh2mt9K19E0i9nC75zcOed/uBsnVUb\nH927w+LFMGdO6CQiuSdnC/60aX6xVYdy9+eUKKleHf70J7/HjoikV87Ow2/Txs/pvuiiNISSrPrp\nJ9h3X5g50++1IyJly+t5+O+957sFunULnUQqY+ed4bLL4J57QicRyS052cI//3w/DbNfvzSFkqz7\n5hu/DcaiRbDXXqHTiERb3p5p+8kn0KIFrFgBu+6axmCSdVdcAbvv7s8fFpHS5W3B79sXdtlFg365\n4OOP/Ss1PXmLlC0vC/6aNX6Qb+FCqFcvzcEkCHXPiZQvLwv+rbfCp5/C44+nN5OEM2eOX4H78cd+\nrx0R+aW8m6WzcSM8+KCfwy254+ijoVEjmDgxdBKR+MuZgj92LPzP/8BBB4VOIunWvz8MG6ZN1URS\nlRMFv7jYz9lW6z43nXmmX4ylTdVEUpMTBf+VV/wsjhNPDJ1EMqFKFT9oO2xY6CQi8ZYTg7Ynnww9\ne0LXrhkKJcFt3Oi3WygsVLedyPbyZtD2/fdh+XLo1Cl0EsmkmjXh8sth+PDQSUTiK/Yt/AsugMMP\nhwEDMhhKIuGbb6BpU1i6FOrUCZ1GJDryYh7+qlW+2H/yiV+CL7mvZ09o0ABuuil0EpHoyIuCP3Cg\n79u9//4Mh5LIWLgQWrf2C+xq1AidRiQacr7gr1/vB/HefdcvzJH8cfrp/qyDCy8MnUQkGnJ+0HbM\nGD8NU8U+//Tr5wdvI9ZWEYm8WBb84mK47z64+urQSSSENm2gqAjeeCN0EpF4iWXBf/VVv9DqhBNC\nJ5EQzPyTvaZoiiQnln34rVvD73/vp2RKftq4ERo2hBkzoEmT0GlEwsrZPvz58/2xd+efHzqJhFSz\nJvTqBSNGhE4iEh+xa+Ffcgnstx/ccEMWQ0kkffEFHHKIPxFL6zAkn+XktMxvv/Uv37XSUrbq1g2a\nNfNbKIvkq5zs0nn0UTj3XBV7+a+rroIHHoAtW0InEYm+2BT8zZvh4Yf9f3CRrVq0gL33hpdeCp1E\nJPpiU/AnTvQbZx12WOgkEjVXX+3XZYhI2WJT8O+/X617KVnHjn5vnblzQycRibZYFPx33vEDtu3b\nh04iUbTDDnDFFdpET6Q8sZil06ULtGyprRSkdGvWQOPG8NFHsOeeodOIZFfOTMtcvdr3269YAbVq\nBQwmkdezJ+yzD9x4Y+gkItmVMwX/hhtg3To/9U6kLB9+6LdO/vRTqF49dBqR7MmJefibNsFjj0Gf\nPqGTSBwcdpg/4HzixNBJRKIp0gV//Hi/irJp09BJJC6uvFKDtyKliWzBd85vjKWpmJKM9u39jK5Z\ns0InEYmeyBb8t96Cn37yh12IVFTVqr4LUGM+Ir8U2UHbTp3gpJPUfy/J+/57f/TlwoVQr17oNCKZ\nF+tZOitXOo48Elau9CdbiSTr8sv9fPzBg0MnEcm8WBf8gQMdGzdqfxSpvEWL4JRTfKOhRo3QaUQy\nK9YFv04dx8yZfuWkSGWddpo/CrN799BJRDIrq/PwzaytmS0xs6Vmdm0p14wws2Vm9oGZHVnW47Vo\noWIvqds6RTNibRqRYFIu+GZWBRgJnA4cAnQxswO3u6YdsL9z7gDgUuCRsh6zb99UU4nAGWfA2rWa\noimyVTpa+C2AZc65lc65n4HxwFnbXXMWMAbAOTcLqGVmdUt7wNNOS0MqyXuaoiny/6Wj4O8NfL7N\n56sSXyvrmtUlXPPfUJFdHSBxc9FF8Oqr/sBzkXRaudJP/Y2TaqEDlGTwNnPpCgoKKCgoCJZF4m33\n3aFzZ38e8s03h04jueSuu/z52iF+rwoLCyksLEz6+1KepWNmxwCDnXNtE58PBJxz7q5trnkEmOac\nezbx+RLgJOfc1yU83i/2wxdJhaZoSrpFbXFfNmfpzAYam1lDM6sOdAYmbXfNJKBHItgxwPclFXuR\nTDj4YDj8cJgwIXQSyRVPPAFt20aj2Ccj5YLvnCsC+gBTgIXAeOfcYjO71Mx6Ja55BVhhZsuBR4HL\nU72vSDL69tUUTUmPoiIYOdJP+42bSC68ilomib+iImjSBMaNg2OOCZ1G4mzyZLjlFj/d18rtRMmO\nnDgARSRdtk7RHDEidBKJuxEjfOs+KsU+GWrhS974/nvYbz8/0Pab34ROI3EU1QkAauGLbGf33aFr\nV3ikzHXeIqUbORJ69YpWsU+GWviSVxYvhpNPjl4LTaJv6yvERYuiNztHLXyREhx0EBxxBDz7bOgk\nEjejR/v9maJW7JOhgi9558or/cCbXkhKRRUV+T2Z4n7Gtgq+5J127fzL87ffDp1E4mLyZKhb12/d\nHmcq+JJ3qlT570IskYrYOhUz7jRoK3nphx9g331h3jxo0CB0GomyBQugTRv49FOoXj10mpJp0Fak\nDLvtBhdcAA8/HDqJRN2IEdC7d3SLfTLUwpe8tXw5tGrlp2jWrBk6jUTRmjX+uNUlS3wfflSphS9S\njsaN/SDcM8+ETiJR9dhj0LFjtIt9MtTCl7w2dSr07+/78uO4N4pkzs8/+z3vX34ZjjwydJqyqYUv\nUgGtW0NxMUybFjqJRM3zz/uCH/VinwwVfMlrZn66naZoyvbuvz/+C622py4dyXsbNkDDhn4hVuPG\nodNIFMyeDb/7nR/Yr1o1dJryqUtHpIJ22gl69vRL50XAt+779IlHsU+GWvgiwKpV/tzbFSugVq3Q\naSSk1avhsMPi9bugFr5IEurXh9NP94dTS3576CHo3j0+xT4ZauGLJMyaBV26wLJlufdSXipmwwa/\n5cbMmfEaz1ELXyRJLVv6BTaTJoVOIqE8/TQce2y8in0yVPBFttGvHwwfHjqFhOAc3HcfXH116CSZ\no4Ivso2zz4bPP/fT8iS/TJniN0grKAidJHNU8EW2Ua2a3yv/3ntDJ5FsGz7ct+5zeYsNDdqKbGfd\nOn9Y9fz5fvaO5L6te96vWBHPw+01aCtSSbVqQY8eMHJk6CSSLcOHwxVXxLPYJ0MtfJESfPKJ3zr5\n009hl11Cp5FM+uorOOggv43Cr38dOk3lqIUvkoJGjfzg3ejRoZNIpj34oF9/Eddinwy18EVK8c47\n0LUrLF3qB3Ml92xdaDVjBjRpEjpN5amFL5KiY46BevXghRdCJ5FMGTPGL7SKc7FPhgq+SBn+9CcY\nNswvypHcUlTkB2v79QudJHtU8EXK0KGDP8j6rbdCJ5F0mzQJdt8dTjwxdJLsUcEXKUPVqr4FeM89\noZNIug0bBgMG5PZCq+1p0FakHFsH9qZPh6ZNQ6eRdJg5Ey64wA/I58LOqBq0FUmTnXaCyy5TKz+X\n3H23f+WWC8U+GWrhi1TAt9/6mRyLF8Nee4VOI6lYuhSOP95vo7DzzqHTpIda+CJpVKcOdOsGI0aE\nTiKpuuce6N07d4p9MtTCF6mgrdstrFgBu+4aOo1UxpdfwiGHwEcf+SfxXKEWvkiaNWoErVvDY4+F\nTiKVdf/9/pVaLhX7ZKiFL5KE996Djh3h44/9YRkSH+vW+Sft997zs65yiVr4IhnQvDkceCCMGxc6\niSTrkUegXbvcK/bJUAtfJEnTpvlBv0WL8m9aX1xt2uQPtZkyBQ47LHSa9FMLXyRDCgqgdm1tqhYn\nTz3lX53lYrFPhlr4IpUwaRIMHuz7g/NpaX4cbdniV0g/9ZSff5+L1MIXyaD27WHzZnjttdBJpDzj\nx0ODBrlb7JOhFr5IJY0bB6NGwZtvhk4ipSkuhkMP9dMxTzstdJrMUQtfJMPOPx9WrfKbqkk0vfCC\nXyTXunXoJNGQUsE3s9pmNsXMPjKz18ysVinXfWpm88xsrpm9m8o9RaKiWjX485/h1ltDJ5GSOAe3\n3w7XX69xlq1SbeEPBF53zjUF3gD+XMp1xUCBc+4o51yLFO8pEhk9evhl+u+8EzqJbO/VV/2pVu3b\nh04SHakW/LOApxIfPwV0LOU6S8O9RCKnenUYOFCt/KhxDm67Da67Dqqo8vxHqj+KPZ1zXwM4574C\n9izlOgdMNbPZZtYzxXuKRMrFF8O8eX6KpkTD1Knw/fdw3nmhk0RLtfIuMLOpQN1tv4Qv4DeUcHlp\n02uOc859aWZ18IV/sXNuRmn3HDx48H8+LigooKCgoLyYIsHUqAHXXONb+S++GDqNOOfXSNx0U+6u\nhC4sLKSwsDDp70tpWqaZLcb3zX9tZnsB05xzB5XzPYOAH51zw0v5c03LlNjZuBH23x9eeQWOPDJ0\nmvw2ZQpcfTV8+GHuFvztZWta5iTgwsTHvwdeKiHITma2S+LjnYE2wIIU7ysSKTVrwrXXwqBBoZPk\nt3xo3aci1YJ/F3CamX0EnArcCWBm9cxscuKausAMM5sLvAO87JybkuJ9RSLn0kt9P/6cOaGT5K+t\nffedOoVOEk1aaSuSRg89BJMn+64dyS7noFUruPJK6NIldJrs0kpbkQAuuQQWLoS33w6dJP9Mngzr\n1/sV0FIyFXyRNKpRA264QX352VZc7H/ut96qefdl0Y9GJM0uvBCWL9ematk0YYJ/sj3rrNBJok19\n+CIZ8PTT8OCDMHOm9nHJtC1b4JBDYOTI3N4RsyzqwxcJqGtX2LDBH5QimTVmDNSrpx0xK0ItfJEM\n+dvf/Arc+fM1JzxTNm3yp1k98wwcd1zoNOGohS8S2BlnwK9+BWPHhk6Su0aOhKOOyu9inwy18EUy\naMYM6NbNb6G8446h0+SWtWt96376dDjwwNBpwlILXyQCjj8ejjjCt0Qlve64A845R8U+GWrhi2TY\nkiW+8C9ZAnvsETpNbli5Epo1gwUL/IBtvqtoC18FXyQL+vb17x94IGyOXNGjB+y7L9xyS+gk0aCC\nLxIh330HBx3k+/SbNg2dJt7efRc6dvTjIrvuGjpNNKgPXyRC9tjDT9G85prQSeLNObjqKhgyRMW+\nMlTwRbLkyiv9oRz/+EfoJPH1zDN+ZW2PHqGTxJO6dESy6IUX/CZfH3wAO+wQOk28/PSTn5Ezfrzm\n3W9PXToiEdSxI9Svr2malTF0qJ/tpGJfeWrhi2TZRx/5wvXhh7DXXqHTxMPHH0PLlvD++7DPPqHT\nRI9m6YhE2DXXwDffwJNPhk4Sfc75bSpOPlmD3qVRwReJsB9/9NM0n31WXRTlef55uPFGmDsXqlcP\nnSaa1IcvEmG77gr33usPPt+8OXSa6Fq/Hq6+2p8VrGKfOhV8kUDOO8+vFr377tBJouvmm31Xzkkn\nhU6SG9SlIxLQypXQvLk/GatJk9BpouW993zf/fz5ULdu6DTRpi4dkRho2NDPy7/0Uj84Kd7mzXDx\nxTBsmIp9OqngiwTWt68fxH388dBJouOuu/x6he7dQyfJLerSEYmABQt8X/Xs2b5fP59t/VnMneuL\nvpRPXToiMXLooTBggO/GKC4OnSacn3/2P4Pbb1exzwQVfJGI6N/fH8r94IOhk4Rz663+HOCePUMn\nyU3q0hGJkGXLoFUreOut/Ju189ZbcO65vitHp1glR106IjF0wAH+FKfOneHf/w6dJnt++AEuuAAe\nfVTFPpPUwheJGOegUyf4zW9gxIjQabLj97+HHXf0BV+SV9EWfrVshBGRijODv/wFjjoKTjnFb6mc\ny0aP9scWzpkTOknuUwtfJKJmzYIOHXwxbNgwdJrMmDsX2rSBf/7TbyYnlaM+fJGYa9kSrr0WzjkH\nNmwInSb9/vUvv5/QyJEq9tmiFr5IhDnnz28tKoJx43x3Ty4oLvZdVY0awX33hU4Tf2rhi+QAMxg1\nyk/XHDo0dJr0GTjQz8zJpb9THGjQViTiatb0h5+3bAkHHwy//W3oRKkZNQpeegneflt73GebunRE\nYuLdd+HMM32xbNUqdJrKmTLFT8GcPh0aNw6dJneoS0ckx7RoAWPHwtlnw6JFodMkb84cv/vlhAkq\n9qGo4IvESNu2cM89/v1nn4VOU3Hz5vlXJ48/DscfHzpN/lIfvkjMdO8Oa9ZAQQG88Ub0t1NetMg/\nQY0cGf/xh7hTwReJoauugqpV/Vmv//hHdLtIFizwxX7oUL9dhISlgi8SU336+FkuBQV+MPTgg0Mn\n+v+mT/cLq+69F7p2DZ1GQAVfJNZ69YKddvJF/+mn/TYFUfDiiz7buHFw2mmh08hWmpYpkgOmT4ff\n/Q6uu863/EOtyC0uhjvugIcegkmToHnzMDnyTUWnZargi+SIFSv8oOjRR/ttlXfbLbv3X7PG72n/\n44/w7LN+e2fJDs3DF8kz++0H77wDNWrAEUf4HSiz5bXXoFkzfzbvG2+o2EdVSgXfzM4zswVmVmRm\nzcq4rq2ZLTGzpWZ2bSr3FJHS7bKLP0TkgQf8qVl9+8J332Xufl9+6e9z2WV+y4ShQ2GHHTJ3P0lN\nqi38D4GzgTdLu8DMqgAjgdOBQ4AuZnZgiveNpMLCwtARUqL8YaUzf/v2MH++323zwAPhzjth48a0\nPTxr18LgwXD44X7HywULoEaNwvTdIIC4//5UREoF3zn3kXNuGVBW31ELYJlzbqVz7mdgPHBWKveN\nqrj/wih/WOnOv8cefrHTzJkwe7Y/ROWaa2D58so/5pIlMGCAP3t31Sr/2EOG+JlC+vlHXzamZe4N\nfL7N56vwTwIikgVNmsBzz/ktlkeNgmOPhaZNoXVrf4Riixb+PNmSrF3rt0WYPt3vgbN2re/C+eAD\naNAgu38PSV25Bd/MpgJ1t/0S4IDrnXMvZyqYiKTXAQfA3XfDrbfCm2/CtGnQv7/v+qldG+rX9+83\nbvRv33wD69b5bpsWLeCRR/yTRRVN9YittEzLNLNpQH/n3Psl/NkxwGDnXNvE5wMB55y7q5TH0pxM\nEZEkVWRaZjq7dEq72WygsZk1BL4EOgNdSnuQioQWEZHkpTots6OZfQ4cA0w2s1cTX69nZpMBnHNF\nQB9gCrAQGO+cW5xabBERSVbkVtqKiEhmRGb4Jc6Ls8zscTP72szmh85SGWZW38zeMLOFZvahmV0Z\nOlMyzKyGmc0ys7mJ/INCZ0qWmVUxs/fNbFLoLMkys0/NbF7i5/9u6DzJMrNaZjbBzBYn/g+0DJ2p\nosysSeLn/n7i/bqy/v9GooWfWJy1FDgV+ALf79/ZObckaLAKMrPjgfXAGOfc4aHzJMvM9gL2cs59\nYGa7AO8BZ8Xl5w9gZjs55zaYWVXgLeBK51xsio+Z/RFoDuzmnOsQOk8yzOwToLlz7l+hs1SGmT0J\nvOmce8LMqgE7Oed+CBwraYk6ugpo6Zz7vKRrotLCj/XiLOfcDCCWv+wAzrmvnHMfJD5eDyzGr5+I\nDefchsSHNfCTEcK3ZCrIzOoDZwB/CZ2lkozo1JKkmNluwAnOuScAnHNb4ljsE1oDH5dW7CE6/0gl\nLc6KVcHJFWa2L3AkMCtskuQkukTmAl8BU51zs0NnSsK9wABi9CS1HQdMNbPZZtYzdJgk7Qd8Z2ZP\nJLpFRplZzdChKul84K9lXRCVgi8RkOjOmQhclWjpx4Zzrtg5dxRQH2hpZhE7/6lkZnYm8HXiFZZR\n9jYlUXXzgDPRAAABiElEQVScc64Z/lXKFYkuzrioBjQDHkz8HTYAA8NGSp6Z7QB0ACaUdV1UCv5q\nYJ9tPq+f+JpkSaLvciIw1jn3Uug8lZV4OT4NaBs6SwUdB3RI9IP/FTjZzMYEzpQU59yXifffAi8Q\nr61TVgGfO+fmJD6fiH8CiJt2wHuJf4NSRaXg/2dxlplVxy/Oittshbi2zrYaDSxyzt0fOkiyzGwP\nM6uV+LgmcBoQiwFn59x1zrl9nHON8L/3bzjneoTOVVFmtlPilSFmtjPQBlgQNlXFOee+Bj43syaJ\nL50KLAoYqbK6UE53DkTkTFvnXJGZbV2cVQV4PE6Ls8zsGaAA+LWZfQYM2joIFAdmdhzQDfgw0Q/u\ngOucc38Pm6zC6gFPJWYpVAGedc69EjhTvqgLvJDYEqUaMM45NyVwpmRdCYxLdIt8AlwUOE9SzGwn\n/IBtr3KvjcK0TBERybyodOmIiEiGqeCLiOQJFXwRkTyhgi8ikidU8EVE8oQKvohInlDBFxHJEyr4\nIiJ54v8A42wJvv9FA/kAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "xs = np.linspace(0, 2*np.pi, 100)\n", "ys = np.sin(xs) # np.sin is a universal function\n", "plt.plot(xs, ys);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generalized ufuncs\n", "\n", "A universal function performs vectorized looping over scalars. A generalized ufunc performs looping over vectors or arrays. Currently, numpy only ships with a single generalized ufunc. However, they play an important role for JIT compilation with `numba`, a topic we will cover in future lectures." ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(m,n),(n,p)->(m,p)\n" ] } ], "source": [ "from numpy.core.umath_tests import matrix_multiply\n", "\n", "print(matrix_multiply.signature)" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": true }, "outputs": [], "source": [ "us = np.random.random((5, 2, 3)) # 5 2x3 matrics\n", "vs = np.random.random((5, 3, 4)) # 5 3x4 matrices" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.45041464, 0.73156889, 0.20199586],\n", " [ 0.07597661, 0.13069672, 0.57386177]],\n", "\n", " [[ 0.10830059, 0.26695388, 0.44054188],\n", " [ 0.57974703, 0.17978862, 0.52472549]],\n", "\n", " [[ 0.40794462, 0.35751635, 0.36870809],\n", " [ 0.63494551, 0.11960905, 0.51381859]],\n", "\n", " [[ 0.49510212, 0.46783668, 0.07856113],\n", " [ 0.28401281, 0.47199107, 0.54560703]],\n", "\n", " [[ 0.79458848, 0.43756637, 0.06759583],\n", " [ 0.40228528, 0.50838122, 0.56375008]]])" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "us" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 5.12266952e-02, 8.40637879e-01, 2.38644940e-01,\n", " 4.35712252e-01],\n", " [ 7.81217918e-01, 6.77203544e-01, 7.28623630e-01,\n", " 8.70980358e-02],\n", " [ 9.53278422e-01, 6.57880491e-01, 4.45387233e-01,\n", " 5.54732924e-02]],\n", "\n", " [[ 6.52946888e-01, 2.29030756e-01, 5.91273241e-01,\n", " 8.29711164e-06],\n", " [ 5.25251656e-01, 5.06625358e-01, 1.87996526e-01,\n", " 3.57795468e-02],\n", " [ 6.45830540e-01, 7.67992588e-01, 3.53764451e-01,\n", " 8.93663158e-01]],\n", "\n", " [[ 7.05223217e-01, 5.68551438e-01, 2.21699577e-01,\n", " 3.66118249e-01],\n", " [ 3.59834031e-01, 8.86827366e-01, 8.90595276e-01,\n", " 9.32417623e-01],\n", " [ 9.20454420e-01, 8.21082903e-01, 9.46367477e-01,\n", " 6.79992096e-02]],\n", "\n", " [[ 5.17735631e-01, 7.57666718e-01, 9.76354847e-01,\n", " 8.23073506e-01],\n", " [ 9.12873687e-01, 7.01160089e-01, 9.41861241e-01,\n", " 5.75122377e-01],\n", " [ 4.29768204e-01, 6.52651201e-01, 4.52733332e-01,\n", " 7.76359616e-01]],\n", "\n", " [[ 3.29293395e-01, 6.01805518e-01, 8.82878227e-01,\n", " 9.58704548e-01],\n", " [ 1.57352491e-01, 8.34085919e-01, 2.60621284e-01,\n", " 1.61751295e-01],\n", " [ 1.81993190e-01, 3.98928140e-01, 1.31517889e-01,\n", " 4.99537484e-03]]])" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vs" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# perform matrix multiplication for each of the 5 sets of matrices\n", "ws = matrix_multiply(us, vs) " ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5, 2, 4)" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ws.shape" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.78714627, 1.00694579, 0.73049393, 0.27117477],\n", " [ 0.65304469, 0.52990956, 0.36895086, 0.07632137]],\n", "\n", " [[ 0.4954479 , 0.49838267, 0.2700697 , 0.40324843],\n", " [ 0.81186204, 0.62685067, 0.56221777, 0.47536541]],\n", "\n", " [[ 0.75571755, 0.85173269, 0.75777686, 0.50778237],\n", " [ 0.96376431, 0.88895942, 0.73355161, 0.37892998]],\n", "\n", " [[ 0.71717088, 0.75442382, 0.95959983, 0.73756047],\n", " [ 0.81239634, 0.90221944, 0.96886187, 0.92880331]],\n", "\n", " [[ 0.34280688, 0.87012156, 0.82445404, 0.83289018],\n", " [ 0.31506361, 0.89102688, 0.5618071 , 0.47072019]]])" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ws" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Saving and loading NDArrays\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Saving to and loading from text files" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x1 = np.arange(1,10).reshape(3,3)\n", "x1" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.savetxt('../data/x1.txt', x1)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00\r\n", "4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00\r\n", "7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00\r\n" ] } ], "source": [ "!cat ../data/x1.txt" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3.],\n", " [ 4., 5., 6.],\n", " [ 7., 8., 9.]])" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x2 = np.loadtxt('../data/x1.txt')\n", "x2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Saving to and loading from binary files (much faster and also preserves dtype)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.save('../data/x1.npy', x1)" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "�NUMPY\u0001\u0000F\u0000{'descr': '