{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import scipy.linalg as la\n", "\n", "sns.set_context('notebook', font_scale=1.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1**. Exact geometric solutions with $n = m$\n", "\n", "- Find the equation of the line that passes through the points (2,1) and (3,7)\n", "- Find the equation of the circle that passes through the points (1,7), (6,2) and (4,6)\n", "\n", "Hint: The equation of a circle can be written as\n", "\n", "$$\n", "(x - a)^2 + (y - b)^2 = r^2\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the equation of the line that passes through the points (2,1) and (3,7)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the equation of the circle that passes through the points (1,7), (6,2) and (4,6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2**. Interpolating polynomials and choice of basis\n", "\n", "We have \n", "\n", "| $x_i$ | $y_i$ |\n", "| ----- | -------- |\n", "| 0 | -5 |\n", "| 1 | -3 |\n", "| -1 | -15 |\n", "| 2 | 39 |\n", "| -2 | -9 |\n", "\n", "Find interpolating polynomials using\n", "\n", "- Monomial basis $f_i(x_i) = x^i$ - code this using only simple linear algebra operations (including solve)\n", "- Lagrange basis\n", "$$\n", "l_j(x_j) = \\prod_{0 \\le m \\le k, m \\ne j} \\frac{x - x_m}{x_j - x_m}\n", "$$\n", "\n", "The Lagrange interpolation uses the values of $y$ as the coefficient for the basis polynomials. Do this manually and then using the scipy.interpolate package" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([0,1,-1,2,-2])\n", "y = np.array([-5, -3, -15, 39, -9])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Monomial basis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lagrange basis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using library functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.interpolate import lagrange" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3**. Markov chains\n", "\n", "$$\n", "P = \\pmatrix{\n", " p_{11} & p_{12} & p_{13} \\\\\n", " p_{21} & p_{22} & p_{33} \\\\\n", " p_{31} & p_{32} & p_{33} \\\\\n", " }\n", "$$\n", "\n", "By convention, the $rows$ of a Markov transition matrix sum to 1, and $p_{32}$ is the probability that the system will change from state 3 to state 2. Therefore, to see the next state of an initial probability row vector $v_k$, we need to perform left multiplication\n", "\n", "$$\n", "v_{k+1}^T = v_{k}^T P\n", "$$\n", "\n", "If this is confusing, you can work with the matrix $P^T$ and do right-multiplication with column vectors. In this case, $p_{32}$ is the probability that the system will change from state 2 to state 3.\n", "\n", "![img](../data/markov.png)\n", "\n", "Find the stationary vector $\\pi^T = \\pi^T P$ for the transition graph shown \n", "\n", "- by solving a set of linear equations\n", "- by solving an eigenvector problem\n", "- Check that the resulting vector is invariant with respect to the transition matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Brute force check" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**4**. Graphs\n", "\n", "$M$ is the adjacency matrix of a directed graph $G$. Find the vertices that belong to a clique.\n", "\n", "$$\n", "M = \\pmatrix{\n", " 0 & 1 & 0 & 1 & 1 \\\\\n", " 1 & 0 & 0 & 1 & 0 \\\\\n", " 1 & 1 & 0 & 1 & 0 \\\\\n", " 1 & 1 & 0 & 0 & 0 \\\\\n", " 1 & 0 & 0 & 1 & 0\n", " }\n", "$$\n", "\n", "A clique is defined as a subset of a graph where\n", "\n", "1. The subset has at least 3 vertices\n", "2. All pairs of vertices are connected\n", "3. The subset is as large as possible\n", "\n", "Because of the symmetry required in condition 2, we only need to consider the graph $S$ where $s_{ij} = 1$ if vertcies $i$ and $j$ communicate and 0 otherwise. Then the on-zero diagonal entries of $S^3$ is the set states recurrent in 3 steps. That is, there is a bi-directional path ${s_i \\leftrightarrow s_j \\leftrightarrow s_k \\leftrightarrow s_i}$, which means that the vertices $\\{s_i, s_j, s_k\\}$ form a subset of a clique." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M = np.array([\n", " [0,1,0,1,1],\n", " [1,0,0,1,1],\n", " [1,1,0,1,0],\n", " [1,1,0,0,0],\n", " [1,0,0,1,0] \n", "])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**5**. Suppose we wish to solve the problem $t = Mt + b$ - here the notation is from one type of such problems where $t$ is the temperature, $M$ is a matrix for diffusion, and $b$ represent fixed boundary conditions. Suppose we have a 5 by 5 grid system whose boundary temperatures are fixed. Let $M$ is a matrix with $1/4$ for the $\\pm 1$ off-diagonals and 0 elsewhere (i.e. diffusion is approximated by the average temperature of the 4 N, S, E, W neighbors), and $b$ is the vector $(5,2,3,3,0,1,3,0,1)$ - this assumes the temperatures along the bottom = 0, right edge = 1, top = 2 and left edge = 3. Find the equilibrium temperature at each of the 9 interior points\n", "\n", "- by solving a linear equation\n", "- by iteration" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Direct solution - not possible for large matrices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jacobi iteration" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**6**. Iterated affine maps\n", "\n", "Define the following mapping in $\\mathbb{R}^2$\n", "\n", "$$\n", "T_i: \\pmatrix{x \\\\ y} \\to s \\pmatrix{\\cos \\theta & - \\sin \\theta \\\\ \\sin \\theta & \\cos \\theta} \\pmatrix{x \\\\ y} + \\pmatrix{a_i \\\\ b_i}\n", "$$\n", "\n", "Suppose $s = 1/3$, $\\theta = 0$, and $\\pmatrix{a_i \\\\ b_i}$ are \n", "\n", "$$\n", "\\pmatrix{0 \\\\ 0}, \\pmatrix{1/3 \\\\ 0},\n", "\\pmatrix{2/3 \\\\ 0}, \\pmatrix{0 \\\\ 1/3},\n", "\\pmatrix{2/3 \\\\ 1/3}, \\pmatrix{0 \\\\ 2/3}, \n", "\\pmatrix{1/3 \\\\ 2/3}, \\pmatrix{2/3 \\\\ 2/3}\n", "$$\n", "\n", "Generate 1,000 points by first randomly selecting a point in the unit square, then applying at random one of th transformations $T_i$ to the point. Plot the resulting 1,000 points as a scatter plot on in a square frame.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ab =[\n", " [0,0],\n", " [1/3,0],\n", " [2/3,0],\n", " [0,1/3],\n", " [2/3,1/3],\n", " [0,2/3],\n", " [1/3,2/3],\n", " [2/3,2/3]\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**7**. The Fibonacci sequence came about from this toy model of rabbit population dynamics\n", "\n", "- A baby rabbit matures into an adult in 1 time unit\n", "- An adult gives birth to exactly 1 baby in 1 time unit\n", "- Rabbits are immortal\n", "\n", "This gives the well known formula for the number of rabbits over discrete time $F_{k+2} = F_{k} + F_{k+1}$\n", "\n", "- Express this model as a matrix equation, and calculate the long-term growth rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let the population at any time be expreessed as the vector\n", "$$\n", "\\pmatrix{\\text{adult} \\\\ \\text{baby} }\n", "$$\n", "\n", "In the next time step, there will be \n", "\n", "- 1 adult from each adult, and one adult from each baby\n", "- 1 baby from each adult" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**8**. Age-structured population growth\n", "\n", "Suppose that we observe the following Leslie matrix\n", "\n", "$$\n", "L = \\pmatrix{\n", "0 & 3 & 2 & 0.5 \\\\\n", "0.8 & 0 & 0 & 0 \\\\\n", "0 & 0.9 & 0 & 0 \\\\\n", "0 & 0 & 0.7 & 0\n", "}\n", "$$\n", "\n", "![img](../data/leslie.png)\n", "\n", "- Starting with just 1,000 females in age-group 0-15 at time 0 and nobody else, what is the expected population after 90 years?\n", "- Suppose we could alter the fertility in a *single* age group for this population - can we achieve a steady state non-zero population?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "L = np.array([\n", " [0,3,2,0.5],\n", " [0.8,0,0,0],\n", " [0,0.9,0,0],\n", " [0,0,0.7,0]\n", "])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x0 = np.array([1000,0,0,0]).reshape(-1,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the real eigenvalue with real eigenvector is dominant $\\vert L_1 \\vert > \\vert L_k \\vert$.\n", "\n", "A theorem says this will be true if you have any two positive consecutive entries in the first row of $L$.\n", "\n", "The growth rate is determined by the dominant real eigenvalue with real eigenvector - in the long term, whether the population will grow, shrink or reach steady state depends on whether this is greater than, less than or equal to 1 respectively." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**9**.\n", "\n", "You are given the following set of data to fit a quadratic polynomial to\n", "\n", "```python\n", "x = np.arange(10)\n", "y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225,\n", " -2.05877605, -12.40257359, -28.64568712, -46.39822281,\n", " -68.15488905, -97.16032044])\n", "```\n", "\n", "- Find the least squares solution by using the normal equations $A^T A \\hat{x} = A^T y$. (5 points)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225,\n", " -2.05877605, -12.40257359, -28.64568712, -46.39822281,\n", " -68.15488905, -97.16032044])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**10**. \n", "\n", "You are given the following data\n", "\n", "```python\n", "A = np.array([[1, 8, 0, 7],\n", " [0, 2, 9, 4],\n", " [2, 8, 8, 3],\n", " [4, 8, 6, 1],\n", " [2, 1, 9, 6],\n", " [0, 7, 0, 1],\n", " [4, 0, 2, 4],\n", " [1, 4, 9, 5],\n", " [6, 2, 6, 6],\n", " [9, 9, 6, 3]], dtype='float')\n", "\n", "b = np.array([[2],\n", " [5],\n", " [0],\n", " [0],\n", " [6],\n", " [7],\n", " [2],\n", " [6],\n", " [7],\n", " [9]], dtype='float')\n", "```\n", "\n", "- Using SVD directly (not via `lstsq`), find the least squares solution to $Ax = b$ (10 points)\n", "- Use SVD to find the best rank 3 approximation of A (10 points)\n", "- Calculate the approximation error in terms of the Frobenius norm (5 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A = np.array([[1, 8, 0, 7],\n", " [0, 2, 9, 4],\n", " [2, 8, 8, 3],\n", " [4, 8, 6, 1],\n", " [2, 1, 9, 6],\n", " [0, 7, 0, 1],\n", " [4, 0, 2, 4],\n", " [1, 4, 9, 5],\n", " [6, 2, 6, 6],\n", " [9, 9, 6, 3]], dtype='float')\n", "\n", "b = np.array([[2],\n", " [5],\n", " [0],\n", " [0],\n", " [6],\n", " [7],\n", " [2],\n", " [6],\n", " [7],\n", " [9]], dtype='float')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**11**.\n", "\n", "The page rank of a node is given by the equation\n", "\n", "![i1](https://wikimedia.org/api/rest_v1/media/math/render/svg/6bb0f1469218a064274fd4691143e9ce64639dc2)\n", "\n", "and at steady state, we have the page rank vector $R$\n", "\n", "![i3](https://wikimedia.org/api/rest_v1/media/math/render/svg/65d2fed50688deaca4640b117c88a9e7a3c2ef0d)\n", "\n", "where $d$ is the damping factor, $N$ is the number of nodes, $1$ is a vector of ones, and \n", "\n", "![i2.5](https://wikimedia.org/api/rest_v1/media/math/render/svg/3e82b446a376633a386b10668703a4547f167d1c)\n", "\n", "where $L(p_j)$ is the number of outgoing links from node $p_j$.\n", "\n", "Consider the graph\n", "\n", "![i0](../data/pagerank.png)\n", "\n", "If $d = 0.9$ find the page rank of each node\n", "\n", "- By solving a linear system \n", "- By eigendecomposition\n", "\n", "Note: The Markov matrix constructed as instructed does not follow the usual convention. Here the columns of our Markov matrix are probability vectors, and the page rank is considered to be a column vector of the steady state probabilities." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**12**.\n", "\n", "Recall that a covariance matrix is a matrix whose entries are\n", "\n", "![img](https://wikimedia.org/api/rest_v1/media/math/render/svg/4df2969e65403dd04f2c64137d21ff59b5f54190)\n", "\n", "1. Find the sample covariance matrix of the 4 features of the **iris** data set at http://bit.ly/2ow0oJO using basic `numpy` operations on `ndarrasy`. Do **not** use the `np.cov` or equivalent functions in `pandas` (except for checking). Remember to scale by $1/(n-1)$ for the sample covariance. (10 points)\n", "2. Plot the first 2 principal components of the `iris` data by using eigendecoposition, coloring each data point by the species (10 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "url = 'http://bit.ly/2ow0oJO'\n", "iris = pd.read_csv(url)\n", "iris.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }