{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solving Linear Equations" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import scipy.linalg as la" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Equations\n", "\n", "Consider a set of $m$ linear equations in $n$ unknowns:\n", "\n", "\\begin{align*}\n", "a_{11} x_1 + &a_{12} x_2& +& ... + &a_{1n} x_n &=& b_1\\\\\n", "\\vdots && &&\\vdots &= &\\vdots\\\\\n", "a_{m1} x_1 + &a_{m2} x_2& +& ... + &a_{mn} x_n &=&b_m \n", "\\end{align*}\n", "\n", "We can let\n", "\n", "\\begin{align*}\n", " A=\\left[\\begin{matrix}a_{11}&\\cdots&a_{1n}\\\\\n", " \\vdots & &\\vdots\\\\\n", " a_{m1}&\\cdots&a_{mn}\\end{matrix}\\right]\n", "\\end{align*}\n", "\n", "\\begin{align*}\n", "x = \\left[\\begin{matrix}x_1\\\\\n", " \\vdots\\\\\n", " x_n\\end{matrix}\\right] & \\;\\;\\;\\;\\textrm{ and } &\n", "b = \\left[\\begin{matrix}b_1\\\\\n", " \\vdots\\\\\n", " b_m\\end{matrix}\\right]\n", "\\end{align*}\n", "\n", "and re-write the system\n", " \n", "$$ Ax = b$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear independence and existence of solutions\n", "\n", "* If $A$ is an $m\\times n$ matrix and $m>n$, if all $m$ rows are linearly independent, then the system is *overdetermined* and *inconsistent*. The system cannot be solved exactly. This is the usual case in data analysis, and why least squares is so important. For example, we may be finding the parameters of a linear model, where there are $m$ data points and $n$ parameters.\n", " \n", "* If $A$ is an $m\\times n$ matrix and $m n$, $m = n$ and $m < n$ and provide some intuition as to the existence of a no, unique or infinite solutions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LU Decomposition\n", "\n", "LU stands for 'Lower Upper', and so an LU decomposition of a matrix $A$ is a decomposition so that \n", "$$A= LU$$\n", "where $L$ is lower triangular and $U$ is upper triangular.\n", "\n", "Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix $A$ (as opposed to the augmented matrix). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gaussian elimination is all fine when we are solving a system one time, for one outcome $b$. Many applications involve solutions to multiple problems, where the left-hand-side of our matrix equation does not change, but there are many outcome vectors $b$. In this case, it is more efficient to *decompose* $A$.\n", "\n", "First, we start just as in ge, but we 'keep track' of the various multiples required to eliminate entries. For example, consider the matrix\n", "\n", "$$A = \\left(\\begin{matrix} 1 & 3 & 4 \\\\\n", " 2 & 1 & 3\\\\\n", " 4 & 7 & 2\n", " \\end{matrix}\\right)$$\n", "\n", "We need to multiply row $1$ by $2$ and subtract from row $2$ to eliminate the first entry in row $2$, and then multiply row $1$ by $4$ and subtract from row $3$. Instead of entering zeroes into the first entries of rows $2$ and $3$, we record the multiples required for their elimination, as so:\n", "\n", "$$\\left(\\begin{matrix} 1 & 3 & 4 \\\\\n", " (2)& -5 & -5\\\\\n", " (4)&-5 &-14\n", " \\end{matrix}\\right)$$\n", " \n", "\n", "And then we eliminate the second entry in the third row:\n", "\n", "\n", "$$\\left(\\begin{matrix} 1 & 3 & 4 \\\\\n", " (2)& -5 & -5\\\\\n", " (4)& (1)&-9\n", " \\end{matrix}\\right)$$\n", " \n", "And now we have the decomposition:\n", "$$L= \\left(\\begin{matrix} 1 & 0 & 0 \\\\\n", " 2& 1 & 0\\\\\n", " 4& 1 &1\n", " \\end{matrix}\\right) \\,\n", " U = \\left(\\begin{matrix} 1 & 3 & 4 \\\\\n", " 0& -5 & -5\\\\\n", " 0&0&-9\n", " \\end{matrix}\\right)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Elementary matrices\n", "\n", "Why does the algorithm for LU work? Gaussian elimination consists of 3 elementary operations\n", "\n", "- Op1: swapping two rows\n", "- Op2: replace a row by the sum of that row and a multiple of another\n", "- Op3: multiplying a row by a non-zero scalar\n", "\n", "These can be recast as matrix operations - in particular, pre-multiplication with corresponding elementary matrices.\n", "\n", "- Op1: swapping two rows uses a permutation matrix\n", "\n", "$$\n", "\\begin{bmatrix}\n", "0 & 1 \\\\\n", "1 & 0 \n", "\\end{bmatrix}\\begin{bmatrix}\n", "3 & 4 \\\\\n", "1 & 2 \n", "\\end{bmatrix} = \\begin{bmatrix}\n", "1 & 2 \\\\\n", "3 & 4 \n", "\\end{bmatrix}\n", "$$\n", "\n", "- Op2: replace a row by the sum of that row and a multiple of another uses an lower triangular matrix\n", "\n", "$$\n", "\\begin{bmatrix}\n", "1 & 0 \\\\\n", "-3 & 1 \n", "\\end{bmatrix}\\begin{bmatrix}\n", "1 & 2 \\\\\n", "3 & 4 \n", "\\end{bmatrix} = \\begin{bmatrix}\n", "1 & 2 \\\\\n", "0 & -2 \n", "\\end{bmatrix}\n", "$$\n", "\n", "Note: The inverse operation just substitutes the negative of the multiple, and is also lower triangular.\n", "\n", "- Op3: multiplying a row by a non-zero scalar uses an lower triangular matrix\n", "\n", "$$\n", "\\begin{bmatrix}\n", "1 & 0 \\\\\n", "0 & -0.5 \n", "\\end{bmatrix}\\begin{bmatrix}\n", "1 & 2 \\\\\n", "0 & -2 \n", "\\end{bmatrix} = \\begin{bmatrix}\n", "1 & 2 \\\\\n", "0 & 1 \n", "\\end{bmatrix}\n", "$$\n", "\n", "Note: The inverse operation just substitutes the inverse of the scalar, and is also lower triangular.\n", "\n", "Multiplying an upper triangular matrix by another lower triangular matrix gives an lower triangular matrix. Hence if we put the permutations aside (i.e. keep a separate permutation matrix), Gaussian elimination can be expressed as a product of lower triangular matrices, which is just another lower triangular matrix $L$, with the original matrix $A$ to give an upper triangular matrix $U$. The lower triangular matrix $L$ is then the product of the inverse operations." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A = np.array([\n", " [1,3,4],\n", " [2,1,3],\n", " [4,7,2]\n", "])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 3, 4],\n", " [2, 1, 3],\n", " [4, 7, 2]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Construct U" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "b1 = np.array([\n", " [1, 0, 0],\n", " [-2, 1, 0],\n", " [0, 0, 1]\n", "])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "b2 = np.array([\n", " [1, 0, 0],\n", " [0, 1, 0],\n", " [-4, 0, 1]\n", "])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "b3 = np.array([\n", " [1, 0, 0],\n", " [0, 1, 0],\n", " [0, -1, 1]\n", "])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 3, 4],\n", " [ 0, -5, -5],\n", " [ 4, 7, 2]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b1 @ A" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 3, 4],\n", " [ 0, -5, -5],\n", " [ 0, -5, -14]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b2 @ b1 @ A" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 3, 4],\n", " [ 0, -5, -5],\n", " [ 0, 0, -9]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b3 @ b2 @ b1 @ A" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 3, 4],\n", " [ 0, -5, -5],\n", " [ 0, 0, -9]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U = b3 @ b2 @ b1 @ A\n", "U" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Construct L" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ib1 = np.array([\n", " [1, 0, 0],\n", " [2, 1, 0],\n", " [0, 0, 1]\n", "])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ib2 = np.array([\n", " [1, 0, 0],\n", " [0, 1, 0],\n", " [4, 0, 1]\n", "])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ib3 = np.array([\n", " [1, 0, 0],\n", " [0, 1, 0],\n", " [0, 1, 1]\n", "])" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [2, 1, 0],\n", " [4, 1, 1]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L = ib1 @ ib2 @ ib3\n", "L" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A is factorized into LU" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 3, 4],\n", " [2, 1, 3],\n", " [4, 7, 2]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L @ U" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 3, 4],\n", " [2, 1, 3],\n", " [4, 7, 2]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use the LU decomposition to solve for *any* $b$ without having to perform Gaussian elimination again. \n", "\n", "- First solve $Ly = b$\n", "- Then solve $Ux = y$\n", "\n", "Since $L$ and $U$ are triangular, they can be cheaply solved by substitution." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1],\n", " [2],\n", " [3]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1.],\n", " [ 0.],\n", " [-1.]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = la.solve_triangular(L, b, lower=True)\n", "y" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.88888889],\n", " [-0.11111111],\n", " [ 0.11111111]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = la.solve_triangular(U, y)\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LDU Decomposition\n", "\n", "Note that $L$ is a unit triangular matrix while $U$ is not. It is sometimes instructive to factorize $A = LDU$ where both $L$ and $U$ are unit triangular, and $D$ is a diagonal matrix." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 3, 4],\n", " [ 0, -5, -5],\n", " [ 0, 0, -9]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 0, 0],\n", " [ 0, -5, 0],\n", " [ 0, 0, -9]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D = np.diag(np.diag(U))\n", "D" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 3., 4.],\n", " [-0., 1., 1.],\n", " [-0., -0., 1.]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U1 = U/np.diag(U)[:, None]\n", "U1" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 3., 4.],\n", " [ 0., -5., -5.],\n", " [ 0., 0., -9.]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D @ U1" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(A, L @ D @ U1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LU Decomposition in practice" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "P, L, U = la.lu(A)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 0. , 0. ],\n", " [ 0.5 , 1. , 0. ],\n", " [ 0.25, -0.5 , 1. ]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4. , 7. , 2. ],\n", " [ 0. , -2.5, 2. ],\n", " [ 0. , 0. , 4.5]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "P is a permutation matrix." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 1.],\n", " [0., 1., 0.],\n", " [1., 0., 0.]])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, we can store both L and U in a single matrix LU." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "LU, P = la.lu_factor(A)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4. , 7. , 2. ],\n", " [ 0.5 , -2.5 , 2. ],\n", " [ 0.25, -0.5 , 4.5 ]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LU" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.88888889],\n", " [-0.11111111],\n", " [ 0.11111111]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "la.lu_solve((LU, P), b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cholesky Decomposition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that a square matrix $A$ is positive definite if\n", "\n", "$$u^TA u > 0$$\n", "\n", "for any non-zero n-dimensional vector $u$,\n", "\n", "and a symmetric, positive-definite matrix $A$ is a positive-definite matrix such that\n", "\n", "$$A = A^T$$\n", "\n", "For a positive definite square matrix, all the pivots (diagonal elements of $U$) are positive. If the matrix is also symmetric, then we must have an $LDU$ decomposition of the form $LDL^T$, where all the diagonal elements of $D$ are positive. Given this, $D^{1/2}$ is well-defined, and we have\n", "\n", "$$\n", "A = LDL^T = LD^{1/2}D^{1/2}L^T = LD^{1/2}(LD^{1/2})^T = CC^{T}\n", "$$\n", "\n", "where $C$ is lower-triangular with positive diagonal elements and $C^T$ is its transpose. This decomposition is known as the Cholesky decomposition, and $C$ may be interpreted as the 'square root' of the matrix $A$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $A$ be an $n\\times n$ matrix. We find the matrix $L$ using the following iterative procedure:\n", "\n", "\n", "$$A = \\left(\\begin{matrix}a_{11}&A_{12}\\\\A_{12}&A_{22}\\end{matrix}\\right) =\n", "\\left(\\begin{matrix}\\ell_{11}&0\\\\\n", "L_{12}&L_{22}\\end{matrix}\\right)\n", "\\left(\\begin{matrix}\\ell_{11}&L_{12}\\\\0&L_{22}\\end{matrix}\\right)\n", "$$\n", "\n", "1.) Let $\\ell_{11} = \\sqrt{a_{11}}$\n", "\n", "2.) $L_{12} = \\frac{1}{\\ell_{11}}A_{12}$\n", "\n", "3.) Solve $A_{22} - L_{12}L_{12}^T = L_{22}L_{22}^T$ for $L_{22}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$A = \\left(\\begin{matrix}1&3&5\\\\3&13&23\\\\5&23&42\\end{matrix}\\right)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\\ell_{11} = \\sqrt{a_{11}} = 1$$\n", "\n", "$$L_{12} = \\frac{1}{\\ell_{11}} A_{12} = A_{12}$$\n", "\n", "$\\begin{eqnarray*}\n", "A_{22} - L_{12}L_{12}^T &=& \\left(\\begin{matrix}13&23\\\\23&42\\end{matrix}\\right) - \\left(\\begin{matrix}9&15\\\\15&25\\end{matrix}\\right)\\\\\n", "&=& \\left(\\begin{matrix}4&8\\\\8&17\\end{matrix}\\right)\n", "\\end{eqnarray*}$\n", "\n", "This is also symmetric and positive definite, and can be solved by another iteration\n", "\n", "$\\begin{eqnarray*}\n", "&=& \\left(\\begin{matrix}2&0\\\\4&\\ell_{33}\\end{matrix}\\right) \\left(\\begin{matrix}2&4\\\\0&\\ell_{33}\\end{matrix}\\right)\\\\\n", "&=& \\left(\\begin{matrix}4&8\\\\8&16+\\ell_{33}^2\\end{matrix}\\right)\n", "\\end{eqnarray*}$\n", "\n", "And so we conclude that $\\ell_{33}=1$.\n", "\n", "\n", "This yields the decomposition:\n", "\n", "\n", "$$\\left(\\begin{matrix}1&3&5\\\\3&13&23\\\\5&23&42\\end{matrix}\\right) = \n", "\\left(\\begin{matrix}1&0&0\\\\3&2&0\\\\5&4&1\\end{matrix}\\right)\\left(\\begin{matrix}1&3&5\\\\0&2&4\\\\0&0&1\\end{matrix}\\right)$$\n", "\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A = np.array([\n", " [1,3,5],\n", " [3,13,23],\n", " [5,23,42]\n", "])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 3., 5.],\n", " [0., 2., 4.],\n", " [0., 0., 1.]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C = la.cholesky(A)\n", "C" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1.75],\n", " [-0.25],\n", " [ 0. ]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C1 = la.cho_factor(A)\n", "la.cho_solve(C1, b)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }