Solving Linear Equations

[1]:
import numpy as np
import scipy.linalg as la

Linear Equations

Consider a set of \(m\) linear equations in \(n\) unknowns:

\begin{align*} a_{11} x_1 + &a_{12} x_2& +& ... + &a_{1n} x_n &=& b_1\\ \vdots && &&\vdots &= &\vdots\\ a_{m1} x_1 + &a_{m2} x_2& +& ... + &a_{mn} x_n &=&b_m \end{align*}

We can let

\begin{align*} A=\left[\begin{matrix}a_{11}&\cdots&a_{1n}\\ \vdots & &\vdots\\ a_{m1}&\cdots&a_{mn}\end{matrix}\right] \end{align*} \begin{align*} x = \left[\begin{matrix}x_1\\ \vdots\\ x_n\end{matrix}\right] & \;\;\;\;\textrm{ and } & b = \left[\begin{matrix}b_1\\ \vdots\\ b_m\end{matrix}\right] \end{align*}

and re-write the system

\[Ax = b\]

Linear independence and existence of solutions

  • If \(A\) is an \(m\times n\) matrix and \(m>n\), if all \(m\) rows are linearly independent, then the system is overdetermined and inconsistent. The system cannot be solved exactly. This is the usual case in data analysis, and why least squares is so important. For example, we may be finding the parameters of a linear model, where there are \(m\) data points and \(n\) parameters.

  • If \(A\) is an \(m\times n\) matrix and \(m<n\), if all \(m\) rows are linearly independent, then the system is underdetermined and there are infinite solutions.

  • If \(A\) is an \(m\times n\) matrix and some of its rows are linearly dependent, then the system is reducible. We can get rid of some equations. In other words, there are equations in the system that do not give us any new information.

  • If \(A\) is a square matrix and its rows are linearly independent, the system has a unique solution. (\(A\) is invertible.)

For a linear system, we can only get a unique solution, no solution, or infinite solutions.

Using solve to find unique solutions

\begin{align} x + 2y &= 3 \\ 3x + 4y &= 17 \end{align}
[2]:
A = np.array([[1,2],[3,4]])
A
[2]:
array([[1, 2],
       [3, 4]])
[3]:
b = np.array([3,17]).reshape((-1,1))
b
[3]:
array([[ 3],
       [17]])
[4]:
x = la.solve(A, b)
x
[4]:
array([[11.],
       [-4.]])
[5]:
np.allclose(A @ x, b)
[5]:
True

Under the hood of solve

The solve function uses the dgesv fortran function in lapack (Linear Algebra Package) to do the actual work. In turn, lapack calls even lower level routines in blas (Basic Linear Algebra Subprograms). In particular, solve uses the LU matrix decomposition that we will show in detail shortly.

There is rarely any reason to use blas or lapack functions directly becuase the linalg package provides more convenient functions that also perfrom error checking, but you can use Python to experiment with lapack or blas before using them in a language like C or Fortran.

[6]:
import scipy.linalg.lapack as lapack
[7]:
lu, piv, x, info = lapack.dgesv(A, b)
x
[7]:
array([[11.],
       [-4.]])

Gaussian elimination

Let’s review how Gaussian elimination (ge) works. We will deal with a \(3\times 3\) system of equations for conciseness, but everything here generalizes to the \(n\times n\) case. Consider the following equation:

\[\begin{split}\left(\begin{matrix}a_{11}&a_{12} & a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{matrix}\right)\left(\begin{matrix}x_1\\x_2\\x_3\end{matrix}\right) = \left(\begin{matrix}b_1\\b_2\\b_3\end{matrix}\right)\end{split}\]

For simplicity, let us assume that the leftmost matrix \(A\) is non-singular. To solve the system using ge, we start with the ‘augmented matrix’:

\[\begin{split}\left(\begin{array}{ccc|c}a_{11}&a_{12} & a_{13}& b_1 \\a_{21}&a_{22}&a_{23}&b_2\\a_{31}&a_{32}&a_{33}&b_3\end{array}\right)\end{split}\]

We begin at the first entry, \(a_{11}\). If \(a_{11} \neq 0\), then we divide the first row by \(a_{11}\) and then subtract the appropriate multiple of the first row from each of the other rows, zeroing out the first entry of all rows. (If \(a_{11}\) is zero, we need to permute rows. We will not go into detail of that here.) The result is as follows:

\[\begin{split}\left(\begin{array}{ccc|c} 1 & \frac{a_{12}}{a_{11}} & \frac{a_{13}}{a_{11}} & \frac{b_1}{a_{11}} \\ 0 & a_{22} - a_{21}\frac{a_{12}}{a_{11}} & a_{23} - a_{21}\frac{a_{13}}{a_{11}} & b_2 - a_{21}\frac{b_1}{a_{11}}\\ 0&a_{32}-a_{31}\frac{a_{12}}{a_{11}} & a_{33} - a_{31}\frac{a_{13}}{a_{11}} &b_3- a_{31}\frac{b_1}{a_{11}}\end{array}\right)\end{split}\]

We repeat the procedure for the second row, first dividing by the leading entry, then subtracting the appropriate multiple of the resulting row from each of the third and first rows, so that the second entry in row 1 and in row 3 are zero. Gaussian elimination stops at row echelon form (upper triangular, with ones on the diagonal), and then uses back substitution to obtain the final answer.

Note that in some cases, it is necessary to permute rows to obtain row echelon form (when the pivot would otherwise be zero). This is called partial pivoting.

Example

We perform Gaussian elimination on the the following augmented matrix

\[\begin{split}\left( \begin{array}{ccc|c} 1 & 3 & 4 & 1\\ 2 & 1 & 3 & 2 \\ 4 & 7 & 2 & 3 \end{array} \right)\end{split}\]

We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\).

\[\begin{split}\left( \begin{array}{ccc|c} 1 & 3 & 4 & 1 \\ 0 & -5 & -5 & 0\\ 0 & -5 &-14 & -1 \end{array} \right)\end{split}\]

And then we eliminate the second entry in the third row:

\[\begin{split}\left( \begin{matrix} 1 & 3 & 4 & 1\\ 0 & -5 & -5 & 0 \\ 0 & 0 & -9 & -1 \end{matrix} \right)\end{split}\]

We can now do back-substitution to solve

\[\begin{split}9x_3 = 1 \\ x_2 = -x_3 = 0 \\ x_1 = -3x_2 -4x_3 + 1\end{split}\]

to get

\[\begin{split}x_1 = 8/9 \\ x_2 = -1/9 \\ x_3 = 1/9\end{split}\]

Check

[8]:
A = np.array([
    [1,3,4],
    [2,1,3],
    [4,7,2]
])
b = np.array([1,2,3]).reshape((-1,1))

la.solve(A, b)
[8]:
array([[ 0.88888889],
       [-0.11111111],
       [ 0.11111111]])

Gauss-Jordan elimination

With Gauss-Jordan elimination, we make all the pivots equal to 1, and set all entries above and below a pivot 0.

\[\begin{split}\left( \begin{matrix} 1 & 0 & 0 & 8/9 \\ 0 & 1 & 0 & -1/9 \\ 0 & 0 & 1 & 1/9 \end{matrix} \right)\end{split}\]

Then we can just read off the values of \(x_1\), \(x_2\) and \(x_3\) directly. This is known as the reduced row echelon form.

Gaussian-Jordan elimination and the number of solutions

Consider 3 matrices \(m > n\), \(m = n\) and \(m < n\) and provide some intuition as to the existence of a no, unique or infinite solutions.

LU Decomposition

LU stands for ‘Lower Upper’, and so an LU decomposition of a matrix \(A\) is a decomposition so that

\[ \begin{align}\begin{aligned}A= LU\\where :math:`L` is lower triangular and :math:`U` is upper triangular.\end{aligned}\end{align} \]

Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix \(A\) (as opposed to the augmented matrix).

Gaussian elimination is all fine when we are solving a system one time, for one outcome \(b\). Many applications involve solutions to multiple problems, where the left-hand-side of our matrix equation does not change, but there are many outcome vectors \(b\). In this case, it is more efficient to decompose \(A\).

First, we start just as in ge, but we ‘keep track’ of the various multiples required to eliminate entries. For example, consider the matrix

\[\begin{split}A = \left(\begin{matrix} 1 & 3 & 4 \\ 2 & 1 & 3\\ 4 & 7 & 2 \end{matrix}\right)\end{split}\]

We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\). Instead of entering zeroes into the first entries of rows \(2\) and \(3\), we record the multiples required for their elimination, as so:

\[\begin{split}\left(\begin{matrix} 1 & 3 & 4 \\ (2)& -5 & -5\\ (4)&-5 &-14 \end{matrix}\right)\end{split}\]

And then we eliminate the second entry in the third row:

\[\begin{split}\left(\begin{matrix} 1 & 3 & 4 \\ (2)& -5 & -5\\ (4)& (1)&-9 \end{matrix}\right)\end{split}\]

And now we have the decomposition:

\[\begin{split}L= \left(\begin{matrix} 1 & 0 & 0 \\ 2& 1 & 0\\ 4& 1 &1 \end{matrix}\right) \, U = \left(\begin{matrix} 1 & 3 & 4 \\ 0& -5 & -5\\ 0&0&-9 \end{matrix}\right)\end{split}\]

Elementary matrices

Why does the algorithm for LU work? Gaussian elimination consists of 3 elementary operations

  • Op1: swapping two rows

  • Op2: replace a row by the sum of that row and a multiple of another

  • Op3: multiplying a row by a non-zero scalar

These can be recast as matrix operations - in particular, pre-multiplication with corresponding elementary matrices.

  • Op1: swapping two rows uses a permutation matrix

\[\begin{split}\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\begin{bmatrix} 3 & 4 \\ 1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\end{split}\]
  • Op2: replace a row by the sum of that row and a multiple of another uses an lower triangular matrix

\[\begin{split}\begin{bmatrix} 1 & 0 \\ -3 & 1 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 0 & -2 \end{bmatrix}\end{split}\]

Note: The inverse operation just substitutes the negative of the multiple, and is also lower triangular.

  • Op3: multiplying a row by a non-zero scalar uses an lower triangular matrix

\[\begin{split}\begin{bmatrix} 1 & 0 \\ 0 & -0.5 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 0 & -2 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}\end{split}\]

Note: The inverse operation just substitutes the inverse of the scalar, and is also lower triangular.

Multiplying an upper triangular matrix by another lower triangular matrix gives an lower triangular matrix. Hence if we put the permutations aside (i.e. keep a separate permutation matrix), Gaussian elimination can be expressed as a product of lower triangular matrices, which is just another lower triangular matrix \(L\), with the original matrix \(A\) to give an upper triangular matrix \(U\). The lower triangular matrix \(L\) is then the product of the inverse operations.

[9]:
A = np.array([
    [1,3,4],
    [2,1,3],
    [4,7,2]
])
[10]:
A
[10]:
array([[1, 3, 4],
       [2, 1, 3],
       [4, 7, 2]])

Construct U

[11]:
b1 = np.array([
    [1, 0, 0],
    [-2, 1, 0],
    [0, 0, 1]
])
[12]:
b2 = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [-4, 0, 1]
])
[13]:
b3 = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, -1, 1]
])
[14]:
b1 @ A
[14]:
array([[ 1,  3,  4],
       [ 0, -5, -5],
       [ 4,  7,  2]])
[15]:
b2 @ b1 @ A
[15]:
array([[  1,   3,   4],
       [  0,  -5,  -5],
       [  0,  -5, -14]])
[16]:
b3 @ b2 @ b1 @ A
[16]:
array([[ 1,  3,  4],
       [ 0, -5, -5],
       [ 0,  0, -9]])
[17]:
U = b3 @ b2 @ b1 @ A
U
[17]:
array([[ 1,  3,  4],
       [ 0, -5, -5],
       [ 0,  0, -9]])

Construct L

[18]:
ib1 = np.array([
    [1, 0, 0],
    [2, 1, 0],
    [0, 0, 1]
])
[19]:
ib2 = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [4, 0, 1]
])
[20]:
ib3 = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, 1, 1]
])
[21]:
L = ib1 @ ib2 @ ib3
L
[21]:
array([[1, 0, 0],
       [2, 1, 0],
       [4, 1, 1]])

A is factorized into LU

[22]:
L @ U
[22]:
array([[1, 3, 4],
       [2, 1, 3],
       [4, 7, 2]])
[23]:
A
[23]:
array([[1, 3, 4],
       [2, 1, 3],
       [4, 7, 2]])

We can now use the LU decomposition to solve for any \(b\) without having to perform Gaussian elimination again.

  • First solve \(Ly = b\)

  • Then solve \(Ux = y\)

Since \(L\) and \(U\) are triangular, they can be cheaply solved by substitution.

[24]:
b
[24]:
array([[1],
       [2],
       [3]])
[25]:
y = la.solve_triangular(L, b, lower=True)
y
[25]:
array([[ 1.],
       [ 0.],
       [-1.]])
[26]:
x = la.solve_triangular(U, y)
x
[26]:
array([[ 0.88888889],
       [-0.11111111],
       [ 0.11111111]])

LDU Decomposition

Note that \(L\) is a unit triangular matrix while \(U\) is not. It is sometimes instructive to factorize \(A = LDU\) where both \(L\) and \(U\) are unit triangular, and \(D\) is a diagonal matrix.

[27]:
U
[27]:
array([[ 1,  3,  4],
       [ 0, -5, -5],
       [ 0,  0, -9]])
[29]:
D = np.diag(np.diag(U))
D
[29]:
array([[ 1,  0,  0],
       [ 0, -5,  0],
       [ 0,  0, -9]])

The next step is just element-wise division.

[33]:
U1 = U/np.diag(U)[:, None]
U1
[33]:
array([[ 1.,  3.,  4.],
       [-0.,  1.,  1.],
       [-0., -0.,  1.]])
[34]:
D @ U1
[34]:
array([[ 1.,  3.,  4.],
       [ 0., -5., -5.],
       [ 0.,  0., -9.]])
[36]:
np.allclose(A, L @ D @ U1)
[36]:
True

LU Decomposition in practice

[27]:
P, L, U = la.lu(A)
[28]:
L
[28]:
array([[ 1.  ,  0.  ,  0.  ],
       [ 0.5 ,  1.  ,  0.  ],
       [ 0.25, -0.5 ,  1.  ]])
[29]:
U
[29]:
array([[ 4. ,  7. ,  2. ],
       [ 0. , -2.5,  2. ],
       [ 0. ,  0. ,  4.5]])

P is a permutation matrix.

[30]:
P
[30]:
array([[0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.]])

In practice, we can store both L and U in a single matrix LU.

[31]:
LU, P = la.lu_factor(A)
[32]:
LU
[32]:
array([[ 4.  ,  7.  ,  2.  ],
       [ 0.5 , -2.5 ,  2.  ],
       [ 0.25, -0.5 ,  4.5 ]])
[33]:
la.lu_solve((LU, P), b)
[33]:
array([[ 0.88888889],
       [-0.11111111],
       [ 0.11111111]])

Cholesky Decomposition

Recall that a square matrix \(A\) is positive definite if

\[u^TA u > 0\]

for any non-zero n-dimensional vector \(u\),

and a symmetric, positive-definite matrix \(A\) is a positive-definite matrix such that

\[A = A^T\]

For a positive definite square matrix, all the pivots (diagonal elements of \(U\)) are positive. If the matrix is also symmetric, then we must have an \(LDU\) decomposition of the form \(LDL^T\), where all the diagonal elements of \(D\) are positive. Given this, \(D^{1/2}\) is well-defined, and we have

\[A = LDL^T = LD^{1/2}D^{1/2}L^T = LD^{1/2}(LD^{1/2})^T = CC^{T}\]

where \(C\) is lower-triangular with positive diagonal elements and \(C^T\) is its transpose. This decomposition is known as the Cholesky decomposition, and \(C\) may be interpreted as the ‘square root’ of the matrix \(A\).

Algorithm

Let \(A\) be an \(n\times n\) matrix. We find the matrix \(L\) using the following iterative procedure:

\[\begin{split}A = \left(\begin{matrix}a_{11}&A_{12}\\A_{12}&A_{22}\end{matrix}\right) = \left(\begin{matrix}\ell_{11}&0\\ L_{12}&L_{22}\end{matrix}\right) \left(\begin{matrix}\ell_{11}&L_{12}\\0&L_{22}\end{matrix}\right)\end{split}\]

1.) Let \(\ell_{11} = \sqrt{a_{11}}\)

2.) \(L_{12} = \frac{1}{\ell_{11}}A_{12}\)

3.) Solve \(A_{22} - L_{12}L_{12}^T = L_{22}L_{22}^T\) for \(L_{22}\)

Example

\[\begin{split}A = \left(\begin{matrix}1&3&5\\3&13&23\\5&23&42\end{matrix}\right)\end{split}\]
\[\ell_{11} = \sqrt{a_{11}} = 1\]
\[L_{12} = \frac{1}{\ell_{11}} A_{12} = A_{12}\]

\(\begin{eqnarray*} A_{22} - L_{12}L_{12}^T &=& \left(\begin{matrix}13&23\\23&42\end{matrix}\right) - \left(\begin{matrix}9&15\\15&25\end{matrix}\right)\\ &=& \left(\begin{matrix}4&8\\8&17\end{matrix}\right) \end{eqnarray*}\)

This is also symmetric and positive definite, and can be solved by another iteration

\(\begin{eqnarray*} &=& \left(\begin{matrix}2&0\\4&\ell_{33}\end{matrix}\right) \left(\begin{matrix}2&4\\0&\ell_{33}\end{matrix}\right)\\ &=& \left(\begin{matrix}4&8\\8&16+\ell_{33}^2\end{matrix}\right) \end{eqnarray*}\)

And so we conclude that \(\ell_{33}=1\).

This yields the decomposition:

\[\begin{split}\left(\begin{matrix}1&3&5\\3&13&23\\5&23&42\end{matrix}\right) = \left(\begin{matrix}1&0&0\\3&2&0\\5&4&1\end{matrix}\right)\left(\begin{matrix}1&3&5\\0&2&4\\0&0&1\end{matrix}\right)\end{split}\]
[34]:
A = np.array([
    [1,3,5],
    [3,13,23],
    [5,23,42]
])
[35]:
C = la.cholesky(A)
C
[35]:
array([[1., 3., 5.],
       [0., 2., 4.],
       [0., 0., 1.]])
[36]:
C1 = la.cho_factor(A)
la.cho_solve(C1, b)
[36]:
array([[ 1.75],
       [-0.25],
       [ 0.  ]])