Solving Linear Equations¶
In [1]:
import numpy as np
import scipy.linalg as la
Linear Equations¶
Consider a set of \(m\) linear equations in \(n\) unknowns:
We can let
and re-write the system
Linear independence and existence of solutions¶
- If \(A\) is an \(m\times n\) matrix and \(m>n\), if all \(m\) rows are linearly independent, then the system is overdetermined and inconsistent. The system cannot be solved exactly. This is the usual case in data analysis, and why least squares is so important. For example, we may be finding the parameters of a linear model, where there are \(m\) data points and \(n\) parameters.
- If \(A\) is an \(m\times n\) matrix and \(m<n\), if all \(m\) rows are linearly independent, then the system is underdetermined and there are infinite solutions.
- If \(A\) is an \(m\times n\) matrix and some of its rows are linearly dependent, then the system is reducible. We can get rid of some equations. In other words, there are equations in the system that do not give us any new information.
- If \(A\) is a square matrix and its rows are linearly independent, the system has a unique solution. (\(A\) is invertible.)
For a linear system, we can only get a unique solution, no solution, or infinite solutions.
Using solve
to find unique solutions¶
In [2]:
A = np.array([[1,2],[3,4]])
A
Out[2]:
array([[1, 2],
[3, 4]])
In [3]:
b = np.array([3,17]).reshape((-1,1))
b
Out[3]:
array([[ 3],
[17]])
In [4]:
x = la.solve(A, b)
x
Out[4]:
array([[11.],
[-4.]])
In [5]:
np.allclose(A @ x, b)
Out[5]:
True
Under the hood of solve
¶
The solve
function uses the dgesv
fortran function in lapack
(Linear Algebra Package) to do the actual work. In turn, lapack
calls even lower level routines in blas
(Basic Linear Algebra
Subprograms). In particular, solve
uses the LU matrix decomposition
that we will show in detail shortly.
There is rarely any reason to use blas
or lapack
functions
directly becuase the linalg
package provides more convenient
functions that also perfrom error checking, but you can use Python to
experiment with lapack
or blas
before using them in a language
like C or Fortran.
In [6]:
import scipy.linalg.lapack as lapack
In [7]:
lu, piv, x, info = lapack.dgesv(A, b)
x
Out[7]:
array([[11.],
[-4.]])
Gaussian elimination¶
Let’s review how Gaussian elimination (ge) works. We will deal with a \(3\times 3\) system of equations for conciseness, but everything here generalizes to the \(n\times n\) case. Consider the following equation:
For simplicity, let us assume that the leftmost matrix \(A\) is non-singular. To solve the system using ge, we start with the ‘augmented matrix’:
We begin at the first entry, \(a_{11}\). If \(a_{11} \neq 0\), then we divide the first row by \(a_{11}\) and then subtract the appropriate multiple of the first row from each of the other rows, zeroing out the first entry of all rows. (If \(a_{11}\) is zero, we need to permute rows. We will not go into detail of that here.) The result is as follows:
We repeat the procedure for the second row, first dividing by the leading entry, then subtracting the appropriate multiple of the resulting row from each of the third and first rows, so that the second entry in row 1 and in row 3 are zero. Gaussian elimination stops at row echelon form (upper triangular, with ones on the diagonal), and then uses back substitution to obtain the final answer.
Note that in some cases, it is necessary to permute rows to obtain row echelon form (when the pivot would otherwise be zero). This is called partial pivoting.
Example¶
We perform Gaussian elimination on the the following augmented matrix
We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\).
And then we eliminate the second entry in the third row:
We can now do back-substitution to solve
to get
Check¶
In [8]:
A = np.array([
[1,3,4],
[2,1,3],
[4,7,2]
])
b = np.array([1,2,3]).reshape((-1,1))
la.solve(A, b)
Out[8]:
array([[ 0.88888889],
[-0.11111111],
[ 0.11111111]])
Gauss-Jordan elimination¶
With Gauss-Jordan elimination, we make all the pivots equal to 1, and set all entries above and below a pivot 0.
Then we can just read off the values of \(x_1\), \(x_2\) and \(x_3\) directly. This is known as the reduced row echelon form.
Gaussian-Jordan elimination and the number of solutions¶
Consider 3 matrices \(m > n\), \(m = n\) and \(m < n\) and provide some intuition as to the existence of a no, unique or infinite solutions.
LU Decomposition¶
LU stands for ‘Lower Upper’, and so an LU decomposition of a matrix \(A\) is a decomposition so that
Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix \(A\) (as opposed to the augmented matrix).
Gaussian elimination is all fine when we are solving a system one time, for one outcome \(b\). Many applications involve solutions to multiple problems, where the left-hand-side of our matrix equation does not change, but there are many outcome vectors \(b\). In this case, it is more efficient to decompose \(A\).
First, we start just as in ge, but we ‘keep track’ of the various multiples required to eliminate entries. For example, consider the matrix
We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\). Instead of entering zeroes into the first entries of rows \(2\) and \(3\), we record the multiples required for their elimination, as so:
And then we eliminate the second entry in the third row:
And now we have the decomposition:
Elementary matrices¶
Why does the algorithm for LU work? Gaussian elimination consists of 3 elementary operations
- Op1: swapping two rows
- Op2: replace a row by the sum of that row and a multiple of another
- Op3: multiplying a row by a non-zero scalar
These can be recast as matrix operations - in particular, pre-multiplication with corresponding elementary matrices.
- Op1: swapping two rows uses a permutation matrix
- Op2: replace a row by the sum of that row and a multiple of another uses an lower triangular matrix
Note: The inverse operation just substitutes the negative of the multiple, and is also lower triangular.
- Op3: multiplying a row by a non-zero scalar uses an lower triangular matrix
Note: The inverse operation just substitutes the inverse of the scalar, and is also lower triangular.
Multiplying an upper triangular matrix by another lower triangular matrix gives an lower triangular matrix. Hence if we put the permutations aside (i.e. keep a separate permutation matrix), Gaussian elimination can be expressed as a product of lower triangular matrices, which is just another lower triangular matrix \(L\), with the original matrix \(A\) to give an upper triangular matrix \(U\). The lower triangular matrix \(L\) is then the product of the inverse operations.
In [9]:
A = np.array([
[1,3,4],
[2,1,3],
[4,7,2]
])
In [10]:
A
Out[10]:
array([[1, 3, 4],
[2, 1, 3],
[4, 7, 2]])
Construct U¶
In [11]:
b1 = np.array([
[1, 0, 0],
[-2, 1, 0],
[0, 0, 1]
])
In [12]:
b2 = np.array([
[1, 0, 0],
[0, 1, 0],
[-4, 0, 1]
])
In [13]:
b3 = np.array([
[1, 0, 0],
[0, 1, 0],
[0, -1, 1]
])
In [14]:
b1 @ A
Out[14]:
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 4, 7, 2]])
In [15]:
b2 @ b1 @ A
Out[15]:
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, -5, -14]])
In [16]:
b3 @ b2 @ b1 @ A
Out[16]:
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, 0, -9]])
In [17]:
U = b3 @ b2 @ b1 @ A
U
Out[17]:
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, 0, -9]])
Construct L¶
In [18]:
ib1 = np.array([
[1, 0, 0],
[2, 1, 0],
[0, 0, 1]
])
In [19]:
ib2 = np.array([
[1, 0, 0],
[0, 1, 0],
[4, 0, 1]
])
In [20]:
ib3 = np.array([
[1, 0, 0],
[0, 1, 0],
[0, 1, 1]
])
In [21]:
L = ib1 @ ib2 @ ib3
L
Out[21]:
array([[1, 0, 0],
[2, 1, 0],
[4, 1, 1]])
A is factorized into LU¶
In [22]:
L @ U
Out[22]:
array([[1, 3, 4],
[2, 1, 3],
[4, 7, 2]])
In [23]:
A
Out[23]:
array([[1, 3, 4],
[2, 1, 3],
[4, 7, 2]])
We can now use the LU decomposition to solve for any \(b\) without having to perform Gaussian elimination again.
- First solve \(Ly = b\)
- Then solve \(Ux = y\)
Since \(L\) and \(U\) are triangular, they can be cheaply solved by substitution.
In [24]:
b
Out[24]:
array([[1],
[2],
[3]])
In [25]:
y = la.solve_triangular(L, b, lower=True)
y
Out[25]:
array([[ 1.],
[ 0.],
[-1.]])
In [26]:
x = la.solve_triangular(U, y)
x
Out[26]:
array([[ 0.88888889],
[-0.11111111],
[ 0.11111111]])
LDU Decomposition¶
Note that \(L\) is a unit triangular matrix while \(U\) is not. It is sometimes instructive to factorize \(A = LDU\) where both \(L\) and \(U\) are unit triangular, and \(D\) is a diagonal matrix.
In [27]:
U
Out[27]:
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, 0, -9]])
In [29]:
D = np.diag(np.diag(U))
D
Out[29]:
array([[ 1, 0, 0],
[ 0, -5, 0],
[ 0, 0, -9]])
In [33]:
U1 = U/np.diag(U)[:, None]
U1
Out[33]:
array([[ 1., 3., 4.],
[-0., 1., 1.],
[-0., -0., 1.]])
In [34]:
D @ U1
Out[34]:
array([[ 1., 3., 4.],
[ 0., -5., -5.],
[ 0., 0., -9.]])
In [36]:
np.allclose(A, L @ D @ U1)
Out[36]:
True
LU Decomposition in practice¶
In [27]:
P, L, U = la.lu(A)
In [28]:
L
Out[28]:
array([[ 1. , 0. , 0. ],
[ 0.5 , 1. , 0. ],
[ 0.25, -0.5 , 1. ]])
In [29]:
U
Out[29]:
array([[ 4. , 7. , 2. ],
[ 0. , -2.5, 2. ],
[ 0. , 0. , 4.5]])
P is a permutation matrix.
In [30]:
P
Out[30]:
array([[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.]])
In practice, we can store both L and U in a single matrix LU.
In [31]:
LU, P = la.lu_factor(A)
In [32]:
LU
Out[32]:
array([[ 4. , 7. , 2. ],
[ 0.5 , -2.5 , 2. ],
[ 0.25, -0.5 , 4.5 ]])
In [33]:
la.lu_solve((LU, P), b)
Out[33]:
array([[ 0.88888889],
[-0.11111111],
[ 0.11111111]])
Cholesky Decomposition¶
Recall that a square matrix \(A\) is positive definite if
for any non-zero n-dimensional vector \(u\),
and a symmetric, positive-definite matrix \(A\) is a positive-definite matrix such that
For a positive definite square matrix, all the pivots (diagonal elements of \(U\)) are positive. If the matrix is also symmetric, then we must have an \(LDU\) decomposition of the form \(LDL^T\), where all the diagonal elements of \(D\) are positive. Given this, \(D^{1/2}\) is well-defined, and we have
where \(C\) is lower-triangular with positive diagonal elements and \(C^T\) is its transpose. This decomposition is known as the Cholesky decomposition, and \(C\) may be interpreted as the ‘square root’ of the matrix \(A\).
Algorithm¶
Let \(A\) be an \(n\times n\) matrix. We find the matrix \(L\) using the following iterative procedure:
1.) Let \(\ell_{11} = \sqrt{a_{11}}\)
2.) \(L_{12} = \frac{1}{\ell_{11}}A_{12}\)
3.) Solve \(A_{22} - L_{12}L_{12}^T = L_{22}L_{22}^T\) for \(L_{22}\)
Example¶
\(\begin{eqnarray*} A_{22} - L_{12}L_{12}^T &=& \left(\begin{matrix}13&23\\23&42\end{matrix}\right) - \left(\begin{matrix}9&15\\15&25\end{matrix}\right)\\ &=& \left(\begin{matrix}4&8\\8&17\end{matrix}\right) \end{eqnarray*}\)
This is also symmetric and positive definite, and can be solved by another iteration
\(\begin{eqnarray*} &=& \left(\begin{matrix}2&0\\4&\ell_{33}\end{matrix}\right) \left(\begin{matrix}2&4\\0&\ell_{33}\end{matrix}\right)\\ &=& \left(\begin{matrix}4&8\\8&16+\ell_{33}^2\end{matrix}\right) \end{eqnarray*}\)
And so we conclude that \(\ell_{33}=1\).
This yields the decomposition:
In [34]:
A = np.array([
[1,3,5],
[3,13,23],
[5,23,42]
])
In [35]:
C = la.cholesky(A)
C
Out[35]:
array([[1., 3., 5.],
[0., 2., 4.],
[0., 0., 1.]])
In [36]:
C1 = la.cho_factor(A)
la.cho_solve(C1, b)
Out[36]:
array([[ 1.75],
[-0.25],
[ 0. ]])