# Solving Linear Equations¶

```
In [1]:
```

```
import numpy as np
import scipy.linalg as la
```

## Linear Equations¶

Consider a set of \(m\) linear equations in \(n\) unknowns:

We can let

and re-write the system

### Linear independence and existence of solutions¶

- If \(A\) is an \(m\times n\) matrix and \(m>n\), if all
\(m\) rows are linearly independent, then the system is
*overdetermined*and*inconsistent*. The system cannot be solved exactly. This is the usual case in data analysis, and why least squares is so important. For example, we may be finding the parameters of a linear model, where there are \(m\) data points and \(n\) parameters. - If \(A\) is an \(m\times n\) matrix and \(m<n\), if all
\(m\) rows are linearly independent, then the system is
*underdetermined*and there are*infinite*solutions. - If \(A\) is an \(m\times n\) matrix and some of its rows are
linearly dependent, then the system is
*reducible*. We can get rid of some equations. In other words, there are equations in the system that do not give us any new information. - If \(A\) is a square matrix and its rows are linearly independent, the system has a unique solution. (\(A\) is invertible.)

For a linear system, we can only get a unique solution, no solution, or infinite solutions.

### Using `solve`

to find unique solutions¶

```
In [2]:
```

```
A = np.array([[1,2],[3,4]])
A
```

```
Out[2]:
```

```
array([[1, 2],
[3, 4]])
```

```
In [3]:
```

```
b = np.array([3,17]).reshape((-1,1))
b
```

```
Out[3]:
```

```
array([[ 3],
[17]])
```

```
In [4]:
```

```
x = la.solve(A, b)
x
```

```
Out[4]:
```

```
array([[11.],
[-4.]])
```

```
In [5]:
```

```
np.allclose(A @ x, b)
```

```
Out[5]:
```

```
True
```

#### Under the hood of `solve`

¶

The `solve`

function uses the `dgesv`

fortran function in `lapack`

(Linear Algebra Package) to do the actual work. In turn, `lapack`

calls even lower level routines in `blas`

(Basic Linear Algebra
Subprograms). In particular, `solve`

uses the LU matrix decomposition
that we will show in detail shortly.

There is rarely any reason to use `blas`

or `lapack`

functions
directly becuase the `linalg`

package provides more convenient
functions that also perfrom error checking, but you can use Python to
experiment with `lapack`

or `blas`

before using them in a language
like C or Fortran.

```
In [6]:
```

```
import scipy.linalg.lapack as lapack
```

```
In [7]:
```

```
lu, piv, x, info = lapack.dgesv(A, b)
x
```

```
Out[7]:
```

```
array([[11.],
[-4.]])
```

## Gaussian elimination¶

Let’s review how Gaussian elimination (ge) works. We will deal with a \(3\times 3\) system of equations for conciseness, but everything here generalizes to the \(n\times n\) case. Consider the following equation:

For simplicity, let us assume that the leftmost matrix \(A\) is non-singular. To solve the system using ge, we start with the ‘augmented matrix’:

We begin at the first entry, \(a_{11}\). If \(a_{11} \neq 0\), then we divide the first row by \(a_{11}\) and then subtract the appropriate multiple of the first row from each of the other rows, zeroing out the first entry of all rows. (If \(a_{11}\) is zero, we need to permute rows. We will not go into detail of that here.) The result is as follows:

We repeat the procedure for the second row, first dividing by the
leading entry, then subtracting the appropriate multiple of the
resulting row from each of the third and first rows, so that the second
entry in row 1 and in row 3 are zero. Gaussian elimination stops at *row
echelon* form (upper triangular, with ones on the diagonal), and then
uses *back substitution* to obtain the final answer.

Note that in some cases, it is necessary to permute rows to obtain row
echelon form (when the pivot would otherwise be zero). This is called
*partial pivoting*.

### Example¶

We perform Gaussian elimination on the the following augmented matrix

We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\).

And then we eliminate the second entry in the third row:

We can now do back-substitution to solve

to get

### Check¶

```
In [8]:
```

```
A = np.array([
[1,3,4],
[2,1,3],
[4,7,2]
])
b = np.array([1,2,3]).reshape((-1,1))
la.solve(A, b)
```

```
Out[8]:
```

```
array([[ 0.88888889],
[-0.11111111],
[ 0.11111111]])
```

## Gauss-Jordan elimination¶

With Gauss-Jordan elimination, we make all the pivots equal to 1, and set all entries above and below a pivot 0.

Then we can just read off the values of \(x_1\), \(x_2\) and
\(x_3\) directly. This is known as the *reduced row echelon form*.

## Gaussian-Jordan elimination and the number of solutions¶

Consider 3 matrices \(m > n\), \(m = n\) and \(m < n\) and provide some intuition as to the existence of a no, unique or infinite solutions.

## LU Decomposition¶

LU stands for ‘Lower Upper’, and so an LU decomposition of a matrix \(A\) is a decomposition so that

Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix \(A\) (as opposed to the augmented matrix).

Gaussian elimination is all fine when we are solving a system one time,
for one outcome \(b\). Many applications involve solutions to
multiple problems, where the left-hand-side of our matrix equation does
not change, but there are many outcome vectors \(b\). In this case,
it is more efficient to *decompose* \(A\).

First, we start just as in ge, but we ‘keep track’ of the various multiples required to eliminate entries. For example, consider the matrix

We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\). Instead of entering zeroes into the first entries of rows \(2\) and \(3\), we record the multiples required for their elimination, as so:

And then we eliminate the second entry in the third row:

And now we have the decomposition:

### Elementary matrices¶

Why does the algorithm for LU work? Gaussian elimination consists of 3 elementary operations

- Op1: swapping two rows
- Op2: replace a row by the sum of that row and a multiple of another
- Op3: multiplying a row by a non-zero scalar

These can be recast as matrix operations - in particular, pre-multiplication with corresponding elementary matrices.

- Op1: swapping two rows uses a permutation matrix

- Op2: replace a row by the sum of that row and a multiple of another uses an lower triangular matrix

Note: The inverse operation just substitutes the negative of the multiple, and is also lower triangular.

- Op3: multiplying a row by a non-zero scalar uses an lower triangular matrix

Note: The inverse operation just substitutes the inverse of the scalar, and is also lower triangular.

Multiplying an upper triangular matrix by another lower triangular matrix gives an lower triangular matrix. Hence if we put the permutations aside (i.e. keep a separate permutation matrix), Gaussian elimination can be expressed as a product of lower triangular matrices, which is just another lower triangular matrix \(L\), with the original matrix \(A\) to give an upper triangular matrix \(U\). The lower triangular matrix \(L\) is then the product of the inverse operations.

```
In [9]:
```

```
A = np.array([
[1,3,4],
[2,1,3],
[4,7,2]
])
```

```
In [10]:
```

```
A
```

```
Out[10]:
```

```
array([[1, 3, 4],
[2, 1, 3],
[4, 7, 2]])
```

#### Construct U¶

```
In [11]:
```

```
b1 = np.array([
[1, 0, 0],
[-2, 1, 0],
[0, 0, 1]
])
```

```
In [12]:
```

```
b2 = np.array([
[1, 0, 0],
[0, 1, 0],
[-4, 0, 1]
])
```

```
In [13]:
```

```
b3 = np.array([
[1, 0, 0],
[0, 1, 0],
[0, -1, 1]
])
```

```
In [14]:
```

```
b1 @ A
```

```
Out[14]:
```

```
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 4, 7, 2]])
```

```
In [15]:
```

```
b2 @ b1 @ A
```

```
Out[15]:
```

```
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, -5, -14]])
```

```
In [16]:
```

```
b3 @ b2 @ b1 @ A
```

```
Out[16]:
```

```
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, 0, -9]])
```

```
In [17]:
```

```
U = b3 @ b2 @ b1 @ A
U
```

```
Out[17]:
```

```
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, 0, -9]])
```

#### Construct L¶

```
In [18]:
```

```
ib1 = np.array([
[1, 0, 0],
[2, 1, 0],
[0, 0, 1]
])
```

```
In [19]:
```

```
ib2 = np.array([
[1, 0, 0],
[0, 1, 0],
[4, 0, 1]
])
```

```
In [20]:
```

```
ib3 = np.array([
[1, 0, 0],
[0, 1, 0],
[0, 1, 1]
])
```

```
In [21]:
```

```
L = ib1 @ ib2 @ ib3
L
```

```
Out[21]:
```

```
array([[1, 0, 0],
[2, 1, 0],
[4, 1, 1]])
```

#### A is factorized into LU¶

```
In [22]:
```

```
L @ U
```

```
Out[22]:
```

```
array([[1, 3, 4],
[2, 1, 3],
[4, 7, 2]])
```

```
In [23]:
```

```
A
```

```
Out[23]:
```

```
array([[1, 3, 4],
[2, 1, 3],
[4, 7, 2]])
```

We can now use the LU decomposition to solve for *any* \(b\) without
having to perform Gaussian elimination again.

- First solve \(Ly = b\)
- Then solve \(Ux = y\)

Since \(L\) and \(U\) are triangular, they can be cheaply solved by substitution.

```
In [24]:
```

```
b
```

```
Out[24]:
```

```
array([[1],
[2],
[3]])
```

```
In [25]:
```

```
y = la.solve_triangular(L, b, lower=True)
y
```

```
Out[25]:
```

```
array([[ 1.],
[ 0.],
[-1.]])
```

```
In [26]:
```

```
x = la.solve_triangular(U, y)
x
```

```
Out[26]:
```

```
array([[ 0.88888889],
[-0.11111111],
[ 0.11111111]])
```

### LDU Decomposition¶

Note that \(L\) is a unit triangular matrix while \(U\) is not. It is sometimes instructive to factorize \(A = LDU\) where both \(L\) and \(U\) are unit triangular, and \(D\) is a diagonal matrix.

```
In [27]:
```

```
U
```

```
Out[27]:
```

```
array([[ 1, 3, 4],
[ 0, -5, -5],
[ 0, 0, -9]])
```

```
In [29]:
```

```
D = np.diag(np.diag(U))
D
```

```
Out[29]:
```

```
array([[ 1, 0, 0],
[ 0, -5, 0],
[ 0, 0, -9]])
```

```
In [33]:
```

```
U1 = U/np.diag(U)[:, None]
U1
```

```
Out[33]:
```

```
array([[ 1., 3., 4.],
[-0., 1., 1.],
[-0., -0., 1.]])
```

```
In [34]:
```

```
D @ U1
```

```
Out[34]:
```

```
array([[ 1., 3., 4.],
[ 0., -5., -5.],
[ 0., 0., -9.]])
```

```
In [36]:
```

```
np.allclose(A, L @ D @ U1)
```

```
Out[36]:
```

```
True
```

### LU Decomposition in practice¶

```
In [27]:
```

```
P, L, U = la.lu(A)
```

```
In [28]:
```

```
L
```

```
Out[28]:
```

```
array([[ 1. , 0. , 0. ],
[ 0.5 , 1. , 0. ],
[ 0.25, -0.5 , 1. ]])
```

```
In [29]:
```

```
U
```

```
Out[29]:
```

```
array([[ 4. , 7. , 2. ],
[ 0. , -2.5, 2. ],
[ 0. , 0. , 4.5]])
```

P is a permutation matrix.

```
In [30]:
```

```
P
```

```
Out[30]:
```

```
array([[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.]])
```

In practice, we can store both L and U in a single matrix LU.

```
In [31]:
```

```
LU, P = la.lu_factor(A)
```

```
In [32]:
```

```
LU
```

```
Out[32]:
```

```
array([[ 4. , 7. , 2. ],
[ 0.5 , -2.5 , 2. ],
[ 0.25, -0.5 , 4.5 ]])
```

```
In [33]:
```

```
la.lu_solve((LU, P), b)
```

```
Out[33]:
```

```
array([[ 0.88888889],
[-0.11111111],
[ 0.11111111]])
```

## Cholesky Decomposition¶

Recall that a square matrix \(A\) is positive definite if

for any non-zero n-dimensional vector \(u\),

and a symmetric, positive-definite matrix \(A\) is a positive-definite matrix such that

For a positive definite square matrix, all the pivots (diagonal elements of \(U\)) are positive. If the matrix is also symmetric, then we must have an \(LDU\) decomposition of the form \(LDL^T\), where all the diagonal elements of \(D\) are positive. Given this, \(D^{1/2}\) is well-defined, and we have

where \(C\) is lower-triangular with positive diagonal elements and \(C^T\) is its transpose. This decomposition is known as the Cholesky decomposition, and \(C\) may be interpreted as the ‘square root’ of the matrix \(A\).

### Algorithm¶

Let \(A\) be an \(n\times n\) matrix. We find the matrix \(L\) using the following iterative procedure:

1.) Let \(\ell_{11} = \sqrt{a_{11}}\)

2.) \(L_{12} = \frac{1}{\ell_{11}}A_{12}\)

3.) Solve \(A_{22} - L_{12}L_{12}^T = L_{22}L_{22}^T\) for \(L_{22}\)

### Example¶

\(\begin{eqnarray*} A_{22} - L_{12}L_{12}^T &=& \left(\begin{matrix}13&23\\23&42\end{matrix}\right) - \left(\begin{matrix}9&15\\15&25\end{matrix}\right)\\ &=& \left(\begin{matrix}4&8\\8&17\end{matrix}\right) \end{eqnarray*}\)

This is also symmetric and positive definite, and can be solved by another iteration

\(\begin{eqnarray*} &=& \left(\begin{matrix}2&0\\4&\ell_{33}\end{matrix}\right) \left(\begin{matrix}2&4\\0&\ell_{33}\end{matrix}\right)\\ &=& \left(\begin{matrix}4&8\\8&16+\ell_{33}^2\end{matrix}\right) \end{eqnarray*}\)

And so we conclude that \(\ell_{33}=1\).

This yields the decomposition:

```
In [34]:
```

```
A = np.array([
[1,3,5],
[3,13,23],
[5,23,42]
])
```

```
In [35]:
```

```
C = la.cholesky(A)
C
```

```
Out[35]:
```

```
array([[1., 3., 5.],
[0., 2., 4.],
[0., 0., 1.]])
```

```
In [36]:
```

```
C1 = la.cho_factor(A)
la.cho_solve(C1, b)
```

```
Out[36]:
```

```
array([[ 1.75],
[-0.25],
[ 0. ]])
```