```
import os
import sys
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
%precision 4
plt.style.use('ggplot')
```

**Reference**

SciPy’s official tutorial on Linear algebra

This is the age of Big Data. Every second of every day, data is being recorded in countless systems over the world. Our shopping habits, book and movie preferences, key words typed into our email messages, medical records, NSA recordings of our telephone calls, genomic data - and none of it is any use without analysis.

Enormous data sets carry with them enormous challenges in data processing. Solving a system of \(10\) equations in \(10\) unknowns is easy, and one need not be terribly careful about methodolgy. But as the size of the system grows, algorithmic complexity and efficiency become critical.

For a more complete description:

http://en.wikipedia.org/wiki/Netflix_Prize

The whole technical story

http://www.stat.osu.edu/~dmsl/GrandPrize2009_BPC_BigChaos.pdf

In 2006, Netflix opened a competition where it provided ratings of over
\(400,000\) for \(18,000\) movies. The goal was to make predict
a user’s rating of a movie, based on previous ratings *and* ratings of
‘similar’ users. The task amounted to analysis of a
\(400,000\times 18,000\) matrix! The wikipedia link above describes
the contest and the second link is a very detailed description of the
method (which took into account important characteristics such as how
tastes may change over time). Part of the analysis is related to matrix
decomposition - we won’t go into the details of the winning algorithm,
but we will spend some time on basic matrix decompositions.

Matrix decompositions are an important step in solving linear systems in a computationally efficient manner.

LU stands for ‘Lower Upper’, and so an LU decomposition of a matrix \(A\) is a decomposition so that

\[A= LU\]

where \(L\) is lower triangular and \(U\) is upper triangular.

Now, LU decomposition is essentially gaussian elimination, but we work only with the matrix \(A\) (as opposed to the augmented matrix).

Let’s review how gaussian elimination (ge) works. We will deal with a \(3\times 3\) system of equations for conciseness, but everything here generalizes to the \(n\times n\) case. Consider the following equation:

\[\begin{split}\left(\begin{matrix}a_{11}&a_{12} & a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{matrix}\right)\left(\begin{matrix}x_1\\x_2\\x_3\end{matrix}\right) = \left(\begin{matrix}b_1\\b_2\\b_3\end{matrix}\right)\end{split}\]

For simplicity, let us assume that the leftmost matrix \(A\) is non-singular. To solve the system using ge, we start with the ‘augmented matrix’:

\[\begin{split}\left(\begin{array}{ccc|c}a_{11}&a_{12} & a_{13}& b_1 \\a_{21}&a_{22}&a_{23}&b_2\\a_{31}&a_{32}&a_{33}&b_3\end{array}\right)\end{split}\]

We begin at the first entry, \(a_{11}\). If \(a_{11} \neq 0\), then we divide the first row by \(a_{11}\) and then subtract the appropriate multiple of the first row from each of the other rows, zeroing out the first entry of all rows. (If \(a_{11}\) is zero, we need to permute rows. We will not go into detail of that here.) The result is as follows:

\[\begin{split}\left(\begin{array}{ccc|c}
1 & \frac{a_{12}}{a_{11}} & \frac{a_{13}}{a_{11}} & \frac{b_1}{a_{11}} \\
0 & a_{22} - a_{21}\frac{a_{12}}{a_{11}} & a_{23} - a_{21}\frac{a_{13}}{a_{11}} & b_2 - a_{21}\frac{b_1}{a_{11}}\\
0&a_{32}-a_{31}\frac{a_{12}}{a_{11}} & a_{33} - a_{31}\frac{a_{13}}{a_{11}} &b_3- a_{31}\frac{b_1}{a_{11}}\end{array}\right)\end{split}\]

We repeat the procedure for the second row, first dividing by the
leading entry, then subtracting the appropriate multiple of the
resulting row from each of the third and first rows, so that the second
entry in row 1 and in row 3 are zero. We *could* continue until the
matrix on the left is the identity. In that case, we can then just ‘read
off’ the solution: i.e., the vector \(x\) is the resulting column
vector on the right. Usually, it is more efficient to stop at *reduced
row eschelon* form (upper triangular, with ones on the diagonal), and
then use *back substitution* to obtain the final answer.

Note that in some cases, it is necessary to permute rows to obtain
reduced row eschelon form. This is called *partial pivoting*. If we also
manipulate columns, that is called *full pivoting*.

It should be mentioned that we may obtain the inverse of a matrix using ge, by reducing the matrix \(A\) to the identity, with the identity matrix as the augmented portion.

Now, this is all fine when we are solving a system one time, for one
outcome \(b\). Many applications involve solutions to multiple
problems, where the left-hand-side of our matrix equation does not
change, but there are many outcome vectors \(b\). In this case, it
is more efficient to *decompose* \(A\).

First, we start just as in ge, but we ‘keep track’ of the various multiples required to eliminate entries. For example, consider the matrix

\[\begin{split}A = \left(\begin{matrix} 1 & 3 & 4 \\
2& 1& 3\\
4&1&2
\end{matrix}\right)\end{split}\]

We need to multiply row \(1\) by \(2\) and subtract from row \(2\) to eliminate the first entry in row \(2\), and then multiply row \(1\) by \(4\) and subtract from row \(3\). Instead of entering zeroes into the first entries of rows \(2\) and \(3\), we record the multiples required for their elimination, as so:

\[\begin{split}\left(\begin{matrix} 1 & 3 & 4 \\
(2)& -5 & -5\\
(4)&-11&-14
\end{matrix}\right)\end{split}\]

And then we eliminate the second entry in the third row:

\[\begin{split}\left(\begin{matrix} 1 & 3 & 4 \\
(2)& -5 & -5\\
(4)&(\frac{-11}{5})&-3
\end{matrix}\right)\end{split}\]

And now we have the decomposition:

\[\begin{split}L= \left(\begin{matrix} 1 & 0 & 0 \\
2& 1 & 0\\
4&\frac{-11}5&1
\end{matrix}\right)
U = \left(\begin{matrix} 1 & 3 & 4 \\
0& -5 & -5\\
0&0&-3
\end{matrix}\right)\end{split}\]

We can solve the system by solving two back-substitution problems:

\[Ly = b\]

and

\[Ux=y\]

These are both \(O(n^2)\), so it is more efficient to decompose when there are multiple outcomes to solve for.

Let do this with numpy:

```
import numpy as np
import scipy.linalg as la
np.set_printoptions(suppress=True)
A = np.array([[1,3,4],[2,1,3],[4,1,2]])
print(A)
P, L, U = la.lu(A)
print(np.dot(P.T, A))
print
print(np.dot(L, U))
print(P)
print(L)
print(U)
```

```
[[1 3 4]
[2 1 3]
[4 1 2]]
[[ 4. 1. 2.]
[ 1. 3. 4.]
[ 2. 1. 3.]]
[[ 4. 1. 2.]
[ 1. 3. 4.]
[ 2. 1. 3.]]
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
[[ 1. 0. 0. ]
[ 0.25 1. 0. ]
[ 0.5 0.1818 1. ]]
[[ 4. 1. 2. ]
[ 0. 2.75 3.5 ]
[ 0. 0. 1.3636]]
```

Note that the numpy decomposition uses *partial pivoting* (matrix rows
are permuted to use the largest pivot). This is because small pivots can
lead to numerical instability. Another reason why one should use library
functions whenever possible!

Recall that a square matrix \(A\) is positive definite if

\[\begin{split}u^TA u > 0\end{split}\]

for any non-zero n-dimensional vector \(u\),

and a symmetric, positive-definite matrix \(A\) is a positive-definite matrix such that

\[A = A^T\]

Let \(A\) be a symmetric, positive-definite matrix. There is a unique decomposition such that

\[A = L L^T\]

where \(L\) is lower-triangular with positive diagonal elements and \(L^T\) is its transpose. This decomposition is known as the Cholesky decompostion, and \(L\) may be interpreted as the ‘square root’ of the matrix \(A\).

Let \(A\) be an \(n\times n\) matrix. We find the matri \(L\) using the following iterative procedure:

\[\begin{split}A = \left(\begin{matrix}a_{11}&A_{12}\\A_{12}&A_{22}\end{matrix}\right) =
\left(\begin{matrix}\ell_{11}&0\\
L_{12}&L_{22}\end{matrix}\right)
\left(\begin{matrix}\ell_{11}&L_{12}\\0&L_{22}\end{matrix}\right)\end{split}\]

1.) Let \(\ell_{11} = \sqrt{a_{11}}\)

2.) \(L_{12} = \frac{1}{\ell_{11}}A_{12}\)

3.) Solve \(A_{22} - L_{12}L_{12}^T = L_{22}L_{22}^T\) for \(L_{22}\)

\[\begin{split}A = \left(\begin{matrix}1&3&5\\3&13&23\\5&23&42\end{matrix}\right)\end{split}\]

\[\ell_{11} = \sqrt{a_{11}} = 1\]

\[L_{12} = \frac{1}{\ell_{11}} A_{12} = A_{12}\]

And so we conclude that \(\ell_{33}=1\).

This yields the decomposition:

\[\begin{split}\left(\begin{matrix}1&3&5\\3&13&23\\5&23&42\end{matrix}\right) =
\left(\begin{matrix}1&0&0\\3&2&0\\5&4&1\end{matrix}\right)\left(\begin{matrix}1&3&5\\0&2&4\\0&0&1\end{matrix}\right)\end{split}\]

Now, with numpy:

```
A = np.array([[1,3,5],[3,13,23],[5,23,42]])
L = la.cholesky(A)
print(np.dot(L.T, L))
print(L)
print(A)
```

```
[[ 1. 3. 5.]
[ 3. 13. 23.]
[ 5. 23. 42.]]
[[ 1. 3. 5.]
[ 0. 2. 4.]
[ 0. 0. 1.]]
[[ 1 3 5]
[ 3 13 23]
[ 5 23 42]]
```

Cholesky decomposition is about twice as fast as LU decomposition (though both scale as \(n^3\)).

First recall that an *eigenvector* of a matrix \(A\) is a non-zero
vector \(v\) such that

\[Av = \lambda v\]

for some scalar \(\lambda\)

The value \(\lambda\) is called an *eigenvalue* of \(A\).

If an \(n\times n\) matrix \(A\) has \(n\) linearly independent eigenvectors, then \(A\) may be decomposed in the following manner:

\[A = B\Lambda B^{-1}\]

where \(\Lambda\) is a diagonal matrix whose diagonal entries are the eigenvalues of \(A\) and the columns of \(B\) are the corresponding eigenvectors of \(A\).

Facts:

An \(n\times n\) matrix is diagonizable \(\iff\) it has \(n\) linearly independent eigenvectors.

A symmetric, positive definite matrix has only positive eigenvalues and its eigendecomposition

\[A=B\Lambda B^{-1}\]

is via an orthogonal transformation \(B\). (I.e. its eigenvectors are an orthonormal set)

It is easy to see from the definition that if \(v\) is an eigenvector of an \(n\times n\) matrix \(A\) with eigenvalue \(\lambda\), then

\[Av - \lambda I = \bf{0}\]

where \(I\) is the identity matrix of dimension \(n\) and \(\bf{0}\) is an n-dimensional zero vector. Therefore, the eigenvalues of \(A\) satisfy:

\[\det\left(A-\lambda I\right)=0\]

The left-hand side above is a polynomial in \(\lambda\), and is
called the *characteristic polynomial* of \(A\). Thus, to find the
eigenvalues of \(A\), we find the roots of the characteristic
polynomial.

Computationally, however, computing the characteristic polynomial and then solving for the roots is prohibitively expensive. Therefore, in practice, numerical methods are used - both to find eigenvalues and their corresponding eigenvectors. We won’t go into the specifics of the algorithms used to calculate eigenvalues, but here is a numpy example:

```
A = np.array([[0,1,1],[2,1,0],[3,4,5]])
u, V = la.eig(A)
print(np.dot(V,np.dot(np.diag(u), la.inv(V))))
print(u)
```

```
[[-0.+0.j 1.+0.j 1.+0.j]
[ 2.+0.j 1.+0.j 0.+0.j]
[ 3.+0.j 4.+0.j 5.+0.j]]
[ 5.8541+0.j -0.8541+0.j 1.0000+0.j]
```

**NB:** Many matrices are *not* diagonizable, and many have *complex*
eigenvalues (even if all entries are real).

```
A = np.array([[0,1],[-1,0]])
print(A)
u, V = la.eig(A)
print(np.dot(V,np.dot(np.diag(u), la.inv(V))))
print(u)
```

```
[[ 0 1]
[-1 0]]
[[ 0.+0.j 1.+0.j]
[-1.+0.j 0.+0.j]]
[ 0.+1.j 0.-1.j]
```

```
# If you know the eigenvalues must be reeal
# because A is a positive definite (e.g. covariance) matrix
# use real_if_close
A = np.array([[0,1,1],[2,1,0],[3,4,5]])
u, V = la.eig(A)
print(u)
print np.real_if_close(u)
```

```
[ 5.8541+0.j -0.8541+0.j 1.0000+0.j]
[ 5.8541 -0.8541 1. ]
```

For any \(m\times n\) matrix \(A\), we define its *singular
values* to be the square root of the eigenvalues of \(A^TA\). These
are well-defined as \(A^TA\) is always symmetric, positive-definite,
so its eigenvalues are real and positive. Singular values are important
properties of a matrix. Geometrically, a matrix \(A\) maps the unit
sphere in \(\mathbb{R}^n\) to an ellipse. The singular values are
the lengths of the semi-axes.

Singular values also provide a measure of the *stabilty* of a matrix.
We’ll revisit this in the end of the lecture.

As with the previous decompositions, \(QR\) decomposition is a method to write a matrix \(A\) as the product of two matrices of simpler form. In this case, we want:

\[A= QR\]

where \(Q\) is an \(m\times n\) matrix with \(Q Q^T = I\)
(i.e. \(Q\) is *orthogonal*) and \(R\) is an \(n\times n\)
upper-triangular matrix.

This is really just the matrix form of the Gram-Schmidt orthogonalization of the columns of \(A\). The G-S algorithm itself is unstable, so various other methods have been developed to compute the QR decomposition. We won’t cover those in detail as they are a bit beyond our scope.

The first \(k\) columns of \(Q\) are an orthonormal basis for the column space of the first \(k\) columns of \(A\).

Iterative QR decomposition is often used in the computation of eigenvalues.

Another important matrix decomposition is singular value decomposition or SVD. For any \(m\times n\) matrix \(A\), we may write:

\[A= UDV\]

where \(U\) is a unitary (orthogonal in the real case) \(m\times m\) matrix, \(D\) is a rectangular, diagonal \(m\times n\) matrix with diagonal entries \(d_1,...,d_m\) all non-negative. \(V\) is a unitary (orthogonal) \(n\times n\) matrix. SVD is used in principle component analysis and in the computation of the Moore-Penrose pseudo-inverse.

It is important that numerical algorithms be *stable* and *efficient*.
Efficiency is a property of an algorithm, but stability can be a
property of the system itself.

\[\begin{split}\left(\begin{matrix}8&6&4&1\\1&4&5&1\\8&4&1&1\\1&4&3&6\end{matrix}\right)x = \left(\begin{matrix}19\\11\\14\\14\end{matrix}\right)\end{split}\]

```
A = np.array([[8,6,4,1],[1,4,5,1],[8,4,1,1],[1,4,3,6]])
b = np.array([19,11,14,14])
la.solve(A,b)
```

```
array([ 1., 1., 1., 1.])
```

```
b = np.array([19.01,11.05,14.07,14.05])
la.solve(A,b)
```

```
array([-2.34 , 9.745, -4.85 , -1.34 ])
```

Note that the *tiny* perturbations in the outcome vector \(b\) cause
*large* differences in the solution! When this happens, we say that the
matrix \(A\) *ill-conditioned*. This happens when a matrix is
‘close’ to being singular (i.e. non-invertible).

A measure of this type of behavior is called the *condition number*. It
is defined as:

\[cond(A) = ||A||\cdot ||A^{-1}||\]

In general, it is difficult to compute.

Fact:

\[cond(A) = \frac{\lambda_1}{\lambda_n}\]

where \(\lambda_1\) is the maximum singular value of \(A\) and \(\lambda_n\) is the smallest. The higher the condition number, the more unstable the system. In general if there is a large discrepancy between minimal and maximal singular values, the condition number is large.

```
U, s, V = np.linalg.svd(A)
print(s)
print(max(s)/min(s))
```

```
[ 15.5457 6.9002 3.8363 0.0049]
3198.6725812
```

We can sometimes improve on this behavior by ‘pre-conditioning’. Instead of solving

\[Ax=b\]

we solve

\[D^{-1}Ax=D^{-1}b\]\[where :math:`D^{-1}A` has a lower condition number than :math:`A`\]

itself.

Preconditioning is a *very* involved topic, quite out of the range of
this course. It is mentioned here only to make you aware that such a
thing exists, should you ever run into an ill-conditioned problem!

**1**. Compute the LU decomposition of the following matrix by hand and
using numpy

\[\begin{split}\left(\begin{matrix}1&2&3\\2&-4&6\\3&-9&-3\end{matrix}\right)\end{split}\]

```
A = np.array([[1,2,3],[2,-4,6],[3,-9,-3]])
print(A)
P, L , U = la.lu(A)
print(P)
print(L)
print(U)
```

```
[[ 1 2 3]
[ 2 -4 6]
[ 3 -9 -3]]
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
[[ 1. 0. 0. ]
[ 0.3333 1. 0. ]
[ 0.6667 0.4 1. ]]
[[ 3. -9. -3. ]
[ 0. 5. 4. ]
[ 0. 0. 6.4]]
```

**2**. Compute the Cholesky decomposition of the following matrix by
hand and using numpy

\[\begin{split}\left(\begin{matrix}4&2&3\\2&4&5\\3&5&8\end{matrix}\right)\end{split}\]

```
# Your code here
A=np.array([[4,2,3],[2,4,5],[3,5,8]])
np.linalg.cholesky(A)
```

```
array([[ 2. , 0. , 0. ],
[ 1. , 1.7321, 0. ],
[ 1.5 , 2.0207, 1.291 ]])
```

**3**. Write a function in Python to solve a system

\[Ax = b\]

using SVD decomposition. Your function should take \(A\) and \(b\) as input and return \(x\).

Your function should include the following:

- First, check that \(A\) is invertible - return error message if it is not
- Invert \(A\) using SVD and solve
- return \(x\)

Test your function for correctness.

```
# Your code here
def svdsolver(A,b):
U, s, V = np.linalg.svd(A)
if np.prod(s) == 0:
print("Matrix is singular")
else:
return np.dot(np.dot((V.T).dot(np.diag(s**(-1))), U.T),b)
```

```
A = np.array([[1,1],[1,2]])
b = np.array([3,1])
print(np.linalg.solve(A,b))
print(svdsolver(A,b))
```

```
[ 5. -2.]
[ 5. -2.]
```