Linear least squares

[1]:
import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
import numpy as np
import scipy.linalg as la

Normal equations

  • Suppose \(Ax = b\) is inconsistent. In this case, we can look instead for \(\widehat{x}\) which minimizes the distance between \(Ax\) and \(b\). In other words, we need to minimize \(\Vert Ax - b \Vert^2\).

  • The minimum will occur when \(\langle Ax-b, Ax \rangle = 0\)

  • Solving \((Ax)^T(Ax - b) = 0\) gives the normal equations \(\widehat{x} = (A^TA)^{-1}A^T b\)

  • The corresponding vector in \(C(A)\) is \(A\widehat{x} = A(A^TA)^{-1}A^T b\)

  • Note that \(P_A = A(A^TA)^{-1}A^T\) is the projection matrix for \(C(A)\)

  • This makes sense, since any solution to \(Ax = b\) must be in \(C(A)\), and the nearest such vector to \(b\) must be the projection of \(b\) onto \(C(A)\).

[2]:
m = 100
n = 5
A = np.random.rand(m, n)
b = np.random.rand(m, 1)

Using least squares function (SVD)

[3]:
x, res, rank, s, = la.lstsq(A, b)
[4]:
x
[4]:
array([[0.31192056],
       [0.15414167],
       [0.18783713],
       [0.07997706],
       [0.17878726]])
[5]:
(A @ x).shape
[5]:
(100, 1)

Using normal equations (projection) - for illustration only.

[6]:
la.inv(A.T @ A) @ A.T @ b
[6]:
array([[0.31192056],
       [0.15414167],
       [0.18783713],
       [0.07997706],
       [0.17878726]])

Projection matrices

  • Let \(P\) be the projection matrix onto \(C(A)\)

  • Since it is a projection operator, \(P^2\) = \(P\) (check)

  • Check that \(I-P\) is also idempotent (it turns out to also be a projection matrix)

  • Where does \(I-P\) project to?

  • Show \(C(I-P) = N(P)\)

  • Show \(\rho(P) + \nu(P) = n\)

  • This implies \(\mathbb{R}^n = C(P) \oplus C(I-P)\)

  • Trivial example

    \[\begin{split}P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}, (I-P) = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]
  • Geometry of \(P\) and \(I-P\) in linear least squares

[7]:
P = A @ la.inv(A.T @ A) @ A.T
[8]:
P.shape
[8]:
(100, 100)
[9]:
x, res, rank, s, = la.lstsq(A, b)
[10]:
Q = np.eye(m) - P
[11]:
(P @ b)[:5]
[11]:
array([[0.2763989 ],
       [0.33523772],
       [0.4619637 ],
       [0.50292904],
       [0.49148152]])
[12]:
(A @ x)[:5]
[12]:
array([[0.2763989 ],
       [0.33523772],
       [0.4619637 ],
       [0.50292904],
       [0.49148152]])
[13]:
la.norm(Q @ b)**2
[13]:
9.834201864805834
[14]:
res
[14]:
array([9.83420186])

Optimization

Note that \(\Vert Ax - b \Vert^2\) can also be considered as a cost function of \(x\) that we want to minimize. Hence, we can also use optimization methods such as gradient descent to solve this problem iteratively. Importantly, optimization techniques are generalizable to nonlinear cost functions as well, and some can be made to scale to massive problems.

This will be the topic of the next set of lectures.

[ ]: