Linear least squares¶

In [1]:

import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
import numpy as np
import scipy.linalg as la

Normal equations¶

Suppose \(Ax = b\) is inconsistent. In this case, we can look instead for \(\widehat{x}\) which minimizes the distance between \(Ax\) and \(b\). In other words, we need to minimize \(\Vert Ax - b \Vert^2\).
The minimum will occur when \(\langle Ax-b, Ax \rangle = 0\)
Solving \((Ax)^T(Ax - b) = 0\) gives the normal equations \(\widehat{x} = (A^TA)^{-1}A^T b\)
The corresponding vector in \(C(A)\) is \(A\widehat{x} = A(A^TA)^{-1}A^T b\)
Note that \(P_A = A(A^TA)^{-1}A^T\) is the projection matrix for \(C(A)\)
This makes sense, since any solution to \(Ax = b\) must be in \(C(A)\), and the nearest such vector to \(b\) must be the projection of \(b\) onto \(C(A)\).

In [2]:

m = 100
n = 5
A = np.random.rand(m, n)
b = np.random.rand(m, 1)

Using least squares function (SVD)

In [3]:

x, res, rank, s, = la.lstsq(A, b)

In [4]:

Out[4]:

array([[0.31192056],
       [0.15414167],
       [0.18783713],
       [0.07997706],
       [0.17878726]])

In [5]:

(A @ x).shape

Out[5]:

(100, 1)

Using normal equations (projection) - for illustration only.

In [6]:

la.inv(A.T @ A) @ A.T @ b

Out[6]:

array([[0.31192056],
       [0.15414167],
       [0.18783713],
       [0.07997706],
       [0.17878726]])

Let \(P\) be the projection matrix onto \(C(A)\)
Since it is a projection operator, \(P^2\) = \(P\) (check)
Check that \(I-P\) is also idempotent (it turns out to also be a projection matrix)
Where does \(I-P\) project to?
Show \(C(I-P) = N(P)\)
Show \(\rho(P) + \nu(P) = n\)
This implies \(\mathbb{R}^n = C(P) \oplus C(I-P)\)
Trivial example

\[\begin{split}P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}, (I-P) = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]
Geometry of \(P\) and \(I-P\) in linear least squares

In [7]:

P = A @ la.inv(A.T @ A) @ A.T

In [8]:

P.shape

Out[8]:

(100, 100)

In [9]:

x, res, rank, s, = la.lstsq(A, b)

In [10]:

Q = np.eye(m) - P

In [11]:

(P @ b)[:5]

Out[11]:

array([[0.2763989 ],
       [0.33523772],
       [0.4619637 ],
       [0.50292904],
       [0.49148152]])

In [12]:

(A @ x)[:5]

Out[12]:

array([[0.2763989 ],
       [0.33523772],
       [0.4619637 ],
       [0.50292904],
       [0.49148152]])

In [13]:

la.norm(Q @ b)**2

Out[13]:

9.834201864805834

In [14]:

res

Out[14]:

array([9.83420186])