Linear least squares¶
In [1]:
import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
import numpy as np
import scipy.linalg as la
Normal equations¶
- Suppose \(Ax = b\) is inconsistent. In this case, we can look instead for \(\widehat{x}\) which minimizes the distance between \(Ax\) and \(b\). In other words, we need to minimize \(\Vert Ax - b \Vert^2\).
- The minimum will occur when \(\langle Ax-b, Ax \rangle = 0\)
- Solving \((Ax)^T(Ax - b) = 0\) gives the normal equations \(\widehat{x} = (A^TA)^{-1}A^T b\)
- The corresponding vector in \(C(A)\) is \(A\widehat{x} = A(A^TA)^{-1}A^T b\)
- Note that \(P_A = A(A^TA)^{-1}A^T\) is the projection matrix for \(C(A)\)
- This makes sense, since any solution to \(Ax = b\) must be in \(C(A)\), and the nearest such vector to \(b\) must be the projection of \(b\) onto \(C(A)\).
In [2]:
m = 100
n = 5
A = np.random.rand(m, n)
b = np.random.rand(m, 1)
Using least squares function (SVD)
In [3]:
x, res, rank, s, = la.lstsq(A, b)
In [4]:
x
Out[4]:
array([[0.31192056],
[0.15414167],
[0.18783713],
[0.07997706],
[0.17878726]])
In [5]:
(A @ x).shape
Out[5]:
(100, 1)
Using normal equations (projection) - for illustration only.
In [6]:
la.inv(A.T @ A) @ A.T @ b
Out[6]:
array([[0.31192056],
[0.15414167],
[0.18783713],
[0.07997706],
[0.17878726]])
Projection matrices¶
Let \(P\) be the projection matrix onto \(C(A)\)
Since it is a projection operator, \(P^2\) = \(P\) (check)
Check that \(I-P\) is also idempotent (it turns out to also be a projection matrix)
Where does \(I-P\) project to?
Show \(C(I-P) = N(P)\)
Show \(\rho(P) + \nu(P) = n\)
This implies \(\mathbb{R}^n = C(P) \oplus C(I-P)\)
Trivial example
\[\begin{split}P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}, (I-P) = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]Geometry of \(P\) and \(I-P\) in linear least squares
In [7]:
P = A @ la.inv(A.T @ A) @ A.T
In [8]:
P.shape
Out[8]:
(100, 100)
In [9]:
x, res, rank, s, = la.lstsq(A, b)
In [10]:
Q = np.eye(m) - P
In [11]:
(P @ b)[:5]
Out[11]:
array([[0.2763989 ],
[0.33523772],
[0.4619637 ],
[0.50292904],
[0.49148152]])
In [12]:
(A @ x)[:5]
Out[12]:
array([[0.2763989 ],
[0.33523772],
[0.4619637 ],
[0.50292904],
[0.49148152]])
In [13]:
la.norm(Q @ b)**2
Out[13]:
9.834201864805834
In [14]:
res
Out[14]:
array([9.83420186])