[11]:
%matplotlib inline
[6]:
import matplotlib.pyplot as plt
import numpy as np
Least squares optimization¶
Many optimization problems involve minimization of a sum of squared residuals. We will take a look at finding the derivatives for least squares minimization.
In least squares problems, we usually have \(m\) labeled observations \((x_i, y_i)\). We have a model that will predict \(y_i\) given \(x_i\) for some parameters \(\beta\), \(f(x) = X\beta\). We want to minimize the sum (or average) of squared residuals \(r(x_i) = y_i - f(x_i)\). For example, the objective function is usually taken to be
As a concrete example, suppose we want to fit a quadratic function to some observed data. We have
We want to minimize the objective function
Taking derivatives with respect to \(\beta\), we get
Working with matrices¶
Writing the above system as a matrix, we have \(f(x) = X\beta\), with
and
We want to find the derivative of \(\Vert y - X\beta \Vert^2\), so
Taking derivatives with respect to \(\beta^T\) (we do this because the gradient is traditionally a row vector, and we want it as a column vector here), we get (after multiplying by \(1/2\) for the residue function)
For example, if we are doing gradient descent, the update equation is
Note that if we set the derivative to zero and solve, we get
and the normal equations
For large \(X\), solving the normal equations can be more expensive than simpler gradient descent. Note that the Levenberg-Marquadt algorithm is often used to optimize least squares problems.
Example¶
You are given the following set of data to fit a quadratic polynomial to:
x = np.arange(10)
y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225,
-2.05877605, -12.40257359, -28.64568712, -46.39822281,
-68.15488905, -97.16032044])
Find the least squares solution using gradient descent.
[101]:
x = np.arange(10)
y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225,
-2.05877605, -12.40257359, -28.64568712, -46.39822281,
-68.15488905, -97.16032044])
[138]:
def f(x, y, b):
return (b[0] + b[1]*x + b[2]*x**2 - y)
def res(x, y, b):
return sum(f(x,y, b)*f(x, y, b))
# Elementary form of gradient
def grad(x, y, b):
n = len(x)
return np.array([
sum(f(x, y, b)),
sum(x*f(x, y, b)),
sum(x**2*f(x, y, b))
])
# Matrix form of gradient
def grad_m(X, y, b):
return X.T@X@b- X.T@y
[139]:
grad(x, y, np.zeros(3))
[139]:
array([ 227.0657638 , 1933.9094954 , 15758.14427298])
[140]:
X = np.c_[np.ones(len(x)), x, x**2]
grad_m(X, y, np.zeros(3))
[140]:
array([ 227.0657638 , 1933.9094954 , 15758.14427298])
[131]:
from scipy.linalg import solve
beta1 = solve(X.T@X, X.T@y)
beta1
[131]:
array([ 2.55079998, 7.31478229, -2.04118936])
[143]:
max_iter = 10000
[144]:
a = 0.0001 # learning rate
beta2 = np.zeros(3)
for i in range(max_iter):
beta2 -= a * grad(x, y, beta2)
beta2
[144]:
array([ 2.73391723, 7.23152392, -2.03359658])
[145]:
a = 0.0001 # learning rate
beta3 = np.zeros(3)
for i in range(max_iter):
beta3 -= a * grad_m(X, y, beta3)
beta3
[145]:
array([ 2.73391723, 7.23152392, -2.03359658])
[146]:
titles = ['svd', 'elementary', 'matrix']
plt.figure(figsize=(12,4))
for i, beta in enumerate([beta1, beta2, beta3], 1):
plt.subplot(1, 3, i)
plt.scatter(x, y, s=30)
plt.plot(x, beta[0] + beta[1]*x + beta[2]*x**2, color='red')
plt.title(titles[i-1])
Curve fitting and least squares optimization¶
As shown above, least squares optimization is the technique most associated with curve fitting. For convenience, scipy.optimize
provides a curve_fit
function that uses Levenberg-Marquadt for minimization.
[1]:
from scipy.optimize import curve_fit
[4]:
def logistic4(x, a, b, c, d):
"""The four paramter logistic function is often used to fit dose-response relationships."""
return ((a-d)/(1.0+((x/c)**b))) + d
[7]:
nobs = 24
xdata = np.linspace(0.5, 3.5, nobs)
ptrue = [10, 3, 1.5, 12]
ydata = logistic4(xdata, *ptrue) + 0.5*np.random.random(nobs)
[8]:
popt, pcov = curve_fit(logistic4, xdata, ydata)
[9]:
perr = yerr=np.sqrt(np.diag(pcov))
print('Param\tTrue\tEstim (+/- 1 SD)')
for p, pt, po, pe in zip('abcd', ptrue, popt, perr):
print('%s\t%5.2f\t%5.2f (+/-%5.2f)' % (p, pt, po, pe))
Param True Estim (+/- 1 SD)
a 10.00 10.40 (+/- 0.14)
b 3.00 4.20 (+/- 1.16)
c 1.50 1.38 (+/- 0.09)
d 12.00 12.03 (+/- 0.10)
[13]:
x = np.linspace(0, 4, 100)
y = logistic4(x, *popt)
plt.plot(xdata, ydata, 'o')
plt.plot(x, y)
pass
[ ]: