Finding Roots of Equations¶
Calculus review¶
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as scipy
from scipy.interpolate import interp1d
Let’s review the theory of optimization for multivariate functions. Recall that in the single-variable case, extreme values (local extrema) occur at points where the first derivative is zero, however, the vanishing of the first derivative is not a sufficient condition for a local max or min. Generally, we apply the second derivative test to determine whether a candidate point is a max or min (sometimes it fails - if the second derivative either does not exist or is zero). In the multivariate case, the first and second derivatives are matrices. In the case of a scalar-valued function on \(\mathbb{R}^n\), the first derivative is an \(n\times 1\) vector called the gradient (denoted \(\nabla f\)). The second derivative is an \(n\times n\) matrix called the Hessian (denoted \(H\))
Just to remind you, the gradient and Hessian are given by:
One of the first things to note about the Hessian - it’s symmetric. This structure leads to some useful properties in terms of interpreting critical points.
The multivariate analog of the test for a local max or min turns out to be a statement about the gradient and the Hessian matrix. Specifically, a function \(f:\mathbb{R}^n\rightarrow \mathbb{R}\) has a critical point at \(x\) if \(\nabla f(x) = 0\) (where zero is the zero vector!). Furthermore, the second derivative test at a critical point is as follows:
- If \(H(x)\) is positive-definite (\(\iff\) it has all positive eigenvalues), \(f\) has a local minimum at \(x\)
- If \(H(x)\) is negative-definite (\(\iff\) it has all negative eigenvalues), \(f\) has a local maximum at \(x\)
- If \(H(x)\) has both positive and negative eigenvalues, \(f\) has a saddle point at \(x\).
If you have \(m\) equations with \(n\) variables, then the \(m \times n\) matrix of first partial derivatives is known as the Jacobian \(J(x)\). For example, for two equations \(f(x, y)\) and \(g(x, y)\), we have
We can now express the multivariate form of Taylor polynomials in a familiar format.
Main Issues in Root Finding in One Dimension¶
- Separating close roots
- Numerical Stability
- Rate of Convergence
- Continuity and Differentiability
Bisection Method¶
The bisection method is one of the simplest methods for finding zeros of a non-linear function. It is guaranteed to find a root - but it can be slow. The main idea comes from the intermediate value theorem: If \(f(a)\) and \(f(b)\) have different signs and \(f\) is continuous, then \(f\) must have a zero between \(a\) and \(b\). We evaluate the function at the midpoint, \(c = \frac12(a+b)\). \(f(c)\) is either zero, has the same sign as \(f(a)\) or the same sign as \(f(b)\). Suppose \(f(c)\) has the same sign as \(f(a)\) (as pictured below). We then repeat the process on the interval \([c,b]\).
In [2]:
def f(x):
return x**3 + 4*x**2 -3
x = np.linspace(-3.1, 0, 100)
plt.plot(x, x**3 + 4*x**2 -3)
a = -3.0
b = -0.5
c = 0.5*(a+b)
plt.text(a,-1,"a")
plt.text(b,-1,"b")
plt.text(c,-1,"c")
plt.scatter([a,b,c], [f(a), f(b),f(c)], s=50, facecolors='none')
plt.scatter([a,b,c], [0,0,0], s=50, c='red')
xaxis = plt.axhline(0)
pass
In [3]:
x = np.linspace(-3.1, 0, 100)
plt.plot(x, x**3 + 4*x**2 -3)
d = 0.5*(b+c)
plt.text(d,-1,"d")
plt.text(b,-1,"b")
plt.text(c,-1,"c")
plt.scatter([d,b,c], [f(d), f(b),f(c)], s=50, facecolors='none')
plt.scatter([d,b,c], [0,0,0], s=50, c='red')
xaxis = plt.axhline(0)
pass
We can terminate the process whenever the function evaluated at the new midpoint is ‘close enough’ to zero. This method is an example of what are known as ‘bracketed methods’. This means the root is ‘bracketed’ by the end-points (it is somewhere in between). Another class of methods are ‘open methods’ - the root need not be somewhere in between the end-points (but it usually needs to be close!)
Secant Method¶
The secant method also begins with two initial points, but without the constraint that the function values are of opposite signs. We use the secant line to extrapolate the next candidate point.
In [4]:
def f(x):
return (x**3-2*x+7)/(x**4+2)
x = np.arange(-3,5, 0.1);
y = f(x)
p1=plt.plot(x, y)
plt.xlim(-3, 4)
plt.ylim(-.5, 4)
plt.xlabel('x')
plt.axhline(0)
t = np.arange(-10, 5., 0.1)
x0=-1.2
x1=-0.5
xvals = []
xvals.append(x0)
xvals.append(x1)
notconverge = 1
count = 0
cols=['r--','b--','g--','y--']
while (notconverge==1 and count < 3):
slope=(f(xvals[count+1])-f(xvals[count]))/(xvals[count+1]-xvals[count])
intercept=-slope*xvals[count+1]+f(xvals[count+1])
plt.plot(t, slope*t + intercept, cols[count])
nextval = -intercept/slope
if abs(f(nextval)) < 0.001:
notconverge=0
else:
xvals.append(nextval)
count = count+1
plt.show()
The secant method has the advantage of fast convergence. While the bisection method has a linear convergence rate (i.e. error goes to zero at the rate that \(h(x) = x\) goes to zero, the secant method has a convergence rate that is faster than linear, but not quite quadratic (i.e. \(\sim x^\alpha\), where \(\alpha = \frac{1+\sqrt{5}}2 \approx 1.6\)) however, the trade-off is that the secant method is not guaranteed to find a root in the brackets.
A variant of the secant method is known as the method of false positions. Conceptually it is identical to the secant method, except that instead of always using the last two values of \(x\) for linear interpolation, it chooses the two most recent values that maintain the bracket property (i.e \(f(a) f(b) < 0\)). It is slower than the secant, but like the bisection, is safe.
Newton-Raphson Method¶
We want to find the value \(\theta\) so that some (differentiable) function \(g(\theta)=0\). Idea: start with a guess, \(\theta_0\). Let \(\tilde{\theta}\) denote the value of \(\theta\) for which \(g(\theta) = 0\) and define \(h = \tilde{\theta} - \theta_0\). Then:
This implies that
So that
Thus, we set our next approximation:
and we have developed an iterative procedure with:
Example¶
Let
In [5]:
x = np.arange(-5,5, 0.1);
y = (x**3-2*x+7)/(x**4+2)
p1=plt.plot(x, y)
plt.xlim(-4, 4)
plt.ylim(-.5, 4)
plt.xlabel('x')
plt.axhline(0)
plt.title('Example Function')
plt.show()
In [6]:
x = np.arange(-5,5, 0.1);
y = (x**3-2*x+7)/(x**4+2)
p1=plt.plot(x, y)
plt.xlim(-4, 4)
plt.ylim(-.5, 4)
plt.xlabel('x')
plt.axhline(0)
plt.title('Good Guess')
t = np.arange(-5, 5., 0.1)
x0=-1.5
xvals = []
xvals.append(x0)
notconverge = 1
count = 0
cols=['r--','b--','g--','y--','c--','m--','k--','w--']
while (notconverge==1 and count < 6):
funval=(xvals[count]**3-2*xvals[count]+7)/(xvals[count]**4+2)
slope=-((4*xvals[count]**3 *(7 - 2 *xvals[count] + xvals[count]**3))/(2 + xvals[count]**4)**2) + (-2 + 3 *xvals[count]**2)/(2 + xvals[count]**4)
intercept=-slope*xvals[count]+(xvals[count]**3-2*xvals[count]+7)/(xvals[count]**4+2)
plt.plot(t, slope*t + intercept, cols[count])
nextval = -intercept/slope
if abs(funval) < 0.01:
notconverge=0
else:
xvals.append(nextval)
count = count+1
From the graph, we see the zero is near -2. We make an initial guess of
We have made an excellent choice for our first guess, and we can see rapid convergence!
In [7]:
funval
Out[7]:
0.007591996330867034
In fact, the Newton-Raphson method converges quadratically. However, NR (and the secant method) have a fatal flaw:
In [8]:
x = np.arange(-5,5, 0.1);
y = (x**3-2*x+7)/(x**4+2)
p1=plt.plot(x, y)
plt.xlim(-4, 4)
plt.ylim(-.5, 4)
plt.xlabel('x')
plt.axhline(0)
plt.title('Bad Guess')
t = np.arange(-5, 5., 0.1)
x0=-0.5
xvals = []
xvals.append(x0)
notconverge = 1
count = 0
cols=['r--','b--','g--','y--','c--','m--','k--','w--']
while (notconverge==1 and count < 6):
funval=(xvals[count]**3-2*xvals[count]+7)/(xvals[count]**4+2)
slope=-((4*xvals[count]**3 *(7 - 2 *xvals[count] + xvals[count]**3))/(2 + xvals[count]**4)**2) + (-2 + 3 *xvals[count]**2)/(2 + xvals[count]**4)
intercept=-slope*xvals[count]+(xvals[count]**3-2*xvals[count]+7)/(xvals[count]**4+2)
plt.plot(t, slope*t + intercept, cols[count])
nextval = -intercept/slope
if abs(funval) < 0.01:
notconverge = 0
else:
xvals.append(nextval)
count = count+1
We have stumbled on the horizontal asymptote. The algorithm fails to converge.
Convergence Rate¶
The following is a derivation of the convergence rate of the NR method:
Suppose \(x_k \; \rightarrow \; x^*\) and \(g'(x^*) \neq 0\). Then we may write:
.
Now expand \(g\) at \(x^*\):
We have that
Gauss-Newton¶
For 1D, the Newton method is
We can generalize to \(k\) dimensions by
the inverse Jacobian matrix. In general, the Jacobian is not a square matrix, and we use the generalized inverse \((J^TJ)^{-1}J^T\) instead, giving
In multivariate nonlinear estimation problems, we can find the vector of parameters \(\beta\) by minimizing the residuals \(r(\beta)\),
Inverse Quadratic Interpolation¶
Inverse quadratic interpolation is a type of polynomial interpolation. Polynomial interpolation simply means we find the polynomial of least degree that fits a set of points. In quadratic interpolation, we use three points, and find the quadratic polynomial that passes through those three points.
In [9]:
def f(x):
return (x - 2) * x * (x + 2)**2
x = np.arange(-5,5, 0.1);
plt.plot(x, f(x))
plt.xlim(-3.5, 0.5)
plt.ylim(-5, 16)
plt.xlabel('x')
plt.axhline(0)
plt.title("Quadratic Interpolation")
#First Interpolation
x0=np.array([-3,-2.5,-1.0])
y0=f(x0)
f2 = interp1d(x0, y0,kind='quadratic')
#Plot parabola
xs = np.linspace(-3, -1, num=10000, endpoint=True)
plt.plot(xs, f2(xs))
#Plot first triplet
plt.plot(x0, f(x0),'ro');
plt.scatter(x0, f(x0), s=50, c='yellow');
#New x value
xnew=xs[np.where(abs(f2(xs))==min(abs(f2(xs))))]
plt.scatter(np.append(xnew,xnew), np.append(0,f(xnew)), c='black');
#New triplet
x1=np.append([-3,-2.5],xnew)
y1=f(x1)
f2 = interp1d(x1, y1,kind='quadratic')
#New Parabola
xs = np.linspace(min(x1), max(x1), num=100, endpoint=True)
plt.plot(xs, f2(xs))
xnew=xs[np.where(abs(f2(xs))==min(abs(f2(xs))))]
plt.scatter(np.append(xnew,xnew), np.append(0,f(xnew)), c='green');
So that’s the idea behind quadratic interpolation. Use a quadratic approximation, find the zero of interest, use that as a new point for the next quadratic approximation.
Inverse quadratic interpolation means we do quadratic interpolation on the inverse function. So, if we are looking for a root of \(f\), we approximate \(f^{-1}(x)\) using quadratic interpolation. This just means fitting \(x\) as a function of \(y\), so that the quadratic is turned on its side and we are guaranteed that it cuts the x-axis somewhere. Note that the secant method can be viewed as a linear interpolation on the inverse of \(f\). We can write:
We use the above formula to find the next guess \(x_{n+1}\) for a zero of \(f\) (so \(y=0\)):
We aren’t so much interested in deriving this as we are understanding the procedure:
In [10]:
x = np.arange(-5,5, 0.1);
plt.plot(x, f(x))
plt.xlim(-3.5, 0.5)
plt.ylim(-5, 16)
plt.xlabel('x')
plt.axhline(0)
plt.title("Inverse Quadratic Interpolation")
#First Interpolation
x0=np.array([-3,-2.5,1])
y0=f(x0)
f2 = interp1d(y0, x0,kind='quadratic')
#Plot parabola
xs = np.linspace(min(f(x0)), max(f(x0)), num=10000, endpoint=True)
plt.plot(f2(xs), xs)
#Plot first triplet
plt.plot(x0, f(x0),'ro');
plt.scatter(x0, f(x0), s=50, c='yellow');
Convergence rate is approximately \(1.8\). The advantage of the inverse method is that we will always have a real root (the parabola will always cross the x-axis). A serious disadvantage is that the initial points must be very close to the root or the method may not converge.
That is why it is usually used in conjunction with other methods.
Brentq Method¶
Brent’s method is a combination of bisection, secant and inverse quadratic interpolation. Like bisection, it is a ‘bracketed’ method (starts with points \((a,b)\) such that \(f(a)f(b)<0\).
Roughly speaking, the method begins by using the secant method to obtain a third point \(c\), then uses inverse quadratic interpolation to generate the next possible root. Without going into too much detail, the algorithm attempts to assess when interpolation will go awry, and if so, performs a bisection step. Also, it has certain criteria to reject an iterate. If that happens, the next step will be linear interpolation (secant method).
To find zeros, use
In [11]:
x = np.arange(-5,5, 0.1);
p1=plt.plot(x, f(x))
plt.xlim(-4, 4)
plt.ylim(-10, 20)
plt.xlabel('x')
plt.axhline(0)
pass
In [12]:
scipy.optimize.brentq(f,-1,.5)
Out[12]:
-7.864845203343107e-19
In [13]:
scipy.optimize.brentq(f,.5,3)
Out[13]:
2.0
Roots of polynomials¶
One method for finding roots of polynomials converts the problem into an eigenvalue one by using the companion matrix of a polynomial. For a polynomial
the companion matrix is
The characteristic polynomial of the companion matrix is \(\lvert \lambda I - A \rvert\) which expands to
In other words, the roots we are seeking are the eigenvalues of the companion matrix.
For example, to find the cube roots of unity, we solve
\(x^3 - 1 = 0\). The roots
function uses the companion matrix
method to find roots of polynomials.
In [14]:
# Coefficients of $x^3, x^2, x^1, x^0$
poly = np.array([1, 0, 0, -1])
In [15]:
x = np.roots(poly)
x
Out[15]:
array([-0.5+0.8660254j, -0.5-0.8660254j, 1. +0. j])
In [16]:
plt.scatter([z.real for z in x], [z.imag for z in x])
theta = np.linspace(0, 2*np.pi, 100)
u = np.cos(theta)
v = np.sin(theta)
plt.plot(u, v, ':')
plt.axis('square')
pass
Using scipy.optimize
¶
In [17]:
def f(x):
return x**3-3*x+1
In [18]:
x = np.linspace(-3,3,100)
plt.axhline(0, c='red')
plt.plot(x, f(x))
pass
In [19]:
from scipy.optimize import brentq, newton
brentq
is the recommended method¶
In [20]:
brentq(f, -3, 0), brentq(f, 0, 1), brentq(f, 1,3)
Out[20]:
(-1.8793852415718166, 0.3472963553337031, 1.532088886237956)
Secant method¶
In [21]:
newton(f, -3), newton(f, 0), newton(f, 3)
Out[21]:
(-1.8793852415718169, 0.34729635533385395, 1.5320888862379578)
Newton-Raphson method¶
In [22]:
fprime = lambda x: 3*x**2 - 3
newton(f, -3, fprime), newton(f, 0, fprime), newton(f, 3, fprime)
Out[22]:
(-1.8793852415718166, 0.34729635533386066, 1.532088886237956)
Finding fixed points¶
Finding the fixed points of a function \(g(x) = x\) is the same as
finding the roots of \(g(x) - x\). However, specialized algorithms
also exist - e.g. using scipy.optimize.fixedpoint
.
In [23]:
from scipy.optimize import fixed_point
In [24]:
x = np.linspace(-3,3,100)
plt.plot(x, f(x), color='red')
plt.plot(x, x)
pass
In [25]:
fixed_point(f, 0), fixed_point(f, -3), fixed_point(f, 3)
Out[25]:
(array(0.25410169), array(-2.11490754), array(1.86080585))
Mutlivariate roots and fixed points¶
Use root
to solve polynomial equations. Use fsolve
for
non-polynomial equations.
In [26]:
from scipy.optimize import root, fsolve
Suppose we want to solve a sysetm of \(m\) equations with \(n\) unknowns
Note that the equations are non-linear and there can be multiple solutions. These can be interpreted as fixed points of a system of differential equations.
In [27]:
def f(x):
return [x[1] - 3*x[0]*(x[0]+1)*(x[0]-1),
.25*x[0]**2 + x[1]**2 - 1]
In [28]:
sol = root(f, (0.5, 0.5))
sol.x
Out[28]:
array([1.11694147, 0.82952422])
In [29]:
fsolve(f, (0.5, 0.5))
Out[29]:
array([1.11694147, 0.82952422])
In [30]:
r0 = root(f,[1,1])
r1 = root(f,[0,1])
r2 = root(f,[-1,1.1])
r3 = root(f,[-1,-1])
r4 = root(f,[2,-0.5])
roots = np.c_[r0.x, r1.x, r2.x, r3.x, r4.x]
In [31]:
Y, X = np.mgrid[-3:3:100j, -3:3:100j]
U = Y - 3*X*(X + 1)*(X-1)
V = .25*X**2 + Y**2 - 1
plt.streamplot(X, Y, U, V, color=U, linewidth=2, cmap=plt.cm.autumn)
plt.scatter(roots[0], roots[1], s=50, c='none', edgecolors='k', linewidth=2)
pass
We can also give the Jacobian¶
In [32]:
def jac(x):
return [[-6*x[0], 1], [0.5*x[0], 2*x[1]]]
In [33]:
sol = root(f, (0.5, 0.5), jac=jac)
sol.x, sol.fun
Out[33]:
(array([1.11694147, 0.82952422]), array([-4.23383550e-12, -3.31612515e-12]))
Starting from other initial conditions, different roots may be found¶
In [35]:
sol = root(f, (12,12))
sol.x
Out[35]:
array([ 0.77801314, -0.92123498])
In [36]:
np.allclose(f(sol.x), 0)
Out[36]:
True