Newton's Method

Type of physical science: Computation

Field of study: Numerical methods

Although linear equations can be solved directly using a finite number of algebraic operations, nonlinear equations often cannot be solved in closed form. Newton's method is an iterative technique for approximating the solution to a nonlinear equation by a sequence of solutions to linear equations.

89317109-89752.jpg

Overview

When it is not possible to obtain a closed form solution to a given nonlinear equation, numerical methods can be used to approximate a solution. The most successful numerical schemes are often modeled around Sir Isaac Newton's method. The basic idea in Newton's method is to start with a guess for the desired solution and replace the nonlinear equation by a linear equation involving lines or planes that are tangent to the graph at the starting point. The solution to the linear approximating equation (it is hoped) provides an improved estimate for the solution to the nonlinear equation. This improved estimate for the root is used to obtain a new linearization. Again, it is hoped that the solution to the linear equation yields a still better approximation to the desired root. The iteration continues in this way.

Figure 1 shows Newton's method for the equation x² - 2 = 0 and the starting guess x1 = 8. In the first iteration, one constructs the tangent to the graph of y = x² - 2 at x1. The intersection of the tangent line with the horizontal x-axis is the new iterate x2. In the second iteration, one constructs the tangent to the graph at x2, and the intersection of the tangent line with the x-axis is x3. Observe that the iterations converge to the root where the graph crosses the x-axis and y is 0.

Now consider a general equation f(x) = 0. Let x1 denote the starting guess for a root. The equation of the line tangent to the graph of y = f(x) at x = x1 is where f', the derivative of f, denotes the slope of the tangent to the graph.

Since a root of the equation f(x) = 0 is a point on the graph of y = f(x) where y is zero, set y = 0 in the linearization to obtain an approximation to the original nonlinear equation: f(x1) + f'(x1)(x-x1) = 0. The root of this linear equation, denoted x2, is the new approximation to the root of the nonlinear equation. Thus x2 satisfies the equation f(x1) + f'(x1)(x2-x1) = 0. The difference x2 - x1, called the Newton step, is denoted s1. In terms of the Newton step, the iteration can be expressed x2 = x1 + s1, where f'(x1)s1 = -f(x1). To generate x3, one linearizes about x2 to obtain x3 = x2 + s2, where f'(x2)s2 = -f(x2). The iteration continues in the same way. At iteration k, the new iterate xk+1 is expressed xk+1 = xk + sk, where f'(xk)sk = -f(xk). One can divide by the derivative to get an explicit formula for the Newton step: sk = -f(xk)/f'(xk), so that the Newton iteration reduces to xk+1 = xk - f(xk)/f'(xx). This linearization technique applies to very general equations; in every case, nonlinear functions are replaced by linear functions to compute the Newton step. Consider a system of two equations in two unknowns x and y: f(x , y) = 0 and g(x, y) = 0 Now there are two functions f and g to linearize. Newton's iteration can be expressed:

xk+1 = xk + sk, where fx(xk, yk)sk + fy(xk, yk)tk = -f(xk, yk)

yk+1 = yk + tk, where gx(xk, yk)sk + gy(xk, yk)tk = -g(xk, yk).

Here fx and fy stand for derivatives with respect to x and y, respectively. Since there are two unknowns, x and y, the Newton step has two components s sub k and t sub k. Although the equations for the components of the Newton step appear complicated, computers can easily solve for sk and tk since the equations are linear. Each of the equations is the equation of a line with sk on the vertical axis and tk on the horizontal axis. The Newton step corresponds to the intersection of the lines.

Typically, Newton's method converges only when the starting guess is sufficiently close to the desired root of the equation. The collection of starting guesses for which Newton's method converges is its domain of convergence. To expand the domain of convergence, a damping parameter τ can be inserted in the iteration. In the case of one equation in one unknown, the damped iteration is xk+1 = xk - τsk where s sub k is the usual Newton step, and τ is a positive parameter, less than or equal to 1, chosen to enhance convergence. The damping parameter is frequently chosen using "Armijo's rule," which can be stated in the following way: τ = 2-j, where j is the first nonnegative integer for which ½ f(xk + τsk ½ ≤ - τ/2) ½ f(xk) ½.

If f' does not vanish at the desired root of the equation, and f is a nice smooth function, then Newton's method is quadratically convergent when the starting guess is sufficiently close to a root. That is, if e sub k denotes the error at step k, there exists a constant C such that ek + 1 ≤ Cek2. Modulo the constant C, the error is squared in each iteration. Even when f' vanishes at a root, Newton's method can converge; however, the convergence is typically linear. That is, there exists a constant c < 1 such that ek + 1 ≤ Cek.

When implementing Newton's method in specific applications, often the most difficult part of the implementation is the evaluation of derivatives. In quasi-Newton methods, these derivatives are approximated using differences of function values. In the case of one equation and one unknown, the simplest quasi-Newton scheme is obtained using the approximation f'(xk) ≈ [f(xk) - f(xk-1)]/(xk-xk-1) With this substitution in Newton's method, we have xk+1 = xk + sk, where {[f(xk) - f(xk-1)]/(xk - xk-1)}sk = -f(xk).

This scheme is called the secant method, since the derivative at x sub k is approximated by the slope of the chord through the previous two iterates.

Another way to simplify Newton's iteration is to reuse an old derivative rather than compute the derivative at the current iterate. For example, if the derivative at the starting point x1 is used repeatedly, one obtains the iteration xk + 1 = xk + sk where f' (x1)sk = -f(xk). In this scheme, called modified Newton's method, the derivative is evaluated at x1 in each iteration, instead of at xk. The modified Newton iteration (typically) converges when the starting guess is sufficiently close to a root, although the convergence is slower than that of Newton's method; as a general rule, the modified iteration converges linearly while the true Newton iteration converges quadratically.

Applications

Newton's method is used to solve a nonlinear equation using the solutions to a sequence of linear equations. If the original equation is linear, Newton's method is inappropriate.

For example, if an electrical circuit is composed of batteries and resistors that obey Ohm's law, then current through a resistor is proportional to voltage. Since current is a linear function of voltage, the equations describing the currents and voltages are linear, and Newton's method does not apply. Newton's method is appropriate whenever the underlying physical principle is nonlinear.

In modeling semiconductor devices, systems of equations involving exponentials often arise. Consider the equation x = 1 - x², which does not contain an exponential, but that will illustrate ideas nevertheless. This equation has a root near x = .618. . .that can be computed by Newton's method; since f(x) = x - 1 + x², the iteration reduces to xk+1 = xk - (xk - 1 +xk2)/(1+2xk) Starting from x1 = 0, one obtains the iterates given in table 1. Observe that the number of correct digits nearly doubles in each iteration; this behavior is characteristic of quadratic convergence. This doubling property usually occurs in a neighborhood of the root--far from the root, the convergence can be much slower.

Multiple line equation(s) cannot be represented in ASCII text; please see PDF of this article if available.

Another application of Newton's method is to the evaluation of elementary functions by a computer or a calculator. A common elementary function that is available on many calculators and in many computer programming languages is the square root function. To be specific, one can examine the problem of computing the square root of 2. The basic idea is the following: One devises an equation whose solution is the square root of 2 and solves the equation by Newton's method. One of the simplest equations with solution √2 is x²+2. Newton's method for this equation reduces to xk+1 = xk/2 + 1/xk. An iteration like this, involving multiplications, divisions, additions, or subtractions, can be implemented in the electronic circuits of a calculator or coded in the machine language of a computer.

In other applications, the equations that one wishes to solve are obtained through the discretization of a differential equation. Consider the following boundary-value problem associated with semiconductors: d²x(t)/dt² = -e-x(t), x(0) = 0 and x = 0. In other words, find a function x (t), defined for t between 0 and 1, whose second derivative at any t has the form -e-x(t). One technique for approximating this function is to partition the interval [0, 1] into N subintervals and replace the continuous equation by a finite difference approximation. If ti = i/N are the subdivision points, the following finite difference approximations to the original boundary-value problem are commonly employed: xi + 1 - 2xi + xi - 1 = -e-xi/N2, 1 ≤ i ≤ N - 1 where x0 = 0 and xN = 0, and x sub i is an approximation to x(ti). For each value of N, there is a different system of equations, and as N increases, the solution to the finite difference system approaches the continuous function x(t). Because this system of equations is nonlinear, Newton's method can be applied.

Context

Newton's method is named in honor of Sir Isaac Newton, who used it as early as 1669 to solve various equations. He applied it both to an equation of Johannes Kepler and to polynomial equations like xn = α. Since the solution of this equation is the nth root of α, he was able to compute, to any desired level of accuracy, roots of a given number. Computing in the 1600's was done by hand, and the rapid convergence of Newton's method made it an attractive method.

In the systematic development of Newton's method, modern mathematicians obtained the secant scheme by simplifying Newton's method, replacing Newton's derivative by a difference quotient. Historically, however, the secant method came first. The secant method first appeared in work of Franciscus Vieta in 1600, and Newton's method was a modification of this idea, in which the difference quotient was replaced by Newton's derivative. Despite the significance of Vieta's method, it was described in the 1670's as "work unfit for a Christian and more proper to one that can undertake to remove the Italian Alps into England." The extension of the Newton linearization approach to general abstract equations was studied in the 1950's by the Soviet mathematician Kantorovich.

Although Newton's method has been used for several hundred years to approximate solutions of nonlinear equations, its significance increased immensely with the introduction of the computer in the 1940's. By means of a computer, large linear systems of equations can be solved quickly, and, in particular, linearizations associated with Newton's method are solved routinely. As computer technology evolves, it is anticipated that Newton's method will continue to play a fundamental role in computational methods for science and engineering.

Principal terms

DERIVATIVE: the slope of a curve

DIVERGENT METHOD: a numerical method that does not approach the desired solution

DOMAIN OF CONVERGENCE: the starting guesses for which a numerical method converges

ITERATION: the operation of transforming one number into another number using some given formula

LINEAR CONVERGENCE: the error in a numerical method decreases by a constant factor less than one in each iteration

LINEAR EQUATION: the equation describing a line or a plane

LINEARIZATION: an approximation to an equation obtained by replacing nonlinear curves (or surfaces) by tangent lines (or planes)

NONLINEAR EQUATION: an equation that is not linear

QUADRATIC CONVERGENCE: a numerical method approaches the desired solution and the error is roughly squared in each iteration

ROOT: a solution to an equation

SECOND DERIVATIVE: the rate of change in the slope, an indication of the curvature

Bibliography

Dennis, J. E., Jr., and Robert B. Schnabel. NUMERICAL METHODS FOR UNCONSTRAINED OPTIMIZATION AND NONLINEAR EQUATIONS. Englewood Cliffs, N.J.: Prentice-Hall, 1983. Provides a nice treatment of both Newton and quasi-Newton methods. The treatment is more technical than that of Hager, but less technical than the book by Ortega and Rheinboldt.

Goldstine, Herman H. A HISTORY OF NUMERICAL ANALYSIS FROM THE SIXTEENTH THROUGH THE NINETEENTH CENTURY. New York: Springer-Verlag, 1977. Attempts to trace the development of numerical analysis during the period in which its foundation was being developed. Newton's method is one of many methods that are discussed.

Hager, William W. APPLIED NUMERICAL LINEAR ALGEBRA. Englewood Cliffs, N.J.: Prentice-Hall, 1988. Chapter 4 provides an easy-to-read development of Newton's method.

Larson, Roland E., Robert P. Hostetler, and Bruce H. Edwards. CALCULUS. Lexington, Mass.: D. C. Heath, 1990. A very popular introduction to calculus. Newton's method appears near the beginning, after derivatives have been introduced.

Ortega, James M., and Werner C. Rheinboldt. ITERATIVE SOLUTION OF NONLINEAR EQUATIONS IN SEVERAL VARIABLES. New York: Academic Press, 1970. A comprehensive mathematical treatment of methods for solving systems of equations.

Newton's method of x2 - 2 = 0 with starting guess x1 = 8

Newton's method for x = 1 - x2

Numerical Solutions of Differential Equations

Essay by William W. Hager