Fig. 1. Drawn in green is the locus of points satisfying the constraint
g(
x,
y) =
c. Drawn in blue are contours of
f. Arrows represent the gradient, which points in a direction normal to the contour.
In mathematical optimization problems, Lagrange multipliers are a method for dealing with constraints. Suppose the question as given is to find local extrema of a function of several variables subject to one or more constraints given by setting further functions of the variables to given values. The method introduces a new unknown scalar variable, the Lagrange multiplier, for each constraint; and forms a linear combination involving the multipliers as coefficients. This reduces the constrained problem to an unconstrained problem. It may then be solved, for example by the usual gradient method.
(The defense for this can be carried out in the standard way as concerns partial differentiation, using either total differentials, or their close relatives, the chain rules. The object is, for some implicit function, to find the conditions so that the derivative in terms of the independent variables of a function vanish, or become 0, at some set of inputs.)
To explain why this has a chance of working, consider a two-dimensional case. Suppose we have a function f(x,y) to maximize, subject to
- g(x,y) = c,
where c is a given constant. We can also visualize level sets or contours of f given by
- f(x,y) = d
for various values of d. The constraint is to stay on one contour of g, given by fixing the value of g to be c. Suppose we are walking along the g=c contour. Since the contours of f and g will be distinct, traversing the g=c contour may in general cross many contours of f. So let us take note of the various contours f = d we cross for different values of d. If we cross the contour transversally, we can increase the value of f by walking 'uphill': that is following the direction leading to higher values of f. Only if the contour f = d we are trying to cross actually touches tangentially the contour g = c we are confined to, will this not be possible. At a constrained maximum of f, that must be true.
A familiar example can be obtained from weather maps, with their contours for temperature and pressure: the constrained extrema will occur where the superposed maps show touching lines (isopleths).
Geometrically we translate the tangency condition to saying that the gradients of f and g are parallel vectors at the maximum. Introducing an unknown scalar λ, the gradient of
- f + λg
is then 0 for some value of
;. This in geometrical form is the Lagrange multiplier argument:
- f + λg
must be stationary, where the multiplier λ is a new variable, at a local extremum.
The method of Lagrange multipliers
Let f be a function defined on Rn, and let the constraints be given by gk(x) = 0 (perhaps by moving the constant to the left, as in gk(x) - c = 0). Now, define the Lagrangian, Λ, as

Observe that both the optimization criteria and constraints gk are compactly encoded as extrema of the Lagrangian:

and

Often the Lagrange multipliers have an interpretation as some salient quantity of interest. To see why this might be the case, observe that:

Thus, λk is the rate of change of the quantity being optimized as a function of the constraint variable. As examples, in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action, the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential
can be interpreted as a Lagrange multiplier determining the change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In economics, the optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the value of relaxing a given constraint (e.g. through bribery or other means).
The method of Lagrange multipliers is generalized by the Karush-Kuhn-Tucker conditions.
Example
Suppose we wish to find the discrete probability distribution with maximal information entropy. Then

Of course, the sum of these probabilities equals 1, so our constraint is g(p) = 1 with

We can use Lagrange multipliers to find the point of maximum entropy (depending on the probabilities). For all k from 1 to n, we require that

which gives

Carrying out the differentiation of these n equations, we get

This shows that all pi are equal (because they depend on λ only). By using the constraint ∑k pk = 1, we find

Hence, the uniform distribution is the distribution with the greatest entropy.
For another example, see also derivation of the partition function.
External links
For references to Lagrange's original work and for an account of the terminology see the Lagrange Multiplier entry in
For additional text and interactive applets