# Cuts for Mixed Integer Programs

15 December 2023

Some people’s mind, upon hearing about optimization,
start to wander through continuous variables, gradients
and such. But there is much more out there. Sometimes,
for example, we need to deal with variables that act
like a *switch*, showing some kind of on–off
behaviour. In that case, continuous variables are not
enough: we need the full power of integer variables.

Integer variables naturally arise in a variety of situations: discrete choices, problems in combinatorics, machines that are either off or are on with a minimum cost, feasibility sets formed as the union of simpler parts, etc. Despite the richness of the continuous world, these are all common cases requiring only expressible via integrality.

In a previous post, we discussed approximations by cuts and how they are a useful tool for representing value functions of parameterized optimization problems. Cuts are tightly related to convex functions and, consequently, to continuous optimization. Nevertheless, there are techniques to calculate cuts in the presence of integer variables.

Today, we will continue exploring the world of
cutting planes with a focus on *mixed integer
programs* (MIP) — optimization problems containing
both continuous and integer decision variables:

\begin{array}{rl} f(x) = \min\limits_{u} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ & u \in \R^n \times \Z^k. \end{array}

We will investigate how the presence of integer variables affects the graph of value functions, and learn how to use cuts to approximate this kind of function.

## The Shape of an Optimal Value Function

Given a parameterized optimization problem, how does it value function look like? One should expect that the better behaved a problem, the better behaved its value function, and it is totally right! Value functions inherit some good properties of their underlying programs.

Let’s start with convex programming. These are among the well-behaved classes of optimization problems and, as expected, their optimal value functions have a well-defined shape. Interestingly, if all data in an optimization problem is convex, its optimal value function ends up also being convex.

An optimal value function

\begin{array}{rl} f(x) = \min\limits_{u} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ \end{array}

where the objective c is convex and the
feasibility set X is
jointly convex in x and
u, *is a convex
function*.

A case of particular importance is that of linear programs, that is, optimization problems where the objective is linear and the feasibility set is given by linear equalities and inequalities. These are a special case of convex programming, so their optimal value functions are guaranteed to be convex. But in the presence of all this structure, we can go further: they are in fact polyhedral.

For simplicity, we will only state the theorem for standard form LPs. This is without loss of generality, since one can put any linear program into this form.

The optimal value function for a *standard form
linear program*

\begin{array}{rl} f(x) = \min\limits_{u} & \left\langle c,\, u\right\rangle \\ \textrm{s.t.} & Au = x, \\ & x \ge 0, \end{array}

is a *polyhedral function*.

A direct corollary is that the value function of any
linear program is representable with *finitely many
cuts*. This is a powerful tool when proving the
convergence of cutting plane algorithms for linear
programs.

Let’s now switch our focus towards *mixed integer
programs* (MIP): optimization problems with both
real and integer decision variables.

\begin{array}{rl} f(x) = \min\limits_{u} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ & u \in \R^n \times \Z^k. \end{array}

Our first step in the analysis of a MIP is to isolate the continuous and integer variables into a two-stage program. To do that, we have to rewrite f into an equivalent but more verbose form. We begin by separating the decision variable u into a continuous part u_C and an integer part u_I. The trick is to change how we enforce the integrality of u_I. Let’s allow the entire control — including u_I — to be continuous but add a new integer variable z constrained to equal u_I.

\begin{array}{rl} f(x) = \min\limits_{u_C, u_I} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ & u = (u_C, u_I) \in \R^{n+k} \\ & \textcolor{red}{u_I = z} \\ & \textcolor{red}{z \in \Z^k}. \end{array}

This construction adds nothing but redundancy to our problem. Nevertheless, this gained redundancy isolates the integer variables into a single constraint separable from the others. This way, we can rewrite the optimization as a problem with two stages:

\begin{array}{rl} f(x) = \min\limits_{z} & Q(x, z) \\ \textrm{s.t.} & z \in \Z^k \\ Q(x, z) = \min\limits_{u_C} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ & u = (u_C, z). \end{array}

In the equation above, the second optimization
problem in Q has only
continuous variables because it only sees the variable
z *fixed* at
some integer value.

When Q has some known structure on x — convex or linear programming, for example — the function f will locally have this same structure, because it is a discrete minimum of Q. The intuitive view is that for each value of z, there is a well-behaved function Q(\cdot, z). For a given x, f chooses the best of these functions and then optimizes on it. Furthermore, since the minimum in z is discrete, continuity implies that we must have whole connected ranges of x for which it will choose the same z.

For example, in the case of a *mixed integer
linear program*, the optimal value function is a
discrete minimum of polyhedral functions, meaning that
despite not being polyhedral itself, it is nevertheless
piecewise-linear. See the paper by Ted K Ralphs and
Anahita Hassanzadeh^{1}
for an in-depth study of such value functions.

All this discussion indicates that convex problems with integer variables tend to look like convex, at least locally. Thanks to that, it is common to use techniques from convex programming to approximate and solve them. In particular, one generally expects to be able to approximate a MIP using cuts, even if this approximation isn’t the tightest possible. Let’s focus then on how to calculate cuts for this kind of value function.

## Different Flavours of Cuts

For a value function defined via a convex optimization problem, Lagrangian duality yields tight cuts for free. In the presence of integer variables, however, life is not so simple because the solving methods do not automatically calculate dual multipliers for us.

Calculating a cut for a MIP requires more work but is still feasible. All the strategies that we will show today follow the same idea: Find a convex underapproximation for your value function and calculate cuts for it. These are also automatically cuts for the original problem.

As you may imagine, there are several options to
choose from, each with its pros and cons. In the
following, let’s take a look at three such methods with
varying approximation quality, and corresponding
computational cost. We follow the cut families defined
by Jikai
Zou, Shabbir Ahmed, and Xu Andy Sun^{2} while
generalizing the ideas for a simpler context.

Do keep in mind that not all MIP value functions
accept cuts, because their epigraph could have the
entire space as its convex hull.^{3} Nevertheless, in the
real world you rarely stumble into one of these. Thus,
knowing how approximate MIPs can prove to be a useful
tool to keep in your utility belt

Throughout this section, we will only consider optimization problems that are “convex except for integer variables”, that is, problems having a convex objective and convex feasible set, but with non-convexity arising from part of the decision variables being integer.

\begin{array}{rl} f(x) = \min\limits_{u} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ & u \in \R^n \times \Z^k. \end{array}

### Benders Cuts

Let’s start with the simplest kind of cut for a mixed
integer program. Given a value function f, we define its
*continuous relaxation* as the value of the same
optimization problem, but with the integer variables
relaxed to be continuous.^{4}

\begin{array}{rl} {f}_C(x) = \min\limits_{u} & c(u) \\ \textrm{s.t.} & (x, u) \in X \\ & u \in \R^n \times \textcolor{red}{\R^k}. \end{array}

Based on our previous discussions, the relaxation {f}_C is a convex function. It is also everywhere below f, because they have the same objective function but {f}_C has a larger feasible set.

f(x) \ge {f}_C(x).

From this inequality and the convexity of f, we can calculate (not necessarily tight) cuts at any point.

A **Benders cut** for f at x_0 is a tight cut for the
continuous relaxation {f}_C at x_0,

f(x) \ge {f}_C(x_0) + \left\langle\lambda,\, x - x_0\right\rangle.

Since {f}_C is convex, Benders cuts exist at each point by Lagrangian duality. They are also cheap to calculate, because, after all, you only have to solve a convex program to find them. Also, in the particular case where f is a MILP, its relaxation will be a Linear Program — which are even faster to solve. The name Benders comes from the Benders decomposition for stochastic linear programming, for which this family of cuts was invented in the 60s.

In terms of implementation, calculating a Benders cut amounts to relaxing the parameterized optimization problem and finding a cut for it. We illustrate the procedure below, in Julia code.

```
function benders_cut(f, a)
= continuous_relaxation(f)
fc
# Find optimal value and dual multiplier
= solve_convex(f, a)
value, multiplier return Cut(a, value, multiplier)
end
```

Everything has been well and good, so you might be
asking: what are the disadvantages of Benders cuts? The
answer is that, in general, they are far away from
tight. The continuous relaxation {f}_C is *a convex
underapproximation* of f, but there is no guarantee
that it is a good underapproximation — It can be too
low. Another question is that the continuous relaxation
is *representation dependent*. Equivalent
optimization problems have the same optimal value
function but may yield different continuous
relaxations.

### Lagrangian Cuts

Even though it does not come for free, as in the convex case, Lagrangian duality still works for mixed integer programs. The difference is that you have to explicitly solve the dual formulation. As a prize for your effort, you will gain the tightest cut possible at a chosen point.

Recall from the previous post, that the convex relaxation of a function is represented as the the Lagrangian dual problem. For our MIP, it has the following form:

\begin{array}{rl} \check{f}(x) = \max\limits_{\lambda} \min\limits_{u, y} & c(u) + \left\langle\lambda,\, x - y\right\rangle \\ \textrm{s.t.} & (y, u) \in X \\ & u \in \R^n \times \Z^k. \end{array}

A cut for the dual value function is called a
*Lagrangian cut*, because it comes from the
Lagrangian dual. It is the tightest cut possible at a
point.

A **Lagrangian cut** for f at x_0 is a tight cut for the
dual \check{f} at x_0,

f(x) \ge \check{f}(x_0) + \left\langle\lambda,\, x - x_0\right\rangle.

Despite the dual function \check{f} being convex, it does not come from a convex optimization problem and, consequently, is much harder to solver. In general, to evaluate \check{f}, you will need to solve several mixed integer programs in order to find the optimal multiplier \lambda. Thus, despite their precision, Lagrangian cuts are computationally expensive approximations. This is a limiting factor in their usefulness.

### Strengthened Benders Cuts

Until now we’ve seem two kinds of cuts: Benders are cheap but loose while Lagrangian are tight but expensive. Is there a middle ground? Perhaps there is some way to calculate cuts that are still good, even if not the best possible, but do not require so many resources to calculate.

The idea we will follow is that instead of directly calculating good cut, we will take a loose one and improve it. As we will shortly see, it is possible to turn any valid cut into a tight one at the price of solving one extra mixed integer program. The intuition is that we can move a cut up until it hits the function’s graph.

Recall from the previous post that the Lagrangian relaxation of a value function is an affine underapproximation equal to the tightest cut for f with fixed inclination \lambda, defined as

\begin{array}{rl} L(f, \lambda)(x) = \min\limits_{u, y} & c(u) + \left\langle\lambda,\, x - y\right\rangle \\ \textrm{s.t.} & (y, u) \in X \\ & y \in \R^n \times \Z^k. \end{array}

Thus, the Lagrangian relaxation transforms a loose
cut into a tight one with the same inclination. We call
this procedure *strengthening a cut*. The
implementation goes like this:

```
# Returns the tightest cut for `f`
# with the same inclination as the argument cut.
function strengthen_cut(f, cut)
= lagrangian_relaxation(f, cut.λ)
L # Find optimal value for Lagrangian relaxation
= solve_mip(L, cut.x)
Lx
return Cut(cut.x, Lx, cut.λ)
end
```

In particular, Benders cuts are great candidates for being strengthened.

A **strengthened Benders cut** for f at x_0 is a tight cut for the
Lagrangian relaxation using the multiplier \lambda from the Benders
cut:

f(x) \ge L(x_0; \lambda) + \left\langle\lambda,\, x - x_0\right\rangle.

Evaluating the Lagrangian relaxation amounts to solving a MIP not much harder than f itself. Thus, if we know how to find the value of f, we also know how to find its Lagrangian.

In terms of computational effort, calculating a strengthened Benders cut requires solving a convex program for the inclination followed by a mixed integer program for the constant. Below is some Julia some illustrating the procedure.

```
function strenghtened_benders_cut(f, a)
# Calculate Benders cut to find dual multiplier
# Cost: Solve 1 convex program
= benders_cut(f, a)
cut_b # Solve Lagragian relaxation to improve cut
# Cost: Solve 1 mixed integer program
= strengthen(f, cut)
cut_sb
return cut_sb
end
```

Solving a MIP is much more costly than solving a
convex program, especially in the linear case.
Nevertheless, tight cuts tend to be worth the effort,
since they are directly approximating the best convex
underapproximation instead of some
representation-dependent relaxation. The main problem
with strengthened Benders cuts is that they are only
guaranteed to be tight *somewhere*. They are not
necessarily tight at the point of choice.

## Parting Thoughts

In the end of the day, the right cut for your MIP will depend a lot on the application at hand. There is always a trade-off between precision and computational effort, as you can see in the following table.

Cut | Tightness | Effort |
---|---|---|

Benders | Loose | Convex |

Strengthened Benders | Tight somewhere | Convex + MIP |

Lagrangian | As tight as possible | Many MIPs |

One thing to keep in mind is that the time necessary to calculate a tighter cut could be used to calculate a lot of Benders cuts — a looser but better-shaped approximation. If this Benders approximation stops converging, you can strengthen some cuts to “clean up some area”. A Lagrangian cut works best as a last resort, and only if you are interested in the area around a very specific point, because in the time you are calculating it, you could be getting many strengthened Benders cuts to cover a larger area.

“On the Value Function of a Mixed Integer Linear Optimization Problem and an Algorithm for Its Construction,” 2014, 33, http://coral.ie.lehigh.edu/~ted/files/papers/MILPValueFunction14.pdf.↩︎

“Stochastic Dual Dynamic Integer Programming,”

*Mathematical Programming*175, no. 1-2 (May 2019): 461–502, https://doi.org/10.1007/s10107-018-1249-5.↩︎As an example, consider h(x) = -|x|. This function accepts no cuts and is representable as the MIP: \begin{array}{rl} h(x) = \min\limits_{t,z} & t \\ \textrm{s.t.} & t \ge zx, \\ & t \in \R,\, z \in \{-1, +1\}. \end{array} ↩︎

In the context of Linear Programming, you will probably find it with the name “LP relaxation” because it turns a MILP into a LP.↩︎