How do you test different ideas (i.e. formulations, symmetry braking, parallelization, parameters...) on MIP solvers to get a reasonable result. For example I have a formulation for a certain type of MIPs and many test problems that I can run tests on. The problem is that I don't know when my new formulations help the MIP solver and when it does not. For what works on one test problem does not work on another.

As an example I am trying to figure out if it's better to use hyper threading or not with GUROBI or CPLEX. I have a 4-core machine so basically I can run 8 threads using HT. however in some cases this is slower than using 4-cores while in other cases its almost 2 times faster. This obviously depends on where the branching occurs. If I change the order of the constraints in the model then I might get a completely different comparison.

Same thing happens when I want to choose between Cplex and Gurobi. One lp file might be 50 times faster on Cplex while the other is 50 times slower.

I sort of have the same problem when comparing formulations since I have large MIPs where ordering of constraints can change the solution time 10 times. How would I compare this fairly to another formulation?

I am not certain what I should compare? solution times? explored nodes?

What would be a reasonable way of doing comparisons like this?

asked 09 Feb '11, 13:07

Buxley's gravatar image

accept rate: 9%

The first step in doing comparisons is to abandon all hope of a clean, definitive, unambiguous ranking of alternatives. Well, maybe that's a bit strong, but (as you've observed) for every approach there will be some problem instances/formulations that will make it look good and some that will make it look not so good. Things not inherent in the approach, such as the order variables or constraints are communicated to the solver, obscure parameter settings in the solver, the alignment of planets, etc. will alter your results.

That said, there are a variety of performance profiles that can be used to compare alternatives. For MIPs, it's common to plot the average gap (averaged across all solution attempts) versus time. You can also plot the fraction of replications where the optimal solution was found (whether proved optimal or not) versus time. Of course, given that the abscissa of the graph is time, you need to use the same hardware/software platform for all comparisons.

I would avoid number of explored nodes as a comparison measure for a variety of reasons. One is that, in some cases, you can reduce the number of explored nodes by doing things such as probing that are computationally expensive (and that, at heart, might really be equivalent to exploring a node, but without officially creating the node).


answered 09 Feb '11, 14:40

Paul%20Rubin's gravatar image

Paul Rubin ♦♦
accept rate: 19%

Thanks for that! I have always wondered about using nodes as an comparison since they (IMO) really give no information if you have different formulations. Still I see these kinds of comparisons rather often in different papers.

(11 Feb '11, 06:49) Buxley

Once upon a time, the preferred approach was to compare floating point operations (flops), which was supposed to compensate for hardware differences (whereas CPU time is hardware-specific). Even that neglects differences between additions/subtractions, multiplications and divisions, not to mention it's a royal pain to get at the number of flops. I've seen people count simplex iterations, but not all simplex iterations were created equal. I'd just look at time v. either gap or incumbent value and try to keep hardware and software/OS as constant as possible.

(12 Feb '11, 15:56) Paul Rubin ♦♦

Avoid drawing conclusions based on the arithmetic mean. Use the geometric mean instead.


answered 09 Feb '11, 14:54

Rune%20Sandvik%201's gravatar image

Rune Sandvik 1
accept rate: 0%

Good point. Or slap the box part of a box and whisker plot (say, 25th and 75th percentile) around each average (which probably should be median rather than mean).

(12 Feb '11, 15:58) Paul Rubin ♦♦

I just wanted to chip in on the Hyperthreading Technology (HTT).

HTT enables the OS to treat each physical core as two virtual cores, thus scheduling twice the number of threads simultaneously. This works because many times a given thread gets stalled or is waiting. The HTT-optimized OS can then schedule another thread. I presume this works great for multimedia and gaming. But I doubt if this will help in computationally intensive tasks involving MIPs.

Here is a quote from Wikipedia:

"...when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper Threading Technology is turned on..."


answered 11 Feb '11, 19:21

Siddhartha%20Maheshwary's gravatar image

Siddhartha M...
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 09 Feb '11, 13:07

Seen: 2,641 times

Last updated: 11 Feb '11, 19:21

OR-Exchange! Your site for questions, answers, and announcements about operations research.