Is there any way to estimate typical problem sizes, in terms of the number variables and constraints?

The large commercial solvers excel in "large" problems, but I wonder how common these are in real-world usage. It would be interesting to know if any surveys have been done on what problem sizes are typically solved. A distribution would be ideal.

My initial thought was that the NEOS Server would probably have the best dataset for this, since it gets tens of thousands of problem submissions annually. However, these statistics do not seem to be available on the site.

asked 27 Oct '11, 04:49

DC%20Woods's gravatar image

DC Woods ♦
accept rate: 5%

edited 27 Oct '11, 04:50


Just a quick comment about NEOS: my impression is that NEOS is primarily used for trying out problems on different solvers, and I suspect the majority of the problems sent to NEOS are usually test sets, assignment problems, etc. For one thing, it is the go-to site in many optimization courses. So the statistics collected on NEOS may not actually represent so-called "real world" usage, but more academic, exploratory usage.

(28 Oct '11, 10:17) Gilead ♦

That's a good point, thanks.

(29 Oct '11, 21:20) DC Woods ♦

Yes, almost all solvers report problem size. Large problems are very common in real-world usage, especially in areas like supply chain optimization, planning and scheduling. Stochastic problems using a multiscenario approach are also inherently huge (because they are really a type disguised of Monte Carlo method).

I can't give you a distribution because I haven't gone through a systematic data collection process, but in general, large problems are the norm rather than the exception, especially for LPs, and increasingly, MILPs.

However I do have some anecdotal numbers because the folks in the office next to mine love to tell me their problem sizes. Also, at academic seminars given by commercial vendors (by this I mean OR solution vendors, not solver vendors), typically the first question that is asked during Q&A is "what are your typical problems sizes?" (if it wasn't already made clear in the presentation). These numbers are often bandied about, so if you've been around the block several times, you'll get a rough notion of what the numbers are.

Now, what is a large problem? It depends when you're asking the question. Not too long ago, an MIP with 1000s of constraints/variables would be considered large. 4-5 years ago, 40,000-80,000 became the new threshold for large. 2-3 years ago, 100,000 was the threshold. Last year, 200,000 became the state of the art threshold. (Mind you, this is based on my own sampling. Some people here may have different numbers -- these numbers are somewhat industry dependent too.) I should also mention that the number of binaries in these problems are about 10-15% of the number of variables. I'd imagine the numbers for pure LPs are even larger.

THat said, I would suggest that the number of variables in a problem is actually not a very meaningful way to convey the magnitude of the problem. One can always introduce a huge number of artificial variables (e.g. slack variables), which totally distorts the size of the original problem. The number of equality constraints is a slightly more meaningful number, but again, one can introduce a lot of redundant equality/inequality constraints and problem cuts that do not actually represent the magnitude of the original problem; they are merely mathematical constructs to help the problem solve more easily.

Furthermore, the overwhelming majority of these problems are sparse. Side note: Dense optimization problems are seen in some domains like statistics -- a few years ago, a stats software vendor came up to me and said they were solving large-scale problems of about 10,000 variables/constraints, but because they're dealing with dense matrices, 10,000 to them was a big deal.

In general though, most other commercial optimization problems are sparse. So while 200,000 variables sounds impressive, there are probably only a few non-zero elements in the matrix. Which brings me to my point: I believe (and I think some people agree with me) that the number of non-zero elements in the constraint matrix (in the case of nonlinear programs, the the number of nonzeros in the Jacobian) gives one a far better sense of the magnitude of the problem than the number of variables and constraints.

Update: I just heard that MIP problems with 200,000 binary variables are routinely solved now. Problems of this size are atypical in my community though.


answered 27 Oct '11, 11:01

Gilead's gravatar image

Gilead ♦
accept rate: 15%

edited 28 Oct '11, 10:07


I'd like to add to this great answer that although there are types of MIP problems with hundreds of thousends (even millions) of variables that are solved there are also small MIPs, some less than 100 variables, which are quite hard to solve for even the best MIP solvers. A famous example are market share problems. Another example are certain problems with general integer variables. For these problems non-zero count is also not a good measure, they are just structurally hard.

(28 Oct '11, 11:20) Philipp Chri...

I don't know any such survey or research. I think problem-solving complexity is problem-dependent and different classes of problem. In fact, complexity of solving a problem is perhaps better explained via some other inherent characteristics of the problem than just number of variables and constraints. In addition, these inherent characteristics are usually better-understood by everyone. For example, consider routing problems. I believe complexity factor is better determined with the number of nodes. Also, the effect of incorporating multiple new additional constraints could be less than adding just a new node, which would greatly increase number of variables and constraints.

Another important point is that complexity of solving different problems occurs at different levels. Therefore, it might be impossible to create a general formulae or rule of thumb for every problem. For example, you might easily solve a TSP with n nodes while solving similar VRP for that n nodes is impossible or takes a lot of time (maybe, this situation is more controled by the number of variables and constraints for different problems).

Finally, I recommend checking for papers with case studies in areas of your interest, which could provide you with more information regarding the size of real-world problems. For this purpose, perhaps Interfaces would be a good start.


answered 27 Oct '11, 07:26

Ehsan's gravatar image

Ehsan ♦
accept rate: 16%

Thanks. In this case, I really am interested in size more than complexity. However, I do appreciate that complexity has more to do with how long a solver would run, and how much difficulty it would have. Individual case studies also don't help much. I know that there is a large range of problems that need to be solved, I'm interested in the frequencies of problems of each size. Also case studies are probably biased towards "interesting" or "difficult" problems, and I'm wondering about typical usage.

(27 Oct '11, 07:32) DC Woods ♦

This is a really interesting question. I know many MSc and PhD students who are usually confused whether to tackle their research problem via special-purpose algorithms or just generic optimization software. If such information is known, they could select their approach more wisely. Perhaps surveys by leading OR societies could provide the answer to this question.

(27 Oct '11, 07:55) Ehsan ♦

I'd guess that most of the time generic optimization software would be more effective than special-purpose algorithms, unless the purpose of the research was to develop those algorithms. The generic solvers have decades of work refining them... If the generic solvers fail, then my next step would be metaheuristics.

(27 Oct '11, 07:59) DC Woods ♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 27 Oct '11, 04:49

Seen: 3,045 times

Last updated: 29 Oct '11, 21:20

OR-Exchange! Your site for questions, answers, and announcements about operations research.