I was wondering if any of the free solvers were powerful enough to solve a problem that has approximately 50M variables and 100+M constraints. It's a maximization and is entirely linear. I was thinking of using CoinOR's CLP solver on a relatively powerful workstation. Will this yield a solution? If not, what is the rule of thumb for a free LP solver's maximum problem size? Would a commercial solver (e.g. cplex or gurobi) handle such a problem? Note: I realize the answer depends on the problem type and solver, but I just wanted to ensure that I'm not off by multiple orders of magnitude before I actually start. The largest problems I could find references to (on a Google search) were less than 10M variables. asked 27 Jan '14, 03:31 Tim Y
showing 5 of 8
show 3 more comments

Most likely the dual problem will solve more efficiently than the primal since you have more rows than columns. Moreover, most likely an interiorpoint based algorithm works better than a simplex algorithm due to the size. The latter means that you most likely should use a commercial code. Since the public domain interiorpoint based codes for large scale LPs are not that good. [Please correct me if I wrong and biased.] You will need a large computer in any case but how large will depend on the structure in your problem i.e. the sparsity pattern after presolve. It would be fun if you would donate the problem to public domain so it can be used for test and benchmarking. I will be happy to try our code MOSEK on it. answered 27 Jan '14, 09:20 Erling_MOSEK 1
+vote for the suggestion to make the problem public, I'd be happy to run CPLEX on it as well!
(28 Jan '14, 04:15)
jfpuget
2
@jfpuget I think we could use LPLIB collection of challenging largescale LP test problems. Something that could test the limits but not be totally out of reach.
(28 Jan '14, 04:52)
Erling_MOSEK

Number of constraints or variables is a poor estimate even for LP performance. Since it's a LP I would think your biggest problem would be memory. One suggestion could be to build a smaller model on a standard desktop (if possible) increase the size and measure memory consumption to predict the memory consumption for the final model. Will give you an hint of machine size needed.
In absolute last resort and worst case, one can build a customized LP algorithm (fairly easy) which offload parts of the problem to disk. But that would require more than average optimizer knowledge.
You could also try to exploit any specific structure in your model via decomposition techniques to decrease your CPU time significantly. Based on the number of variables and constraint, I bet there is something in there to constitute a nice and useful structure (e.g., planning periods or scenarios).
@Ehsan Have you had success with decomposition for huge LP's in terms of cutting down solving time ? From my experience it does not work very well, except for exceptional well suited structures (even if you ignore the tailing effect).
Probably a better and faster advice, just pick the largest machine on amazon EC2 and try it out... Then you got your answer :)
@Bo: I agree. However, I think the main point of decomposition is to fit the problem into the machine, not just the CPU time. My own experience is with facility location problems, which are usually perfect for decomposition. Based on experience and reliable results from the literature, instances up to 100+M variables and constraints could be solved for basic FLPs. However, these simple FLPs could not be solved directly due to requiring a huge amount of memory. I think this is worth investigating before just putting the whole model into the solver at once.
@Ehsan In that case we agree, but for reducing memory using a partial pricing primal simplex with sections of the non basic variables written to disk is sufficient and will likely perform better if tailored correctly (I did implement such an idea long time ago).
BTW : In the previous mention case and also in many other huge problems, it tend to be the user who forget the model often already is an approximation to the real problem at hand i.e too much detail in the model.