I have to solve a large number (from 2 to 10 millions) of very similar small (maybe 100 constraints) problems where only a small number of constraints, always the same change.

Essentially I want to minimize the cost of covering consumption. The rules are more or less the following:

  • Each customer (we have millions of them) consume a number of units of different items, this is different for each customer.
  • We have a number of products, each of them include one or many items, at a given cost. The products and costs are common for all customers
  • Further there are some additional constraints that link which products can be combined for each customer, but the constraints are again the same for all customers.

I am planning on solving this using Spark and I am not familiar with the performance of algorithms and my question is, should I try to solve a very large number of small problems, or should I combine them in a large problem repeating the constraints between products for each customer?

Can I somehow leverage that the individual problems are very similar?


asked 10 May '18, 06:22

Luis%20Sisamon's gravatar image

Luis Sisamon
accept rate: 0%

If only a few constraints each time, it is probably fastest to use dual simplex on all problems other than maybe the first. This assumes a solver that can "hot start" after a change to a model, using the previous basis as a starting point. CPLEX has that ability, and I'm pretty sure most other good LP solvers do as well. I would test that theory by running a few problems, first solving them independently and then with hot starts and dual simple, and compare end-to-end run times.

I don't see any advantage to merging multiple problems into a single larger problem. Depending on what actually changes from one problem to the next, there might be some utility to doing a clustering analysis of sorts, trying to group problems so that there are minimal differences in constraints within a cluster.


answered 10 May '18, 11:25

Paul%20Rubin's gravatar image

Paul Rubin ♦♦
accept rate: 19%

Thanks for you answer. I have been considering the clustering option, I may not be able to cluster all cases, but probably can cluster some 80-90% of them and then handle the rest individually. I will look into the hot start option, I can probably sort the problems by calculating a distance between usage profiles. I still need to figure out how to best make use of the parallel capabilities of spark.

(10 May '18, 12:30) Luis Sisamon

On a related note, can a solver 'hot start' on a non feasible solution? For example let's say I solve an LP and get a feasible solution but then I need to add some additional constraints to the same problem; could I hot start even though the new constraint makes the last feasible solution invalid?

(11 May '18, 10:08) gtg489p

Yes. It is common (dare I say standard practice) in IP solvers to hot start the solution to a node problem with added cuts from the solution to the parent problem. Dual simplex generally works well for this. The new constraints make the previous primal solution superoptimal/infeasible but just make the previous dual solution suboptimal (but still dual feasible).

(11 May '18, 10:13) Paul Rubin ♦♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 10 May '18, 06:22

Seen: 165 times

Last updated: 11 May '18, 10:13

OR-Exchange! Your site for questions, answers, and announcements about operations research.