Hi I am working on a bin packing problem where rather than optimizing for the least number of needed bins we would like to optimize for the most remaining space amongst all bins (including bins containing an item). To achieve the spreading we came up with the idea to minimize the sum of the used space in each bin squared, that way the more full it is the more it is penalized. However this makes the problem an MIQP problem due to the quadratic component in the minimization, assumedly slowing down the optimization. Does anyone have a suggestion on how to better formulate the optimization parameter to achieve the same or similar effect? I've created two different models using OPL and CPLEX 12.3 to solve, using 80 bins and 4 items (as a test) the first model solves in ~9minutes on a 8 thread machine, the second model I ran for 30 minutes and still had a GAP of 47.51% and gave up. 9 minutes isn't terrible but I'd like to use about 320 items, and scaling is very poor (8 items was over an hour). I'm listing my two models and data below. Does anyone have a suggestion how to tweak the model for better runtime/scaling? Also one thing that confused me was when I ran model 1 CPLEX told me it took 65859055 iterations to finish, however I'd expect 40960000 to be enough to explore all possibilities of decision variable assignment (80^4). Why the disparity? Thanks in advance! Model 1:
Data 1:
Model 2:
Data 2: itemSize = #[ 1: 0.56583464, 2: 0.60144615, 3: 0.4545771, 4: 0.47858784 ]#; binCapacity = #[ 1: 2, 2: 2, 3: 2, 4: 2, 5: 2, 6: 2, 7: 2, 8: 2, 9: 2, 10: 2, 11: 2, 12: 2, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2, 18: 2, 19: 2, 20: 2, 21: 2, 22: 2, 23: 2, 24: 2, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2, 30: 2, 31: 2, 32: 2, 33: 2, 34: 2, 35: 2, 36: 2, 37: 2, 38: 2, 39: 2, 40: 2, 41: 2, 42: 2, 43: 2, 44: 2, 45: 2, 46: 2, 47: 2, 48: 2, 49: 2, 50: 2, 51: 2, 52: 2, 53: 2, 54: 2, 55: 2, 56: 2, 57: 2, 58: 2, 59: 2, 60: 2, 61: 2, 62: 2, 63: 2, 64: 2, 65: 2, 66: 2, 67: 2, 68: 2, 69: 2, 70: 2, 71: 2, 72: 2, 73: 2, 74: 2, 75: 2, 76: 2, 77: 2, 78: 2, 79: 2, 80: 2 ]#;
showing 5 of 6
show 1 more comments

A. This sounds like "load balancing". I understand that you do not wish to solve a bin packing problem but a closely related machine scheduling problem. Think of your items as jobs and the bins as machines. You would like to "schedule" the jobs on machines (assign the items to bins) such that the latest finishing time (the "makespan") is minimized. This should have the desired effect. You will achieve this goal using a linear objective function as follows: Introduce a (continuous) variable, say z. Then you need to replace the constraints sum(i in items) .... <= binCapacity[b]; by sum(i in items) .... <= z; and have "minimize z" as the objective function. This is a "min max" objective function, and hopefully does the job. B. use binary variables for modeling by all means (if you have the choice between integer and binary variables); this usually is stronger in terms of the linear relaxation, which in turn helps the branchandbound algorithm. This appears to be a reason why the second model is much weaker. C. The models have a lot of "symmetry", another reason for bad performance. That means: The solver does not know that there is no essential difference between bin 1 and bin 2 ... and bin 80. You can help the solver by introducing "an artificial distinction" between bins, eg., by introducing constraints that enforce that you open bin i+1 only if bin i is already open. This works for "standard" bin packing, but possibly not with your "balancing" objective function. Hm. D. The number of iterations reported is most probably the number of simplex iterations which is not the number of combinations to pack items. The solver solves the linear relaxation of your model first (and in every node of the branchandbound tree), this has nothing to do with the number of solutions to the integer program. Anyway, the number of iterations is extremely large for such a toy model. E. What comes to my mind as a remedy to several of your problems: An entirely different model formulation as a set covering model may help: Enumerate all possibilities ("patterns") to pack a bin, and choose a subset of patterns, such that every item is contained in one pattern. This will require column generation (and possibly branchandprice) to solve the model, which is not easy/possibly in OPL. My answer may leave you with more questions than before, feel free to followup on this post. One helpful information would be: do you need optimal solutions or "only" good quality? answered 27 Dec '11, 06:47 Marco Luebbecke ♦ Hi Marco Thanks for this! I've been playing around with your min max objective function and it definitely got me more scalability than I previously had. The obvious downfall to min max however is if there is one large job that nearly fills a machine by itself, which (my understanding) then places nearly no constraints on the filled capacity of the other machines with the exception that the sum on any of them needs to be less than the big job. In regards to E, do you have a link or more details on how this would work? An optimal solution would be best, or within some percentage of optimal.
(28 Dec '11, 13:35)
Halcyon
Ok, I understand the min max drawback. What about introducing an extra variable for each bin, and minimizing their sum? When you do not need an optimal solution, you could pregenerate "some" possible packings for one bin, say, several thousand. Maybe you can use your problem knowledge to decide about how "promising" patterns may look like. Objective coefficient of a pattern is the amount of wasted space. Introduce a binary variable for each pattern (=chosen for some bin or not). Constraints are: for each item i: sum over all variables/patterns that contain i must be >= 1.
(28 Dec '11, 17:19)
Marco Luebbecke ♦
You can see here http://www.springerlink.com/content/ubawrxrkf6nqeulj/ how it works for the bin packing problem when you want an optimal solution, i.e., when you need column generation to solve the linear relaxation. (needed to add a 2nd comment due to char count constraint)
(28 Dec '11, 17:19)
Marco Luebbecke ♦
"What about introducing an extra variable for each bin, and minimizing their sum?" I assume you mean a variable for the sum of the load in each bin, then minimizing the sum of those? In this case the sum is constant no matter the packing.. unless I misunderstood?
(29 Dec '11, 00:11)
Halcyon
you are right, that won't work. last try (not exact): introduce a variable for "each" load (in some discretization, say z_5%, z_10%, z_15%, etc.), and use for each bin b and "load percentage" p sum(i in items) .... <= p * binCapacity[b] + z_p; that penalizes the load in each bin above a certain level, could even be weighted in the objective function to penalize "larger percentages more".
(29 Dec '11, 03:32)
Marco Luebbecke ♦

Hello, I did my thesis on using constraint programming to solve this kind of problem (here). You might have a look at this article: Pierre Schaus, Pascal Van Hentenryck, JeanCharles Régin: Scalable Load Balancing in Nurse to Patient Assignment Problems. CPAIOR 2009: 248262. answered 04 Jan '12, 06:44 pierre schaus 
Letting
shrinks the range of bin loadings (in percentage terms). answered 04 Jan '12, 07:41 Paul Rubin ♦♦ 
I'm sorry because I read objective function of your model #1 incorrectly last night and therefore I implemented your first model in GAMS incorrectly. Therefore, I've to correct my previous statements as follows (apparently, I was way too sleepy while answering your question): a) If number of items is less than bins as in your examples, the optimal objective function would constant based on your inputs, which is equal to sum(i, (itemSize(i))^2). There are obvious multiple optimal solutions to this problem: Allocate items to distinct bins. However, if number of bins is less than items, then the finding optimal solution and optimal objective function are not trivial. b) Since I implemented your model incorrectly, the solution time I reported was wrong and your reported solution time was correct. However, I found an insight on why the CPLEX might be very slow to solve the model as in your data sets, number of bins is larger than number of bins. Apparently, as there are multiple optimal solutions (n!, which n is the number of items), CPLEX has a really hard time closing the gap as it has to explore a large number of nodes with equal objective function values in the branch & bound tree. c) I developed a another mathematical model in GAMS for your problem. The model minimizes the squared unused capacity of bins (please note the difference in objective function). For this matter, I had to add a binary variable denoting selection of bins. The model is as follows:
This model could be solved effectively for your data sets (in about 10 seconds on four x64 threads). Another positive side of this model is that it tends to keep the number of used bins small even if the number of available bins is larger than the number of items. Feel free to ask question if you don't understand the logic behind my model. ps. GAMS does not reformulate the models and just acts as medium for transforming your model to what could be understood by the solvers. However there are some solvers which could do this. answered 28 Dec '11, 11:25 Ehsan ♦ Hi Ehsan thanks for the suggestions. One problem is I actually want to maximize the sum of the remaining space per bin^2, which when I do that it causes the decision variable Y to be selected on all since the inequality is just <=. If you run your model on the data you'll find it packs everything into one bin which is opposite of the goal. Thanks!
(28 Dec '11, 14:11)
Halcyon
@Halcyon: Are you sure about this? I've run the model multiple times and its gives me a solution with objective function of 1.804221 and just two bins being used (items 1 and 4 in one bin and items 2 and 3 in one bin). As a matter of fact, a solution with only one used bin should be infeasible as the total size of items is about 2.1 which is larger than the capacity of the bins.
(28 Dec '11, 14:43)
Ehsan ♦

Given a fixed set of items and a fixed set of bins, the total unused space across all bins is a constant. Can you clarify your goal?
Hi Paul and Ehsan thanks for the feedback. I don't believe it should be a constant, or at least if it is then I have built it incorrectly. The goal is to load balance the items into the bins. I was expecting "pow(((sum(a in items) itemSize[a]*X[a][u])), 2)" to give me the total size of the items in bin u squared, is this not the case?
RE Ehsan 2: I'm wondering if GAMS is reformulating the problem differently from my OPL formulation?
Thanks!
Your objective function is not identically a constant, but total used space (and hence total free space) is. Squaring seems to be aimed at some sort of load "spreading" (echoing the title of your question), but it's still unclear what you want. Possible (distinct) goals include maximizing the minimum unused space in any bin, minimizing the spread between most and least empty space in a bin, minimizing squared deviation of actual unused space from "average" unused space, minimizing the spread between greatest and least percentage unused space, ad nauseum.
Hi Paul, understood. I need to balance feasibility and solution time on the size of problem, but minimizing the spread between greatest and least percentage unused space or minimizing squared deviation of actual unused space from "average" unused space would be good to try. I'd prefer to avoid min max unless it is the only feasible option for the reasons I put in my reply to Marco.
@Paul: Thanks so much. You're right. As a matter of fact, the special structure of the data sets @Halcyon has used caused me such belief. I've updated my answer to reflect on this matter.