I have a problem, and it seems like it should be something that someone has studied before. I have a sorted list of N elements, and I want to divide them into K groups, by choosing K-1 split points between them. There may be elements with the same value, and we want to have items with same value in the same group. Find K groups as close in size to round(N/K) as possible.

For example, divide these 32 elements in to 4 groups of size 8:

1 1 1 1 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 7 8 9 10 10

One solution would be these 3 break points:

1 1 1 1 2 2 | 3 3 3 3 3 3 3 3 | 4 4 4 4 5 5 5 5 5 | 6 6 6 6 7 8 9 10 10

1 1 1 1 2 2         = 6 elements,  error = abs(8-6)=2
3 3 3 3 3 3 3 3     = 8 elements,  error = abs(8-8)=0
4 4 4 4 5 5 5 5 5     = 9 elements,  error = abs(8-9)=1
6 6 6 6 7 8 9 10 10 = 9 elements,  error = abs(8-9)=1

total error = 4

Does this look familiar to anyone? I'd like an approximation algorithm if possible.

Thanks, Craig Schmidt

asked 06 Jan '12, 15:40

Craig's gravatar image

Craig
1513
accept rate: 0%

retagged 07 Jan '12, 07:01

fbahr's gravatar image

fbahr ♦
3.5k313


I think a dynamic program (which would run in polynomial time). For each break position x, define the value v(x,k) = best objective if the k'th break is in position x. Then you calculate v(x,k) by running through all x'<x and taking v(x',k-1) + cost having [x',x] as a group. The final objective is the best v(x'',k) for the final position x'' (I think I need some notation).

In your example, if we denote 1,2,3,.. 10 to be the break positions after the corresponding number, we get

v(1,1) = 8-4 = 4

v(2,1) = 8-6 = 2

v(2,2) = 4+ (8-2) = 10

v(3,1) = 14-8 = 6

v(3,2) = min (v(2,1)+8-8, v(1,1)+abs(8-10)) = min (2,6) = 2

v(3,3) = 10 + 8-8 = 10 etc.

v(10,4) will have your value. Complexity is something like N^2K, though I am sure you can bring it down.

link

answered 07 Jan '12, 02:50

Michael%20Trick's gravatar image

Michael Trick ♦♦
3.9k926
accept rate: 20%

edited 07 Jan '12, 02:52

1

nice. so the graph problem I was looking for is a shortest path problem in a state network...

(07 Jan '12, 03:52) Marco Luebbecke ♦

For what it's worth I implemented a simple MiniZinc (CP) model for this problem: equal sized groups.mzn . It assumes that k is fixed (searching for a variable k is quite harder to do in MiniZinc).

The main constraints are the following where a is the array of values, k is the number of buckets. x is an array of the break points (decision variable), the array s is the number of elements in each bucket.

% Set the break points (x) and sum the number of elements in each bucket (s)
constraint
   s[1] = x[1]
   /
   forall(i in 2..k-1) (
      s[i] = x[i]-x[i-1]
   )
   /
   s[k] = n-x[k-1]
; 

The following constraint ensure that we don't have a break point between elements with the same values:

forall(j in 2..n) (
   a[j-1] = a[j] -> not(exists(p in 1..k-1) ( x[p] = j-1) )
   % alt. version
   % a[j-1] = a[j] -> forall(p in 1..k-1) (x[p] != j-1 )     
)

The objective is to minimize z which is defined like this:

int: gsize = round(int2float(n) / int2float(k)); % group size
var 0..n: z = sum(i in 1..k) ( abs(s[i]-gsize) );

Using some random data, it's quite fast for arrays of size < 100, and some solvers (such as Chuffed, a lazy SAT solver) have no problems with arrays of size 1000, at least when k is not too large (say < 20 or so); however Chuffed have to work a lot for arrays of size 10000. (I don't know if these sizes are in the range of the application.)

I have assumed that the "grouping" constraint is hard in the requirements, i.e. that element with the same values must be in the same bucket. However, when k > (number of distinct values in the array) this is not possible. The model check for this via the variable nval, and in those cases it just skip this constraint (though this is not shown in the code above).

Also, I have not been able to identify any upper bound on the decision variables (i.e. lower than n, the size of the array) since I don't know if there are some restrictions of the distribution of the given values. This is unfortunate since this model would do (much) better if the domains of the decision variables (x, s, and z) where smaller.

Update: I have now changed the model with two improvements:

1) Tighter domain for x by using just the valid break points.

set of int: valid_breaks = {j-1 | j in 2..n  where a[j] != a[j-1]};
% ...
array[1..k-1] of var valid_breaks: x;

Many solvers are faster with this variant, though the Chuffed solver don't like it.

2) Replaced the implication constraint shown above with the following by eliminating the invalid breakpoints, i.e. the positions between elements of the same values; this is useful if we can't use the valid breakpoints as a domain for x:

set of int: non_valid_breaks = {j-1 | j in 2..n  where a[j] = a[j-1]};
% ...
forall(j in non_valid_breaks, p in 1..k-1) (
   x[p] != j
)

Now Chuffed solves the 10000 problem instance in about 0.1 second (though it takes 30 seconds to generate the FlatZinc .fzn file), and both fzn2smt and G12's LazyFD is much faster for the 1000 size instance (just a second or two).

link

answered 08 Jan '12, 04:58

Hakan%20Kjellerstrand's gravatar image

Hakan Kjelle...
471127
accept rate: 0%

edited 08 Jan '12, 16:04

Thanks to everyone who answered my question. It is really interesting to see some different approaches. I came up with a recursive, heuristic approach, to my question. I considered two forms of the problem. The first, as I stated it above, we want exactly K groups. The second, we want at most K groups. This relaxation can give better solutions in terms of error.

We want K groups out of the N data points. Let L = int(round(float(N)/float(K))) be the desired size of each group.

We first try to build a chunk of size L. There may be a group of identical points that goes across the ideal chunk boundary. We don't know if it is better to include or exclude this group, so we try both and choose the solution with the best error. For each case, we call our function again with the remaining data and K-1 groups. If we can build a chunk of size L then we just do so, and recursively call the algorithm on K-1 groups.

In the exact case you have an additional restriction on the size of the group. You need to make sure that the remaining data has at least K-1 distinct values, or you won't be able to produce K-1 groups.

Splitting our N points is an O(N) operation. The recursion will have to explore 2^K solutions in the worst case, although in practice you won't have a block overlapping the desired boundary each time. Thus the overall time is O(N 2^K). If K is small, as it is for my cases, this will be fast enough.

If anyone is interested, here is a python implementation, which solves my example problem for all values of K.

# This code is licensed under the MIT license
# Copyright (c) 2012 Craig Schmidt
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy of this
# software and associated documentation files (the "Software"), to deal in the Software
# without restriction, including without limitation the rights to use, copy, modify, merge,
# publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons
# to whom the Software is furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
# INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
# PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE
# FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
# OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

# Craig Schmidt's email is firstname@firstnamelastname.com

import random

# how many unique values are there in the sorted list data
def count_unique(data):
   unique = 1                      # element 0 is always unique to start
   for i in range(1,len(data)):
       if data[i] != data[i-1]:
           unique += 1             # is unique if not the same as previous value
   return unique

# find no more or exactly K splits of size L in *sorted* array data
# exact = True, we find exactly K splits
# exact = False, we find no more than K splits
# the error for segment x will be abs(L-len(x))
# we want to find the splits with the overall smallest error
# returns (total_error, splits), where splits is a list of lists
def find_splits(data, K, L, exact):

    # shouldn't end up calling it with empty data
    assert len(data) > 0, "empty," + str(K) + "," + str(L)

    # if K is 1, then just return rest of data
    if K == 1:
        return (abs(L-len(data)),[data])

    # we want to find out how far to go in making a new block
    # Ideally, we want to make a block of size min(len(data),L)
    # let B be the size of the block
    B = min(len(data),L)

    # however, if exact==True, we need to make sure
    # we have at least K-1 unique values left in data[B:]
    # otherwise, we won't be able to make K-1 chunks with that data
    if exact:
        unique = count_unique(data)     # total in all data

        # there is no solution in this case so return an error value
        if unique < K:
            return (1e100,[[]])

        # need to leave K-1 unique values in rest of data
        # leaving us this many for the current block
        availtouse = unique - (K-1)  
        assert availtouse >= 1

        # find the point in [0,..B-1] where we equal availtouse unique values
        # that's the last index i we can use and still end up with K spilts, with B = i+1
        used = 0
        for i in range(B):
            if (i==0) or (data[i] != data[i-1]):
                used += 1
            if used == availtouse:
                break
        B = i+1

    # see we can make the next split of size B, 
    # or if there are identical elements across the split
    # note: may want a tolerance here if comparing floats
    if (len(data) >= B+1) and (data[B-1] == data[B]):

        # now we need to find the size of the block of values of values 
        # equal to data[B-1] and data[B]
        first = B-1
        while (first > 0) and (data[first-1] == data[B-1]):
            first -= 1
        last = B
        while (last < len(data)-1) and (data[last+1] == data[B]):
            last += 1
        assert data[first] == data[last]

        # here a large block fills all remaining data, so return it
        if (first == 0) and (last == len(data) - 1):
            return (abs(L-len(data)),[data])

        # we have the special case where first == 0, 
        # so there is a large block filling all of B
        # in this case excluding it would leave nothing in current,
        # so we can only really include it
        # i.e. just do the with case
        if first == 0:

            # include block
            current = data[:last+1]     # include element last
            currenterror = abs(L-len(current))
            rest = data[last+1:]        # last + 1 to end
            (besterror,bestrest) = find_splits(rest,K-1,L,exact)
            return (currenterror+besterror,[current] + bestrest)

        # also the case where last == len(data) - 1, 
        # so the block goes all the way to the end of the data,
        # in that case, including it will not leave any data left to make another block
        elif last == len(data) - 1:

            # included case give just a single segment
            included = [data]
            includederror = abs(L-len(data))

            # compare to excluding it
            withoutcurrent = data[:first]   # don't include element first
            assert len(withoutcurrent) > 0
            withoutcurrenterror = abs(L-len(withoutcurrent))
            withoutrest = data[first:]
            assert len(withoutrest) > 0
            (withoutbesterror,withoutbest) = find_splits(withoutrest,K-1,L,exact)

            combined = [withoutcurrent] + [withoutbest]
            combinederror = withoutcurrenterror+withoutbesterror

            if combinederror < includederror:
                return (combinederror,combined)
            else:
                return (includederror,included)

        else:

            # we have a block that can be included or excluded with no problems
            # we don't know whether it is better to include the block data[first:last+1] or not
            # so recurse on both, picking the better

            withcurrent = data[:last+1]     # do include the element last
            assert len(withcurrent) > 0
            withcurrenterror = abs(L-len(withcurrent))
            withrest = data[last+1:]
            assert len(withrest) > 0
            (withbesterror,withbest) = find_splits(withrest,K-1,L,exact)

            withoutcurrent = data[:first]   # don't include the element first
            assert len(withoutcurrent) > 0
            withoutcurrenterror = abs(L-len(withoutcurrent))
            withoutrest = data[first:]
            assert len(withoutrest) > 0
            (withoutbesterror,withoutbest) = find_splits(withoutrest,K-1,L,exact)

            # choose whether it is better to include it or not
            if withbesterror < withoutbesterror:
                return (withcurrenterror+withbesterror,[withcurrent] + withbest)
            else:
                return (withoutcurrenterror+withoutbesterror,[withoutcurrent] + withoutbest)

    # can make ideal sized split, with no overlapping ties
    else:                       
        current = data[:B]    # 0 ... B-1
        currenterror = abs(L-len(current))  # is exactly size B, but may have wanted L
        rest = data[B:]         # B to end
        if len(rest) > 0:
            # still stuff to do
            (besterror,bestrest) = find_splits(rest,K-1,L,exact)
            return (currenterror+besterror,[current] + bestrest)        
        else:
            # B is all the data, we're done
            return (currenterror,[current])

def test1(exact):
    data = [1,1,1,1,2,2,3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,7,8,9,10,10]
    N = len(data)

    for K in range(1,len(data)+1):
        L = int(round(float(N)/float(K)))
        (error, result) = find_splits(data,K,L,exact)
        # in exact mode should always return K partitions if no error 
        if exact and error != 1e100:
            assert len(result) == K
        print "K=" + str(K), "L=" + str(L), "partition=" + str(len(result)), "error=" + str(error), result

if __name__ == '__main__':
    import sys

    test1(exact=False)
    print "____________________________________________"
    test1(exact=True)

This gives the following output. The first set of output is the non-exact case with <= K chunks, and the second is the exact case with K chunks.

K=1 L=32 partition=1 error=0 [[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=2 L=16 partition=2 error=4 [[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=3 L=11 partition=3 error=9 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4], [5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=4 L=8 partition=4 error=4 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 5, 5, 5, 5, 5], [6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=5 L=6 partition=5 error=8 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 5, 5, 5, 5, 5], [6, 6, 6, 6, 7, 8], [9, 10, 10]]
K=6 L=5 partition=6 error=6 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6, 7], [8, 9, 10, 10]]
K=7 L=5 partition=6 error=6 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6, 7], [8, 9, 10, 10]]
K=8 L=4 partition=6 error=8 [[1, 1, 1, 1], [2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8, 9, 10, 10]]
K=9 L=4 partition=6 error=8 [[1, 1, 1, 1], [2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8, 9, 10, 10]]
K=10 L=3 partition=7 error=13 [[1, 1, 1, 1], [2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8, 9], [10, 10]]
K=11 L=3 partition=7 error=13 [[1, 1, 1, 1], [2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8, 9], [10, 10]]
K=12 L=3 partition=7 error=13 [[1, 1, 1, 1], [2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8, 9], [10, 10]]
K=13 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=14 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=15 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=16 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=17 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=18 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=19 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=20 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=21 L=2 partition=8 error=16 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9, 10, 10]]
K=22 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=23 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=24 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=25 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=26 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=27 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=28 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=29 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=30 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=31 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=32 L=1 partition=10 error=22 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
____________________________________________
K=1 L=32 partition=1 error=0 [[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=2 L=16 partition=2 error=4 [[1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=3 L=11 partition=3 error=9 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4], [5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=4 L=8 partition=4 error=4 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 5, 5, 5, 5, 5], [6, 6, 6, 6, 7, 8, 9, 10, 10]]
K=5 L=6 partition=5 error=8 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 5, 5, 5, 5, 5], [6, 6, 6, 6, 7, 8], [9, 10, 10]]
K=6 L=5 partition=6 error=6 [[1, 1, 1, 1, 2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6, 7], [8, 9, 10, 10]]
K=7 L=5 partition=7 error=9 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6, 7], [8, 9, 10, 10]]
K=8 L=4 partition=8 error=10 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8, 9], [10, 10]]
K=9 L=4 partition=9 error=14 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7, 8], [9], [10, 10]]
K=10 L=3 partition=10 error=18 [[1, 1, 1, 1], [2, 2], [3, 3, 3, 3, 3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [6, 6, 6, 6], [7], [8], [9], [10, 10]]
K=11 L=3 partition=1 error=1e+100 [[]]
etc
link

answered 09 Jan '12, 14:41

Craig's gravatar image

Craig
1513
accept rate: 0%

Can't say it looks familiar, but it's a variant of a set partitioning problem, so it could be modelled easily as a mixed integer linear program. (Of course, "modeled easily" and "solved easily" are not synonymous, at least for large instances.)

It could also be modeled as a constraint programming problem. The CP might be easier to solve if the objective were to minimize the maximum deviation from target size, rather than the sum of deviations. (Once the CP achieves a certain maximum deviation, it will try to achieve a smaller one, and reducing the maximum allowed deviation will eliminate certain combinations from consideration.) That might be true for the MIP formulation as well.

link

answered 06 Jan '12, 18:03

Paul%20Rubin's gravatar image

Paul Rubin ♦
10.5k412
accept rate: 17%

this is a combinatorial problem, the numbers are irrelevant. this problem has an "interval property", and actually I cannot believe that this problem is NP-hard. it is certainly polytime solvable when K is fixed because there are only (N-1 choose K-1) possible solutions which we can exhaustively inspect in polytime. but when K is part of the input, hell, there is some graph problem behind this...

link

answered 06 Jan '12, 20:14

Marco%20Luebbecke's gravatar image

Marco Luebbecke ♦
2.5k314
accept rate: 16%

edited 10 Jan '12, 07:34

If I recall correctly [disclaimer: that's not a very strong statement], @Mark once posed a similar question. Unfortunately, I can't find it from his user profile (there's a possibility that this question got lost in a database corruption).

To my best knowledge, I proposed hierarchical clustering as a solution technique (for which performance guarantees are studied here). The top-rated answer, though, was a two-pass merge routine - afaicr proposed by either @Paul Rubin or (1 of the 5) @Matthew Saltzman(s).

link

answered 07 Jan '12, 06:50

fbahr's gravatar image

fbahr ♦
3.5k313
accept rate: 11%

edited 07 Jan '12, 07:23

@Florian: I vaguely recall that, and I think you're right that Matt proposed a two-pass merge. I can't find it either (tried both Matt's answers and general searches), so it does look as though the site has lost some data. :-(

(07 Jan '12, 10:24) Paul Rubin ♦

you can also give it a thought to model it like

min. t (t = sum(K-no.of elements))

subject to: for each Kth element, the no.of elements should be <= N/K

Hope its not too vague!

link

answered 10 Jan '12, 06:54

Pavan's gravatar image

Pavan
24329
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×23
×17
×4

Asked: 06 Jan '12, 15:40

Seen: 2,430 times

Last updated: 10 Jan '12, 07:34

OR-Exchange! Your site for questions, answers, and announcements about operations research.