Consider a network of supplies and demands. |S| << |D|.

To shrink our (complex) associated transportation problem, we wish to join iso-cost demand nodes. Specifically,

 if d1 and d2 in D s.t. cost[s,d1] = cost[s,d2] for all s in S then 
    combine d1 and d2 into a single node

This works because we can split a solution to the reduced problem according to the demands at d1 and d2. This should a simple sequential procedure. We can stop when all demand node pairs have been considered without any combination occurring. (O(|D|^2) is good enough for this practical problem.)

Sadly, there aren't a lot of iso-cost nodes. So the above-outlined procedure isn't too effective in reducing problem size. I've been noodling with a relaxation - combining 'almost' iso-cost demand nodes. For instance, if two demand nodes have slightly differing transportation costs for a supply node or two but are the same for all other supply nodes, combining them should degrade accuracy only a smidgen.

Does this sort of node clustering sound familiar?

asked 18 Dec '12, 00:03

SanjaySaigal's gravatar image

accept rate: 13%

I'm not sure whether this is a graph theory problem or not, but it seems more like a pre-processing technique for a specific problem.

(18 Dec '12, 03:31) Ehsan ♦

Couple of clarifications to my original question:

  1. Indeed this is a practical problem
  2. Theoretical efficiency is not really a concern
  3. Optimality is not a goal. A 'good enough' reduction in |D| is
  4. Historical shipments may be used to weight Paul's distance metric

Thanks all.

(19 Dec '12, 14:29) SanjaySaigal

You might want to put the clarifications in the question itself, rather than as a comment. I think I've misread the original question slightly; will edit my original answer accordingly

(19 Dec '12, 17:50) yeesian

You're looking for some sort of graph decomposition algorithm/heuristic. (Disclaimer: graph algorithms are most definitely not my "thing", so read the rest of this answer with a jaundiced eye.) Here's one ad hoc approach:

  1. Treat supplier costs for each demand node as a vector of dimension \(|S|\) (per yeesian's answer), pick your favorite norm, and compute the norm of the difference in supplier cost vectors between each pair of demand nodes (\(O(|D|^2|S|)\)). Call this the distance between demand nodes.
  2. Select a cutoff for distance and create a new graph on the demand nodes, where two demand nodes are connected by an edge iff the distance between them does not exceed the cutoff (\(O(|D|^2)\)).
  3. Decompose the graph into maximal cliques, and collapse each clique into a pseudo/meta/ersatz demand node.

I'm pretty sure that finding an optimal decomposition is NP-hard, but I'm also pretty sure you can find a polynomial-time heuristic of hopefully low enough order. Here's a first cut:

  1. Compute the degree of each demand node (\(O(|D|^2)\)) and sort the list of demand nodes into descending degree order (\(O(|D|\log |D|)\) using a tree sort).
  2. Starting with the first node, compute a maximal clique by adding connected nodes in descending degree order so long as they are connected to all members of the clique (\(O(|D|^2)\) per node) and remove it from the graph.

This should be \(O(|D|^3)\) overall. I have no idea how it fares in terms of the average number of cliques generated.


answered 19 Dec '12, 11:09

Paul%20Rubin's gravatar image

Paul Rubin ♦♦
accept rate: 19%

Paul - yours is the path I also started down, noting that we should avoid recursive merging.

For instance, if ABCD and AEF are two cliques in the 'distance graph', first shrink ABCD into a supernode (call it Z). Avoid merging demands E and F into Z too. Though 'near' each other and A, they are not near B, C or D. Instead, create Z (=ABCD) and Y (=EF) to reduce |D| by 4.

I think you imply that when you mention removing shrunken cliques.

By iterating over cutoffs, we can get a tradeoff curve. Pick the best "bang for the buck", however that is defined, to get the desired clustering.

(19 Dec '12, 14:22) SanjaySaigal

Sanjay: Yes, there would be no recursive merging in my approach. If ABCD were the first maximal clique produced (implying that E and F were each too far from at least one of A, B, C or D), all four of A..D would be removed from the "to-do" list of nodes, and would be ineligible to belong to any other clique.

(19 Dec '12, 14:34) Paul Rubin ♦♦

Merging Demand Nodes

Thinking aloud, if you represent each demand node as a vector of costs from each supply node: [c1, ..., ck]*, perhaps you can embed the merging/unionisation of the nodes within a quicksort algorithm (in lexicographic order) as follows:

  1. select one of the demand nodes (at random)
  2. partition the rest of the demand nodes according to the selected node (cf. 1).
  3. merge those nodes that are 'equal' to the selected node during the partitioning step (cf. 2)
  4. recursively sort the remaining demand nodes

Assuming 'moving'/merging the nodes is O(1) in complexity, the overall algorithm should be about O(|S||D|log|D|). Perhaps someone else can come up with a more refined analysis (see also: @Paul Rubin's answer)

*ignoring the details of representing infinite costs (if any)


The main algorithm aside, I think you're really asking about classification. Or we might, as you've mentioned, call it cluster analysis. Given information about a set of nodes (either about their costs/relationships, or other historical data), you wish to classify them into a (fixed/unknown) discrete number of 'classes' (demand nodes).

If you have historical data (mentioned in your comment) that you wish to use to improve your classification, you can also consider using (machine) learning algorithms. However, given that you're just looking for nodes that are similar, and not really at predictive accuracy, this might be an unnecessary step. Nonetheless, you might still want to think about a good set of features (examples of domain-independent ones: arc costs, degree of each node), and an appropriate kernel (distance metric) to use.


Also, [separate from the answer, but included because it also answers the question]: some googling turned up fastgraph. The author has written a brief description of the implementation in his paper Efficiently Merging Graph Nodes.


answered 18 Dec '12, 10:03

yeesian's gravatar image

accept rate: 3%

edited 19 Dec '12, 18:32

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 18 Dec '12, 00:03

Seen: 2,299 times

Last updated: 19 Dec '12, 18:32

OR-Exchange! Your site for questions, answers, and announcements about operations research.