Consider a network of supplies and demands. S << D. To shrink our (complex) associated transportation problem, we wish to join isocost demand nodes. Specifically,
This works because we can split a solution to the reduced problem according to the demands at d1 and d2. This should a simple sequential procedure. We can stop when all demand node pairs have been considered without any combination occurring. (O(D^2) is good enough for this practical problem.) Sadly, there aren't a lot of isocost nodes. So the aboveoutlined procedure isn't too effective in reducing problem size. I've been noodling with a relaxation  combining 'almost' isocost demand nodes. For instance, if two demand nodes have slightly differing transportation costs for a supply node or two but are the same for all other supply nodes, combining them should degrade accuracy only a smidgen. Does this sort of node clustering sound familiar? asked 18 Dec '12, 00:03 SanjaySaigal 
You're looking for some sort of graph decomposition algorithm/heuristic. (Disclaimer: graph algorithms are most definitely not my "thing", so read the rest of this answer with a jaundiced eye.) Here's one ad hoc approach:
I'm pretty sure that finding an optimal decomposition is NPhard, but I'm also pretty sure you can find a polynomialtime heuristic of hopefully low enough order. Here's a first cut:
This should be \(O(D^3)\) overall. I have no idea how it fares in terms of the average number of cliques generated. answered 19 Dec '12, 11:09 Paul Rubin ♦♦ Paul  yours is the path I also started down, noting that we should avoid recursive merging. For instance, if ABCD and AEF are two cliques in the 'distance graph', first shrink ABCD into a supernode (call it Z). Avoid merging demands E and F into Z too. Though 'near' each other and A, they are not near B, C or D. Instead, create Z (=ABCD) and Y (=EF) to reduce D by 4. I think you imply that when you mention removing shrunken cliques. By iterating over cutoffs, we can get a tradeoff curve. Pick the best "bang for the buck", however that is defined, to get the desired clustering.
(19 Dec '12, 14:22)
SanjaySaigal
Sanjay: Yes, there would be no recursive merging in my approach. If ABCD were the first maximal clique produced (implying that E and F were each too far from at least one of A, B, C or D), all four of A..D would be removed from the "todo" list of nodes, and would be ineligible to belong to any other clique.
(19 Dec '12, 14:34)
Paul Rubin ♦♦

Merging Demand NodesThinking aloud, if you represent each demand node as a vector of costs from each supply node: [c1, ..., ck]*, perhaps you can embed the merging/unionisation of the nodes within a quicksort algorithm (in lexicographic order) as follows:
Assuming 'moving'/merging the nodes is O(1) in complexity, the overall algorithm should be about O(SDlogD). Perhaps someone else can come up with a more refined analysis (see also: @Paul Rubin's answer) *ignoring the details of representing infinite costs (if any) ClassificationThe main algorithm aside, I think you're really asking about classification. Or we might, as you've mentioned, call it cluster analysis. Given information about a set of nodes (either about their costs/relationships, or other historical data), you wish to classify them into a (fixed/unknown) discrete number of 'classes' (demand nodes). If you have historical data (mentioned in your comment) that you wish to use to improve your classification, you can also consider using (machine) learning algorithms. However, given that you're just looking for nodes that are similar, and not really at predictive accuracy, this might be an unnecessary step. Nonetheless, you might still want to think about a good set of features (examples of domainindependent ones: arc costs, degree of each node), and an appropriate kernel (distance metric) to use. FastgraphAlso, [separate from the answer, but included because it also answers the question]: some googling turned up fastgraph. The author has written a brief description of the implementation in his paper Efficiently Merging Graph Nodes. answered 18 Dec '12, 10:03 yeesian 
I'm not sure whether this is a graph theory problem or not, but it seems more like a preprocessing technique for a specific problem.
Couple of clarifications to my original question:
Thanks all.
You might want to put the clarifications in the question itself, rather than as a comment. I think I've misread the original question slightly; will edit my original answer accordingly