4
2

Folks, I have worked on building and deploying optimization models in one machine and never worked or completed coursework on parallel optimization. It seems quite interesting and would like to know more about it. Please may I request OR folks here to suggest me resources on where to start. I would like to know how to set up a parallel computing platform (given 2 or more machines readily available) and how to programming a simple LP using existing parallel computing setup.

Also is it possible to run parallel optimization models on Hadoop (either mapreduce or YARN)?

asked 08 Jan '14, 12:21

Pavan's gravatar image

Pavan
3002621
accept rate: 0%


Hi Pavan,

If you're familiar with Python and it's IPython interactive shell and other extras, you can setup a parallel computing infrastructure very easily. IPython's introduction to parallel computing will get you started in an hour or so (that's what it took for me).

Also, there is Sage which allows you to easily define LP and MILP and also offers facilities for parallel computing.

I've used Python for scientific computing for a few years now and it feels to me like the perfect balance between fast execution time and quick development. Further more, it's also multi-platform (in case you may need to bring together machines with different OSes).

The Sage approach is to use Python as a "glue" language to call highly efficient numerical libraries (linear algegra, LP, etc.) and offers libraries for very high-level mathematics if that's handy for your need.

Enjoy!

Eric

link

answered 08 Jan '14, 15:45

EricP's gravatar image

EricP
442
accept rate: 100%

2

An alternative: Julia http://julialang.org/, which also has IJulia, similar to IPython (runs on same infrastructure). It also has great support for optimization - see http://juliaopt.org for a summary.

(08 Jan '14, 23:07) Iain Dunning
10

I'd agree its very interesting, but I'm going to be a bit negative and caution that the types of optimization that OR people usually consider are not easily made parallel.

  • Linear programming is very hard to make parallel. Simplex in particular cannot be practically made parallel, and the sparse nature of the linear algebra involved means that parallel lin.alg. doesn't even improve interior point much. Gurobi does a clever thing where it runs simplex and interior point simultaneously, which brings me to another point:
  • One easy way to get into solving optimization problems in parallel is to use a serial algorithm in multiple ways, e.g. try many starting points simultaneously, or multiple parameters, or even multiple methods.
  • Hadoop/MapReduce is generally not applicable to optimization. MapReduce is frequently employed for communication/data-bound problems, not compute-bound problems - bring the computation to the data. There is usually no obvious way to turn an optimization problem into a single stage map-reduce operation (multiple map-reduce stages are needed typically, with massive overhead penalties as a result).
  • The most promising application of parallel optimization is when the optimization problem is very large and is decomposable. Benders decomposition for giant stochastic problems in energy is the best example - each subproblem can be solved in parallel. Even then, avoiding bottle necks can be tough, and the subproblems must be sufficiently hard to solve to beat the communication overheads. This is a pretty great paper that shows some of the issues and ways around them: http://link.springer.com/article/10.1023/A:1021858008222
  • The trend lately in neural networks lately of doing your "deep learning" on GPUs could be considered parallel optimization

So, to summarize, "here be dragons". Exploiting structure in your problems is critical - there are no easy wins here for most OR problems.

On a more positive note, see my colleague Miles Lubin's blog post "Distributed Numerical Optimization" to see what is possible if that structure is present, and how it can be fairly easy with the right tools (here, Julia).

link

answered 08 Jan '14, 23:04

Iain%20Dunning's gravatar image

Iain Dunning
9171418
accept rate: 33%

edited 08 Jan '14, 23:05

Great answer. Especially many normal developers fail to understand that MapReduce doesn't work on optimization problems. I 've been thinking of writing an article on that for some time.

(09 Jan '14, 04:32) Geoffrey De ... ♦

I believe you mean to say optimization problems cannot be solved in mapreduce. Mapreduce is a layer on top of a given distributed file system -- be it HDFS or MapR filesystem. We dont configure/design mapreduce to solve optimization problems (hence, mapreduce doesn't work on optimization problems). The idea is to configure/design algorithms to solve optimization problems in mapreduce. feasible or not comes later.

(09 Jan '14, 06:35) Pavan
1

Many optimization problems can be described as a sequence of map-reduce operations, because most optimization algorithms take multiple iterations. For example, the multiplicative weights algorithm for solving LPs is very parallel at each iteration, and is easily mapped to a map-reduce paradigm - but you need multiple such iterations to converge to a solution.

I would also say that optimization rarely needs "big data" in raw form - usually you would use MapReduce to extract relevant data/summary from your raw bigdata, then run the optimization on that. I have done this in practice.

(09 Jan '14, 07:44) Iain Dunning
1

Indeed, there's theoretical support for the claim that LP is hard to parallelize. LP is P-complete.

(09 Jan '14, 20:30) Austin Buchanan

Are you a CPLEX user? The latest version (12.6) has the ability to ship problems to remote processors (which must have CPLEX installed). Could be useful for, among other things, decomposition methods with multiple subproblems. I don't have docs handy, but it supports multiple communication protocols, including (I'm pretty sure) SSH.

link

answered 08 Jan '14, 18:59

Paul%20Rubin's gravatar image

Paul Rubin ♦♦
14.6k412
accept rate: 19%

In addition to what Paul says, CPLEX 12.6 includes a distributed parallel MIP solver. This solver builds on the capacity Paul describes. Our distributed MIP runs out of the box, you don't need to do much setup except than declaring the machines you'll be using.

I concur on the various comments on use cases. LP seems hard to distribute. I've seen basic linear algebra being distributed, but it is far from providing a useful LP algorithm. Decomposition methods are a good way to start. Last, distributing the branch&bound tree as we do in our distributed MIP solver is also a good one.

link

answered 09 Jan '14, 05:14

jfpuget's gravatar image

jfpuget
2.5k310
accept rate: 8%

@Geoffrey De Smet

I have used Stochastic Gradient Descent method to solve a logistic regression problem (sigmoid function) using Mapreduce (in Hadoop using Mahout -- multi node cluster with 72 nodes). Also, I understand that mapreduce version of conjugate gradient is implemented.

Hence, convex optimization problems can be solved using Mapreduce in Hadoop -- my theory is that any algorithm that uses gradient (and not hessian) could be implemented in mapreduce paradigm.

But even Newton Rhapson method seems to have been implemented in mapreduce framework. Look up Map-Reduce for Machine Learning on Multicore

Not to forget Breadth First Search, Shortest Path Algorithm in mapreduce.

I have not personally solved TSP using Mapreduce but Apache Mahout developers have developed Evolutionary Algorithm for solving TSP. And yes, the procedure uses mapreduce paradigm.

You can also look at HAMA package csl.skku.edu/papers/CS-TR-2010-330.pdf‎ that provides a matrix and graph computation framework for Hadoop. True, it uses BSP not use mapreduce framework and my question was mapreduce/YARN. Also lookup pseudo distributed mode of Hadoop -- it does not use mapreduce layer. Also I was actually hoping someone would comment/answer in context of YARN.

link

answered 09 Jan '14, 06:29

Pavan's gravatar image

Pavan
3002621
accept rate: 0%

edited 05 Mar '14, 09:53

You're mapReducing the computation (branches etc). I was talking about mapReducing the problem data :) I 'll clearly mark the distinction in the article to avoid confusion.

(09 Jan '14, 06:37) Geoffrey De ... ♦

not sure what you mean. I merely exported csv files and txt files into HDFS and after i execute my code, mapreduce will do the rest. the beauty here is the background daemon tools like job tracker and task tracker will divide the data into chunks, process it in each slave node (map phase) and aggregated (reduce phase) to produce desired resultsd. its actually like map, shuffle and sort/combine, reduce. besides can you give me an example where you "mapreduce the data".

(09 Jan '14, 07:09) Pavan

MapReduce the problem data. For example, take a TSP dataset with 100k cities, divide it into 2 datasets of 50k cities each. In parallel, solve each one optimally. Then "reduce" those 2 solved datasets back to 1 dataset. Is it optimal? No. HTH :)

(10 Jan '14, 02:43) Geoffrey De ... ♦
1

@Geoffrey: In all fairness, one could certainly build a fairly good heuristic solution in this way -- not sure abt. the "framework overhead", though [there are probably way more efficient ways to implement parallelized split/combine operators].

(10 Jan '14, 05:00) fbahr ♦

@fbahr Agreed. But it's quality goes down as the number of splits increase - so it would scale out badly. It would be interesting to see a graph of the splitCount vs the solutionQuality on different use cases to get some metrics (instead of educated guesses as we're doing now).

(10 Jan '14, 05:28) Geoffrey De ... ♦
2

I finally got around to writing that article on MapReducing a planning problem. Enjoy

(03 Mar '14, 11:21) Geoffrey De ... ♦
1

looks good to me. agree with you on the fact that splitting a problem will not yield optimal solutions.

(05 Mar '14, 09:54) Pavan
showing 5 of 7 show 2 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×190
×127
×5
×3

Asked: 08 Jan '14, 12:21

Seen: 4,892 times

Last updated: 05 Mar '14, 09:59

OR-Exchange! Your site for questions, answers, and announcements about operations research.