IS anyone working on Apache Mahout? (basically it is a machine learning framework) Is it possible to link Mahout with python to create machine learning mapreduce Jobs?

Also, are there any techniques that can be used to compute the inverse of a matrix in MapReduce?

asked 24 Sep '13, 07:16

Pavan's gravatar image

Pavan
3002821
accept rate: 0%

edited 25 Sep '13, 03:08

Afaik, the only way to use Mahout through Python is still by means of JPype – as outlined here: http://bayesianbrain.blogspot.de/2011/03/mahout-and-python-integration-using.html [Disclaimer: That's just what Google "told" me.]

(24 Sep '13, 11:20) fbahr ♦

On the other hand, it's fairly easy to write map/reduce jobs [for Hadoop] in Python using Hadoop Streaming – as, for instance, explained here: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ – or mrjob, hadoopy, ...

(24 Sep '13, 11:20) fbahr ♦
1

Anyway, if you prefer to work in Python, you should probably consider scikit-learn or some other "native" library [1,2] (as an alternative to Mahout).

(24 Sep '13, 11:23) fbahr ♦

Jython is another option.

But even as a Mahout committer, I would recommend you check out scikit and numpy. I say this because:

a) scikit is python (and C) so it feels and tastes like python

b) if you are using python, you probably aren't worried about really big data. As such, you probably care more about breadth of techniques available over ability to scale to very large data sets. Mahout's goal is scaling machine learning. Scikit's goal is breadth of the collection of algorithms. My guess is that scikit matches your needs better than Mahout, irrespective of language mismatch.

link

answered 24 Sep '13, 18:56

Ted%20Dunning's gravatar image

Ted Dunning
411
accept rate: 0%

At Devoxx2010 there was interesting talk on Mahout video here.

For Mahout specific questions, you're probably better off on their mailing list (which is available as a newsgroup on gmane.org).

link

answered 25 Sep '13, 08:41

Geoffrey%20De%20Smet's gravatar image

Geoffrey De ... ♦
3.6k32764
accept rate: 6%

-1

@Fbahr @Ted Dunning: Thanks for your responses, I will check out the links. I can work with Java and Python - no constraint on the language I use. I do have another question though,

Irrespective of Python or Java, is there a method in which we can determine the inverse of a matrix in MapReduce paradigm? The possibities I have tried are:

  1. Traditional approach of finding determinant and cofactor would not be possible because of the recursive approach for determinant computation and also difficulties in cofactor computation (dependency issue - all blocks would be required in reducer)
  2. Gaussian Elimination: Currently working on this.
  3. Singular Value Decomposition: Is it possible to find inverse of a Matrix using SVD? (I've never worked on SVD before)

Any textbook recommendations on matrix decomposition techniques with worked examples and possibly pseudocode would be really helpful.

link

answered 25 Sep '13, 03:06

Pavan's gravatar image

Pavan
3002821
accept rate: 0%

First of all... w.l.o.g.: Don’t invert that matrix ;-)

(25 Sep '13, 05:20) fbahr ♦

But if you actually need that matrix inverse, w/ Mahout you can... compute the SVD and run matrix multiplication jobs on Hadoop (for computing a matrix inverse by SVD: http://adrianboeing.blogspot.de/2010/05/inverting-matrix-svd-singular-value.html)

(25 Sep '13, 06:59) fbahr ♦

Edit: ...although @Ted Dunning seems to agree w/ my first comment, he rather suggests using QR decomposition.

(25 Sep '13, 07:11) fbahr ♦
1

P.S.: A quick search on Google also pointed to a bunch of academic papers dealing with map/reduce solutions to matrix problems – maybe this thesis has something for you: https://uwspace.uwaterloo.ca/bitstream/10012/7830/1/Xiang_Jingen.pdf [Chapter 3: Matrix Inversion Using MapReduce]

(25 Sep '13, 07:16) fbahr ♦

I did like that "dont invert" article and I was quite on the right track by exploring SVD. Thanks @fbahr for your thesis recommendation. I quickly ran thro as I type (thesis) and found the author implemented LU Decomposition in mapreduce. Thats a start!

(25 Sep '13, 08:27) Pavan
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×10
×9
×4
×3
×2

Asked: 24 Sep '13, 07:16

Seen: 3,313 times

Last updated: 25 Sep '13, 15:53

OR-Exchange! Your site for questions, answers, and announcements about operations research.