IS anyone working on Apache Mahout? (basically it is a machine learning framework) Is it possible to link Mahout with python to create machine learning mapreduce Jobs? Also, are there any techniques that can be used to compute the inverse of a matrix in MapReduce? |
At Devoxx2010 there was interesting talk on Mahout video here. For Mahout specific questions, you're probably better off on their mailing list (which is available as a newsgroup on gmane.org). |
@Fbahr @Ted Dunning: Thanks for your responses, I will check out the links. I can work with Java and Python - no constraint on the language I use. I do have another question though, Irrespective of Python or Java, is there a method in which we can determine the inverse of a matrix in MapReduce paradigm? The possibities I have tried are:
Any textbook recommendations on matrix decomposition techniques with worked examples and possibly pseudocode would be really helpful. First of all... w.l.o.g.: Don’t invert that matrix ;-)
(25 Sep '13, 05:20)
fbahr ♦
But if you actually need that matrix inverse, w/ Mahout you can...
(25 Sep '13, 06:59)
fbahr ♦
Edit: ...although @Ted Dunning seems to agree w/ my first comment, he rather suggests using QR decomposition.
(25 Sep '13, 07:11)
fbahr ♦
1
P.S.: A quick search on Google also pointed to a bunch of academic papers dealing with map/reduce solutions to matrix problems – maybe this thesis has something for you: https://uwspace.uwaterloo.ca/bitstream/10012/7830/1/Xiang_Jingen.pdf [Chapter 3: Matrix Inversion Using MapReduce]
(25 Sep '13, 07:16)
fbahr ♦
|
Jython is another option. But even as a Mahout committer, I would recommend you check out scikit and numpy. I say this because: a) scikit is python (and C) so it feels and tastes like python b) if you are using python, you probably aren't worried about really big data. As such, you probably care more about breadth of techniques available over ability to scale to very large data sets. Mahout's goal is scaling machine learning. Scikit's goal is breadth of the collection of algorithms. My guess is that scikit matches your needs better than Mahout, irrespective of language mismatch. |
Afaik, the only way to use Mahout through Python is still by means of JPype – as outlined here: http://bayesianbrain.blogspot.de/2011/03/mahout-and-python-integration-using.html [Disclaimer: That's just what Google "told" me.]
On the other hand, it's fairly easy to write map/reduce jobs [for Hadoop] in Python using Hadoop Streaming – as, for instance, explained here: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ – or mrjob, hadoopy, ...
Anyway, if you prefer to work in Python, you should probably consider scikit-learn or some other "native" library [1,2] (as an alternative to Mahout).