mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <prashant.ii...@gmail.com>
Subject Re: Has anyone tried Spark with Mahout?
Date Mon, 31 Oct 2011 11:20:50 GMT
This is nice ! . With only problem one would have to learn a new paradigm.
People have habit of sticking to what they are familiar with.
-P

On Mon, Oct 31, 2011 at 4:39 PM, Nick Pentreath <nick.pentreath@gmail.com>wrote:

> I have this crazy idea to combine Scalala (which aims to be a library
> for linear algebra in Scala, based on netlib-java, that provides
> Matlab / numpy like syntax and plotting), scalanlp (same developer as
> Scalala, focused on NLP/ML algorithms), Spark and Mahout in some way,
> to create a Matlab-like environment (or better an IPython-like
> super-shell, that could also be integrated into a GUI) that allows you
> to write code that seamlessly operates locally and across a Hadoop
> cluster using Spark's framework.
>
> Ideally it would wrap / port Mahout's distributed matrix operations
> (multiplication, SVD, other decompositions etc), as well as SGD and
> some others etc, and integrate scalanlp's algorithms. It would be
> seamless in the sense that calling, say, A * B, or SVD on a matrix in
> local mode or cluster mode is exactly the same, save for setting
> Spark's context to be local vs cluster (and specifying the HDFS
> location of the data for cluster mode etc) - this is based on
> Scalala's idea of optimised code paths depending on the matrix type.
> This would allow rapid prototyping on a local machine / test cluster,
> and deploying the exact same code across huge clusters...
>
> I don't have enough experience yet with Mahout, let alone Scala and
> Scalala, to think about tackling this, but I wonder if this is
> something people would like to see?!
>
> n
>
> On 20 Oct 2011, at 16:30, Josh Patterson <josh@cloudera.com> wrote:
>
> > I've run some tests with Spark in general, its a pretty interesting
> setup;
> >
> > I think the most interesting aspect (relevant to what you are asking
> > about) is that Matei already has Spark running on top of MRv2:
> >
> > https://github.com/mesos/spark-yarn
> >
> > (you dont have to run mesos, but the YARN code needs to be able to see
> > the jar in order to do its scheduling stuff)
> >
> > I've been playing around with writing a genetic algorithm in
> > Scala/Spark to run on MRv2, and in the process got introduced to the
> > book:
> >
> > "Parallel Iterative Algorithms, From Sequential to Grid Computing"
> >
> > which talks about strategies for parallelizing high iterative
> > algorithms and the inherent issues involved (sync/async iterations,
> > sync/async communications, etc). Since you can use Spark as a
> > "BSP-style" framework (ignoring the RRDs if you like) and just shoot
> > out slices of an array of items to be processed (relatively fast
> > compared to MR), it has some interesting property/tradeoffs to take a
> > look at.
> >
> > Toward the end of my ATL Hug talk I mentioned the possibility of how
> > MRv2 could be used with other frameworks, like Spark, to be better
> > suited for other algorithms (in this case, highly iterative):
> >
> > http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop
> >
> > I think it would be interesting to have mahout sitting on top of MRv2,
> > like Ted is referring to, and then have an algorithm matched to a
> > framework on YARN and a workflow that mixed and matched these
> > combinations.
> >
> > Lot's of possibilities here.
> >
> > JP
> >
> >
> > On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >> Spark is very cool but very incompatible with Hadoop code.  Many Mahout
> >> algorithms would run much faster on Spark, but you will have to do the
> >> porting yourself.
> >>
> >> Let us know how it turns how!
> >>
> >> 2011/10/19 WangRamon <ramon_wang@hotmail.com>
> >>
> >>>
> >>>
> >>>
> >>>
> >>> Hi All I was told today that Spark is a much better platform for
> cluster
> >>> computing, better than Hadoop at least at Recommendation computing
> way, I'm
> >>> still very new at this area, if anyone has done some investigation on
> Spark,
> >>> can you please share your idea here, thank you very much. Thanks Ramon
> >>>
> >>
> >
> >
> >
> > --
> > Twitter: @jpatanooga
> > Solution Architect @ Cloudera
> > hadoop: http://www.cloudera.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message