mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Iterative jobs
Date Thu, 16 Jun 2011 17:54:40 GMT
I guess my question i, is it a better framework and for what kind of
problem? It says it gains speed for iterative algorithms by keeping
stuff in memory. That's a fine tradeoff to make but sounds about like
a point on the same "efficient frontier" of tradeoffs that any good
system lives on. It's also a tradeoff you can already kind of make on
Hadoop, and that some of the implementations here already do: loading
via distributed cache or side-loading from HDFS. At first glance I'd
guess it's "a bit better" for some kinds of problem.

So what's the "cost" of using this? You certainly wouldn't want to
replace the Hadoop-based version as the audience for that is much
greater. Having an additional implementation doesn't hurt anyone. But
it's another dependency and item to support (unless it's not going to
get supported) and there is some small harm to having one isolated
orphan implementation in the project: we've been trying to kill rather
than feed such orphans lately.

I personally don't think that this project needs more algorithms, and
am personally directing all my time to what I view as more essential
infrastructure tasks. Or to put it another way: how about tidying up
Hadoop before moving on? That's just me.

My gut says it would be cool to implement the SVD on something like
this to see how it goes. I don't yet see this is anything to move to.

On Thu, Jun 16, 2011 at 6:34 PM, Hector Yee <> wrote:
> What do people think of using Spark for iterative jobs:
> Or is there a new version of hadoop that supports this kind of computation?
> --
> Yee Yang Li Hector
> (tech + travel)
> (book reviews)

View raw message