mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: What is Mahout?
Date Thu, 26 Feb 2015 02:21:37 GMT
-1 on incubation as well. The website and docs and user lists and this
champion and mentor stuff, and logos and promotions for committers
 absolutely do not make any sense at this point. From what i hear, people
are pretty busy without having that as it is. It would probably make more
sense to take both Andrews :) and committers who actively pursue the
programming environment vision to PMC and for people who feel that they
have no valuable input for new philosophy of the project just go emeritus
and give up their voting rights. "Power of do", as they say.

There's no major change in philosophy either -- mahout has been proclaiming
"scalable machine learning", which is what we will continue doing. Only
doing it (hopefully) a bit easier and with new set of backend tools.

I want to emphasize that i'd seek math environment status in more general
sense: not just algebraic, but also connect this to stats, samplers,
optimizers, (including bayesian opts), feature extractors, i.e. all basic
big ml tools. Adapt Spark's DataFrame to these tools where appropriate.
Viewing it as solely distributed algebra is a bit skewed away from reality.
On private branches, i have previously developed a lot of that
functionality (except for the visual stuff) and it is in practice very
useful; it creates a common umbrella for people with R background.

I would very much want to integrate something for visualization, as it is
important for environment. Unfortunately, I don't see any mature science
plotting for jvm stuff around. Scatter plots at best. I want at least to be
able to plot 2d maps and KDEs in with contours or density levels. There are
ways to visualize massive datasets (and their parts). See no tools for this
around at all. Maybe some clever way to integrate with ggplot2 or shiny
server? even that would've been better, even if it required 3rd party
software installation, than nothing at all.

I don't expect methodologies go to contrib, actually. Slightly different
modules, maybe, but not so extreme as contrib.

On Wed, Feb 25, 2015 at 5:18 PM, Andrew Musselman <> wrote:

> How much would be involved in changing the name of a top-level project?
> I'd prefer to avoid the overhead of going back into incubation.
> I agree 0.10 makes more sense.
> On Wed, Feb 25, 2015 at 12:16 PM, Sean Owen <> wrote:
> > My $0.02:
> >
> > There is no shortage of algorithm libraries that are in some way
> > runnable on Hadoop out there, and not as much easy-to-use distributed
> > matrix operation libraries. I think it's more additive to the
> > ecosystem to solve that narrow, and deep, linear algebra problem and
> > really nail it. That's a pretty good 'identity' to claim. It seems
> > like an appropriate scope.
> >
> > I do think the project has changed so much that it's more confusing to
> > keep calling it Mahout than to change the name. I can't think of one
> > person I've talked to about Mahout in the last 6 months that was not
> > under the impression that what is in 0.9 has simply been ported to
> > Spark. It's different enough that it could even be it's own incubator
> > project (under a different name).
> >
> > The brand recognition is for the deprecated part so keeping that is
> > almost the problem. It's not crazy to just change the name. Or even
> > consider a re-incubation. It might give some latitude to more fully
> > reboot.
> >
> > Releasing 1.0.0 on the other hand means committing to the APIs (and
> > name) for some fairly new code and fairly soon. Given that this is
> > sort of a 0.1 of a new project, going to 1.0 feels semantically wrong.
> > But a release would be good. Personally I'd suggest 0.10.
> >
> > On Wed, Feb 25, 2015 at 5:50 PM, Pat Ferrel <>
> wrote:
> > > Looking back over the last year Mahout has gone through a lot of
> > changes. Most users are still using the legacy mapreduce code and new
> users
> > have mostly looked elsewhere.
> > >
> > > The fact that people as knowledgable as former committers compare
> Mahout
> > to Oryx or MLlib seems odd to me because Mahout is neither a server nor a
> > loose collection of algorithms. It was the later until all of mapreduce
> was
> > moved to legacy and “no new mapreduce” was the rule.
> > >
> > > But what is it now? What is unique and of value? Is it destined to be
> > late to the party and chasing the algo checklists of things like MLlib?
> > >
> > > First a slight digression. I looked at moving itemsimilarity to raw
> > Spark if only to remove mrlegacy from the dependencies. At about the same
> > time another Mahouter asked the Spark list how to transpose a matrix. He
> > got the answer “why would you want to do that?” The fairly high
> performance
> > algorithm behind spark-itemsimilarity was designed by Sebastian and
> > requires an optimized A’A, A’B, A’C… and spark-rowsimilarity requires
> AA’.
> > None of these are provided by MLlib. No actual transpose is required so
> > these two things should be seen as separate comments about MLlib. The
> > moral: unless I want to write optimized matrix transpose-and-multiply
> > solvers I will stick with Mahout.
> > >
> > > So back to Mahout’s unique value. Mahout today is a general linear
> > algebra lib and environment that performs optimized calculations on
> modern
> > engines like Spark. It is something like a Scala-fied R on Spark (or
> other
> > engine).
> > >
> > > If this is true then spark-itemsimilarity can be seen as a
> > package/add-on that requires Mahout’s core Linear Algebra.
> > >
> > > Why use Mahout? Use it if you need scalable general linear algebra.
> > That’s not what MLlib does well.
> > >
> > > Should we be chasing MLlib’s algo list? Why would we? If we need some
> > algo, why not consume it directly from MLlib or somewhere else? Why is a
> > reimplementation important all else being equal?
> > >
> > > Is general scalable linear algebra sufficient for all important ML
> > algos? Certainly not. For instance streaming ones and in particular
> online
> > updated streaming algos may have little to gain from Mahout as it is
> today.
> > >
> > > If the above is true then Mahout is nothing like what it was in 0.9 and
> > is being unfairly compared to 0.9 and other things like that. This
> > misunderstanding of what Mahout _is_ leads to misapplied criticism and
> lack
> > of use for what it does well. At very least this all implies a very
> > different description on the CMS at most maybe something as drastic as a
> > name change.
> > >
> > >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message