mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Board Report
Date Mon, 07 Apr 2014 11:53:47 GMT
To Sean's point, if Mahout were "my company", I would do the following, albeit pragmatic and
not so pleasant thing, assuming, of course, I had the $$$ to do so:

1. Clean up existing code with a laser focus on a few key areas (Sebastian's list makes sense)
using a part of the team and call it 1.0 and ship it, as it has a number of users and they
deserve to not have the rug pulled out from under them.  

2. Spin out a subset of the team to explore and prototype 2.0 based on two very positive and
re-energizing looking ideas:
	a. Scala DSL (and maybe Spark)
	b. 0xData
	All of the work for #2 would be done in a clean repo and would only bring in legacy code
where it was truly beneficial (back compat. can come later, if at all).
	It would then benchmark those two approaches as well as look at where they overlap and are
mutually beneficial and then go forward with the winner.

3. Once #2 is viable, put most effort into it and maintain 1.0 with as minimal support as
possible, encouraging, neh -- actively helping -- 1.0 customers upgrade as quickly as possible.

The tricky part then becomes how do you make sure to still make your sales #'s while also
convincing them that your roadmap is what they are really buying.

If I didn't have the $$$ to do both of these (i.e. we need a massive turn around and we have
one last shot), I would be all in on #2.


That being said, Mahout is not "my company".  Heck, Mahout is not even a "company", so we
don't need to be bound by company conventions and thought processes, even if that fits with
all of our individual day jobs.  And, thankfully, we don't have any sales numbers to make.

We are chartered with one and only one mission: produce open source, scalable machine learning
libraries under the Apache license and community driven principles.  We are not required by
the Board or anyone else to support version X for Y years or to use Hadoop or Scala or Java.
 We are also not required to implement any specific algorithms or deliver them on specific
time frames.  We are also not required to provide users upgrade paths or the like.  Naturally,
we _want_ to do these things for the sake of the community, but let's be clear: it is not
a requirement from the ASF.  We are, however, required, to have a sustaining community. 


I personally think we should start clean on #2, throwing off the shackles of the past and
emerge 6-9 months later with Mahout 2.0 (and yes, call it that, not 0.1 as Sebastian suggests,
for marketing reasons) built on a completely new and fresh repository, likely bringing in
only the Math/collections underpinnings and maybe the build system.  This new repository would
have only a handful of core algorithms that we know are well implemented, sustainable and
best in class.  

I think we should look at the lead up to 0.9 as an experiment that proved out a lot of interesting
ideas, including the fact that Mahout proved there is vast interest in open source large scale
machine learning and that it is the benchmark for comparison.  Not many other ML projects
can say that, even if they have better technical implementations or are less fragmented. 
Once you realize something has outlived it's usefulness in software, however, there is no
point in lingering.

That being said, at least for the foreseeable future, I am not in a position to contribute
much code.  So, from my perspective, the ASF Meritocratic approach takes over:  those who
do the work make the decisions.  If you want something in, then put up the patch and ask for
feedback.  If no one provides feedback, assume lazy consensus and move forward.  Nothing convinces
people better than actual, real, executing code.  For my part, I am happy to continue to work
the bureaucratic side of things to make sure reports get filed, credentials get created, etc.
and the occasional patch.  I hope one day I will have time to contribute again.

I will follow up w/ a separate email on what I am going to put in the Board Report.
On Apr 7, 2014, at 1:52 AM, Sean Owen <> wrote:

> No, it's about the opposite. I'm referring to the default, current
> state of play here.
> The issues for a vendor are demand and supportability. Do people want
> to pay for support of X? Can you honestly say you have expertise to
> support and influence X over at least a major release cycle (12-18
> months)? The latter needs a reasonably reliable roadmap and
> continuity.
> I'm suggesting that in the current state, demand is low and going
> down. The current code base seems de facto deprecated/unsupported
> already, and possibly to be removed or dramatically changed into
> something as-yet unclear. Nobody here seems to have taken a hard
> decision regarding a next major release, but, the trajectory of that
> decision seems clear if the current state remains the same.
> From my perspective, "middle-ground" new directions like adding a bit
> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only
> worse. I can see why there may be a little renewed demand for the new
> bits, but then, why not go all in on one of them?
> Because a substantially all-new direction is a different story. If a
> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a
> lot of renewed demand. And a clearer underlying roadmap sounds
> possible. It would remain to be seen, but there's nothing stopping
> those ideas from becoming part of a distro too.
> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <> wrote:
>> Please be explicit here.  It sounds like you are saying that if Mahout goes
>> in the proposed new direction that Cloudera will drop Mahout.
>> Is that what you mean to say?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message