mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Board Report
Date Mon, 07 Apr 2014 11:37:55 GMT
+1 and agree   

I might have a little longer off ramp for the old style.  I don't see a strong need to completely
revamp the map-reduce based code.  Nor is the legacy stuff around the preference database
worth salvaging.  

It cannot reasonably argued that usage is low and declining while simultaneously saying that
perpetual support of old code is required.  


Sent from my iPhone

> On Apr 7, 2014, at 4:08, Suneel Marthi <suneel_marthi@yahoo.com> wrote:
> 
> +1 and agree with ssc's suggestion.
> 
> 
> 
> Sent from my iPhone
> 
>> On Apr 7, 2014, at 3:30 AM, Sebastian Schelter <ssc@apache.org> wrote:
>> 
>> I agree that the state of the MR code is something that needs to be addressed. There
have been several attempts to rework/refactor it, but none of them had a satisfactory result
unfortunately.
>> 
>> I'm hearing that there is lack for a coherent vision for the future of Mahout. Let
me suggest a radical one.
>> 
>> - call the next release 0.10 not 1.0, as the latter implies a maturity which does
not reflect the radical changes I'm proposing
>> 
>> - move all the MR code to a new maven module, deprecate it and announce that we delete
it in the release after 0.11
>> 
>> - make the new DSL the heart of Mahout, aim for the following algorithms to be implemented
in the DSL as a new basis:
>> 
>> Collaborative Filtering:
>> 
>> * Cooccurrence-based recommender (work started in MAHOUT-1464)
>> * ALS (work started in MAHOUT-1365)
>> 
>> Clustering:
>> 
>> * k-Means
>> * Streaming k-Means
>> 
>> Classification:
>> 
>> * NaiveBayes (work started in MAHOUT-1493)
>> * either Random Forests or an ensemble of SGD classifiers
>> 
>> Dimensionality Reduction / Topic Models
>> 
>> * SSVD (prototype in trunk)
>> * PCA (prototype in trunk)
>> * LDA
>> 
>> 
>> - integrate Stratosphere / h20 as follows:
>> 
>> * the Stratosphere guys can choose to implement the physical operators of the DSL
to make our algos run on Stratosphere. If they do, this is great for Mahout as it allows people
to run code on different backends. If they don't, we don't lose anything.
>> 
>> * a major point in porting the algorithms to the DSL would be to make the input formats
of all algorithms consistent. That would allow h20 to work off the same inputs the scala DSL.
>> 
>> Let me know what you think.
>> 
>> -s
>> 
>> 
>> 
>> 
>> 
>>> On 04/06/2014 05:54 PM, Sean Owen wrote:
>>> On Sun, Apr 6, 2014 at 4:16 PM, Andrew Musselman
>>> <andrew.musselman@gmail.com> wrote:
>>>> Seems to me there has been a renewed effort to eat our broccoli, along with
>>>> the other ideas people have been bringing on board.
>>>> 
>>>> What are you proposing to put in the board report?
>>> 
>>> I have not seen significant activity to unify or update the existing
>>> code. It's still the same different chunks with different styles,
>>> input/output, distributed/not, etc. The doc updates look very
>>> positive. To be fair the task of really addressing the technical debt
>>> is very large, so even making said dent would be a lot of work. A
>>> clean-slate reboot therefore actually seems like a good plan, but
>>> that's another question...
>>> 
>>> Concretely, in a board report, I personally would not agree with
>>> representing the Spark or H2O work as an agreed future plan or
>>> roadmap, right now. Being in the board report makes that impression,
>>> as have recent articles/tweets I've seen, so it deserves care. That's
>>> why I chimed in, maybe tilting at windmills.
>>> 
>>> From where I sit with customers, the overall impression is negative
>>> among those that have tried to use the code, and usage has gone from
>>> few to almost none. I doubt my sample is so different from the whole
>>> user population. Much of it is consistency/quality, but some of it's
>>> just an interest in non-M/R frameworks.
>>> 
>>> So, I think that current state and set of problems is far more
>>> important to acknowledge in a board report than just mentioning some
>>> future possibilities, and the latter was the impression I got of the
>>> likely content. In fact, it makes the talk about large upcoming
>>> possible changes make so much more sense.
>> 

Mime
View raw message