mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <smar...@apache.org>
Subject Re: Tackling the "legacy dilemma"
Date Wed, 16 Apr 2014 02:59:51 GMT
The plan is to replace the existing Random Forests impl with a spark based
Streaming Random Forests.
As ssc had already mentioned the plan is not to entertain any new MR impls
but accept bug fixes for existing ones.


The consensus is to do away with existing MapReduce RF once the Spark based
Streaming Random Forests is in place.


On Tue, Apr 15, 2014 at 10:51 PM, Manoj Awasthi <awasthi.manoj@gmail.com>wrote:

>
> >  * remove Random Forest as we cannot even answer questions to the
> > implementation on the mailinglist
> >
>      -1 to removing present Random Forests. I think it is being used - we
> (at adobe) are playing around with it a bit.  If the reason for removal is
> that there no active maintainer that can be resolved by people using it
> getting more active on this - a community action. FWIW, I vote against
> throwing away this code.
>
>
>
> On Tue, Apr 15, 2014 at 2:38 PM, Sebastian Schelter <ssc@apache.org>wrote:
>
>> On 04/15/2014 11:07 AM, Suneel Marthi wrote:
>>
>>> On Tue, Apr 15, 2014 at 12:57 AM, Sebastian Schelter <ssc@apache.org>
>>> wrote:
>>>
>>>  Hi,
>>>>
>>>>  From reading the thread, I have the impression that we agree on the
>>>> following actions:
>>>>
>>>>
>>>>   * reject any future MR algorithm contributions, prominently state this
>>>> on the website and in talks
>>>>   * make all existing algorithm code compatible with Hadoop 2, if there
>>>> is
>>>> no one willing to make an existing algorithm compatible, remove the
>>>> algorithm
>>>>   * deprecate Canopy clustering
>>>>   * email the original FPM and random forest authors to ask for
>>>> maintenance
>>>> of the algorithms
>>>>   * rename core to "mr-legacy" (and  gradually pull items we really need
>>>> out of that later)
>>>>
>>>> I will create jira tickets for those action points. I think the biggest
>>>> challenge here is the Hadoop 2 compatibility, is someone volunteering to
>>>> drive that? Would be awesome.
>>>>
>>>>
>>> With things settling down at work for me, I have time now to dedicate
>>> back
>>> to Mahout. I can drive this effort.
>>>
>>
>> That is great news!
>>
>>
>>
>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>>
>>>> On 04/13/2014 07:19 PM, Andrew Musselman wrote:
>>>>
>>>>  This is a good summary of how I feel too.
>>>>>
>>>>>   On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <ssc@apache.org>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Unfortunately, its not that easy to get enough voluntary work. I
>>>>>> issued
>>>>>> the third call for working on the documentation today as there are
>>>>>> still
>>>>>> lots of open issues. That's why I'm trying to suggest a move that
>>>>>> involves
>>>>>> as few work as possible.
>>>>>>
>>>>>> We should get the MR codebase into a state that we all can live with
>>>>>> and
>>>>>> then focus on new stuff like the scala DSL.
>>>>>>
>>>>>> --sebastian
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   On 04/13/2014 07:09 PM, Giorgio Zoppi wrote:
>>>>>>
>>>>>>> The best thing, should be do a plan, and see how much effort
do you
>>>>>>> need to
>>>>>>> this. Then find out voluntaries to accomplish the task. Quite
sure
>>>>>>> that
>>>>>>> there a lot of people around there that they are willing to help
out.
>>>>>>>
>>>>>>> BR,
>>>>>>> deneb.
>>>>>>>
>>>>>>>
>>>>>>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <ssc@apache.org>:
>>>>>>>
>>>>>>>
>>>>>>>   Hi,
>>>>>>>
>>>>>>>>
>>>>>>>> I took some days to let the latest discussion about the state
and
>>>>>>>> future
>>>>>>>> of Mahout go through my head. I think the most important
thing to
>>>>>>>> address
>>>>>>>> right now is the MapReduce "legacy" codebase. A lot of the
MR
>>>>>>>> algorithms
>>>>>>>> are currently unmaintained, documentation is outdated and
the
>>>>>>>> original
>>>>>>>> authors have abandoned Mahout. For some algorithms it is
hard to get
>>>>>>>> even
>>>>>>>> questions answered on the mailinglist (e.g. RandomForest).
I agree
>>>>>>>> with
>>>>>>>> Sean's comments that letting the code linger around is no
option and
>>>>>>>> will
>>>>>>>> continue to harm Mahout.
>>>>>>>>
>>>>>>>> In the previous discussion, I suggested to make a radical
move and
>>>>>>>> aim
>>>>>>>> to
>>>>>>>> delete this codebase, but there were serious objections from
>>>>>>>> committers and
>>>>>>>> users that convinced me that there is still usage of and
interested
>>>>>>>> in
>>>>>>>> that
>>>>>>>> codebase.
>>>>>>>>
>>>>>>>> That puts us into a "legacy dilemma". We cannot delete the
code
>>>>>>>> without
>>>>>>>> harming our userbase. On the other hand, I don't see anyone
willing
>>>>>>>> to
>>>>>>>> rework the codebase. Further, the code cannot linger around
anymore
>>>>>>>> as
>>>>>>>> it
>>>>>>>> is doing now, especially when we fail to answer questions
or don't
>>>>>>>> provide
>>>>>>>> documentation.
>>>>>>>>
>>>>>>>> *We have to make a move*!
>>>>>>>>
>>>>>>>> I suggest the following actions with regard to the MR codebase.
I
>>>>>>>> hope
>>>>>>>> that they find consent. If there are objections, please give
>>>>>>>> alternatives,
>>>>>>>> *keeping everything as-is is not an option*:
>>>>>>>>
>>>>>>>>    * reject any future MR algorithm contributions, prominently
state
>>>>>>>> this on
>>>>>>>> the website and in talks
>>>>>>>>    * make all existing algorithm code compatible with Hadoop
2, if
>>>>>>>> there is
>>>>>>>> no one willing to make an existing algorithm compatible,
remove the
>>>>>>>> algorithm
>>>>>>>>    * deprecate the existing MR algorithms, yet still take
bug fix
>>>>>>>> contributions
>>>>>>>>    * remove Random Forest as we cannot even answer questions
to the
>>>>>>>> implementation on the mailinglist
>>>>>>>>
>>>>>>>> There are two more actions that I would like to see, but'd
be
>>>>>>>> willing
>>>>>>>> to
>>>>>>>> give up if there are objections:
>>>>>>>>
>>>>>>>>    * move the MR algorithms into a separate maven module
>>>>>>>>    * remove Frequent Pattern Mining again (we already aimed
for
>>>>>>>> that in
>>>>>>>> 0.9
>>>>>>>> but had one user who shouted but never returned to us)
>>>>>>>>
>>>>>>>> Let me know what you think.
>>>>>>>>
>>>>>>>> --sebastian
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message