Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F3B3105FA for ; Mon, 7 Apr 2014 11:38:52 +0000 (UTC) Received: (qmail 25662 invoked by uid 500); 7 Apr 2014 11:38:50 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 24480 invoked by uid 500); 7 Apr 2014 11:38:42 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 24240 invoked by uid 99); 7 Apr 2014 11:38:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2014 11:38:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.160.42 as permitted sender) Received: from [209.85.160.42] (HELO mail-pb0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2014 11:38:25 +0000 Received: by mail-pb0-f42.google.com with SMTP id rr13so6608102pbb.15 for ; Mon, 07 Apr 2014 04:38:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:from:content-type:in-reply-to:message-id:date:to :content-transfer-encoding:mime-version; bh=Yk3tiuJbetlVD2vmfjT/ld0Y02A/t2kYl8PA16KVX4w=; b=tZsUhkutpKk/fXwK6k/+094B/PlQW+pNObnTCjzy+i4ybzZEaG87RETmR1N1rjHrbw jwN3eMcAClBZ4LWMlcXP02rn3oNuDL1Vd2yBIcx5lUEbv7xkPccqIXmpVdaH7/xrjxb1 nWLFG8JYJ6M1DFUMt2xXoAg/xvVz0gP6DSspV8zDLSMefyyYM2rWVk8qbFXvgVQHAF1n NkJT1G4ZgXT/2a+yJkr+9AB/HSjTqYHzFT0G0G+5cVwcaGazqBJoe68bvvY0kTc77NP1 zoRuLNgB0/EvvpZhnq9X8rPxlnfsr2/IyuudgBwVHMmNZWx/3sY+DIDU1Rnsms+kUTRL UBfA== X-Received: by 10.68.136.2 with SMTP id pw2mr1108039pbb.167.1396870682682; Mon, 07 Apr 2014 04:38:02 -0700 (PDT) Received: from [10.59.233.66] (mobile-198-228-221-087.mycingular.net. [198.228.221.87]) by mx.google.com with ESMTPSA id f3sm36260475pbg.60.2014.04.07.04.37.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 04:38:01 -0700 (PDT) Subject: Re: Board Report References: <53412F64.9090308@apache.org> <534253FF.8030005@apache.org> <81B9D55B-0480-43DB-B967-2CBF1413ED1A@yahoo.com> From: Ted Dunning Content-Type: text/plain; charset=us-ascii X-Mailer: iPhone Mail (11D167) In-Reply-To: <81B9D55B-0480-43DB-B967-2CBF1413ED1A@yahoo.com> Message-Id: Date: Mon, 7 Apr 2014 04:37:55 -0700 To: "dev@mahout.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org +1 and agree =20 I might have a little longer off ramp for the old style. I don't see a stro= ng need to completely revamp the map-reduce based code. Nor is the legacy s= tuff around the preference database worth salvaging. =20 It cannot reasonably argued that usage is low and declining while simultaneo= usly saying that perpetual support of old code is required. =20 Sent from my iPhone > On Apr 7, 2014, at 4:08, Suneel Marthi wrote: >=20 > +1 and agree with ssc's suggestion. >=20 >=20 >=20 > Sent from my iPhone >=20 >> On Apr 7, 2014, at 3:30 AM, Sebastian Schelter wrote: >>=20 >> I agree that the state of the MR code is something that needs to be addre= ssed. There have been several attempts to rework/refactor it, but none of th= em had a satisfactory result unfortunately. >>=20 >> I'm hearing that there is lack for a coherent vision for the future of Ma= hout. Let me suggest a radical one. >>=20 >> - call the next release 0.10 not 1.0, as the latter implies a maturity wh= ich does not reflect the radical changes I'm proposing >>=20 >> - move all the MR code to a new maven module, deprecate it and announce t= hat we delete it in the release after 0.11 >>=20 >> - make the new DSL the heart of Mahout, aim for the following algorithms t= o be implemented in the DSL as a new basis: >>=20 >> Collaborative Filtering: >>=20 >> * Cooccurrence-based recommender (work started in MAHOUT-1464) >> * ALS (work started in MAHOUT-1365) >>=20 >> Clustering: >>=20 >> * k-Means >> * Streaming k-Means >>=20 >> Classification: >>=20 >> * NaiveBayes (work started in MAHOUT-1493) >> * either Random Forests or an ensemble of SGD classifiers >>=20 >> Dimensionality Reduction / Topic Models >>=20 >> * SSVD (prototype in trunk) >> * PCA (prototype in trunk) >> * LDA >>=20 >>=20 >> - integrate Stratosphere / h20 as follows: >>=20 >> * the Stratosphere guys can choose to implement the physical operators of= the DSL to make our algos run on Stratosphere. If they do, this is great fo= r Mahout as it allows people to run code on different backends. If they don'= t, we don't lose anything. >>=20 >> * a major point in porting the algorithms to the DSL would be to make the= input formats of all algorithms consistent. That would allow h20 to work of= f the same inputs the scala DSL. >>=20 >> Let me know what you think. >>=20 >> -s >>=20 >>=20 >>=20 >>=20 >>=20 >>> On 04/06/2014 05:54 PM, Sean Owen wrote: >>> On Sun, Apr 6, 2014 at 4:16 PM, Andrew Musselman >>> wrote: >>>> Seems to me there has been a renewed effort to eat our broccoli, along w= ith >>>> the other ideas people have been bringing on board. >>>>=20 >>>> What are you proposing to put in the board report? >>>=20 >>> I have not seen significant activity to unify or update the existing >>> code. It's still the same different chunks with different styles, >>> input/output, distributed/not, etc. The doc updates look very >>> positive. To be fair the task of really addressing the technical debt >>> is very large, so even making said dent would be a lot of work. A >>> clean-slate reboot therefore actually seems like a good plan, but >>> that's another question... >>>=20 >>> Concretely, in a board report, I personally would not agree with >>> representing the Spark or H2O work as an agreed future plan or >>> roadmap, right now. Being in the board report makes that impression, >>> as have recent articles/tweets I've seen, so it deserves care. That's >>> why I chimed in, maybe tilting at windmills. >>>=20 >>> =46rom where I sit with customers, the overall impression is negative >>> among those that have tried to use the code, and usage has gone from >>> few to almost none. I doubt my sample is so different from the whole >>> user population. Much of it is consistency/quality, but some of it's >>> just an interest in non-M/R frameworks. >>>=20 >>> So, I think that current state and set of problems is far more >>> important to acknowledge in a board report than just mentioning some >>> future possibilities, and the latter was the impression I got of the >>> likely content. In fact, it makes the talk about large upcoming >>> possible changes make so much more sense. >>=20