Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35D0D6672 for ; Wed, 18 May 2011 11:58:30 +0000 (UTC) Received: (qmail 15817 invoked by uid 500); 18 May 2011 11:58:29 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 15762 invoked by uid 500); 18 May 2011 11:58:29 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 15754 invoked by uid 99); 18 May 2011 11:58:29 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 11:58:29 +0000 Received: from localhost (HELO [10.0.0.77]) (127.0.0.1) (smtp-auth username gsingers, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 11:58:29 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: MongoDataModel From: Grant Ingersoll In-Reply-To: Date: Wed, 18 May 2011 07:58:27 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <2737B8A0-CBBC-41E2-9979-861A72DD11CF@apache.org> References: <71E16FA1-33B0-46CE-8487-54E3D8C9FB12@apache.org> To: dev@mahout.apache.org X-Mailer: Apple Mail (2.1084) On May 18, 2011, at 6:58 AM, Sean Owen wrote: > The reasoning that led to 'taste-webapp' is what leads to create an = expanded > 'mahout-integration'. >=20 > When I contributed my code, some folks asked, hmm, could we toss your = EJB > and web services integration, because it seems unfortunate to make the = whole > build depend on the EJB APIs and the Axis services. EJBs are defunct = enough > we just deleted that. Web services are not, but this integration was = not put > into core. It could have been, but it was better to farm it out. >=20 > The only reason it was called 'taste-webapp', which I sense is = striking > people as some kind of issue, is because that was the only thing it = happened > to contain. >=20 > But, we didn't put it in examples, whether by accident or design, and = that > sounds right to me. It's not just an example of how to use Mahout. It = is > something a user may use. >=20 >=20 > Consider the Lucene integration points. These are actually in core = (not > examples as I thought). I don't believe that feels quite right in = light of > the above thinking, which sounds right to me. "Most" Mahout usage does = not > involve Lucene. Some does. Lucene isn't core to Mahout, so probably > shouldn't be in core. >=20 Actually, I think it is core at this point, since we moved the = Vectorization stuff to core. Unfortunately, we need Lucene core in = order to get the baseline definitions of TokenStreams for Vectorization. > Should it be in examples? because there is also some Lucene stuff = there. > Better place, but not quite. Isn't this somewhere "more core" than = just > examples of end user usage, but not core? >=20 >=20 > And now we have MongoDB integration. Would we like to put it in core? = No, I > don't think so, per above. Is it just an example? Not really, though = that's > a better place. It feels again somewhere in between, in the same place = that > the web services integration fell. Agreed. I'm open to the move, just wanted to hear more about what it = would look like. I think it makes sense that all DataModel = implementations move along with it, other than the base classes, as a = MySQLJDBCDataModel is in the same class of things as a Mongo one (or = Cassandra or HBase, etc.) Likewise, we could do the same with our classification storage = implementations. (In fact, I wonder if there is duplication going on = between the Taste DataModel and the Classifier ones, at least a very low = level) >=20 >=20 > That's "taste-webapp" then, though of course the name would no longer = be > accurate. So just change the name. (Rather than make n new modules for = each > integration, right?) >=20 > And then I propose moving things like Lucene touch points there, yes. >=20 I'm not sure it can, unless you are moving all the Vectorization stuff = there. I think what we are seeing evolve here is a need for an ETL and = Persistence layer that contains all of these optional bits. Not sure = yet on where that belongs, although mahout-integration seems fine. We = probably could somehow work in what is in Utils there too and drop Utils = (or at least lighten it). So, could mahout-integration look like: 1. ETL tools (Iterables, Vectorization) 2. Storage mechanisms 3. Web services/REST Utils would keep the various other utils/helper classes. >=20 >=20 > Nothing more. This is not any attempt to abstract-ify anything = further. > Hadoop stuff remains in core since it's pretty core, for example. It's = just > rearranging modules in a way that, to me, seems both more logical and = more > consistent with past decisions. >=20 >=20 +1.=