mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: MongoDataModel
Date Wed, 18 May 2011 11:58:27 GMT

On May 18, 2011, at 6:58 AM, Sean Owen wrote:

> The reasoning that led to 'taste-webapp' is what leads to create an expanded
> 'mahout-integration'.
> 
> When I contributed my code, some folks asked, hmm, could we toss your EJB
> and web services integration, because it seems unfortunate to make the whole
> build depend on the EJB APIs and the Axis services. EJBs are defunct enough
> we just deleted that. Web services are not, but this integration was not put
> into core. It could have been, but it was better to farm it out.
> 
> The only reason it was called 'taste-webapp', which I sense is striking
> people as some kind of issue, is because that was the only thing it happened
> to contain.
> 
> But, we didn't put it in examples, whether by accident or design, and that
> sounds right to me. It's not just an example of how to use Mahout. It is
> something a user may use.
> 
> 
> Consider the Lucene integration points. These are actually in core (not
> examples as I thought). I don't believe that feels quite right in light of
> the above thinking, which sounds right to me. "Most" Mahout usage does not
> involve Lucene. Some does. Lucene isn't core to Mahout, so probably
> shouldn't be in core.
> 

Actually, I think it is core at this point, since we moved the Vectorization stuff to core.
 Unfortunately, we need Lucene core in order to get the baseline definitions of TokenStreams
for Vectorization.

> Should it be in examples? because there is also some Lucene stuff there.
> Better place, but not quite. Isn't this somewhere "more core" than just
> examples of end user usage, but not core?
> 
> 
> And now we have MongoDB integration. Would we like to put it in core? No, I
> don't think so, per above. Is it just an example? Not really, though that's
> a better place. It feels again somewhere in between, in the same place that
> the web services integration fell.

Agreed.  I'm open to the move, just wanted to hear more about what it would look like.  I
think it makes sense that all DataModel implementations move along with it, other than the
base classes, as a MySQLJDBCDataModel is in the same class of things as a Mongo one (or Cassandra
or HBase, etc.)

Likewise, we could do the same with our classification storage implementations. (In fact,
I wonder if there is duplication going on between the Taste DataModel and the Classifier ones,
at least a very low level)


> 
> 
> That's "taste-webapp" then, though of course the name would no longer be
> accurate. So just change the name. (Rather than make n new modules for each
> integration, right?)
> 
> And then I propose moving things like Lucene touch points there, yes.
> 

I'm not sure it can, unless you are moving all the Vectorization stuff there.  I think what
we are seeing evolve here is a need for an ETL and Persistence layer that contains all of
these optional bits.  Not sure yet on where that belongs, although mahout-integration seems
fine.  We probably could somehow work in what is in Utils there too and drop Utils (or at
least lighten it).

So, could mahout-integration look like:
1. ETL tools (Iterables, Vectorization)
2. Storage mechanisms
3. Web services/REST

Utils would keep the various other utils/helper classes.

> 
> 
> Nothing more. This is not any attempt to abstract-ify anything further.
> Hadoop stuff remains in core since it's pretty core, for example. It's just
> rearranging modules in a way that, to me, seems both more logical and more
> consistent with past decisions.
> 
> 

+1.
Mime
View raw message