mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: mahout/solr integration
Date Fri, 16 Apr 2010 18:26:51 GMT

On Apr 16, 2010, at 2:21 PM, Jake Mannix wrote:

> So here's my take: once we're a TLP (next month sometime?), it is
> a good time to start allowing subprojects or submodules which are

Submodules, yes, subprojects, not so much, unless the committers are the same.  We can definitely
release different artifacts, etc. but the Board has made it pretty clear that umbrella projects
are not good.

> "scripting" layers on top of Mahout - whether they are PigLatin, or
> Cascalog, JRuby, or Clojure.  If it's JVM-based, especially, having
> code/scripts which are "drivers" and wrappers for what is currently
> for the most part a library which has a shell script (Taste-web is
> the exception) is a huge useful addition.
> 
> To compare, the guys doing Incanter ( http://data-sorcery.org/ )
> are using Clojure as a wrapper around Parallel-Colt
> ( http://sites.google.com/site/piotrwendykier/software/parallelcolt ),
> and does all of the heavy lifting in pure Java.  That's why I'm not
> too worried about performance stuff (to respond to Robin's concern
> down-thread while I'm writing this).
> 
> Personally, I'm not a Clojure guy, but Lisp has long been a mainstay
> of the AI world, and if we're going to interface with academic ML and
> AI more (which we should!!!), Clojure is going to be our best bet,
> as it's just Lisp on the JVM, and while were I going to write a REPL
> for us, I'd do it in JRuby (much like HBase uses JRuby's jirb for
> their shell), and may still, but more easy interactive ways of
> using Mahout, the better, I'd say.
> 
> Hmm... this was a bit scattered of a response, but I'm really loathe
> to turn away a) nice hooks between Solr and Mahout, b) scripting-style
> wrappers which could expand our community, and c) simply new
> functionality.

+1.  I hope to add in some more Solr hooks in the near future, too, including an implementation
of the ClusteringEngine and other things like DocumentProcessor chains, etc. for classifiers
(help wanted!)


> 
> I'm certainly game to help shepherd in any code we can use, although
> I guess I'm fine waiting to help make a sub-project once we're a TLP
> if that's the right way to go.

Let's see a patch first.  As with all of this stuff, if there are people who will work on
it and maintain it and it's related to Mahout, then I think we should take it in.

-Grant



Mime
View raw message