mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: clustering after search
Date Wed, 03 Nov 2010 13:29:14 GMT
Hmm, you should come to ApacheCon tomorrow in Atlanta where I will be talking/showing this.

Assuming you won't, I've started a small prototype that hooks in clustering via KMeans to
Solr's ClusteringComponent and will take and run the index through KMeans.  The hooks are
in there for doing the clustering on a DocSet (i.e. the results from a search) as you are
suggesting but I haven't implemented that yet and I don't know how well that will perform,
especially as compared to Carrot2, which is already integrated into Solr and is better designed
for the type of stuff you are wanting.  

You can take a look at _very early_ stage code at https://github.com/gsingers/ApacheCon2010.
 This is by no means production quality yet.  It is not even fully tested yet. 

(For those interested, that link also hooks in Mahout to provide Recommendations and to classify
documents using the Naive Bayes classifier.  This last bit is courtesy of Drew via our book
Taming Text.)

The one gotcha with the code is that you have to un-WAR the Solr WAR file and stuff all the
Mahout libs into the Solr WAR file because otherwise you get classloader problems between
Solr's Resource Loader and Hadoop's class loader.

-Grant


On Nov 2, 2010, at 2:54 PM, Borbála Siklósi wrote:

> Maybe I have quite a simple question, but I haven't been able to find out
> the solution. I have a solr index of doucuments and I run kmeans clustering
> on them. It all works fine. How can I do that I make a keyword search on the
> solr index and run the clustering only on the result set? Can I someway
> determine what documents the algorithm should cluster?

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Mime
View raw message