From user-return-5176-apmail-mahout-user-archive=mahout.apache.org@mahout.apache.org Wed Nov 03 13:29:14 2010 Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 52783 invoked from network); 3 Nov 2010 13:29:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Nov 2010 13:29:14 -0000 Received: (qmail 82525 invoked by uid 500); 3 Nov 2010 13:29:45 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 82169 invoked by uid 500); 3 Nov 2010 13:29:42 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 82161 invoked by uid 99); 3 Nov 2010 13:29:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 13:29:40 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 03 Nov 2010 13:29:37 +0000 Received: (qmail 52457 invoked by uid 99); 3 Nov 2010 13:28:44 -0000 Received: from localhost.apache.org (HELO [10.10.1.195]) (127.0.0.1) (smtp-auth username gsingers, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 13:28:44 +0000 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: clustering after search From: Grant Ingersoll In-Reply-To: Date: Wed, 3 Nov 2010 09:29:14 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@mahout.apache.org X-Mailer: Apple Mail (2.1081) X-Virus-Checked: Checked by ClamAV on apache.org Hmm, you should come to ApacheCon tomorrow in Atlanta where I will be = talking/showing this. Assuming you won't, I've started a small prototype that hooks in = clustering via KMeans to Solr's ClusteringComponent and will take and = run the index through KMeans. The hooks are in there for doing the = clustering on a DocSet (i.e. the results from a search) as you are = suggesting but I haven't implemented that yet and I don't know how well = that will perform, especially as compared to Carrot2, which is already = integrated into Solr and is better designed for the type of stuff you = are wanting. =20 You can take a look at _very early_ stage code at = https://github.com/gsingers/ApacheCon2010. This is by no means = production quality yet. It is not even fully tested yet.=20 (For those interested, that link also hooks in Mahout to provide = Recommendations and to classify documents using the Naive Bayes = classifier. This last bit is courtesy of Drew via our book Taming = Text.) The one gotcha with the code is that you have to un-WAR the Solr WAR = file and stuff all the Mahout libs into the Solr WAR file because = otherwise you get classloader problems between Solr's Resource Loader = and Hadoop's class loader. -Grant On Nov 2, 2010, at 2:54 PM, Borb=E1la Sikl=F3si wrote: > Maybe I have quite a simple question, but I haven't been able to find = out > the solution. I have a solr index of doucuments and I run kmeans = clustering > on them. It all works fine. How can I do that I make a keyword search = on the > solr index and run the clustering only on the result set? Can I = someway > determine what documents the algorithm should cluster? -------------------------- Grant Ingersoll http://www.lucidimagination.com/