Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 25578 invoked from network); 4 Jul 2009 15:43:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jul 2009 15:43:08 -0000 Received: (qmail 21027 invoked by uid 500); 4 Jul 2009 15:43:18 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 20939 invoked by uid 500); 4 Jul 2009 15:43:18 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 20929 invoked by uid 99); 4 Jul 2009 15:43:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jul 2009 15:43:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jul 2009 15:43:08 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5A44F234C046 for ; Sat, 4 Jul 2009 08:42:47 -0700 (PDT) Message-ID: <253112193.1246722167368.JavaMail.jira@brutus> Date: Sat, 4 Jul 2009 08:42:47 -0700 (PDT) From: "Yonik Seeley (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-769) Support Document and Search Result clustering In-Reply-To: <1740299645.1221101924312.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727247#action_12727247 ] Yonik Seeley commented on SOLR-769: ----------------------------------- Of course, now that I've removed the clustering libs from the solr.war, the example no longer works for some reason... looks like all the jars are in example/clustering/solr/lib, so it's classloading issues I imagine. On a related note, I'm not sure how useful it is to have a clustering component with multiple plugins itself... the extra level of plugins seems to just add more complexity. Different plugins could always share utility classes, perhaps even base classes, and could strive for a common output format - all without going to an additional plugin model. > Support Document and Search Result clustering > --------------------------------------------- > > Key: SOLR-769 > URL: https://issues.apache.org/jira/browse/SOLR-769 > Project: Solr > Issue Type: New Feature > Reporter: Grant Ingersoll > Assignee: Yonik Seeley > Priority: Minor > Fix For: 1.4 > > Attachments: clustering-componet-shard.patch, clustering-libs.tar, clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip > > > Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering. > The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results. > While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??????) for the clusters. I _may_ push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.