Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 81450 invoked from network); 7 Jul 2009 18:20:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Jul 2009 18:20:05 -0000 Received: (qmail 76783 invoked by uid 500); 7 Jul 2009 18:20:15 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 76698 invoked by uid 500); 7 Jul 2009 18:20:15 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 76688 invoked by uid 99); 7 Jul 2009 18:20:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jul 2009 18:20:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brad.giaccio@gmail.com designates 209.85.132.246 as permitted sender) Received: from [209.85.132.246] (HELO an-out-0708.google.com) (209.85.132.246) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jul 2009 18:20:06 +0000 Received: by an-out-0708.google.com with SMTP id b2so1757710ana.5 for ; Tue, 07 Jul 2009 11:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=HMuIdUPvLHUPBBIMNYzfO9x320ADomArBpUhwR1REVo=; b=j0W9BudcnDnA3+YoAtAcsazG/1bdrSUZRUW9x3LHW2AmzKe0O7TfUK5jivJmgZSfDs GRC+JH7HoB4DhJO5Z9YowGqvwb/hLT/qsuDr8QLBcsnKpqR8ubjHewIeavLfljyMgR7v IMZR7FyOxoU5A/Q7o+IOtR+sZBxA7OznVFkSE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=lUtYFMhcrLUqL2Gma4dt9RYXO0BSheGIOUomekBSyKtpPVkkpbIv/Vx6h80mJaE73F 8OgYhpbycieSlpS/v6kT/vhWa7MQIKetyIVFti7bQKlF5fl5QzspG+5HBdiB55aFfaWP qxkiECUAfBKMP045PtKPg4nt+GHNRREFyP5Qk= MIME-Version: 1.0 Received: by 10.100.144.14 with SMTP id r14mr10967795and.65.1246990785209; Tue, 07 Jul 2009 11:19:45 -0700 (PDT) In-Reply-To: <78643995.1246990574890.JavaMail.jira@brutus> References: <1740299645.1221101924312.JavaMail.jira@brutus> <78643995.1246990574890.JavaMail.jira@brutus> From: Brad Giaccio Date: Tue, 7 Jul 2009 14:19:25 -0400 Message-ID: <9362e4bb0907071119y6b387384lc5f873fc89b90986@mail.gmail.com> Subject: Re: [jira] Commented: (SOLR-769) Support Document and Search Result clustering To: solr-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e644cc6eca668b046e21aa12 X-Virus-Checked: Checked by ClamAV on apache.org --0016e644cc6eca668b046e21aa12 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit No problem, but now that you are the assignee perhaps you can apply it for me, its attached to the ticket as 'clustering-component-shard.patch' and it includes update junit tests. If it needs some work now that you have made some output changes I'll be glad to update. Thanks, Brad On Tue, Jul 7, 2009 at 2:16 PM, Yonik Seeley (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728271#action_12728271] > > Yonik Seeley commented on SOLR-769: > ----------------------------------- > > Apologies Brad - I didn't realize there were pending patches or I would > have not done the reformat. > > > Support Document and Search Result clustering > > --------------------------------------------- > > > > Key: SOLR-769 > > URL: https://issues.apache.org/jira/browse/SOLR-769 > > Project: Solr > > Issue Type: New Feature > > Reporter: Grant Ingersoll > > Assignee: Yonik Seeley > > Priority: Minor > > Fix For: 1.4 > > > > Attachments: clustering-componet-shard.patch, > clustering-libs.tar, clustering-libs.tar, SOLR-769-analyzerClass.patch, > SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, > SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, > SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, > SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip > > > > > > Clustering is a useful tool for working with documents and search > results, similar to the notion of dynamic faceting. Carrot2 ( > http://project.carrot2.org/) is a nice, BSD-licensed, library for doing > search results clustering. Mahout (http://lucene.apache.org/mahout) is > well suited for whole-corpus clustering. > > The patch I lays out a contrib module that starts off w/ an integration > of a SearchComponent for doing clustering and an implementation using > Carrot. In search results mode, it will use the DocList as the input for > the cluster. While Carrot2 comes w/ a Solr input component, it is not the > same as the SearchComponent that I have in that the Carrot example actually > submits a query to Solr, whereas my SearchComponent is just chained into the > Component list and uses the ResponseBuilder to add in the cluster results. > > While not fully fleshed out yet, the collection based mode will take in a > list of ids or just use the whole collection and will produce clusters. > Since this is a longer, typically offline task, there will need to be some > type of storage mechanism (and replication??????) for the clusters. I _may_ > push this off to a separate JIRA issue, but I at least want to present the > use case as part of the design of this component/contrib. It may even make > sense that we split this out, such that the building piece is something like > an UpdateProcessor and then the SearchComponent just acts as a lookup > mechanism. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > --0016e644cc6eca668b046e21aa12--