Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 32266 invoked from network); 29 Dec 2009 14:45:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Dec 2009 14:45:04 -0000 Received: (qmail 72820 invoked by uid 500); 29 Dec 2009 14:45:03 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 72702 invoked by uid 500); 29 Dec 2009 14:45:02 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 72692 invoked by uid 99); 29 Dec 2009 14:45:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Dec 2009 14:45:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Dec 2009 14:44:51 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2EF37234C04C for ; Tue, 29 Dec 2009 06:44:30 -0800 (PST) Message-ID: <2110861162.1262097870190.JavaMail.jira@brutus.apache.org> Date: Tue, 29 Dec 2009 14:44:30 +0000 (UTC) From: "Grant Ingersoll (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-236) Field collapsing In-Reply-To: <19014374.1178921656142.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-236?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D127950= 63#action_12795063 ]=20 Grant Ingersoll commented on SOLR-236: -------------------------------------- bq. I'm curious as to whether anyone has just thought of using the Clusteri= ng component for this? If your "collapse" field was a single token, I wonde= r if you would get the results you're looking for. I would note, in looking at the Carrot2 code, they actually have a ByFieldC= lusteringAlgorithm (what they call synthetic clustering) which does field c= ollapsing/clustering on a value of a field. To quote the javadocs: {quote} Clusters documents into a flat structure based on the values of some field = of the documents. By default the {@link Document#SOURCES} field is used {quote} and {quote} * Name of the field to cluster by. Each non-null scalar field value with = distinct * hash code will give raise to a single cluster, named using the * {@link Object#toString()} value of the field. If the field value is = a collection, * the document will be assigned to all clusters corresponding to the v= alues in the * collection. Note that arrays will not be 'unfolded' in this way. *=20 {quote} I don't know how it performs, but it seems like it would at least be worth = investigating. Note, they also have a synthetic one for collapsing based on URL: ByUrlClus= teringAlgorithm Just food for thought. > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.3 > Reporter: Emmanuel Keller > Assignee: Shalin Shekhar Mangar > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-p= atch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsin= g-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-wit= h-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collap= se-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-= 5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.p= atch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patc= h, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, = field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-colla= psing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing= _1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwa= ld.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.pa= tch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-2= 36-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, S= OLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, S= OLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a gi= ven field to a single entry in the result set. Site collapsing is a special= case of this, where all results for a given web site is collapsed into one= or two entries in the result set, typically with an associated "more docum= ents from this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=3D48&amid=3D299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before c= ollapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.