Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@locus.apache.org Received: (qmail 79859 invoked from network); 5 Jun 2007 13:20:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Jun 2007 13:20:19 -0000 Received: (qmail 73875 invoked by uid 500); 5 Jun 2007 13:20:22 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 73848 invoked by uid 500); 5 Jun 2007 13:20:22 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 73839 invoked by uid 99); 5 Jun 2007 13:20:22 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2007 06:20:22 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [207.31.238.50] (HELO athena.getconnectedinc.com) (207.31.238.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2007 06:20:17 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: [jira] Commented: (SOLR-236) Field collapsing Date: Tue, 5 Jun 2007 09:19:55 -0400 Message-ID: <3D97BD422499564FA26C6F7160FA95520AB766C7@athena.getconnectedinc.com> In-Reply-To: <3546525.1181048434219.JavaMail.jira@brutus> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [jira] Commented: (SOLR-236) Field collapsing Thread-Index: AcencZATLmdCN0bIR4aAAX7CSHNubwAAY2mw References: <19014374.1178921656142.JavaMail.jira@brutus> <3546525.1181048434219.JavaMail.jira@brutus> From: "Will Johnson" To: X-Virus-Checked: Checked by ClamAV on apache.org I haven't looked at any of the patches but I can comment some other uses for the feature that are in production today with major vendors. While it's used for site collapsing ala google it's also heavily used in ecommerce settings. Check out BestBuy.com/circuitcity/etc and do a search for some really generic word like 'cable' and notice all the groups of items; BB shows 3 per group, CC shows 1 per group. In each case it's not clear that the number of docs is really limited at all, ie it's more important to get back all the categories with n docs per category and the counts per category than it is to get back a fixed number of results or even categories for that matter. Also notice that neither of these sites allow you to page through the categorized results. I'd also point out that many vendors require the collapsing field to be an int instead of a string and then force the front end to do the mapping. just one more thing to consider.... - will =20 -----Original Message----- From: Yonik Seeley (JIRA) [mailto:jira@apache.org]=20 Sent: Tuesday, June 05, 2007 9:01 AM To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-236) Field collapsing [ https://issues.apache.org/jira/browse/SOLR-236?page=3Dcom.atlassian.jira.= p lugin.system.issuetabpanels:comment-tabpanel#action_12501550 ]=20 Yonik Seeley commented on SOLR-236: ----------------------------------- I guess adjacent collapsing can make sense when one is sorting by the field that is being collapsed. For the normal collapsing though, this patch appears to implement it by changing the sort order to the collapsing field (normally not desired). For example, if sorting by relevance and collapsing on a field, one would normally want the groups sorted by relevance (with the group relevance defined as the max score of it's members). As far as how to do paging, it makes sense to rigidly define it in terms of number of documents, regardless of how many documents are in each group. Going back to google, it always displays the first 10 documents, but a variable number of groups. That does mean that a group could be split across pages. It would actually be much simpler (IMO) to always return a fixed number of groups rather than a fixed number of documents, but I don't think this would be less useful to people. Thoughts? > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.2 > Reporter: Emmanuel Keller > Attachments: field_collapsing_1.1.0.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated "more documents from this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=3D48&amid=3D299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version (1.2) > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.