Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CCEA5662F for ; Mon, 16 May 2011 21:11:31 +0000 (UTC) Received: (qmail 71704 invoked by uid 500); 16 May 2011 21:11:30 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 71613 invoked by uid 500); 16 May 2011 21:11:30 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 71494 invoked by uid 99); 16 May 2011 21:11:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 21:11:30 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 21:11:27 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6D3D0CC52F for ; Mon, 16 May 2011 21:10:47 +0000 (UTC) Date: Mon, 16 May 2011 21:10:47 +0000 (UTC) From: "Martijn van Groningen (JIRA)" To: dev@lucene.apache.org Message-ID: <424506637.16711.1305580247444.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1588338454.13015.1305382307497.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3097) Post grouping faceting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034312#comment-13034312 ] Martijn van Groningen commented on LUCENE-3097: ----------------------------------------------- bq. Ie, we just have to insure, at indexing time, that docs within the same "group" are adjacent, if you want to be able to count by unique group values. This means that in the same group also need to be in the same segment, right? Or if we use this mechanism for faceting documents with the same facet need to be in the same segment??? If that is true, it would make the collectors easier. The SentinelIntSet we use in the collectors is not necessary, because we can lookup the norm from the DocIndexTerms. We won't find the same group in a different segment. On the other hand with scalability in mind would make it complex. Since documents with the in the same group need to be in the same segment. Which makes indexing complex. > Post grouping faceting > ---------------------- > > Key: LUCENE-3097 > URL: https://issues.apache.org/jira/browse/LUCENE-3097 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Martijn van Groningen > Priority: Minor > Fix For: 3.2, 4.0 > > > This issues focuses on implementing post grouping faceting. > * How to handle multivalued fields. What field value to show with the facet. > * Where the facet counts should be based on > ** Facet counts can be based on the normal documents. Ungrouped counts. > ** Facet counts can be based on the groups. Grouped counts. > ** Facet counts can be based on the combination of group value and facet value. Matrix counts. > And properly more implementation options. > The first two methods are implemented in the SOLR-236 patch. For the first option it calculates a DocSet based on the individual documents from the query result. For the second option it calculates a DocSet for all the most relevant documents of a group. Once the DocSet is computed the FacetComponent and StatsComponent use one the DocSet to create facets and statistics. > This last one is a bit more complex. I think it is best explained with an example. Lets say we search on travel offers: > |||hotel||departure_airport||duration|| > |Hotel a|AMS|5 > |Hotel a|DUS|10 > |Hotel b|AMS|5 > |Hotel b|AMS|10 > If we group by hotel and have a facet for airport. Most end users expect (according to my experience off course) the following airport facet: > AMS: 2 > DUS: 1 > The above result can't be achieved by the first two methods. You either get counts AMS:3 and DUS:1 or 1 for both airports. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org