lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Harness <>
Subject Re: Question about LUCENE-3097 - Post Group Faceting
Date Fri, 05 Aug 2011 14:28:23 GMT
Martin -

     Thanks for the reply. I understand your answer about the segments.
However, I'm still cloudy about faceting with respect to the group head.
Perhaps an example will clarify my confusion.  Suppose I have 3 order
documents with the following data:

*orderNumber: 1
customerNumber: 1
totalInCents: 1500
productType: 'BOOK'

orderNumber: 2
customerNumber: 1
totalInCents: 500
productType: 'BOOK'

orderNumber: 3
customerNumber: 1
totalInCents: 1000
productType: 'DVD'


*     *Imagine I perform a search for items greater than or equal to 1000
cents grouped by customer number. I would expect to get order numbers 1 and
3 back grouped underneath customer id.  Lets assume that order number 1 is
considered the most relevant document (in your scenario). Will the post
group faceting miss that I actually have two facet values for productType:



On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen <> wrote:

> Hi Josh,
> For post grouping the documents don't need to reside in the same segment.
> Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
> can
> collect the most relevant document for each group (GroupHead). This
> collector can produce a int[] or a FixedBitSet that can be used during
> faceting to produce
> post group facets (patch in SOLR-2665 uses this). During faceting only the
> the groupheads are known, because of this field values that are different in
> documents
> less relevant than the most relevant document of a group aren't taken into
> account. This is the same as in example described in the description of
> LUCENE-3097.
> Hope this helps!
> Martijn
> On 4 August 2011 22:59, Joshua Harness <> wrote:
>> Hello -
>>      Please let me know if this question is more appropriate of the user
>> list. I had assumed the developer list was more appropriate since the ticket
>> is still open.  I was analyzing the comments on LUCENE-3097<>and
had a couple of questions.
>>      A comment<>started
a small thread that mentioned that all documents in a given group
>> would need to be contiguous and in the same segment. Also - a statement was
>> made that ' The app would have to ensure this'. I was unclear the result of
>> this conversation. It sounded like maybe this could have turned out to not
>> be the case. What is the status of this? Does my application have to ensure
>> all the documents in the group are in the same segment? How would one
>> accomplish this?
>>      Another comment<>mentioned
that 'we pick only the head long as the head doc is
>> guaranteed to have the same value for field X, it safe to use that doc to
>> represent the entire group for facet counting'.  Does this mean that there
>> is a restriction placed on me that the head document must have field values
>> that match the rest of the documents in the same group? Or is this simply an
>> implementation detail that uses the head document when this condition is the
>> case or chooses another strategy when this is not the case?
>>      I am very interested in adopting this patch. However - I am
>> attempting to understand any limitations/conditions so that I may use it
>> correctly. Any advice would be greatly appreciated.
>> Thanks!
>> Josh Harness
> --
> Met vriendelijke groet,
> Martijn van Groningen

View raw message