lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Harness <jkharnes...@gmail.com>
Subject Re: Question about LUCENE-3097 - Post Group Faceting
Date Fri, 05 Aug 2011 14:28:23 GMT
Martin -

     Thanks for the reply. I understand your answer about the segments.
However, I'm still cloudy about faceting with respect to the group head.
Perhaps an example will clarify my confusion.  Suppose I have 3 order
documents with the following data:

*orderNumber: 1
customerNumber: 1
totalInCents: 1500
productType: 'BOOK'

orderNumber: 2
customerNumber: 1
totalInCents: 500
productType: 'BOOK'

orderNumber: 3
customerNumber: 1
totalInCents: 1000
productType: 'DVD'

*

*     *Imagine I perform a search for items greater than or equal to 1000
cents grouped by customer number. I would expect to get order numbers 1 and
3 back grouped underneath customer id.  Lets assume that order number 1 is
considered the most relevant document (in your scenario). Will the post
group faceting miss that I actually have two facet values for productType:
BOOK and DVD?

Thanks!

Josh

On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen <
martijn.is.hier@gmail.com> wrote:

> Hi Josh,
>
> For post grouping the documents don't need to reside in the same segment.
> Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
> can
> collect the most relevant document for each group (GroupHead). This
> collector can produce a int[] or a FixedBitSet that can be used during
> faceting to produce
> post group facets (patch in SOLR-2665 uses this). During faceting only the
> the groupheads are known, because of this field values that are different in
> documents
> less relevant than the most relevant document of a group aren't taken into
> account. This is the same as in example described in the description of
> LUCENE-3097.
> Hope this helps!
>
> Martijn
>
>
> On 4 August 2011 22:59, Joshua Harness <jkharness87@gmail.com> wrote:
>
>> Hello -
>>
>>      Please let me know if this question is more appropriate of the user
>> list. I had assumed the developer list was more appropriate since the ticket
>> is still open.  I was analyzing the comments on LUCENE-3097<https://issues.apache.org/jira/browse/LUCENE-3097>and
had a couple of questions.
>>
>>      A comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953>started
a small thread that mentioned that all documents in a given group
>> would need to be contiguous and in the same segment. Also - a statement was
>> made that ' The app would have to ensure this'. I was unclear the result of
>> this conversation. It sounded like maybe this could have turned out to not
>> be the case. What is the status of this? Does my application have to ensure
>> all the documents in the group are in the same segment? How would one
>> accomplish this?
>>
>>      Another comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297>mentioned
that 'we pick only the head doc...as long as the head doc is
>> guaranteed to have the same value for field X, it safe to use that doc to
>> represent the entire group for facet counting'.  Does this mean that there
>> is a restriction placed on me that the head document must have field values
>> that match the rest of the documents in the same group? Or is this simply an
>> implementation detail that uses the head document when this condition is the
>> case or chooses another strategy when this is not the case?
>>
>>      I am very interested in adopting this patch. However - I am
>> attempting to understand any limitations/conditions so that I may use it
>> correctly. Any advice would be greatly appreciated.
>>
>> Thanks!
>>
>> Josh Harness
>>
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>

Mime
View raw message