lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.is.h...@gmail.com>
Subject Re: Question about LUCENE-3097 - Post Group Faceting
Date Sat, 06 Aug 2011 13:27:05 GMT
The facet result for field productType will show the following count:
BOOK: 1
DVD: 0

So yes, because of post group faceting you'll miss the second facet.
This is basically the same example I described in LUCENE-3097.

I've also described three ways of calculating facet counts in combination
grouping.
The third way which I've named matrix counts (field value & group value
combination) would give the result that you expect.
However this isn't implemented yet. In Solr this would require changes in
the FacetComponent.
I hope this explains it a bit!

Martijn

On 5 August 2011 16:28, Joshua Harness <jkharness87@gmail.com> wrote:

> Martin -
>
>      Thanks for the reply. I understand your answer about the segments.
> However, I'm still cloudy about faceting with respect to the group head.
> Perhaps an example will clarify my confusion.  Suppose I have 3 order
> documents with the following data:
>
> *orderNumber: 1
> customerNumber: 1
> totalInCents: 1500
> productType: 'BOOK'
>
> orderNumber: 2
> customerNumber: 1
> totalInCents: 500
> productType: 'BOOK'
>
> orderNumber: 3
> customerNumber: 1
> totalInCents: 1000
> productType: 'DVD'
>
> *
>
> *     *Imagine I perform a search for items greater than or equal to 1000
> cents grouped by customer number. I would expect to get order numbers 1 and
> 3 back grouped underneath customer id.  Lets assume that order number 1 is
> considered the most relevant document (in your scenario). Will the post
> group faceting miss that I actually have two facet values for productType:
> BOOK and DVD?
>
> Thanks!
>
> Josh
>
>
> On Fri, Aug 5, 2011 at 4:22 AM, Martijn v Groningen <
> martijn.is.hier@gmail.com> wrote:
>
>> Hi Josh,
>>
>> For post grouping the documents don't need to reside in the same segment.
>> Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
>> can
>> collect the most relevant document for each group (GroupHead). This
>> collector can produce a int[] or a FixedBitSet that can be used during
>> faceting to produce
>> post group facets (patch in SOLR-2665 uses this). During faceting only the
>> the groupheads are known, because of this field values that are different in
>> documents
>> less relevant than the most relevant document of a group aren't taken into
>> account. This is the same as in example described in the description of
>> LUCENE-3097.
>> Hope this helps!
>>
>> Martijn
>>
>>
>> On 4 August 2011 22:59, Joshua Harness <jkharness87@gmail.com> wrote:
>>
>>> Hello -
>>>
>>>      Please let me know if this question is more appropriate of the user
>>> list. I had assumed the developer list was more appropriate since the ticket
>>> is still open.  I was analyzing the comments on LUCENE-3097<https://issues.apache.org/jira/browse/LUCENE-3097>and
had a couple of questions.
>>>
>>>      A comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953>started
a small thread that mentioned that all documents in a given group
>>> would need to be contiguous and in the same segment. Also - a statement was
>>> made that ' The app would have to ensure this'. I was unclear the result of
>>> this conversation. It sounded like maybe this could have turned out to not
>>> be the case. What is the status of this? Does my application have to ensure
>>> all the documents in the group are in the same segment? How would one
>>> accomplish this?
>>>
>>>      Another comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297>mentioned
that 'we pick only the head doc...as long as the head doc is
>>> guaranteed to have the same value for field X, it safe to use that doc to
>>> represent the entire group for facet counting'.  Does this mean that there
>>> is a restriction placed on me that the head document must have field values
>>> that match the rest of the documents in the same group? Or is this simply an
>>> implementation detail that uses the head document when this condition is the
>>> case or chooses another strategy when this is not the case?
>>>
>>>      I am very interested in adopting this patch. However - I am
>>> attempting to understand any limitations/conditions so that I may use it
>>> correctly. Any advice would be greatly appreciated.
>>>
>>> Thanks!
>>>
>>> Josh Harness
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message