lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.is.h...@gmail.com>
Subject Re: Question about LUCENE-3097 - Post Group Faceting
Date Fri, 05 Aug 2011 08:22:51 GMT
Hi Josh,

For post grouping the documents don't need to reside in the same segment.
Lucene's grouping module has a collector (TermAllGroupHeadsCollector) that
can
collect the most relevant document for each group (GroupHead). This
collector can produce a int[] or a FixedBitSet that can be used during
faceting to produce
post group facets (patch in SOLR-2665 uses this). During faceting only the
the groupheads are known, because of this field values that are different in
documents
less relevant than the most relevant document of a group aren't taken into
account. This is the same as in example described in the description of
LUCENE-3097.
Hope this helps!

Martijn

On 4 August 2011 22:59, Joshua Harness <jkharness87@gmail.com> wrote:

> Hello -
>
>      Please let me know if this question is more appropriate of the user
> list. I had assumed the developer list was more appropriate since the ticket
> is still open.  I was analyzing the comments on LUCENE-3097<https://issues.apache.org/jira/browse/LUCENE-3097>and
had a couple of questions.
>
>      A comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13033953&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13033953>started
a small thread that mentioned that all documents in a given group
> would need to be contiguous and in the same segment. Also - a statement was
> made that ' The app would have to ensure this'. I was unclear the result of
> this conversation. It sounded like maybe this could have turned out to not
> be the case. What is the status of this? Does my application have to ensure
> all the documents in the group are in the same segment? How would one
> accomplish this?
>
>      Another comment<https://issues.apache.org/jira/browse/LUCENE-3097?focusedCommentId=13038297&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13038297>mentioned
that 'we pick only the head doc...as long as the head doc is
> guaranteed to have the same value for field X, it safe to use that doc to
> represent the entire group for facet counting'.  Does this mean that there
> is a restriction placed on me that the head document must have field values
> that match the rest of the documents in the same group? Or is this simply an
> implementation detail that uses the head document when this condition is the
> case or chooses another strategy when this is not the case?
>
>      I am very interested in adopting this patch. However - I am attempting
> to understand any limitations/conditions so that I may use it correctly. Any
> advice would be greatly appreciated.
>
> Thanks!
>
> Josh Harness
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message