lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5409) ToParentBlockJoinCollector.getTopGroups returns empty Groups
Date Fri, 24 Jan 2014 21:59:38 GMT


Michael McCandless commented on LUCENE-5409:

Ahh ... two rewrites will store the wrong origChildQuery.  Nice catch :)

But: do we first have a failing test case here?

Also, were you rewriting yourself externally in your application?  Or is there some path through
Lucene that results in more than one rewrite?

> ToParentBlockJoinCollector.getTopGroups returns empty Groups
> ------------------------------------------------------------
>                 Key: LUCENE-5409
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.6
>         Environment: Ubuntu 12.04
>            Reporter: Peng Cheng
>            Assignee: Michael McCandless
>            Priority: Critical
>             Fix For: 4.7
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> A bug is observed to cause unstable results returned by the getTopGroups function of
class ToParentBlockJoinCollector.
> In the scorer generation stage, the ToParentBlockJoinCollector will automatically rewrite
all the associated ToParentBlockJoinQuery (and their subqueries), and save them into its in-memory
Look-up table, namely joinQueryID (see enroll() method for detail). Unfortunately, in the
getTopGroups method, the new ToParentBlockJoinQuery parameter is not rewritten (at least users
are not expected to do so). When the new one is searched in the old lookup table (considering
the impact of rewrite() on hashCode()), the lookup will largely fail and eventually end up
with a topGroup collection consisting of only empty groups (their hitCounts are guaranteed
to be zero).
> An easy fix would be to rewrite the original BlockJoinQuery before invoking getTopGroups
method. However, the computational cost of this is not optimal. A better but slightly more
complex solution would be to save unrewrited Queries into the lookup table.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message