lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Thu, 14 Oct 2010 12:28:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920940#action_12920940
] 

Robert Muir commented on LUCENE-2690:
-------------------------------------

I will play with the latest patch some, and hopefully upload a new one.

The real solution to this "tie-break" case really is the fact that the priority queue comparison
is "compare by boost, then term text".

With the MultiTermsEnum this was no problem, because we look at all terms in order, so we
made MaxNonCompetitiveBoostAttribut just a float.

With per-segment rewrite, then we can look at terms out-of-order.

So I think if we add the optional term text of the pq's bottom for the previous segment to
the MaxNonCompetitiveBoostAttribute itself, then the enum itself can implement the tie break,
cleaner, and more efficiently. The rewrite method should or consumer should only be setting
the values of this attribute and not dealing with this case.


> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, LUCENE-2690-hack.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message