lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Thu, 14 Oct 2010 20:01:32 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-2690:
----------------------------------

    Attachment: LUCENE-2690.patch

Attached is a new patch with two changes:

- moved the BQ reordering to MTQ for now. A general reordering of BooleanQueries should be
done in a separate issue (with more performant rewrite). Currently this uses the same comparator
like BQ before. You may wonder: why not simply use a sorted map? - the idea is that sorting
at the end is faster than using a TreeMap where all terms are compared against (even those
falling out of queue). I sort the BQ clauses directly like BQ, to not create an additional
array to hold all terms again. Maybe its still faster by copying all BytesRefs to an array
before and then build BQ? For now this should be enough. To improve we need SorterTemplate
again (for the BytesRefHash case) :-)
- fixed an issue with the PQ in TopTermsRewrite: The bottom information was previously only
set when the PQ was overflowing. In the past and now its set once the queue is full. This
was an optimization bug, its now as it was always. Maybe this explains Mike's score changes
on wikipedia index?

Mike: can you test?

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch,
LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message