lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Thu, 14 Oct 2010 19:13:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921081#action_12921081
] 

Yonik Seeley commented on LUCENE-2690:
--------------------------------------

bq. For random queries it had a huge positive impact on query perf. 

If the clauses were just term queries, that would make me really suspect the test.
If it was MTQ queries, then MTQ should sort, not BQ.

bq. The BQ cloning/reordering was not measureable.

Right - I would expect that for typical queries and typical uses.
I guess I'm worried about the atypical cases since I've seen so many of them - people putting
together single boolean queries with 10K clauses, people doing complex nested queries with
thousands of terms, or people executing thousands of queries per request (or per document
added, via memory index) where this overhead suddenly becomes significant.

bq. We are still working on this patch, its marked as TODO, so we will investigate further.

Cool :-)

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch,
LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message