lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment
Date Fri, 15 Oct 2010 00:05:34 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921182#action_12921182
] 

Michael McCandless commented on LUCENE-2690:
--------------------------------------------

Test results on 10M Wiki index:

Single seg:

||Query||QPS clean||QPS mtqseg3||Pct diff||||
|united~0.6|26.01|25.48|{color:red}-2.0%{color}|
|un*ed|260.88|258.61|{color:red}-0.9%{color}|
|un*d|91.52|90.99|{color:red}-0.6%{color}|
|united~0.7|98.01|97.99|{color:red}-0.0%{color}|
|state|39.95|39.94|{color:red}-0.0%{color}|
|unit*|33.60|33.73|{color:green}0.4%{color}|
|u*d|29.87|30.01|{color:green}0.5%{color}|
|uni*ed|1825.14|1859.49|{color:green}1.9%{color}|

Multi seg (22 segments):

||Query||QPS clean||QPS mtqseg3||Pct diff||||
|unit*|34.68|34.56|{color:red}-0.3%{color}|
|state|40.43|40.30|{color:red}-0.3%{color}|
|united~0.6|3.18|3.20|{color:green}0.6%{color}|
|u*d|16.81|19.55|{color:green}16.3%{color}|
|united~0.7|11.01|13.85|{color:green}25.8%{color}|
|un*d|52.51|66.21|{color:green}26.1%{color}|
|un*ed|42.88|92.95|{color:green}116.8%{color}|
|uni*ed|175.06|543.64|{color:green}210.5%{color}|

And, the test did not barf so the hits (docID & scores) are identical!

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch,
LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch,
LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using TopTermsBooleanQueryRewrite), the
auto constant rewrite method and the ScoringBQ rewrite methods using a MultiFields wrapper
on the top-level reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses some additional
datastructures (hashed sets/maps) to exclude duplicate terms. All tests currently pass, but
FuzzyQuery's tests should not, because it depends for the minimum score handling, that the
terms are collected in order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message