lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery
Date Sun, 09 Jun 2013 22:42:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679210#comment-13679210
] 

Robert Muir commented on LUCENE-5049:
-------------------------------------

This is an apples vs oranges comparison.

If you write one huge hairy java method with hardcoded query (OR) + hardcoded Postingsformat
(Lucene42) + hardcoded Directory (Mmap) + Hardcoded Similarity (Default) that only works if
all terms are against a single field, it would be much faster there too... 
                
> Native (C++) implementation of "pure OR" BooleanQuery
> -----------------------------------------------------
>
>                 Key: LUCENE-5049
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5049
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5049.patch
>
>
> I've been playing with a C++ implementation of BooleanQuery containing
> only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
> The results are impressive: ~3X speedup for BQ OR over two terms, and
> also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
> to BQ OR over N terms:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev            
   Pct diff
>                  MedTerm       69.47     (15.8%)       68.61     (13.4%)   -1.2% ( -26%
-   33%)
>                 HighTerm       55.25     (16.2%)       54.63     (13.9%)   -1.1% ( -26%
-   34%)
>                  LowTerm      333.10      (9.6%)      329.43      (8.0%)   -1.1% ( -17%
-   18%)
>                   IntNRQ        3.37      (2.6%)        3.36      (4.6%)   -0.2% (  -7%
-    7%)
>                  Prefix3       18.91      (2.0%)       19.04      (3.5%)    0.7% (  -4%
-    6%)
>                 Wildcard       29.40      (1.7%)       29.70      (2.8%)    1.0% (  -3%
-    5%)
>                MedPhrase      132.69      (6.2%)      134.66      (7.0%)    1.5% ( -11%
-   15%)
>         HighSloppyPhrase        0.82      (3.6%)        0.83      (3.5%)    1.9% (  -5%
-    9%)
>              AndHighHigh       19.65      (0.6%)       20.02      (0.8%)    1.9% (  
0% -    3%)
>               HighPhrase       11.74      (6.6%)       11.96      (7.1%)    1.9% ( -11%
-   16%)
>          MedSloppyPhrase       29.09      (1.2%)       29.76      (1.9%)    2.3% (  
0% -    5%)
>          LowSloppyPhrase       25.71      (1.4%)       26.98      (1.7%)    4.9% (  
1% -    8%)
>                  Respell      173.78      (3.0%)      182.41      (3.7%)    5.0% (  -1%
-   12%)
>              MedSpanNear       27.67      (2.5%)       29.07      (2.4%)    5.1% (  
0% -   10%)
>             HighSpanNear        2.95      (2.4%)        3.10      (2.8%)    5.4% (  
0% -   10%)
>              LowSpanNear        8.29      (3.4%)        8.82      (3.3%)    6.4% (  
0% -   13%)
>               AndHighMed       79.32      (1.6%)       84.44      (1.0%)    6.5% (  
3% -    9%)
>                LowPhrase       23.20      (2.0%)       25.14      (1.6%)    8.4% (  
4% -   12%)
>               AndHighLow      594.17      (3.4%)      660.32      (1.9%)   11.1% (  
5% -   16%)
>                   Fuzzy2       88.32      (6.4%)      121.44      (1.7%)   37.5% (  27%
-   48%)
>                   Fuzzy1       86.34      (6.0%)      153.49      (1.7%)   77.8% (  66%
-   90%)
>               OrHighHigh       16.29      (2.5%)       48.29      (1.3%)  196.5% ( 188%
-  205%)
>                OrHighMed       28.98      (2.7%)       87.81      (0.9%)  203.0% ( 194%
-  212%)
>                OrHighLow       27.38      (2.6%)       84.94      (1.1%)  210.3% ( 201%
-  219%)
> {noformat}
> This is essentially a scaled back attempt at LUCENE-1594 in that it's
> "hardwired" to "just" the "OR of TermQuery" case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message