lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4571) speedup disjunction with minShouldMatch
Date Thu, 21 Feb 2013 16:54:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583350#comment-13583350
] 

Michael McCandless commented on LUCENE-4571:
--------------------------------------------

New patch results (BS2 trunk vs BS2 w/ last patch):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct
diff
     Low3MinShouldMatch2        3.95      (3.6%)        3.18      (3.5%)  -19.7% ( -25% -
 -13%)
     Low1MinShouldMatch2        1.93      (3.0%)        1.57      (2.7%)  -18.8% ( -23% -
 -13%)
     Low2MinShouldMatch2        2.53      (3.4%)        2.06      (2.8%)  -18.6% ( -23% -
 -12%)
     HighMinShouldMatch2        1.62      (2.9%)        1.33      (2.8%)  -17.8% ( -22% -
 -12%)
     HighMinShouldMatch3        1.65      (3.0%)        1.38      (3.2%)  -16.5% ( -22% -
 -10%)
     Low1MinShouldMatch3        1.98      (3.1%)        1.76      (3.7%)  -11.2% ( -17% -
  -4%)
     Low1MinShouldMatch0        1.85      (2.9%)        1.78      (3.3%)   -3.7% (  -9% -
   2%)
     HighMinShouldMatch0        1.56      (2.8%)        1.51      (2.9%)   -3.6% (  -9% -
   2%)
     Low2MinShouldMatch0        2.39      (3.1%)        2.30      (3.8%)   -3.5% ( -10% -
   3%)
     Low3MinShouldMatch0        3.70      (3.3%)        3.59      (4.1%)   -3.0% ( -10% -
   4%)
     Low4MinShouldMatch0        6.93      (4.1%)        6.78      (6.4%)   -2.1% ( -12% -
   8%)
     HighMinShouldMatch4        1.67      (3.1%)        1.65      (4.6%)   -1.7% (  -9% -
   6%)
     Low2MinShouldMatch3        2.65      (3.5%)        2.80      (5.0%)    5.7% (  -2% -
  14%)
     Low1MinShouldMatch4        2.02      (3.2%)        2.49      (6.1%)   23.1% (  13% -
  33%)
     Low4MinShouldMatch2        8.57      (5.5%)       34.29     (10.1%)  300.0% ( 269% -
 334%)
     Low4MinShouldMatch3        8.61      (5.5%)       45.59     (10.5%)  429.3% ( 391% -
 471%)
     Low3MinShouldMatch3        4.26      (3.8%)       23.84     (13.3%)  459.8% ( 426% -
 495%)
     Low4MinShouldMatch4        8.64      (5.3%)       60.51     (13.9%)  600.2% ( 551% -
 654%)
     Low2MinShouldMatch4        2.69      (3.6%)       21.68     (17.7%)  705.5% ( 660% -
 753%)
     Low3MinShouldMatch4        4.27      (3.8%)       35.35     (16.5%)  728.4% ( 681% -
 778%)
{noformat}

                
> speedup disjunction with minShouldMatch 
> ----------------------------------------
>
>                 Key: LUCENE-4571
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4571
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.1
>            Reporter: Mikhail Khludnev
>         Attachments: LUCENE-4571.patch, LUCENE-4571.patch, LUCENE-4571.patch
>
>
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole disjunction,
and verifies minShouldMatch condition [on every doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]:
> {code}
>   public int nextDoc() throws IOException {
>     assert doc != NO_MORE_DOCS;
>     while(true) {
>       while (subScorers[0].docID() == doc) {
>         if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
>           heapAdjust(0);
>         } else {
>           heapRemoveRoot();
>           if (numScorers < minimumNrMatchers) {
>             return doc = NO_MORE_DOCS;
>           }
>         }
>       }
>       afterNext();
>       if (nrMatchers >= minimumNrMatchers) {
>         break;
>       }
>     }
>     
>     return doc;
>   }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the heap first,
and then push them back advancing behind that top doc. For me the question no.1 is there a
performance test for minShouldMatch constrained disjunction. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message