lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6184) BooleanScorer should better deal with sparse clauses
Date Mon, 19 Jan 2015 10:47:36 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-6184:
---------------------------------
    Attachment: LUCENE-6184.patch

Same patch, just adding the suggested API in order to make BulkScorer able to skip. Results
of the luceneutil benchmark still look similar:

{code}
              AndHighLow      883.42      (3.5%)      872.51      (3.3%)   -1.2% (  -7% -
   5%)
            OrNotHighLow     1052.93      (4.4%)     1048.44      (4.5%)   -0.4% (  -8% -
   8%)
                PKLookup      277.07      (2.0%)      276.65      (2.1%)   -0.2% (  -4% -
   4%)
              AndHighMed      137.40      (1.9%)      137.30      (2.4%)   -0.1% (  -4% -
   4%)
            HighSpanNear       34.67      (3.1%)       34.65      (3.0%)   -0.0% (  -5% -
   6%)
         LowSloppyPhrase      215.69      (2.5%)      215.61      (2.5%)   -0.0% (  -4% -
   5%)
         MedSloppyPhrase      183.08      (2.5%)      183.11      (2.0%)    0.0% (  -4% -
   4%)
              HighPhrase       26.33      (6.8%)       26.34      (6.8%)    0.0% ( -12% -
  14%)
             AndHighHigh       51.61      (1.8%)       51.64      (2.0%)    0.0% (  -3% -
   3%)
               LowPhrase       74.61      (1.3%)       74.68      (1.4%)    0.1% (  -2% -
   2%)
        HighSloppyPhrase       14.94      (5.7%)       14.97      (5.0%)    0.2% (  -9% -
  11%)
               MedPhrase       31.42      (1.1%)       31.47      (1.1%)    0.2% (  -1% -
   2%)
             LowSpanNear       55.89      (2.5%)       56.00      (2.5%)    0.2% (  -4% -
   5%)
                 Respell       73.38      (2.4%)       73.54      (2.2%)    0.2% (  -4% -
   4%)
            OrNotHighMed      118.20      (1.6%)      118.66      (1.7%)    0.4% (  -2% -
   3%)
             MedSpanNear       78.17      (3.2%)       78.62      (3.5%)    0.6% (  -5% -
   7%)
           OrHighNotHigh       31.47      (1.8%)       31.66      (1.9%)    0.6% (  -2% -
   4%)
           OrNotHighHigh       50.29      (1.6%)       50.63      (2.0%)    0.7% (  -2% -
   4%)
            OrHighNotMed       82.27      (2.3%)       83.17      (2.3%)    1.1% (  -3% -
   5%)
                 VeryLow     6149.21      (4.7%)     6223.22      (5.4%)    1.2% (  -8% -
  11%)
            OrHighNotLow       55.30      (3.2%)       56.25      (2.5%)    1.7% (  -3% -
   7%)
                 LowTerm      808.21      (7.3%)      824.32      (4.5%)    2.0% (  -9% -
  14%)
                HighTerm      106.18      (4.3%)      108.63      (3.0%)    2.3% (  -4% -
  10%)
                 MedTerm      296.65      (4.2%)      304.42      (2.7%)    2.6% (  -4% -
  10%)
                Wildcard       20.85      (7.5%)       21.50      (5.3%)    3.1% (  -8% -
  17%)
                 Prefix3       95.63      (6.2%)       98.81      (5.3%)    3.3% (  -7% -
  15%)
                  Fuzzy2       62.12      (9.0%)       64.44     (10.2%)    3.7% ( -14% -
  25%)
                  IntNRQ        8.85      (8.9%)        9.21      (6.7%)    4.1% ( -10% -
  21%)
                  Fuzzy1      105.42     (11.2%)      116.28      (4.8%)   10.3% (  -5% -
  29%)
               OrHighLow       51.75      (8.2%)       59.92      (8.2%)   15.8% (   0% -
  35%)
              OrHighHigh       32.34      (8.5%)       37.53      (8.5%)   16.0% (   0% -
  36%)
               OrHighMed       16.79      (8.7%)       19.62      (8.8%)   16.8% (   0% -
  37%)
          VeryLowVeryLow     2053.12      (2.3%)     2399.38      (3.2%)   16.9% (  11% -
  22%)
{code}

> BooleanScorer should better deal with sparse clauses
> ----------------------------------------------------
>
>                 Key: LUCENE-6184
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6184
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6184.patch, LUCENE-6184.patch, LUCENE-6184.patch
>
>
> The way that BooleanScorer works looks like this:
> {code}
> for each (window of 2048 docs) {
>   for each (optional scorer) {
>     scorer.score(window)
>   }
> }
> {code}
> This is not efficient for very sparse clauses (doc freq much lower than maxDoc/2048)
since we keep on scoring windows of documents that do not match anything. BooleanScorer2 currently
performs better in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message