lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format
Date Sat, 04 Aug 2012 21:37:02 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428682#comment-13428682
] 

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

I added some new tasks to luceneutil (AndHighLow, OrHighLow), and also
separated tasks for Low/Med/HighTerm (and same for SpanNear/Phrase
queries) so that we can see the impact on the different queries, and
so that we actually test skipping (AndHighLow).

Then I ran a test w/ the 2nd (non-buggy, partial decode, 32
skipInterval patch):

{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
          AndHighLow      631.54       10.72      101.44        0.70  -84% -  -83%
          AndHighMed       44.85        0.94       39.31        0.36  -14% -   -9%
         AndHighHigh       18.39        0.27       16.16        0.08  -13% -  -10%
     MedSloppyPhrase       12.15        0.14       11.27        0.30  -10% -   -3%
         MedSpanNear        9.11        0.10        8.58        0.10   -7% -   -3%
         LowSpanNear        5.05        0.03        4.78        0.03   -6% -   -4%
           MedPhrase        5.09        0.10        4.81        0.10   -9% -   -1%
           LowPhrase        7.80        0.08        7.43        0.07   -6% -   -2%
    HighSloppyPhrase        2.13        0.06        2.04        0.06  -10% -    1%
     LowSloppyPhrase        5.28        0.11        5.09        0.15   -8% -    1%
            HighTerm       22.85        0.11       22.08        0.56   -6% -    0%
             LowTerm      526.19        3.56      510.53        9.14   -5% -    0%
             MedTerm      138.34        0.51      134.66        3.58   -5% -    0%
          HighPhrase        3.55        0.11        3.46        0.11   -8% -    3%
        HighSpanNear        1.64        0.00        1.60        0.02   -3% -    0%
              Fuzzy1       99.11        3.49       98.91        2.71   -6% -    6%
              Fuzzy2       88.31        3.05       88.19        2.32   -6% -    6%
             Respell       77.97        1.75       78.24        1.86   -4% -    5%
            PKLookup      192.61        1.47      193.47        1.53   -1% -    2%
           OrHighMed       25.14        1.23       25.28        1.16   -8% -   10%
          OrHighHigh        9.22        0.47        9.30        0.45   -8% -   11%
           OrHighLow       37.28        1.79       37.60        1.75   -8% -   10%
            Wildcard       67.88        0.33       69.19        2.70   -2% -    6%
             Prefix3       25.67        0.35       26.25        1.22   -3% -    8%
              IntNRQ        8.85        0.02        9.27        0.98   -6% -   15%
{noformat}

I'm confused why AndHighLow got slower... this patch should have
lowered the per-skip cost.

                
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
>                 Key: LUCENE-4283
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4283
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Han Jiang
>            Priority: Minor
>         Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-slow.patch,
LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time
the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read
doc/freq data. Also,  a higher level skip list will be created only for those df>blockSize^k,
which means for most terms, skipping will just be a linear scan. If we increase current blockSize
for better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message