lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format
Date Wed, 08 Aug 2012 09:00:22 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430958#comment-13430958
] 

Han Jiang commented on LUCENE-4283:
-----------------------------------

Hmm, the improvement isn't that noisy
{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh       83.84        5.07       88.64        2.41   -3% -   15%
          AndHighLow     1716.87       62.53     1891.91       20.85    5% -   15%
          AndHighMed      348.15       37.20      441.49       10.78   11% -   45%
              Fuzzy1       87.67        0.92       84.80        2.36   -6% -    0%
              Fuzzy2       32.84        0.37       31.41        1.06   -8% -    0%
          HighPhrase       18.45        0.93       18.88        0.53   -5% -   10%
    HighSloppyPhrase       22.16        0.76       21.55        0.57   -8% -    3%
        HighSpanNear        3.07        0.11        3.09        0.04   -3% -    5%
            HighTerm      181.58       18.26      171.10        6.44  -17% -    8%
              IntNRQ       48.39        1.47       49.28        0.88   -2% -    6%
           LowPhrase       80.49        3.34       87.04        2.63    0% -   16%
     LowSloppyPhrase       28.53        1.09       27.31        0.71  -10% -    2%
         LowSpanNear       46.86        1.63       49.34        1.15    0% -   11%
             LowTerm     1637.37       19.39     1608.23       16.93   -3% -    0%
           MedPhrase       22.48        1.03       23.27        0.52   -3% -   10%
     MedSloppyPhrase       15.46        0.52       15.00        0.37   -8% -    2%
         MedSpanNear       37.09        1.21       37.80        0.69   -3% -    7%
             MedTerm      587.20       44.40      560.78       19.09  -14% -    6%
          OrHighHigh       62.10        0.88       62.95        1.05   -1% -    4%
           OrHighLow      126.89        1.48      128.30        1.53   -1% -    3%
           OrHighMed      124.20        1.18      125.34        1.23   -1% -    2%
            PKLookup      213.54        3.75      211.98        0.37   -2% -    1%
             Prefix3      106.76        2.31      107.79        0.84   -1% -    3%
             Respell      100.12        1.00       96.48        2.58   -7% -    0%
            Wildcard      149.61        3.53      150.29        0.88   -2% -    3%
{noformat}
                
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
>                 Key: LUCENE-4283
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4283
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Han Jiang
>            Priority: Minor
>         Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-codes-cleanup.patch,
LUCENE-4283-record-next-skip.patch, LUCENE-4283-record-skip&inlining-scanning.patch, LUCENE-4283-slow.patch,
LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time
the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read
doc/freq data. Also,  a higher level skip list will be created only for those df>blockSize^k,
which means for most terms, skipping will just be a linear scan. If we increase current blockSize
for better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message