lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Thu, 09 Aug 2012 15:02:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431882#comment-13431882
] 

Han Jiang commented on LUCENE-3892:
-----------------------------------

I revived the PFor codes, and test it agains BlockFor and BlockPacked:

BlockFor as base:
{noformat}
                Task    QPS base StdDev base    QPS pfor StdDev pfor      Pct diff
         AndHighHigh      121.54        1.37      116.69        2.03   -6% -   -1%
          AndHighLow     2286.36       14.19     2212.92       11.48   -4% -   -2%
          AndHighMed      322.97        7.37      294.19        4.76  -12% -   -5%
              Fuzzy1       85.56        1.46       87.97        3.27   -2% -    8%
              Fuzzy2       30.94        0.56       32.16        1.34   -2% -   10%
          HighPhrase        9.39        0.38        9.02        0.45  -12% -    5%
    HighSloppyPhrase        5.38        0.08        5.24        0.12   -6% -    1%
        HighSpanNear       10.38        0.39        9.92        0.08   -8% -    0%
            HighTerm      180.30        6.87      172.83        6.26  -11% -    3%
              IntNRQ       62.01        3.73       60.89        3.54  -12% -   10%
           LowPhrase       42.44        0.67       38.73        0.89  -12% -   -5%
     LowSloppyPhrase       62.82        0.79       56.79        0.43  -11% -   -7%
         LowSpanNear       81.79        2.00       74.10        1.13  -12% -   -5%
             LowTerm     1763.95       39.62     1721.30       34.22   -6% -    1%
           MedPhrase       27.87        0.59       25.82        0.74  -11% -   -2%
     MedSloppyPhrase       32.15        0.41       29.91        0.31   -9% -   -4%
         MedSpanNear       23.48        0.71       22.00        0.05   -9% -   -3%
             MedTerm      662.11       24.22      638.81       19.31   -9% -    3%
          OrHighHigh       26.82        0.47       27.14        1.93   -7% -   10%
           OrHighLow      152.40        3.54      156.58       11.11   -6% -   12%
           OrHighMed      103.20        2.26      105.84        7.55   -6% -   12%
            PKLookup      216.38        4.32      219.32        2.59   -1% -    4%
             Prefix3      169.89        4.97      163.82        3.34   -8% -    1%
             Respell       83.23        1.44       86.20        3.00   -1% -    9%
            Wildcard      155.81        2.79      152.30        2.54   -5% -    1%
{noformat}

BlockPacked as base:
{noformat}
                Task    QPS base StdDev base    QPS pfor StdDev pfor      Pct diff
         AndHighHigh      122.94        3.43      116.24        1.90   -9% -   -1%
          AndHighLow     2294.32       58.32     2199.14       31.97   -7% -    0%
          AndHighMed      325.55       12.44      290.20        3.80  -15% -   -6%
              Fuzzy1       88.33        1.84       87.86        2.54   -5% -    4%
              Fuzzy2       31.92        0.80       32.00        0.92   -5% -    5%
          HighPhrase        9.73        0.47        9.04        0.29  -14% -    0%
    HighSloppyPhrase        5.49        0.19        5.16        0.03   -9% -   -1%
        HighSpanNear       10.93        0.23        9.90        0.09  -12% -   -6%
            HighTerm      178.31        6.37      171.06        6.14  -10% -    3%
              IntNRQ       60.87        4.71       62.38        5.49  -13% -   20%
           LowPhrase       44.97        1.18       38.36        1.01  -19% -  -10%
     LowSloppyPhrase       69.61        1.19       55.90        1.39  -23% -  -16%
         LowSpanNear       88.50        0.66       72.80        2.23  -20% -  -14%
             LowTerm     1769.84       32.66     1717.02       39.75   -6% -    1%
           MedPhrase       28.88        0.84       25.57        0.68  -16% -   -6%
     MedSloppyPhrase       34.47        0.50       29.29        0.54  -17% -  -12%
         MedSpanNear       24.88        0.32       21.69        0.38  -15% -  -10%
             MedTerm      667.95       21.61      633.73       22.17  -11% -    1%
          OrHighHigh       27.96        1.29       26.82        0.81  -11% -    3%
           OrHighLow      158.62        5.82      155.08        5.05   -8% -    4%
           OrHighMed      107.16        4.19      104.81        3.17   -8% -    4%
            PKLookup      217.22        1.86      216.83        1.87   -1% -    1%
             Prefix3      167.32        6.72      166.12        6.53   -8% -    7%
             Respell       85.25        2.27       85.85        2.16   -4% -    6%
            Wildcard      156.24        5.69      154.63        3.02   -6% -    4%
{noformat}

Current PFor impl only saves 1.8% against For, but get quite some perf loss. Let's use the
Packed version!
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor&hardcode(base).patch,
LUCENE-3892-blockFor&packedecoder(comp).patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-bulkVInt.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
LUCENE-3892-handle_open_files.patch, LUCENE-3892-non-specialized.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch,
LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch,
LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message