lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Mon, 30 Jul 2012 17:37:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425010#comment-13425010
] 

Han Jiang edited comment on LUCENE-3892 at 7/30/12 5:36 PM:
------------------------------------------------------------

Previous experiments showed a net loss with packed ints API, however there're slight difference
e.g. all-value-the-same case is not handled equally. I suppose these two patches should make
the comparison fair enough. 

Base: BlockForPF + hardcoded decoder
Comp: BlockForPF + PackedInts.Decoder
{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh       25.66        0.31       22.61        1.21  -17% -   -6%
          AndHighMed       74.17        1.45       59.48        3.62  -26% -  -13%
              Fuzzy1       95.60        1.51       96.06        2.22   -3% -    4%
              Fuzzy2       28.67        0.50       28.51        0.75   -4% -    3%
              IntNRQ       33.31        0.60       30.73        1.51  -13% -   -1%
          OrHighHigh       17.58        0.59       16.22        1.18  -17% -    2%
           OrHighMed       34.42        0.93       32.14        2.33  -15% -    2%
            PKLookup      217.08        4.25      213.76        1.37   -4% -    1%
              Phrase        6.10        0.12        5.34        0.07  -15% -   -9%
             Prefix3       77.27        1.26       70.42        2.87  -13% -   -3%
             Respell       92.91        1.34       92.61        1.83   -3% -    3%
        SloppyPhrase        5.35        0.16        5.00        0.29  -14% -    1%
            SpanNear        6.05        0.15        5.47        0.07  -12% -   -6%
                Term       37.62        0.32       33.08        1.70  -17% -   -6%
        TermBGroup1M       17.45        0.64       16.40        0.73  -13% -    1%
      TermBGroup1M1P       25.20        0.69       23.47        1.24  -14% -    0%
         TermGroup1M       18.53        0.65       17.40        0.76  -13% -    1%
            Wildcard       44.39        0.49       40.51        1.69  -13% -   -3%
{noformat}

Hmm, quite strange that we are already getting perf loss with baseline patch:

Base: BlockForPF in current branch
Comp: BlockForPF + hardcoded decoder(patch file)
{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh       26.71        0.98       24.15        0.82  -15% -   -2%
          AndHighMed       73.37        5.01       61.30        1.97  -24% -   -7%
              Fuzzy1       85.73        4.95       84.30        1.79   -9% -    6%
              Fuzzy2       30.15        2.05       29.52        0.66  -10% -    7%
              IntNRQ       38.56        1.69       36.91        1.27  -11% -    3%
          OrHighHigh       16.98        1.48       16.82        0.94  -13% -   14%
           OrHighMed       34.60        2.79       34.70        2.22  -13% -   16%
            PKLookup      214.93        3.99      213.86        1.23   -2% -    1%
              Phrase       11.53        0.23       10.75        0.42  -12% -   -1%
             Prefix3      107.15        3.83      102.12        2.69  -10% -    1%
             Respell       87.41        5.41       86.08        1.76   -9% -    7%
        SloppyPhrase        5.90        0.15        5.66        0.21   -9% -    2%
            SpanNear        4.99        0.12        4.79        0.01   -6% -   -1%
                Term       49.37        2.38       45.53        0.49  -12% -   -2%
        TermBGroup1M       17.23        0.40       16.44        0.53   -9% -    0%
      TermBGroup1M1P       22.02        0.50       22.42        0.60   -3% -    7%
         TermGroup1M       13.65        0.29       13.05        0.28   -8% -    0%
            Wildcard       48.73        2.01       46.35        1.31  -11% -    2%
{noformat}
                
      was (Author: billy):
    Previous experiments showed a net loss with packed ints API, however there're slight difference
e.g. all-value-the-same case is not handled equally. I suppose these two patches should make
the comparison fair enough. 

Base: BlockForPF + hardwired decoder
Comp: BlockForPF + PackedInts.Decoder
{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh       25.66        0.31       22.61        1.21  -17% -   -6%
          AndHighMed       74.17        1.45       59.48        3.62  -26% -  -13%
              Fuzzy1       95.60        1.51       96.06        2.22   -3% -    4%
              Fuzzy2       28.67        0.50       28.51        0.75   -4% -    3%
              IntNRQ       33.31        0.60       30.73        1.51  -13% -   -1%
          OrHighHigh       17.58        0.59       16.22        1.18  -17% -    2%
           OrHighMed       34.42        0.93       32.14        2.33  -15% -    2%
            PKLookup      217.08        4.25      213.76        1.37   -4% -    1%
              Phrase        6.10        0.12        5.34        0.07  -15% -   -9%
             Prefix3       77.27        1.26       70.42        2.87  -13% -   -3%
             Respell       92.91        1.34       92.61        1.83   -3% -    3%
        SloppyPhrase        5.35        0.16        5.00        0.29  -14% -    1%
            SpanNear        6.05        0.15        5.47        0.07  -12% -   -6%
                Term       37.62        0.32       33.08        1.70  -17% -   -6%
        TermBGroup1M       17.45        0.64       16.40        0.73  -13% -    1%
      TermBGroup1M1P       25.20        0.69       23.47        1.24  -14% -    0%
         TermGroup1M       18.53        0.65       17.40        0.76  -13% -    1%
            Wildcard       44.39        0.49       40.51        1.69  -13% -   -3%
{noformat}
                  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor&hardcode(base).patch,
LUCENE-3892-blockFor&packedecoder(comp).patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-handle_open_files.patch,
LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch,
LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch,
LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message