lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Mon, 30 Jul 2012 22:01:36 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425296#comment-13425296
] 

Adrien Grand commented on LUCENE-3892:
--------------------------------------

My benchmark results are a little different but oal.util.packed is still behind... (it compares
the current branch vs. patched with PackedInts):

{noformat}
                TaskQPS pforcodecStdDev pforcodecQPS pforcodec-packedintsStdDev pforcodec-packedints
     Pct diff
              Phrase       38.21        3.01       35.73        2.41  -19% -    8%
            SpanNear       27.99        1.30       26.30        1.23  -14% -    3%
        SloppyPhrase       43.32        2.98       41.02        2.53  -16% -    7%
          AndHighMed      230.23        8.48      219.88        9.35  -11% -    3%
         AndHighHigh       52.53        2.02       50.80        2.62  -11% -    5%
              IntNRQ       43.24        3.42       41.84        2.79  -16% -   12%
            Wildcard      113.26        3.17      109.91        3.50   -8% -    3%
             Prefix3      194.56        9.56      189.39        9.64  -11% -    7%
                Term      301.86       14.49      295.28       17.51  -12% -    8%
           OrHighMed      100.60        8.30       99.06        8.00  -16% -   15%
          OrHighHigh       32.35        2.92       31.90        2.88  -17% -   18%
              Fuzzy2       36.27        0.67       35.87        0.93   -5% -    3%
              Fuzzy1       81.14        1.24       80.24        1.68   -4% -    2%
       TermGroup100K      193.40        3.36      191.27        4.13   -4% -    2%
    TermBGroup100K1P      152.78        5.06      151.23        3.98   -6% -    5%
      TermBGroup100K      242.78        7.06      240.71        8.01   -6% -    5%
             Respell       85.75        1.36       85.17        2.04   -4% -    3%
            PKLookup      206.02        5.05      205.57        4.63   -4% -    4%
{noformat}

I am not sure why oal.util.packed is slower. The only differences I see is that they use inheritance
instead of a switch block to know how to decode data and that they encode values in the high-order
long bits first while the branch currently starts with the low-order int bits. I'll try to
dig deeper to understand what happens...
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor&hardcode(base).patch,
LUCENE-3892-blockFor&packedecoder(comp).patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-handle_open_files.patch,
LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch,
LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch,
LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message