lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Tue, 07 Aug 2012 14:48:08 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430373#comment-13430373
] 

Adrien Grand commented on LUCENE-3892:
--------------------------------------

I backported Mike's changes to the {{BlockPacked}} codec and tried to understand why it was
slower than {{Block}}...

The use of {{java.nio.*Buffer}} seemed to be the bottleneck ({{ByteBuffer.asLongBuffer}} and
{{ByteBuffer.getLong}} especially are _very_ slow) of the decoding step so I switched back
to decoding from long[] (instead of LongBuffer) and added direct decoding from byte[] to avoid
having to convert the bytes to longs before decoding.

Tests passed with -Dtests.postingsformat=BlockPacked. Here are the results of the benchmark
(unfortunately, it started before Mike committed r1370179):

{noformat}
                Task    QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed      Pct diff
            PKLookup      259.41        9.06      255.77        8.89   -8% -    5%
          AndHighLow     1656.30       50.44     1653.85       55.05   -6% -    6%
         AndHighHigh       82.90        1.82       83.47        2.52   -4% -    6%
          AndHighMed      274.76       11.11      278.51       13.42   -7% -   10%
             Prefix3      285.41        4.82      289.60        6.31   -2% -    5%
            HighTerm      230.78       14.33      235.16       20.61  -12% -   18%
              IntNRQ       55.91        1.03       57.13        2.73   -4% -    9%
             LowTerm     1720.10       47.06     1759.16       55.47   -3% -    8%
            Wildcard      290.54        3.82      297.39        5.42    0% -    5%
             MedTerm      733.01       35.38      750.46       50.37   -8% -   14%
        HighSpanNear        6.93        0.23        7.12        0.39   -6% -   11%
          HighPhrase        6.46        0.22        6.65        0.46   -7% -   14%
             Respell       96.11        2.84       99.00        3.98   -3% -   10%
          OrHighHigh       38.07        2.53       39.23        3.06  -10% -   19%
              Fuzzy2       50.29        1.70       51.87        2.25   -4% -   11%
           MedPhrase       26.20        0.94       27.03        1.07   -4% -   11%
           OrHighMed      138.83        7.76      143.54        9.79   -8% -   16%
              Fuzzy1      100.58        2.15      104.21        3.99   -2% -    9%
    HighSloppyPhrase        5.26        0.11        5.45        0.24   -3% -   10%
           OrHighLow       78.43        5.55       81.80        6.89  -10% -   21%
         MedSpanNear       32.75        1.13       34.28        1.73   -3% -   13%
           LowPhrase       90.27        3.20       95.06        3.58   -2% -   13%
         LowSpanNear       46.40        1.95       48.89        2.40   -3% -   15%
     MedSloppyPhrase       36.29        1.00       38.59        1.46    0% -   13%
     LowSloppyPhrase       37.41        1.11       40.48        1.39    1% -   15%
{noformat}

Mike, Billy, could you check that {{BLockPacked}} is at least as fast as {{Block}} on your
computer too?
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor&hardcode(base).patch,
LUCENE-3892-blockFor&packedecoder(comp).patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-handle_open_files.patch,
LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch,
LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch,
LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message