lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Fri, 20 Jul 2012 19:01:34 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419444#comment-13419444
] 

Han Jiang edited comment on LUCENE-3892 at 7/20/12 6:59 PM:
------------------------------------------------------------

So I changed the patch to readBytes():

base: PackedInts.getReaderNoHeader().get(long[]), file io is handled by PackedInts.
comp: PackedInts.getDecoder().decode(LongBuffer,LongBuffer), use byte[] to hold the compressed
block, and ByteBuffer.wrap().asLongBuffer as a wrapper.

Well, not as expected.
{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh       23.78        1.06       23.38        0.42   -7% -    4%
          AndHighMed       52.06        3.28       50.82        1.21  -10% -    6%
              Fuzzy1       88.56        0.59       88.98        2.38   -2% -    3%
              Fuzzy2       28.80        0.36       28.97        0.83   -3% -    4%
              IntNRQ       41.92        1.67       41.34        0.50   -6% -    3%
          OrHighHigh       15.85        0.45       15.89        0.39   -4% -    5%
           OrHighMed       20.38        0.61       20.50        0.62   -5% -    6%
            PKLookup      110.72        2.19      111.74        2.53   -3% -    5%
              Phrase        7.51        0.12        7.05        0.18   -9% -   -2%
             Prefix3      106.27        2.65      105.37        1.13   -4% -    2%
             Respell      112.03        0.81      112.79        2.71   -2% -    3%
        SloppyPhrase       15.43        0.48       14.92        0.27   -7% -    1%
            SpanNear        3.52        0.10        3.41        0.06   -7% -    1%
                Term       39.19        1.34       39.04        0.81   -5% -    5%
        TermBGroup1M       18.45        0.68       18.33        0.56   -7% -    6%
      TermBGroup1M1P       22.78        0.90       22.26        0.56   -8% -    4%
         TermGroup1M       19.50        0.73       19.42        0.63   -7% -    6%
            Wildcard       29.56        1.13       29.18        0.28   -5% -    3%
{noformat}
                
      was (Author: billy):
    base: PackedInts.getReaderNoHeader().get(long[]), file io is handled by PackedInts.

comp:
PackedInts.getDecoder().decode(LongBuffer,LongBuffer), use byte[] to hold the compressed block,
and ByteBuffer.wrap().asLongBuffer as a wrapper.

Well, not as expected.
{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh       23.78        1.06       23.38        0.42   -7% -    4%
          AndHighMed       52.06        3.28       50.82        1.21  -10% -    6%
              Fuzzy1       88.56        0.59       88.98        2.38   -2% -    3%
              Fuzzy2       28.80        0.36       28.97        0.83   -3% -    4%
              IntNRQ       41.92        1.67       41.34        0.50   -6% -    3%
          OrHighHigh       15.85        0.45       15.89        0.39   -4% -    5%
           OrHighMed       20.38        0.61       20.50        0.62   -5% -    6%
            PKLookup      110.72        2.19      111.74        2.53   -3% -    5%
              Phrase        7.51        0.12        7.05        0.18   -9% -   -2%
             Prefix3      106.27        2.65      105.37        1.13   -4% -    2%
             Respell      112.03        0.81      112.79        2.71   -2% -    3%
        SloppyPhrase       15.43        0.48       14.92        0.27   -7% -    1%
            SpanNear        3.52        0.10        3.41        0.06   -7% -    1%
                Term       39.19        1.34       39.04        0.81   -5% -    5%
        TermBGroup1M       18.45        0.68       18.33        0.56   -7% -    6%
      TermBGroup1M1P       22.78        0.90       22.26        0.56   -8% -    4%
         TermGroup1M       19.50        0.73       19.42        0.63   -7% -    6%
            Wildcard       29.56        1.13       29.18        0.28   -5% -    3%
{noformat}
                  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch,
LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch,
LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch,
LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message