lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Fri, 20 Jul 2012 13:54:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419144#comment-13419144
] 

Han Jiang edited comment on LUCENE-3892 at 7/20/12 1:52 PM:
------------------------------------------------------------

An initial try with PackedInts in current trunk version. I replaced all the int[] buffer with
long[] buffer so we can use the API directly. I don't quite understand the Writer part, so
we have to save each long value one by one.

However, it is the Reader part we are concerned:
{noformat}
                Task    QPS base StdDev base QPS packedStdDev packed      Pct diff
         AndHighHigh       29.60        1.56       23.78        0.51  -25% -  -13%
          AndHighMed       74.68        3.92       53.15        2.31  -35% -  -21%
              Fuzzy1       88.23        1.21       87.13        1.41   -4% -    1%
              Fuzzy2       30.09        0.45       29.47        0.47   -5% -    1%
              IntNRQ       41.96        3.88       38.16        2.48  -22% -    6%
          OrHighHigh       17.56        0.34       15.45        0.15  -14% -   -9%
           OrHighMed       34.71        0.76       30.77        0.53  -14% -   -7%
            PKLookup      111.00        1.90      110.52        1.59   -3% -    2%
              Phrase        9.03        0.23        7.62        0.41  -22% -   -8%
             Prefix3      123.56        8.42      110.94        5.43  -20% -    1%
             Respell      102.37        1.11      101.79        1.38   -2% -    1%
        SloppyPhrase        3.97        0.19        3.52        0.07  -17% -   -4%
            SpanNear        8.24        0.18        7.22        0.25  -17% -   -7%
                Term       45.16        3.15       37.47        2.32  -27% -   -5%
        TermBGroup1M       17.19        1.09       15.86        0.77  -17% -    3%
      TermBGroup1M1P       23.47        1.66       20.43        1.16  -23% -   -1%
         TermGroup1M       19.20        1.14       17.73        0.84  -16% -    2%
            Wildcard       42.75        3.27       36.75        1.96  -24% -   -1%
{noformat}

Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits, instead of using
PACKED all the time?

Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now.
                
      was (Author: billy):
    An initial try with PackedInts in current trunk version. I replaced all the int[] buffer
with long[] buffer so we can use the API directly. I don't quite understand the Writer part,
so we have to save each long value one by one.

However, it is the Reader part we are concerned:
{format}
                Task    QPS base StdDev base QPS packedStdDev packed      Pct diff
         AndHighHigh       29.60        1.56       23.78        0.51  -25% -  -13%
          AndHighMed       74.68        3.92       53.15        2.31  -35% -  -21%
              Fuzzy1       88.23        1.21       87.13        1.41   -4% -    1%
              Fuzzy2       30.09        0.45       29.47        0.47   -5% -    1%
              IntNRQ       41.96        3.88       38.16        2.48  -22% -    6%
          OrHighHigh       17.56        0.34       15.45        0.15  -14% -   -9%
           OrHighMed       34.71        0.76       30.77        0.53  -14% -   -7%
            PKLookup      111.00        1.90      110.52        1.59   -3% -    2%
              Phrase        9.03        0.23        7.62        0.41  -22% -   -8%
             Prefix3      123.56        8.42      110.94        5.43  -20% -    1%
             Respell      102.37        1.11      101.79        1.38   -2% -    1%
        SloppyPhrase        3.97        0.19        3.52        0.07  -17% -   -4%
            SpanNear        8.24        0.18        7.22        0.25  -17% -   -7%
                Term       45.16        3.15       37.47        2.32  -27% -   -5%
        TermBGroup1M       17.19        1.09       15.86        0.77  -17% -    3%
      TermBGroup1M1P       23.47        1.66       20.43        1.16  -23% -   -1%
         TermGroup1M       19.20        1.14       17.73        0.84  -16% -    2%
            Wildcard       42.75        3.27       36.75        1.96  -24% -   -1%
{format}

Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits, instead of using
PACKED all the time?

Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now.
                  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch,
LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch,
LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch,
LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message