lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Wed, 11 Jul 2012 14:00:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411517#comment-13411517
] 

Han Jiang commented on LUCENE-3892:
-----------------------------------

bq. The Pulsing parts in last patch is not included here, because they doesn't improve performance
significantly. 

Here are some tests between For vs PulsingFor, PFor vs PulsingPFor. Run on the 1M docs with
wikimediumhard.tasks

It is strange that PKLookup still doesn't benefit for FixedBlockInt:

{noformat}
                Task     QPS For  StdDev ForQPS PulsingForStdDev PulsingFor      Pct diff
         AndHighHigh       23.01        0.33       22.94        0.66   -4% -    4%  
          AndHighMed       56.41        0.76       57.41        1.74   -2% -    6%  
              Fuzzy1       86.74        0.85       82.22        2.39   -8% -   -1% 
              Fuzzy2       28.23        0.38       26.15        0.97  -11% -   -2% 
              IntNRQ       41.78        1.65       40.78        3.53  -14% -   10% 
          OrHighHigh       14.44        0.34       14.50        0.92   -8% -    9%  
           OrHighMed       30.59        0.77       31.12        1.93   -6% -   10% 
            PKLookup      110.31        2.03      109.22        2.43   -4% -    3%  
              Phrase        8.18        0.44        7.97        0.40  -12% -    8%  
             Prefix3       99.64        2.38       97.09        3.46   -8% -    3%  
             Respell       99.66        0.45       92.76        2.81  -10% -   -3% 
        SloppyPhrase        4.28        0.16        4.08        0.13  -11% -    2%  
            SpanNear        4.08        0.13        3.93        0.06   -7% -    0%  
                Term       33.63        1.25       34.06        1.71   -7% -   10% 
        TermBGroup1M       15.54        0.46       15.78        0.56   -4% -    8%  
      TermBGroup1M1P       20.34        0.73       20.62        0.62   -5% -    8%  
         TermGroup1M       19.18        0.52       19.72        0.49   -2% -    8%  
            Wildcard       34.86        0.88       34.27        1.77   -9% -    6% 
{noformat}

{noformat}
         AndHighHigh       19.98        0.31       19.92        0.26   -3% -    2%  
          AndHighMed       58.21        1.51       57.86        1.18   -5% -    4%  
              Fuzzy1       91.86        1.17       85.86        1.18   -8% -   -4% 
              Fuzzy2       32.66        0.58       30.08        0.57  -11% -   -4% 
              IntNRQ       33.89        0.82       32.66        1.10   -9% -    2%  
          OrHighHigh       15.79        1.29       14.96        0.67  -16% -    7%
           OrHighMed       30.31        2.09       28.91        1.67  -15% -    8%
            PKLookup      112.80        0.81      111.82        2.90   -4% -    2%
              Phrase        6.14        0.11        6.23        0.10   -1% -    5%
             Prefix3      147.80        2.88      138.35        2.11   -9% -   -3%
             Respell      118.57        1.18      108.30        1.86  -11% -   -6%
        SloppyPhrase        5.78        0.15        5.66        0.29   -9% -    5%
            SpanNear        6.32        0.14        6.40        0.16   -3% -    6%
                Term       41.60        2.44       38.12        0.33  -14% -   -1%
        TermBGroup1M       14.40        0.48       13.73        0.19   -8% -    0%
      TermBGroup1M1P       23.68        0.44       22.82        0.44   -7% -    0%
         TermGroup1M       15.25        0.48       14.51        0.20   -9% -    0%
            Wildcard       32.76        0.53       31.76        0.62   -6% -    0%
{noformat}

                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch,
LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch,
LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch,
LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message