lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Mon, 18 Jun 2012 13:42:43 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Han Jiang updated LUCENE-3892:
------------------------------

    Attachment: LUCENE-3892_for.patch
                LUCENE-3892_pfor.patch

The new "3892_pfor" patch fixed some "SuppressingCodec" stuff since last two weeks. And the
"3892_for" lazily implements "For" postingsformat based on current codes. These two patches
are temporary separated, in order to prevent performance reduction for the sake of method
overriding.

Currently, blocksize ranges from 32 to 128 are tested on both two patches. However, for those
skipping-intensive queries, there is no significant performance gain when smaller blocksize
was applied. 

Here is a previous result for PFor, with blockSize=64, comparing with 128(in brackets):
{noformat}
                Task    QPS Base StdDev Base    QPS PFor StdDev PFor      Pct diff
              Phrase        4.93        0.36        3.10        0.33  -47% -  -25%  (-47%
-  -25%)
          AndHighMed       27.92        2.26       19.16        1.72  -42% -  -18%  (-37%
-  -15%)
            SpanNear        2.73        0.16        1.96        0.24  -40% -  -14%  (-36%
-  -13%)
        SloppyPhrase        4.19        0.21        3.20        0.30  -34% -  -12%  (-30%
-   -6%)
            Wildcard       19.44        0.87       17.11        0.94  -20% -   -2%  (-17%
-    3%)
         AndHighHigh        7.50        0.38        6.61        0.59  -23% -    1%  (-19%
-    6%)
              IntNRQ        4.06        0.52        3.88        0.35  -22% -   19%  (-16%
-   24%)
             Prefix3       31.00        1.69       30.45        2.29  -13% -   11%  ( -6%
-   20%)
          OrHighHigh        4.16        0.47        4.11        0.34  -18% -   20%  (-14%
-   27%)
           OrHighMed        4.98        0.59        4.94        0.41  -18% -   22%  (-14%
-   27%)
             Respell       40.29        2.11       40.11        2.13  -10% -   10%  (-15%
-    2%)
        TermBGroup1M       20.50        0.32       20.52        0.80   -5% -    5%  (  1%
-   10%)
         TermGroup1M       13.51        0.43       13.61        0.40   -5% -    7%  (  1%
-    9%)
              Fuzzy1       43.20        1.83       44.02        1.95   -6% -   11%  (-11%
-    1%)
            PKLookup       87.16        1.78       89.52        0.94    0% -    5%  ( -2%
-    7%)
              Fuzzy2       16.09        0.80       16.54        0.77   -6% -   13%  (-11%
-    6%)
                Term       43.56        1.53       45.26        3.84   -8% -   16%  (  2%
-   26%)
      TermBGroup1M1P       21.33        0.64       22.24        1.23   -4% -   13%  (  0%
-   14%) 
{noformat}

Also, the For postingsformat shows few performance change. So I suppose the bottleneck isn't
in this method: PForUtil.patchException.
Here is an example with blockSize=64:
{noformat}
                Task    QPS Base StdDev Base     QPS For  StdDev For      Pct diff
              Phrase        5.03        0.45        3.30        0.43  -47% -  -18%
          AndHighMed       28.05        2.33       18.83        1.77  -43% -  -19%
            SpanNear        2.69        0.18        1.94        0.25  -40% -  -12%
        SloppyPhrase        4.19        0.20        3.22        0.35  -34% -  -10%
         AndHighHigh        7.61        0.46        6.41        0.54  -27% -   -2%
             Respell       41.36        1.65       37.94        2.42  -17% -    1%
            Wildcard       19.20        0.77       17.89        0.99  -15% -    2%
          OrHighHigh        4.22        0.37        3.94        0.32  -21% -   10%
           OrHighMed        5.06        0.46        4.73        0.39  -21% -   11%
              Fuzzy1       44.15        1.31       42.38        1.74  -10% -    2%
              Fuzzy2       16.48        0.59       15.84        0.76  -11% -    4%
         TermGroup1M       13.32        0.35       13.44        0.53   -5% -    7%
            PKLookup       87.70        1.81       88.62        1.22   -2% -    4%
        TermBGroup1M       20.14        0.47       20.40        0.59   -3% -    6%
             Prefix3       30.31        1.49       31.08        2.26   -9% -   15%
      TermBGroup1M1P       21.13        0.46       21.79        1.42   -5% -   12%
              IntNRQ        3.96        0.45        4.14        0.46  -16% -   31%
                Term       43.07        1.51       46.06        4.50   -6% -   21%
{noformat}
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch,
LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message