lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hao yan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2903) Improvement of PForDelta Codec
Date Tue, 08 Feb 2011 23:40:57 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992237#comment-12992237
] 

hao yan commented on LUCENE-2903:
---------------------------------

I tried to move memory allocation out of readBlock() to BlockReader's constructor. It improves
the performance a little. I also tried to use ByteBuffer/IntBuffer to replace my manual convertsion
between bytes[]/int[]. It makes things worse.

The following is my result for 0.1M data:
(1) BulkVInt vs patchedFrameoFRef3
        QueryQPS       bulkVIntQPS patchedFrameoFRef3  Pct diff
     "united states"      393.55      362.84     -7.8%
   "united states"~3      243.84      236.80     -2.9%
   +nebraska +states     1140.25      998.00    -12.5%
     +united +states      687.76      633.31     -7.9%
doctimesecnum:[10000 TO 60000]      413.56      427.53      3.4%
doctitle:.*[Uu]nited.*      510.46      534.47      4.7%
  spanFirst(unit, 5)     1240.69     1108.65    -10.6%
spanNear([unit, state], 10, true)      511.77      463.18     -9.5%
              states     1626.02     1483.68     -8.8%
                 u*d      164.23      162.79     -0.9%
                un*d      257.53      252.97     -1.8%
                uni*      607.53      591.02     -2.7%
               unit*     1024.59     1043.84      1.9%
       united states      627.35      578.70     -7.8%
          united~0.6       11.51       11.36     -1.3%
         united~0.75       52.58       53.57      1.9%
            unit~0.5       12.08       11.93     -1.2%
            unit~0.7       50.98       51.30      0.6%

(2) FrameOfRef VS PatchcedFrameOfRef3
QueryQPS        patchedFrameofrefQPS pathcedFrameofref3  Pct diff
     "united states"      314.76      362.71     15.2%
   "united states"~3      227.53      237.08      4.2%
   +nebraska +states     1075.27     1025.64     -4.6%
     +united +states      646.41      626.57     -3.1%
doctimesecnum:[10000 TO 60000]      412.88      429.37      4.0%
doctitle:.*[Uu]nited.*      481.70      528.82      9.8%
  spanFirst(unit, 5)     1060.45     1118.57      5.5%
spanNear([unit, state], 10, true)      409.33      467.73     14.3%
              states     1353.18     1479.29      9.3%
                 u*d      158.91      165.98      4.4%
                un*d      237.36      256.41      8.0%
                uni*      560.22      593.12      5.9%
               unit*      946.97     1043.84     10.2%
       united states      431.22      583.09     35.2%
          united~0.6       10.91       11.37      4.2%
         united~0.75       50.30       53.30      5.9%
            unit~0.5       11.54       11.94      3.5%
            unit~0.7       47.38       50.38      6.3%


(3) PatchedFrameOfRef VS PatchedFrameOfRef3

 QueryQPS             FrameOfRefQPS pathcedFrameofref3  Pct diff
     "united states"      326.26      360.49     10.5%
   "united states"~3      226.50      234.69      3.6%
   +nebraska +states     1077.59     1021.45     -5.2%
     +united +states      648.51      630.52     -2.8%
doctimesecnum:[10000 TO 60000]      324.46      428.45     32.0%
doctitle:.*[Uu]nited.*      485.44      527.70      8.7%
  spanFirst(unit, 5)     1007.05     1111.11     10.3%
spanNear([unit, state], 10, true)      446.03      465.55      4.4%
              states     1449.28     1459.85      0.7%
                 u*d      158.43      161.79      2.1%
                un*d      246.37      256.28      4.0%
                uni*      548.85      594.88      8.4%
               unit*      920.81     1042.75     13.2%
       united states      450.65      576.37     27.9%
          united~0.6       11.07       11.26      1.7%
         united~0.75       50.70       52.60      3.8%
            unit~0.5       11.64       11.76      1.0%
            unit~0.7       49.04       50.70      3.4%




> Improvement of PForDelta Codec
> ------------------------------
>
>                 Key: LUCENE-2903
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2903
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: hao yan
>         Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef,
and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding (may result
in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of PForDelta
in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The
Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta
does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in the new
LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta
implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is
"NewPForDelta", as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef for almost
all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4 methods,
including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server" instead of "-client"

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message