lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: strange problem of PForDelta decoder
Date Mon, 20 Dec 2010 15:33:11 GMT
On Mon, Dec 20, 2010 at 5:49 AM, Li Li <> wrote:
>   I think random test is not sufficient.
>   for normal situation, some branches are not executed. I tested
> with many random
> int arrays and it works. But when I use it in real indexing, when in
> optimize stage, it corrupted.
>  Because PForDelta will choose best numFrameBits and some bit such as
> 31 is hardly generated by random arrays. So I "force" the encoder to
> choose all possible numFrameBits to test all the decode1 ...decode32
> and find some bugs in it.

Good point -- we need to make sure we cover all numFrameBits.  And a
series of 128 random ints in a row will heavily bias for the high num
bits cases.  Maybe if we doing a better job w/ the random source to
try to target all numBits, w/ varying numbers of exceptions, etc...
I'll put a nocommit for this.

>    what's pfor2? using s9/s16 to encode exception and offset?

Yeah I just committed pfor2 this morning on the bulk branch.  You can
check it out from

pfor2 came from the patch attached on by Hao Yan
(thanks!). It uses s16 for the exceptions (though, there's a bug
somewhere, because it fails the random test), and it takes a different
approachy for encoding exceptions.

>    In it's s9
> for NewPForDelta also have many bugs and also need test each branch to
> ensure it works well.

OK we should have a look at that one still.  We need to converge on a
good default codec for 4.0.  Fortunately it's trivial to take any int
block encoder (fixed or variable block) and make a Lucene codec out of


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message