lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2735) First Cut at GroupVarInt with FixedIntBlockIndexInput / Output
Date Wed, 03 Nov 2010 08:36:27 GMT


Simon Willnauer commented on LUCENE-2735:

bq. The lookup table would have taken substantial memory: 256*(64+4*4) == 20K and would have
taken up a good fraction of L1 cache (perhaps not detectable in a micro-benchmark, but perhaps
significant in a full application).

Thanks yonik, I had a similar version before without a table and the perf was somewhat the
same. I agree that this is unnecessary! 

Yet, some of you "cleanups" didn't do any good though :) that (b=(byte)(current>>>16))!=0
only checks if there is a bit set between bit 16 and 24 due to the case. I will upload a new
version in a second with a better test.

 I made a codec for GVint (attached) but a few tests fail with spooky exceptions, eg TestPhraseQuery.testRandomPhrases
and TestCodecs.testRandomPostings and TestIndicesEquals.testInstantiatedIndexWriter (in contrib/instantiated
which, somehow, is really good at finding sneaky codec problems!).

I guess that is because the second patch didn't really work though. 

I tried making an extra test in GVintMicroBenchmark that created the same output as groupintsTest
but just read all the bytes directly back, no decoding, using IndexOutput. It's called GroupVarIntRead
in the output below and as can be seen, most of the processing seems to take place outside
of GVint decoding. Sorry no patch, as I messed up the formatting.

Toke thanks for bringing this up. I run a slightly modified benchmark with a profiler attached
to it using IntIndexInput directly one with GVint and one with VInt and guess what the damn
hottest method is? Thread.interrupt() takes 77% of the time.

|Name|Time (ms)|
|org.apache.lucene.index.codecs.gvint.GVintMicroBenchmark.benchRead(int[][], IntStreamFactory)|218692|
|[Wall Time][], int, int)|57237|
|[], int)|49749|
|org.apache.lucene.index.codecs.gvint.GVintIndexInput.readGroupInt(int, IndexInput)|39851|
|, long, int, long)|16054|

I also run the updated benchmarks - here are some numbers:

|Max random value|GVint ns / value|Vint ns/value|GVint total in ms|Vint total in ms|

> First Cut at GroupVarInt with FixedIntBlockIndexInput / Output
> --------------------------------------------------------------
>                 Key: LUCENE-2735
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Priority: Minor
>             Fix For: 4.0
>         Attachments: LUCENE-2735.patch, LUCENE-2735.patch, LUCENE-2735_alt.patch
> I have hacked together a FixedIntBlockIndex impl with Group VarInt encoding - this does
way worse than standard codec in benchmarks but I guess that is mainly due to the FixedIntBlockIndex
limitations. Once LUCENE-2723 is in / or builds with trunk again I will update and run some
tests. The isolated microbenchmark shows that there could be improvements over vint even in
java though and I am sure we can make it faster impl. wise.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message