lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: New codecs keep Freq skip/omit Pos
Date Sat, 23 Apr 2011 18:24:54 GMT
On Sat, Apr 23, 2011 at 2:06 PM, Alex vB <> wrote:
> I am a little bit curious about the Lucene 3.0 performance results because
> the larger index seems to
> work faster?!? I already ran the test several times. Are my results
> realistic at all? I thought PForDelta/2 would outperform the standard index
> implementations in query processing.

it depends upon the type of query.. what queries are you using for
this benchmarking and how are you benchmarking?
FYI: for benchmarking standard query types with wikipedia you might be
interested in

> The last result is my own implementation. I am still looking to get it
> smaller because I think I can improve compression further. For indexing I
> use PForDelta2 in combination with payloads. Those are causing the higher
> runtimes. In memory it looks nice. The gap between my solution and PForDelta
> is already 700 MB. I would say it is an improvement. :D I will have a look
> at it again after I got an index with your adapted implementation.

wait, you are indexing payloads for your tests with these other codecs
when it says "W POS" ?

keep in mind that even adding a single payload to your index slows
down the decompression of the positions tremendously, because payload
lengths are intertwined with the positions. For block codecs payloads
really need to be done differently so that blocks of positions are
really just blocks of positions. This hasn't yet been fixed for the
sep nor the fixed layouts, so if you add any payloads, and then
benchmark positional queries then the results are not realistic.

> Normally all payloads corresponding to a query get fetched, right?

No they do not, only if you use a payload based query such as
PayloadTermQuery. Normal non-positional queries like TermQuery and
even normal positional queries like PhraseQuery don't fetch payloads
at all...

>From the description of what you are doing I don't understand how
payloads fit in because they are per-position? But, I haven't had the
time to digest the paper you sent yet.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message