lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastiano Vigna <vi...@di.unimi.it>
Subject Re: Interleaving and new Lucene formats
Date Sat, 16 Feb 2013 12:05:07 GMT
On 16 February 2013 11:45, Robert Muir <rcmuir@gmail.com> wrote:

> But forcing that wouldn't be testing the 4.1 index format, it would be
> something else (something not interesting).
>

Do you mind if I have my own share of knowledge and have my idea about
interesting benchmarks? :)

You didn't answer, but the undertext *seems* that counts are no longer
interleaved. Again, is it the case?

Forcing a count is an essential test for the index efficiency, as you need
counts to do scoring. Testing with a scorer is not a good idea because the
scorer CPU usage is difficult to control across different implementations.
So the only way of testing a non-interleaved index against an interleaved
index (or comparing the speed of count access against a non-interleaved
index) is to force a count reading without any other activity.

4.1 index format mixes with variable byte because its more efficient
than using FOR everywhere. This means FOR blocks in this format are
always size 128. The remainder is encoded as vbyte.

So essentially you code every blocks of 128 postings using FOR, but fall
back to VByte for the tail ( <128). For low-frequency terms, this means
just VByte. Right?

Mime
View raw message