lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: TrieRange
Date Sat, 07 Feb 2009 17:44:11 GMT
On Sat, Feb 7, 2009 at 12:26 PM, Uwe Schindler <> wrote:
>> To optimize index space, one would want to "right justify" the encoded
>> number for any bit range to minimize variation on the left - this
>> plays into lucene's prefix compression.

The prototype code I just posted in JIRA does this.  For example, if
we are encoding the bits 0xffffffffffffffff with a precision of only 8
bits, and using 7 bits per char, then it stores

0x01 0x7f
  instead of
0x7f 0x70

This means that a whole sequence of these values would take up closer
to 1 byte of data instead of 2 in the index.

> I am not sure, if this is the right way. Lucene's prefix compression is also
> good for seeking fast to the term. If thousands of terms, only varying in
> the last bits (because all bits before are zero), must be scanned to get to
> the right one, it would get less performant.

Every 128th term is stored in full in memory and a binary search is
used to find the closest lower term.  A linear scan is done from

If anything it should be slightly faster to iterate over a more compact index.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message