lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: TrieRange
Date Sat, 07 Feb 2009 17:44:11 GMT
On Sat, Feb 7, 2009 at 12:26 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> To optimize index space, one would want to "right justify" the encoded
>> number for any bit range to minimize variation on the left - this
>> plays into lucene's prefix compression.

The prototype code I just posted in JIRA does this.  For example, if
we are encoding the bits 0xffffffffffffffff with a precision of only 8
bits, and using 7 bits per char, then it stores

0x01 0x7f
  instead of
0x7f 0x70

This means that a whole sequence of these values would take up closer
to 1 byte of data instead of 2 in the index.

> I am not sure, if this is the right way. Lucene's prefix compression is also
> good for seeking fast to the term. If thousands of terms, only varying in
> the last bits (because all bits before are zero), must be scanned to get to
> the right one, it would get less performant.

Every 128th term is stored in full in memory and a binary search is
used to find the closest lower term.  A linear scan is done from
there.

If anything it should be slightly faster to iterate over a more compact index.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message