lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: TrieRange
Date Sat, 07 Feb 2009 13:45:55 GMT
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> An optimization might be to remove
> the lower 0 bits from the string, but it would not be needed. The strings
> are unique for one precision (no difference between 0-bits there or not).

Yes, one would certainly want to remove trailing bits that were insignificant.

To optimize index space, one would want to "right justify" the encoded
number for any bit range to minimize variation on the left - this
plays into lucene's prefix compression.

For exampe: If we wanted to encode 7 bits per character (so each
character will take up only one byte in UTF8), but say we have 9 bits
of data we want to encode.

The two characters could be encoded like this (where x is a data bit):
xxxxxxxx xx000000
Or this:
000000xx xxxxxxxx

The latter is more efficient in index space since many more values
will share the same leading bits.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message