lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: TrieRange
Date Sat, 07 Feb 2009 00:52:55 GMT
On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> Encoding a slice per character makes the code simpler, but increases
>> the size of the index... but perhaps not enough to worry about in
>> practice?
>
> This is correct. For 2bit and 4bit there is a lot of overhead by this, but
> there is no way round (any ideas how to fix this?). But 8bit is the most
> compact one. There needs to be more testing and benchmarking.

Separate bit slicing and String encoding.... they are independent.
If a,b,c,d are prefix codes designating precision, and w,x,y,z are
each 2 bits of the number, then

ax
bxy
cxyz
dxyzw

Everything after the prefix can be encoded in a single character in each case.

Lucene's prefix encoding of the index will remove some of the
redundancy... buy only for numbers that are very packed together.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message