lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Term numbering and range filtering
Date Tue, 11 Nov 2008 20:55:45 GMT

Also, one nice optimization we could do with the "term number column- 
stride array" is do bit packing (borrowing from the PFOR code)  

Ie since we know there are X unique terms in this segment, when  
populating the array that maps docID to term number we could use  
exactly the right number of bits.  Enumerated fields with not many  
unique values (eg, country, state) would take relatively little RAM.   
With LUCENE-1231, where the fields are stored column stride on disk,  
we could do this packing during index such that loading at search time  
is very fast.


Paul Elschot wrote:

> Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:
>> The other part of your proposal was to somehow "number" term text
>> such that term range comparisons can be implemented fast int
>> comparison.
> ...
>> However that'd be quite a bit deeper change to Lucene.
> The cheap version is hierarchical prefixing here:
> Regards,
> Paul Elschot
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message