lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Term numbering and range filtering
Date Tue, 11 Nov 2008 21:27:12 GMT

Paul Elschot wrote:

> Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
>> Also, one nice optimization we could do with the "term number column-
>> stride array" is do bit packing (borrowing from the PFOR code)
>> dynamically.
>>
>> Ie since we know there are X unique terms in this segment, when
>> populating the array that maps docID to term number we could use
>> exactly the right number of bits.  Enumerated fields with not many
>> unique values (eg, country, state) would take relatively little RAM.
>> With LUCENE-1231, where the fields are stored column stride on disk,
>> we could do this packing during index such that loading at search
>> time is very fast.
>
> Perhaps we'd better continue this at LUCENE-1231 or LUCENE-1410.
> I think what you're referring to is PDICT, which has frame exceptions
> for values that occur infrequently.

Yes let's move the discussion to Jira.

Actually I was referring to simple bit-packing.

For encoding array of compact enum terms (eg city, state, color, zip)  
I'm guessing the exceptions logic won't buy us much and would hurt  
seeking needed for column-stride fields.  But we should certainly test  
it.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message