lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <>
Subject Re: Improve worst-case performance of TrieRange queries
Date Tue, 24 Mar 2009 21:48:08 GMT
Uwe and I talked a little bit about this at the ApacheCon. We figured 
that this will probably only improve a very small amount of ranges, so 
as Uwe recommended, this is probably not worth the effort and complexity.

Never mind, just an idea :)


On 3/24/09 12:40 AM, Michael Busch wrote:
> Let me give an example to explain my idea - I'm using dates in my
> example, because it's easier to imagine :)
> Let's say we have the following posting lists. There are 20 docs in the
> index and an X means that a doc contains the corresponding term:
> Jan                X   X
> Feb X    X      X
> Mar          X
> Apr        X        X
> May    X
> Jun
> Jul           XX
> Aug       X      X
> Sep   X
> Oct               X
> Nov  X      X
> Dec     X             X
> Then we index another term 'ALL'. It gets added for any document that 
> has a numeric value in this bucket:
> If the query is [Jun TO Jul], then we process the query normally 
> (ORing terms Jun and Jul). If the query is [Feb TO Nov], then we 
> basically translate that into All AND NOT (Jan OR Dec).
> Since you only evaluate the complement of the terms, you can (almost) 
> double the worst case performance.
> Downsides:
> - you have to have another BitSet in memory to perform the AND NOT 
> operation, so it needs more memory
> - this complement approach is only this simple for numeric fields 
> where one document has only a single value; similar things are doable 
> for multi-valued numeric fields, however more complex and possibly 
> with less performance gain
> - you need to index an additional term per bucket, so the index size 
> increases slightly
> Does this make sense? Maybe this has even been discussed already?
> -Michael

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message