lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Jain" <Eric.J...@isb-sib.ch>
Subject Re: slow performance with Date Range Searching
Date Wed, 17 Sep 2003 14:23:42 GMT
> Does anyone have any suggestions for searching date ranges.  Our
> ranges will generally be between a 3 - 7 year period.

Apparently Lucene expands ranges to boolean 'or' queries. So if you have
a thousand distinct dates within a range, Lucene will build a query with
a thousand terms...

One workaround is to first run the query, but without ranges. Store the
results into a bit vector with the position corresponding to the
document id. Then create a TermEnum that starts with the lower range
value. For each term, get the document ids, and set the corresponding
values in a second bit vector. Break the loop as soon as the TermEnum
has reached or passed the upper limit of the range. Finally, 'and' or
'or' the second bit vector with the first one. It's as simple as that
:-)

I wonder why Lucene doesn't use this strategy by default. I realize it
is less efficient when the range includes few terms, but it seems to
scale far better.

--
Eric Jain


Mime
View raw message