lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <>
Subject Re: How to handle range queries over large ranges and avoid Too Many Boolean clauses
Date Wed, 19 May 2004 06:23:45 GMT
Claude Devarenne writes:
> Hi,
> I have over 60,000 documents in my index which is slightly over a 1 GB 
> in size.  The documents range from the late seventies up to now.  I 
> have indexed dates as a keyword field using a string because the dates 
> are in YYYYMMDD format.  When I do range queries things are OK as long 
> as I don't exceed the built-in number of boolean clauses, so that's a 
> range of 3 years, e.g. 1979 to 1981.  The users are not only doing 
> complex queries but also want to query over long ranges, e.g. [19790101 
> TO 19991231].
> Given these requirements, I am thinking of doing a query without the 
> date range, bring the unique ids back from the hits and then do a date 
> query in the SQL database I have that contains the same data.  Another 
> alternative is to do the query without the date range in Lucene and 
> then sort the results within the range.  I still have to learn how to 
> use the new sorting code and confessed I did not have time to look at 
> it yet.
> Is there a simpler, easier way to do this?
I think it would be worth to take a look at the sorting code.

The idea of the sorting code is to have an array of the dates for each doc
in memory and access this array for sorting.
Now sorting isn't the only thing one might use this array for.
Doing a range check is another.
So you might extend the sorting code by a range selection.

There is no code for this in lucene and you have to create your own searcher
but it gives you a fast way to search and sort by date.

I did this independently from the new sorting code (I just started a little
to early) and it works quite well.
The only drawback from this (and the new sorting code) is, that it requires
an array of field values that must be rebuilt each time the index changes.
Shouldn't be a problem for 60000 documents.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message