lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: Date Range support
Date Fri, 13 Feb 2004 07:00:39 GMT
tom wa writes:
> From: Erik Hatcher
> > On Jan 29, 2004, at 5:08 AM, tom wa wrote:
> > > I'm trying to create an index which can also be searched with date 
> > > ranges. My first attempt using the Lucene date format ran in to 
> > > trouble after my index grew and I couldn't search over more than a few 
> > > days.
> > >  the suggestion 
> > > seemed to be to use strings of the format yyyyMMdd. Using that format 
> > > worked great until I remembered that my search needs
> > >  to be able to support different timezones. Adding the hour to my 
> > > field causes the same problem above and my queries stop working when 
> > > using a range of about 2 months.
> > 
> > When you say you couldn't search and that it stopped working, do you 
> > mean it was just unacceptably slow?
> >
> 
> (Sorry it's taken me a while to reply)
> 
> It wasn't slow, my timeout is far greater than the time it takes to come back with no
hits.
> 
> A small example of a query would be (date: [200306081900 TO 200306201200]) AND (text:
sometext) and this will return zero hits. The index contains about 1000 items for each 24hr
period and the total number of documents was about 150k. I had the same results when using
Lucene's built in date format too. If you think it should be able to cope with what I am trying
to do then I'll take another look.
> 
An alternative to using date ranges or date filters is to use an aproach
similar to the recently introduced sort on a integer field (cvs only, so far).

That is, 
- create an array of the dates of all documents
- extend the low level search, in a way that it uses this array and a
  upper and lower limit to do an additional selection (that's similar 
  to what the filter does)

The advantage over a filter is, that you can use the same array for arbitrary
date ranges while a filter is specific to a date range.

OTOH the array needs to be newly created whenever the index changes. The 
cost depends on the number of different dates and the array size of course.
I did some tests and found, that it takes less than .1 seconds on a
P4 2400 Mhz to create such an array for ~ 100000 documents, ~ 10000 different
dates.
So it depends a bit on how often your index changes, if that's a good way.

Another disadvantage is, that you will have to dig a little bit deeper into 
lucenes search classes.
And memory usage might get a problem, once you exceed a few million documents.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message