lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Quaroni <dquar...@OPENRATINGS.com>
Subject RE: slow performance with Date Range Searching
Date Wed, 17 Sep 2003 14:16:49 GMT
I don't know how lucene handles date ranges, but I was having very poor
results using booleans between different because of the way lucene handles
them.  What lucene does is that it evaluates each field in the query
separately and retrieves all of the results, then it evaluates the boolean
joins between the different fields.

So I believe the way lucene is handling the query is:

Get all the documents whose LongTitle has killeen in them
Get all the documents whose LongTitle has state in them
Get all the documents whose StateDistrict has id in them
Get all the documents filed between 1997-01-01 and 2002-04-04

(This, incidently, takes up a huge amount of memory)

Finally, it evaluates the booleans figures out which documents satisfy all
of your criteria and returns that to you.


I'm working on a matching engine that takes company information like their
name and address and finds our record for that company.  I ended up making a
separate index for every state and country because it was running too slow
and running out of memory when I was using booleans between fields. 

Maybe you could do something similar with your dates.  (i.e. one index per
year)


-----Original Message-----
From: Killeen, Tom [mailto:tom.killeen@thomson.com]
Sent: Wednesday, September 17, 2003 10:01 AM
To: 'Lucene Users List'
Subject: slow performance with Date Range Searching


Hello all, 

I have recently indexed approx 15.8 million XML documents in which I index
the contents certain elements (titles, states, dates to name a few).  I have
27 separate indices and use a MultiSearcher to search these indices.  

When I search on the title and state fields with multiple terms searching is
very fast.  For example I get a hit count of 227, 000 in .4 seconds.  But
when I throw a date range in the search, performance suffers significantly.


My query would look something like this: LongTitle:killeen AND
LongTitle:state AND StateDistrict:id AND FiledDate:["1997-01-01" TO
"2002-04-04"] and it returned in 
5.7 seconds


Does anyone have any suggestions for searching date ranges.  Our ranges will
generally be between a 3 - 7 year period.

thanks, 
Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Mime
View raw message