lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <rstave...@seseit.com>
Subject RE: Date ranges - getting the approach right
Date Sat, 15 Jul 2006 09:16:51 GMT
> It's not allways faster ... it really depends on how many matching terms
there are in your range.

Does the cached RangeFilter's performance drop off relative to RangeQuery
with a large number of matches then? 

> Wether you should cache all RangeFilters depends largely on how often you
plan on re-opening your IndexReader

Of course that's what I'd overlooked. There is no point in treating today's
hits separately, because I'll have to recreate the cached RangeFilter
everytime the index is re-opened anyhow, which means that I might as well
add today to the cache too. I'd got it into my head that I'd only need to
refresh the cached RangeFilters on midnight roll-over, but they will of
course be invalidated whenever the index is re-opened.

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 15 July 2006 00:34
To: java-user@lucene.apache.org
Subject: Re: Date ranges - getting the approach right


: I gather I should prefer RangeQuery to ConstantScoreQuery+RangeFilter,
: because it is faster not to use a Filter. However, I sometimes have to

It's not allways faster ... it really depends on how many matching terms
there are in your range.

: In a year of 365 days with e-mail messages arriving every day, can I
assume
: that an inclusive date range of 20050713-20060713 in a RangeQuery is going
: to contribute 365 clauses to a BooleanQuery? Can I assume that 5 years
would
: mean 5 x 365 = 1825 clauses?

those assumptions are valid is you are also assuming at least one email
message per day (the number of clauses a RangeQuery is based on the number
of unique terms in that range which actually exist in your index).

: An alternative would be to assume that my users are mostly going to ask
for
: e-mail arriving within the last day, two days, week, fortnight, month,
: quarter, year, 5 years and pre-cache filters for these typical usage
ranges
: every time the clock rolls over, using a CachingWrapperFilter with
: RangeFilter and to BooleanQuery that with a term query on today's date.

That would be my suggestion ... use RangeFilter for everything, cache and
pre-fetch any filters you expect to be very common (particularly things that
your UI makes very easy).  Wether you should cache all RangeFilters depends
largely on how often you plan on re-opening your IndexReader -- if it's very
frequent, then it may not be worth it; if it's very infrequent, you may want
to use soemthing smarter then CachingWrapperFilter so you don't have a lot
of one-off RangeFilters lieing arround forever taking up RAM.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message