Chris Hostetter
Re: Date ranges - getting the approach right
Fri, 14 Jul 2006 23:34:04 GMT

: I gather I should prefer RangeQuery to ConstantScoreQuery+RangeFilter,
: because it is faster not to use a Filter. However, I sometimes have to

It's not allways faster ... it really depends on how many matching terms
there are in your range.

: In a year of 365 days with e-mail messages arriving every day, can I assume
: that an inclusive date range of 20050713-20060713 in a RangeQuery is going
: to contribute 365 clauses to a BooleanQuery? Can I assume that 5 years would
: mean 5 x 365 = 1825 clauses?

those assumptions are valid is you are also assuming at least one
email message per day (the number of clauses a RangeQuery is based on the
number of unique terms in that range which actually exist in your index).

: An alternative would be to assume that my users are mostly going to ask for
: e-mail arriving within the last day, two days, week, fortnight, month,
: quarter, year, 5 years and pre-cache filters for these typical usage ranges
: every time the clock rolls over, using a CachingWrapperFilter with
: RangeFilter and to BooleanQuery that with a term query on today's date.

That would be my suggestion ... use RangeFilter for everything, cache and
pre-fetch any filters you expect to be very common (particularly things
that your UI makes very easy).  Wether you should cache all RangeFilters
depends largely on how often you plan on re-opening your IndexReader -- if
it's very frequent, then it may not be worth it; if it's very infrequent,
you may want to use soemthing smarter then CachingWrapperFilter so you
don't have a lot of one-off RangeFilters lieing arround forever taking up


