lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Date ranges - getting the approach right
Date Sat, 15 Jul 2006 20:57:01 GMT
Does it make any sense at all to pre-calculate and permanently store date
range filters for each day? I'm assuming that you only add to your index,
and don't optimize it after deletions or do anything else that would change
the document IDs Lucene assigns AND you index each day's mail sequentially
by day. That last is important for this scheme. Really for any two mails
that get Lucene IDs  I1 and I2 sent on D1 and D2 if I1 < I2 then D1 < D2.

Chris: I am right aren't I? That it's guaranteed that for any document you
add, the new doc ID is greater than any already in the index?

If that's all true, you could just store the first and last Doc ID for each
day in a special index (or special "documents" in your main index) that
looked something like this:

Day 20060323
FirstDocId 234098
LastDocId 238909
Day 20060428
FirstDocId 334098
LastDocId 338909

So, creating a filter would consist of getting the FirstDocId of the first
day, the LastDocId of the last day and flipping on all the bits in your new
filter between those two Doc IDs. In this case, your filter would turn on
all the bits from 234098 to 338909 for a date range of March 23, 2006 to
April 28, 2006.

If you can't guarantee the condition above, this couldn't work. Really, this
is just a simple database table I guess <G>. I can imagine you could do
something similar with the actual bitmaps if your mails aren't sorted in
ascending date order. Instead of putting the doc IDs, you could store the
binary forms of the bitmaps and combine them all. But now we're getting to a
point where I have to question whether the time savings is worth it <G>....

Of course, the first question I should ask is whether you're sure there is
any real performance problem creating your filters on the fly. Premature
optimization and all that....

Anyway, this ocurred to me in a vision, in a flash (See Arlo Guthrie, "The
pickle song (the significance of the pickle)"), and might possibly even be
relevant to your problem, so I thought I'd send it along


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message