lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Using Lucene partly as DB and 'joining' search results.
Date Tue, 15 Apr 2008 04:43:56 GMT

: would then have to make a join using mailId against the core.  However, if I
: want to use PR, I could have a single Document with multiple field, and using
: stored fields can 'modify' that Document.  However, what happens to the DocId
: when the delete+add occurs and how do I ensure it stays the same.

you can't ... that's why i said you'd need to rebuild the smaller index 
completley on a periodic basis (going in the same order as the docs in the 
big index) ... it might not be feasible if the rate at which you need to 
surface annotations has to be "near instanteneous" but assuming most 
emails won't ever get annotations, they'll just be "empty" docs that 
should index lighting fast.

i can also imagine a situation where you break both indexes up into lots 
of pieces (shards) and use a MultiReader over lots of ParallelReaders ... 
that way you have much smaller "small" indexes to rebuild when someone 
annotates an email -- and if hte shards are organized by date, you're less 
likely to ever need to rebuild many of them since people will tend to 
focus on annotating more recent mail, and if queries focus on a specific 
date range (which i'm guessing most email searches will) you can use 
MultiReaders over a subset of all the ParallelReaders to save time on 
scanning through older docs you know won't match.

Disclaimer: all of this is purely brainstorming, i've never actually tried 
anything like this, it may be more trouble then it's worth.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message