lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Swapping between indexes
Date Thu, 06 Mar 2008 11:28:21 GMT

A simple variant on Approach 1 would be to open your writer with  

This way no reader will ever see the changes until you successfully  
close the writer.  If the machine crashes the index is still in the  
starting state as of when the writer was first opened.

Also, re-open of Approach 1 should be a bit (not a lot, though there  
is work to make it a lot) faster than wholly new open required in  
approach 2.

There should not be problems optimizing while searching.  Yes, you  
use more disk space, but no more (in fact, less) than approach 2  

I think approach 2 is only possibly better if the indexing would be  
done on a different computer / IO system.


Sridhar Raman wrote:

> This is my situation.  I have an index, which has a lot of search  
> requests
> coming into it.  I use just a single instance of IndexSearcher to  
> process
> these requests.  At the same time, this index is also getting  
> updated by an
> IndexWriter.  And I want these new changes to be reflected _only_  
> at certain
> intervals.  I have thought of a few ways of doing this.  Each has  
> its share
> of problems and pluses.  I would be glad if someone can help me in  
> figuring
> out the right approach, especially from the performance point of  
> view, as
> the number of documents that will get indexed are pretty large.
> Approach 1:
> Have just one copy of the index for both Search & Index.  At time  
> T, when I
> need to see the new changes reflected, I close the Searcher, and  
> open it
> again.
> - The re-open of the Searcher might be a bit slow (which I could  
> probably
> solve by using some warm-up threads).
> - Update and Search on the index at the same - will this affect the
> performance?
> - If server crashes before time T, the new Searcher would reflect the
> changes, which is not acceptable.  I want the changes to be  
> reflected only
> at time T.  If server crashes, the index should be the previous T-1  
> index.
> - Possible problems while optimising the index (as Search is also
> happening).
> + Just one copy of the index being stored.
> Approach 2:
> Keep 2 copies of the index - 1 for Search, 1 for Index.  At time T,  
> I just
> switch the Searcher to a copy of index that is being updated.
> - Before I do the switch to the new index, I need to make a copy of  
> it so
> that the updates continue to happen on the other index.  Is there a
> convenient way to make this copy?  Is it efficient?
> - Time taken to create a new Searcher will still be a problem (but  
> this is a
> problem in the previous approach as well, and we can live with it).
> + Optimise can happen on an index that is not being read, as a  
> result, its
> resource requirements would be lesser.  And probably even the speed of
> optimisation.
> + Faster search as the index update is happening on a different index.
> So, these are the 2 approaches I am contemplating about.  Any  
> pointers which
> would be the better approach?
> Thanks,
> Sridhar

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message