lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane Bailliez <sbaill...@apache.org>
Subject Re: Index Replication / Clustering
Date Sun, 26 Jun 2005 08:48:27 GMT
Nader Henein wrote:
> Our setup is quite similar to yours, but in all honesty, you will need 
> to do some for of batching on your updates simply because, you don't 
> want to keep the Index Writter open all the time.

For now, the index writer is closed after each added document. It does 
not seem to have such a major overhead compared to keep it open, at most 
overhead is 2x in my tests, which is acceptable for now and in par with 
other commercial search engines they have been using. My constraint is 
basically that the mergeFactor must be 1, but I think honestly that it 
will need to be relaxed when the document rate will increase.

There were no tuning yet.

I have also a quite specific document lifecycle. Incoming documents are 
5-10KB xml where I'm only extracting 0.5-1KB data to be indexed. These 
documents NEVER change. They are not updated, nor deleted.

They are only deleted for archiving purposes because we keep only the 
last 6-months of data.

> As for clustering, we went through three iterations, that keep x indexes 
> parallelized on x servers all of this with fail over and index 
> independent synchronization with your persistent store. There was a 
> little discussion about this a few weeks back, and I mentioned that your 
> biggest pain will be maintaining the integrity of parallel indexes that 
> are updated/deleted autonomously (atomic updates and deletes) but there 
> are ways of running iterative checks to make sure that your indecies 
> stay clean.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message