lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <>
Subject Re: Index Replication / Clustering
Date Sun, 26 Jun 2005 08:17:08 GMT
Our setup is quite similar to yours, but in all honesty, you will need 
to do some for of batching on your updates simply because, you don't 
want to keep the Index Writter open all the time.

As for clustering, we went through three iterations, that keep x indexes 
parallelized on x servers all of this with fail over and index 
independent synchronization with your persistent store. There was a 
little discussion about this a few weeks back, and I mentioned that your 
biggest pain will be maintaining the integrity of parallel indexes that 
are updated/deleted autonomously (atomic updates and deletes) but there 
are ways of running iterative checks to make sure that your indecies 
stay clean.

Nader Henein

Stephane Bailliez wrote:

> I have been browsing the archives concerning this particular topic.
> I'm in the same boat and the customer has clustering requirements.
> To give some background:
> I have a constant flow of incoming messages flying over the network 
> that need to be archived in db, indexed and dispatched to thousand of 
> clients (rich client console).
> the backend architecture needs to be clustered meaning that:
> - the message broker needs to be clustered
> - the database needs to be replicated and support failover
> - the search engine index needs to be replicated
> This is for a 24x7 operation.
> My main problem is that there is a constant flow of write just about 
> everywhere meaning that the lucene index keeps changing, and that I 
> have a very small window available to replicate the data across the 
> network.
> (As of now, I have 2 messages / minute and should go over 50 in the 
> medium-term).
> Concerning the index, being able to replicate is cool, but if one node 
> goes down, it must be able to resynchronize when you bring it up on 
> the cluster...that's a hell of problem.
> As it is acceptable to have downtime on the search engine, I was 
> thinking it was much easier to:
> 1) rely on a shared index via NFS for each node.
> 2) dedicate a box to the search engine and access it via rpc from each 
> node
> Considering the messages I have seen in the archives, 1) seems to be a 
> no-go.
> Option 2) is generally not recommended but think it could fit my needs 
> quite well. IMHO it should work quite well to bring the box in 
> operation if it goes down. Synchronizing the index for me is just a 
> matter of going through the database to reindex the archived content, 
> this will take sometime but as I said, running in degraded mode is 
> acceptable.
> As anyone any suggestion/recommendation/experience/thoughts concerning 
> the problems mentionned above ?
> Cheers,
> Stephane
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---


Nader S. Henein
Senior Applications Architect


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message