lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Smith <>
Subject Re: Index Replication / Clustering
Date Sun, 26 Jun 2005 22:21:28 GMT
Why not try using JMS messaging to send messages to the indexing  
server that Document X needs to be updated via a JMS queue?  This  
gives you the flexibility to have the indexing system down but not  
lose the message that it needs to be indexed, and also allows the  
indexing server to be 'busy' without affecting the application that  
is performing the updates from slowing down too.

If you use ActiveMQ for JMS, you can take advantage of it's Composite  
Destination feature and have a virtual Queue/Topic that is actually  
several Queues/Topics.  This is what we use to keep a mirror index  
server completely in sync.  The application sends an update message  
to a queue named "queue://index1, queue://index2", which becomes 2  
separate queues for the 2 servers, allowing them to process the same  
message whenever they can get around to it.

We then place Apache in front of these 2 mirrored Index/Search nodes  
so the application can use web-services to query the search node  
without actually being aware that there is 2 of them behind the  
scenes, leaving Apache to do the load-balancing and fail-over as the  
index/search nodes come up/down without the main application knowing  
anything about it.

Paul Smith

On 26/06/2005, at 2:35 AM, Stephane Bailliez wrote:

> I have been browsing the archives concerning this particular topic.
> I'm in the same boat and the customer has clustering requirements.
> To give some background:
> I have a constant flow of incoming messages flying over the network  
> that need to be archived in db, indexed and dispatched to thousand  
> of clients (rich client console).
> the backend architecture needs to be clustered meaning that:
> - the message broker needs to be clustered
> - the database needs to be replicated and support failover
> - the search engine index needs to be replicated
> This is for a 24x7 operation.
> My main problem is that there is a constant flow of write just  
> about everywhere meaning that the lucene index keeps changing, and  
> that I have a very small window available to replicate the data  
> across the network.
> (As of now, I have 2 messages / minute and should go over 50 in the  
> medium-term).
> Concerning the index, being able to replicate is cool, but if one  
> node goes down, it must be able to resynchronize when you bring it  
> up on the cluster...that's a hell of problem.
> As it is acceptable to have downtime on the search engine, I was  
> thinking it was much easier to:
> 1) rely on a shared index via NFS for each node.
> 2) dedicate a box to the search engine and access it via rpc from  
> each node
> Considering the messages I have seen in the archives, 1) seems to  
> be a no-go.
> Option 2) is generally not recommended but think it could fit my  
> needs quite well. IMHO it should work quite well to bring the box  
> in operation if it goes down. Synchronizing the index for me is  
> just a matter of going through the database to reindex the archived  
> content, this will take sometime but as I said, running in degraded  
> mode is acceptable.
> As anyone any suggestion/recommendation/experience/thoughts  
> concerning the problems mentionned above ?
> Cheers,
> Stephane
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message