lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tedsolr <>
Subject replicate indexing to second site
Date Tue, 09 Feb 2016 20:43:47 GMT
I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my primary
data center. I am now trying to plan for disaster recovery with an available
warm site. I have read (many times) the disaster recovery section in the
Apache ref guide. I suppose I don't fully understand it.

What I'd like to know is the best way to sync up the existing data, and the
best way to keep that data in sync. Assume that the warm site is an exact
copy (not at the network level) of the production cluster - so the same
servers with the same config. All servers are virtual. The use case is the
active cluster goes down and cannot be repaired, so the warm site would
become the active site. This is a manual process that takes many hours to
accomplish (I just need to fit Solr into this existing process, I can't
change the process :).

I expect that rsync can be used initially to copy the collection data
folders and the zookeeper data and transaction log folders. So after
verifying Solr/ZK is functional after the install, shut it down and perform
the copy. This may sound slow but my production index size is < 100GB. Is
this approach reasonable?

So now to keep the warm site in sync, I could use rsync on a scheduled basis
but I assume there's a better way. The ref guide says to send all indexing
requests to the second cluster at the same time they are sent to the active
cluster. I use SolrJ for all requests. So would this entail using a second
CloudSolrClient instance that only knows about the second cluster? Seems
reasonable but I don't want to lengthen the response time for the users. Is
this just a software problem to work out (separate thread)? Or is there a
SolrJ solution (asyc calls)?


View this message in context:
Sent from the Solr - User mailing list archive at

View raw message