lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: replicate indexing to second site
Date Tue, 09 Feb 2016 21:49:40 GMT
There is a Cross Datacenter replication feature in the works - not sure
of its status.

In lieu of that, I'd simply have two copies of your indexing code -
index everything simultaneously into both clusters.

There is, of course risks that both get out of sync, so you might want
to find some ways to identify/manage that.

Upayavira

On Tue, Feb 9, 2016, at 08:43 PM, tedsolr wrote:
> I have a Solr Cloud cluster (v5.2.1) using a Zookeeper ensemble in my
> primary
> data center. I am now trying to plan for disaster recovery with an
> available
> warm site. I have read (many times) the disaster recovery section in the
> Apache ref guide. I suppose I don't fully understand it.
> 
> What I'd like to know is the best way to sync up the existing data, and
> the
> best way to keep that data in sync. Assume that the warm site is an exact
> copy (not at the network level) of the production cluster - so the same
> servers with the same config. All servers are virtual. The use case is
> the
> active cluster goes down and cannot be repaired, so the warm site would
> become the active site. This is a manual process that takes many hours to
> accomplish (I just need to fit Solr into this existing process, I can't
> change the process :).
> 
> I expect that rsync can be used initially to copy the collection data
> folders and the zookeeper data and transaction log folders. So after
> verifying Solr/ZK is functional after the install, shut it down and
> perform
> the copy. This may sound slow but my production index size is < 100GB. Is
> this approach reasonable?
> 
> So now to keep the warm site in sync, I could use rsync on a scheduled
> basis
> but I assume there's a better way. The ref guide says to send all
> indexing
> requests to the second cluster at the same time they are sent to the
> active
> cluster. I use SolrJ for all requests. So would this entail using a
> second
> CloudSolrClient instance that only knows about the second cluster? Seems
> reasonable but I don't want to lengthen the response time for the users.
> Is
> this just a software problem to work out (separate thread)? Or is there a
> SolrJ solution (asyc calls)?
> 
> Thanks!!
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/replicate-indexing-to-second-site-tp4256240.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message