lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Potter <tim.pot...@lucidworks.com>
Subject RE: Replicating Between Solr Clouds
Date Wed, 05 Mar 2014 02:51:21 GMT
Unfortunately, there is no out-of-the-box solution for this at the moment. 

In the past, I solved this using a couple of different approaches, which weren't all that
elegant but served the purpose and were simple enough to allow the ops folks to setup monitors
and alerts if things didn't work.

1) use DIH's Solr entity processor to pull data from one Solr to another, see: http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

This only works if you store all fields, which in my use case was OK because I also did lots
of partial document updates, which also required me to store all fields

2) use the replication handler's snapshot support to create snapshots on a regular basis and
then move the files over the network

This one works but required the use of read and write aliases and two collections on the remote
(slave) data center so that I could rebuild my write collection from the snapshots and then
update the aliases to point the reads at the updated collection. Work on an automated backup/restore
solution is planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need
something sooner, you can write a backup driver using SolrJ that uses CloudSolrServer to get
the address of all the shard leaders, initiate the backup command on each leader, poll the
replication details handler for snapshot completion on each shard, and then ship the files
across the network. Obviously, this isn't a solution for NRT multi-homing ;-)

Lastly, these aren't the only ways to go about this, just wanted to share some high-level
details about what has worked.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

________________________________________
From: perdurabo <robert_parker@volusion.com>
Sent: Tuesday, March 04, 2014 1:04 PM
To: solr-user@lucene.apache.org
Subject: Replicating Between Solr Clouds

We are looking to setup a highly available failover site across a WAN for our
SolrCloud instance.  The main production instance is at colo center A and
consists of a 3-node ZooKeeper ensemble managing configs for a 4-node
SolrCloud running Solr 4.6.1.  We only have one collection among the 4 cores
and there are two shards in the collection, one master node and one replica
node for each shard.  Our search and indexing services address the Solr
cloud through a load balancer VIP, not a compound API call.

Anyway, the Solr wiki explains fairly well how to replicate single node Solr
collections, but I do not see an obvious way for replicating a SolrCloud's
indices over a WAN to another SolrCloud.  I need for a SolrCloud in another
data center to be able to replicate both shards of the collection in the
other data center over a WAN.  It needs to be able to replicate from a load
balancer VIP, not a single named server of the SolrCloud, which round robins
across all four nodes/2 shards for high availability.

I've searched high and low for a white paper or some discussion of how to do
this and haven't found anything.  Any ideas?

Thanks in advance.



--
View this message in context: http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html
Sent from the Solr - User mailing list archive at Nabble.com.
Mime
View raw message