lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Au <bill.w...@gmail.com>
Subject best practice for restarting the entire SolrCloud cluster
Date Thu, 08 Nov 2012 20:10:49 GMT
I have a simple SolrCloud cluster with 4 Solr instances and 1 shard.  I can
start and stop individual Solr instances without any problem.  But not when
I have to shutdown all the Solr instances at the same time.

After shutting down all the Solr instances, the first instance that starts
up wait for all the replicas:

INFO: Waiting until we see more replicas up: total=4 found=3
timeoutin=169243

In the meantime, any additional Solr instances that start up while the
first one is waiting can't get the leader from zookeeper:

SEVERE: Error getting leader from zk
org.apache.solr.common.SolrException: Could not get leader props

When the first Solr instance see all the replicas, it becomes the leader:

INFO: Enough replicas found to continue.
INFO: I may be the new leader - try and sync

But it fails to sync with the instances that had failed to get the leader
before:

WARNING: PeerSync: core=collection1 url=http://host2:8983/solr  exception
talking to http://host2:8983/solr/collection1/, failed
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://host2:8983/solr/collection1

So I ended up with one for more replicas down after the restart.  I had to
figure out which replica is down and restart them.

What I also discovered is that if I start the first Solr instance and wait
until it returns after the leaderVoteWait of 3 minutes, the rest of the
Solr instance can be started without any problem since by then they can get
the leader from zookeeper.

Is there a better way to restart an entire SolrCloud cluster?

Bill

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message