lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Zookeeper down question
Date Tue, 19 Nov 2013 19:58:52 GMT

On Nov 19, 2013, at 2:24 PM, Timothy Potter <> wrote:

> Good questions ... From my understanding, queries will work if Zk goes down
> but writes do not work w/o Zookeeper. This works because the clusterstate
> is cached on each node so Zookeeper doesn't participate directly in queries
> and indexing requests. Solr has to decide not to allow writes if it loses
> its connection to Zookeeper, which is a safe guard mechanism. In other
> words, Solr assumes it's pretty safe to allow reads if the cluster doesn't
> have a healthy coordinator, but chooses to not allow writes to be safe.

Right - we currently stop accepting writes when Solr cannot talk to ZooKeeper - this is because
we can no longer count on knowing about any changes to the cluster and no new leaders can
be elected, etc. It gets tricky fast if you consider allowing updates without ZooKeeper connectivity
for very long.

> If a Solr nodes goes down while ZK is not available, since Solr no longer
> accepts writes, leader / replica doesn't really matter. I'd venture to
> guess there is some failover logic built in when executing distributing
> queries but I'm not as familiar with that part of the code (I'll brush up
> on it though as I'm now curious as well).

Right - query requests will fail over to other replicas - this is important in general because
the cluster state a Solr instance has can be a bit stale - so a request might hit something
that has gone down and another replica in the shard can be tried. We use the load balancing
solrj client for these internal requests. CloudSolrServer handles failover for the user (or
non internal) requests. Or you can use your own external load balancer.

- Mark

> Cheers,
> Tim
> On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm <
>> wrote:
>> Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), and a
>> standalone zookeeper.
>> Correct me if any of my understanding is incorrect on the following:
>> If ZK goes down, most normal operations will still function, since my
>> understanding is that ZK isn't involved on a transaction by transaction
>> basis for each of these.....
>> Document adds, updates, and deletes on existing collection will still work
>> as expected.
>> Queries will still get processed as expected.
>> Is the above correct?
>> But adding new collections, changing configs, etc., will all fail while ZK
>> is down (or at least, place things in an inconsistent state?)
>> Is that correct?
>> If, while ZK is down, one of the 4 solr nodes also goes down, will all
>> normal operations fail?  Will they all continue to succeed?  I.e. will each
>> of the nodes realize which node is down and route indexing and query
>> requests around them, or is that impossible while ZK is down?  Will some
>> queries succeed (because they were lucky enough to get routed to the one
>> replica on the one shard that is still functional) while other queries fail
>> (they aren't so lucky and get routed to the one replica that is down on the
>> one shard)?
>> Thanks,
>> Garth Grimm

View raw message