lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: SolrCloud shard leader elections - Altering zookeeper sequence numbers
Date Mon, 12 Jan 2015 18:17:44 GMT
Just skimming, but the problem here that I ran into was with the
listeners. Each _Solr_ instance out there is listening to one of the
ephemeral nodes (the "one in front"). So deleting a node does _not_
change which ephemeral node the associated Solr instance is listening

So, for instance, when you delete S2..n-000001 and re-add it, S2 is
still looking at S1....n-000000 and will continue looking at
S1...n-000000 until S1....n-000000 is deleted.

Deleting S2..n-000001 will wake up S3 though, which should now be
looking at S1....n-0000000. Now you have two Solr listeners looking at
the same ephemeral node. The key is that deleting S2...n-000001 does
_not_ wake up S2, just any solr instance that has a watch on the
associated ephemeral node.

The code you want is in LeaderElector.checkIfIamLeader to understand
how it all works. Be aware that the sortSeqs call sorts the nodes by
1> sequence number
2> string comparison.

Which has the unfortunate characteristic of a secondary sort by
session ID. So two nodes with the same sequence number can sort before
or after each other depending on which one gets a session higher/lower
than the other.

This is quite tricky to get right, I once created a patch for 4.10.3
by applying things in this order (some minor tweaks required). All

Good luck!

On Mon, Jan 12, 2015 at 8:54 AM, Zisis Tachtsidis <> wrote:
> SolrCloud uses ZooKeeper sequence flags to keep track of the order in which
> nodes register themselves as leader candidates. The node with the lowest
> sequence number wins as leader of the shard.
> What I'm trying to do is to keep the leader re-assignments to the minimum
> during a rolling restart. In this direction I change the zk sequence numbers
> on the SolrCloud nodes when all nodes of the cluster are up and active. I'm
> using Solr 4.10.0 and I'm aware of SOLR-6491 which has a similar purpose but
> I'm trying to do it from "outside", using the existing APIs without editing
> Solr source code.
> Suppose we have 3 Solr instances S1,S2,S3. They are started in the same
> order and the zk sequences assigned have as follows
> S1:-n_0000000000 (LEADER)
> S2:-n_0000000001
> S3:-n_0000000002
> In a rolling restart we'll get S2 as leader (after S1 shutdown), then S3
> (after S2 shutdown) and finally S1(after S3 shutdown), 3 changes in total.
> == MY ATTEMPT ==
> By using SolrZkClient and the Zookeeper multi API  I found a way to get rid
> of the old zknodes that participate in a shard's leader election and write
> new ones where we can assign the sequence number of our liking.
> S1:-n_0000000000 (no code running here)
> S2:-n_0000000004 (code deleting zknode -n_0000000001 and creating
> -n_0000000004)
> S3:-n_0000000003 (code deleting zknode -n_0000000002 and creating
> -n_0000000003)
> In a rolling restart I'd expect to have S3 as leader (after S1 shutdown), no
> change (after S2 shutdown) and finally S1(after S3 shutdown), that is 2
> changes. This will be constant no matter how many servers are added in
> SolrCloud while in the first scenarion the # of re-assignments equals the #
> of Solr servers.
> The problem occurs when S1 (LEADER) is shut down. The elections that take
> place still set S2 as leader, It's like ignoring the new sequence numbers.
> When I go to /solr/#/~cloud?view=tree the new sequence numbers are listed
> under "/collections" based on which S3 should have become the leader.
> Do you have any idea why the new state is not acknowledged during the
> elections? Is something cached? Or to put it bluntly do I have any chance
> down this path? If not what are my options? Is it possible to apply all
> patches under SOLR-6491 in isolation and continue from there?
> Thank you.
> Extra info which might help follows
> 1. Some logging related to leader elections after S1 has been shut down
>     S2 - Leader's attempt to sync with
> shard failed, moving to the next candidate
>     S2 - We failed sync,
> but we have no versions - we can't sync in that
>            case - we were active before, so become leader anyway
>     S3 - Our node is no longer in line
> to be leader
> 2. And some sample code on how I perform the ZK re-sequencing
>    // Read current zk nodes for a specific collection
> solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().getChildren("/collections/core/leader_elect/shard1
>       /election", true)
>    // node deletion
>       Op.delete(path, -1)
>    // node creation
>       Op.create(createPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
>    // Perform operations
> solrServer.getZkStateReader().getZkClient().getSolrZooKeeper().multi(opsList);
>       solrServer.getZkStateReader().updateClusterState(true);
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

View raw message