curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stibi <sulyan.ti...@gmail.com>
Subject Re: Sometimes leader election ends up in two leaders
Date Thu, 22 May 2014 12:26:57 GMT
Hi!

Thanks for the quick response.
About this step:

— Time N + D2 —
The ZooKeeper quorum is repaired and the nodes start a doWork() loop again.
At this point, there can be 2, 3 or 4 nodes depending.
lock-0000000240 (waiting to be deleted)
lock-0000000241 (waiting to be deleted)
lock-0000000242
lock-0000000243
Neither of the instances will achieve leadership until the nodes 240/241
are deleted.

What guarantees that zNode 241 will be deleted prior to the (successful)
attempt of client #2 to reacquire the mutex using zNode 241?
AFAIK node deletion is a background operation and a retry policy controls
how often a deletion attempt will occur (even for guaranteed deletes).
Unlucky timing can lead to a situation where deletion of zNode 241 happens
after the mutex acquisition. In this case the mutex is not released by the
leader, but since the zNodes are deleted, the other client will also be
elected as leader.

Thanks,
Tibor



On Thu, May 15, 2014 at 3:37 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> I don’t think the situation you describe can happen. Let’s walk through
> this:
>
> — Time N —
> We have a single, correct leader and 2 nodes:
> lock-0000000240
> lock-0000000241
>
> — Time N + D1 —
> ZooKeeper leader instance is restarted. Shortly thereafter, both Curator
> clients will exit their doWork() loops and mark their nodes for deletion.
> Due to a failed connection, though there are still the 2 nodes:
> lock-0000000240 (waiting to be deleted)
> lock-0000000241 (waiting to be deleted)
>
> — Time N + D2 —
> The ZooKeeper quorum is repaired and the nodes start a doWork() loop
> again. At this point, there can be 2, 3 or 4 nodes depending.
> lock-0000000240 (waiting to be deleted)
> lock-0000000241 (waiting to be deleted)
> lock-0000000242
> lock-0000000243
> Neither of the instances will achieve leadership until the nodes 240/241
> are deleted.
>
> Of course, there may be something else that’s causing you to see 2
> leaders. A while back I discovered that rolling config changes can do it (
> http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html).
> Or, there’s something else going on in Curator.
>
> -Jordan
>
>
> From: stibi sulyan.tibor@gmail.com
> Reply: user@curator.apache.org user@curator.apache.org
> Date: May 14, 2014 at 11:39:48 AM
> To: user@curator.apache.org user@curator.apache.org
> Subject:  Sometimes leader election ends up in two leaders
>
>  Hi!
>
> I'm using Curator's Leader Election recipe (2.4.2) and found a very
> hard-to-reproduce issue which could lead to a situation where both clients
> become leader.
>
> Let's say 2 clients are competing for leadership, client #1 is currently
> the leader and zookeeper maintains the following structure under the
> leaderPath:
>
> /leaderPath
>   |- _c_a8524f0b-3bd7-4df3-ae19-cef11159a7a6-lock-0000000240 (client #1)
>   |- _c_b5bdc75f-d2c9-4432-9d58-1f7fe699e125-lock-0000000241 (client #2)
>
> autoRequeue flag is set to true for both clients
>
> Let's tigger a leader election by restarting the ZooKeeper leader.
>
> When this happens, both clients will lose the connection to the ZooKeeper
> ensemble and will try to re-acquire the LeaderSelector's mutex. Eventually
> (after the negotiated session timeout) the ephemeral zNodes under
> /leaderPath will be deleted.
>
> The problem occurs when ephemeral zNode deletions interleave with mutex
> acquisition.
>
> Client #1 can observe that both zNodes (240 and 241) are already deleted,
> /leaderPath has no children so it acquires the mutex successfully.
>
> On the other hand, client #2 can observe that both zNodes still exist, so
> it starts to watch zNode #240 (LockInternals.internalLockLoop():315). In a
> short period of time the watcher will be notified about the zNode's
> deletion, so client #2 reenters LockInternals.internalLockLoop().
>
> What is really strange that getSortedChildren() call in LockInternals:284
> can still return zNode #241
> so it will succeed in acquiring the mutex (LockInternals:287)
>
> The result is two clients, both leader, but /leaderPath contains only one
> zNode for client #1
>
> Did you encounter similar problems before? Do you have any ideas on how to
> prevent such race conditions? I can think of a solution: The leader should
> watch its zNode under /leaderPath and interrupt leadership when the zNode
> gets deleted.
>
> Thank you,
> Tibor
>
>

Mime
View raw message