zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thawan Kooburat <tha...@fb.com>
Subject Session closing delay issue
Date Thu, 07 Jun 2012 18:38:40 GMT
Hi,

We have a Zookeeper ensemble that spend across multiple data centers (each participant is
in a different datacenter). Recently, we ran into an issue when trying to support low session
time (5 seconds). We set tickTime to be 2 seconds and syncLimit to 25.

The using case is a single master. We can only have one master at any given time. The active
master create an ephemeral node. The backup master watch of this ephemeral node to be deleted
before it take over the master role.

The active master is connecting to the follower (F1) in its data center. We believe that a
network delay between F1 and the leader cause the touchTable to not propagate in a timely
manner. The leader decide to close the session due to timeout.  Ephemeral node delete event
reach the other follower (F2) before the close session event reach F1. The backup master which
is connecting to F2  got the ephemeral delete and assume the role of the active master.

>From our log,  the active master saw session expire event 14 seconds after the backup
master receive ephemeral node delete event.

I tried to looked at code, but from my current understanding. We don't have logic that enforce
upper bound in which a particular follower can lag behind (in term of data tree processing).
This means some part of the system may see that the lock is release is before the previous
owner release them.

Another issue that I saw is in this case that,  the client maintains internal clock on when
its session should expire based on its connectivity with the follow. However, the leader internal
clock (session tracker) use information that get relayed from the follower via touchTable.
 As a result, the both party may decide when the session is expired differently if there are
network issue between follower and leader.

Our internal Zookeeper is based on 3.4.3.

--
Thawan Kooburat

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message