zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Kumara <zonik.hatkum...@gmail.com>
Subject Re: Sessions Expire due to Network partitioning in Zookeeper
Date Tue, 07 Mar 2017 04:41:10 GMT
Hi Rakesh,

Thank you so much for the reply.

The client program that I have used is a simple one which implements the
Leader Election. It creates a ephemeral node and watches it and once that
is deleted a new leader is selected.

Client Code
https://gist.github.com/tharindukumara/025c98ddbd406db34067a587c6d9533d

For this test, I deployed both the client and Server C in the same machine.


I think, you can also recreate the scenario, if you deploy the zookeeper
servers and clients as I mentioned.

This is the output of a gstack while executing the client program.

Thread 3 (Thread 0x7fa5a7a2d700 (LWP 9057)):
#0  0x00007fa5a7d27c1d in poll () from /lib64/libc.so.6
#1  0x00007fa5a88a34d8 in copy_watchers (clone=<optimized out>,
to=0x118e3b0, from=<optimized out>) at src/zk_hashtable.c:233
#2  add_for_event (ht=<optimized out>, path=<optimized out>,
list=<optimized out>) at src/zk_hashtable.c:265
#3  0x00007fa5a7c460f0 in __sched_getparam@got.plt () from
/lib64/libpthread.so.0
#4  0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7fa5a722c700 (LWP 9058)):
#0  0x00007fa5a7a3a05f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007fa5a88a3686 in destroy_zk_hashtable (ht=0x118e514) at
src/zk_hashtable.c:143
#2  0x000000000118e3b0 in ?? ()
#3  0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fa5a8cad740 (LWP 9056)):
#0  0x00007fa5a7d01ded in nanosleep () from /lib64/libc.so.6
#1  0x00007fa5a7d01c84 in sleep () from /lib64/libc.so.6
#2  0x000000000040213c in main ()

Can you please give me a proper reason for this scenario?

To me it looks like when the client connects to server B, it initiates a
blocking call to Server B. And since the port is open to communicate it
waits on it too much time.

Thanks
Tharindu



On Mon, Mar 6, 2017 at 12:00 PM, Rakesh Radhakrishnan <rakeshr@apache.org>
wrote:

> Sorry, it was a busy weekend for me. I hope the following will give more
> clarity.
>
> (1) Client connected to a server and sends heartbeat periodically
> Send pings 1/3 the session timeout
> https://github.com/apache/zookeeper/blob/branch-3.4/src/c/
> src/zookeeper.c#L1675
> In your case, sessiontimeout=45secs, then approax 15secs is the ping
> interval
>
> (2) Client connected to a server and the connected server is unavailable.
> Missing ping response. Usually, this time period could be (ping interval +
> small grace period, for idle time calc).
> This allows time to reconnect to a different server.
>
> In your case, timeout 45 secs, client took 15 secs to get the disconnection
> event from server C.
> Then, it took 30 secs to hit connection error from server B. This is very
> suspicious and ideally client should receive an error within 1/3 timeout +
> small grace period.
> Any chance to check the status of the client machine during this 30 secs
> time period. Do you think Kernel message can help ?
>
> Below is, the session "0x35a926acae80000" related log from the shared
> client log.
> >>> zookeeper_client: 2017-03-03
> 09:56:14,770:23078(0x7f9467786700):ZOO_INFO@check_events@1775: session
> establishment complete on server [172.25.83.205:2181],
> sessionId=0x35a926acae80000, negotiated timeout=45000
> >>> zookeeper_client: 2017-03-03
> 09:56:30,770:23078(0x7f9467786700):ZOO_DEBUG@zookeeper_process@2218: Got
> ping response in 1 ms
> >>> zookeeper_client: 2017-03-03
> 09:56:42,213:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1746:
> Socket [172.25.83.205:2181] zk retcode=-4, errno=112(Host is down): failed
> while receiving aserver response
> >>> zookeeper_client: 2017-03-03
> 09:57:12,220:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1666:
> Socket [172.25.83.204:2181] zk retcode=-7, errno=110(Connection timed
> out):
> connection to 172.25.83.204:2181 timed out (exceeded timeout by 6ms)
> >>> zookeeper_client: 2017-03-03
> 09:57:12,221:23078(0x7f9467786700):ZOO_INFO@check_events@1728: initiated
> connection to server [172.25.83.201:2181]
> >>> zookeeper_client: 2017-03-03
> 09:57:12,223:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1764:
> Socket [172.25.83.201:2181] zk retcode=-112, errno=116(Stale file handle):
> sessionId=0x35a926acae80000 has expired.
>
> Rakesh
>
> On Mon, Mar 6, 2017 at 10:11 AM, Tharindu Kumara <
> zonik.hatkumara@gmail.com>
> wrote:
>
> > Hi Rakesh,
> >
> > Any updates on this?
> >
> > Is this a bug or an expected situation?
> >
> > Thanks
> >
> >
> >
> > On Fri, Mar 3, 2017 at 10:49 AM, Tharindu Kumara <
> > zonik.hatkumara@gmail.com>
> > wrote:
> >
> > > Hi Rakesh,
> > >
> > > I have made a small mistake in the above email.
> > > For client's session timeout I have used 20000ms not 45000ms.
> > > Sorry for the mistake.
> > > Basically it waits for 28 seconds to connect to Server A. (14 seconds
> to
> > > figure out disconnection and another 14 seconds to connect to Server B)
> > >
> > >
> > >
> > >
> > > Then as soon as it connected to Server A, Server A sends the session
> > > expiration.
> > > Then I again did the same experiment with setting client's session
> > > expiration time to 45000ms.
> > > Then it took 15secs to retrieve the disconnection event to client.
> > > After from that point, client took another 30secs to trying to connect
> to
> > > Server B.
> > > And then after that, client figures out that it cannot connect to
> Server
> > B
> > > and try to initiate a connection to Server A.
> > > Since 15 + 30 secs is 45, when as soon as client connected to Server A,
> > > Server A sends the session expiration event.
> > > What is the reason for that? It looks like client is trying too much
> time
> > > to connect to Server B.
> > >
> > >
> > > Here I attached the Server B's log (Leader) and client's log.
> > > Server A = 172.25.83.201
> > > Server B = 172.25.83.204
> > > Server C = 172.25.83.205
> > > And client's session id is 0x35a926acae80000
> > >
> > > https://drive.google.com/file/d/0B9o9GJ_CoG1_
> > > NWRTQ3hmeGpZRktrc040MkZzTk5LTDdpMi1v/view?usp=sharing
> > > https://drive.google.com/file/d/0B9o9GJ_CoG1_
> > > VHlKa092QzB6M3NPemRORk9Lc0w1a2N5ZDdz/view?usp=sharing
> > >
> > > And also in the above post, you mentioned that, you mentioned about a
> > > connection-timeout.
> > > Can you please explain a bit more?
> > >
> > > I see that when I try with both 45000ms and 20000ms, it takes around
> 14 -
> > > 15 seconds to figure out a disconnection from clients end.
> > > Looks like it is a constant and client has (session timeout -
> connection
> > > loss notified time) to find a new Server. Am I correct here?
> > > You mentioned that client connection timeout is equal to
> > > ("sessiontimeout/listed servers count").
> > > Please explain a bit about the client ping times and connection
> timeout.
> > > How these to co-relate?
> > >
> > > On Thu, Mar 2, 2017 at 8:18 PM, Rakesh Radhakrishnan <
> rakeshr@apache.org
> > >
> > > wrote:
> > >
> > >> >>> You mentioned that a client sends a ping every 1/3 the session
> > >> timeout.
> > >> Yes, you are correct. Again to analyse your issue, we have to consider
> > >> re-connection timeout also, which is "sessiontimeout/listed servers
> > count"
> > >> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
> > >> /main/org/apache/zookeeper/ClientCnxn.java#L1292
> > >> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
> > >> /main/org/apache/zookeeper/ClientCnxn.java#L1098
> > >>
> > >> Coincidentally, in your example both heartbeat interval and
> > re-connection
> > >> interval are same as you have three servers.
> > >>
> > >> >>>>> It looks like C3 has taken 14 seconds to determine
the
> > disconnected
> > >> event
> > >> >>>>> and another 14 seconds to that it cannot connect to
Server B(C3
> is
> > >> isolated
> > >> >>>>> from B).
> > >>
> > >> With this info, total elapsed time is 28 secs which is less than 45
> secs
> > >> session timeout. Now, the client has 17 secs (45 secs - 28 secs) time
> > >> period to re-establish a connection with server A, right? Could you
> > please
> > >> check whether the client is connecting to A during this period?
> > >>
> > >> Rakesh
> > >>
> > >>
> > >>
> > >> On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara <
> > >> zonik.hatkumara@gmail.com>
> > >> wrote:
> > >>
> > >> > ​​
> > >> > Hi Rakesh,
> > >> >
> > >> > First of all thank you for the quick reply.
> > >> >
> > >> > >>>>> Actually, ZooKeeper client has retry mechanism.
> > >> > >>>>> Client sends a ping every 1/3 the session timeout
(here, 3 is
> > the
> > >> no.
> > >> > of listed servers, A, B, C) and then looks for a response before
> > another
> > >> > 1/3 elapses. This allows time to reconnect to a different server
> (and
> > >> still
> > >> > maintain the session) if the connected server becomes unavailable.
> > >> >
> > >> > You mentioned that a client sends a ping every 1/3 the session
> > timeout.
> > >> And
> > >> > 3 is the no of listed servers.
> > >> >
> > >> > I doubt that. Because, I am using the C Binding and after inspecting
> > the
> > >> > code it looks like that 3 is a hard coded value.
> > >> > Simply no matter what the number of clients, zk client biding is
> > always
> > >> > sending a ping every 1/3 session timeout.
> > >> >
> > >> > Can please clarify that for me?
> > >> >
> > >> > Here I used a tick of 3000ms and session expiration timeout of
> > 45000ms.
> > >> >
> > >> > And please find the screenshot of extacted client log outout.
> > >> >
> > >> > https://anonimag.es/image/JT9htnL
> > >> >
> > >> > It looks like C3 has taken 14 seconds to determine the disconnected
> > >> event
> > >> > and another 14 seconds to that it cannot connect to Server B(C3 is
> > >> isolated
> > >> > from B).
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <
> > >> rakeshr@apache.org>
> > >> > wrote:
> > >> >
> > >> > > >>>> According to my understanding, it looks like,
when a client
> > >> trying
> > >> > to
> > >> > > >>>> connect to a server that it cannot connect due
to a network
> > >> > > partitioning,
> > >> > > >>>> it uses a blocking call and it waits too much
time trying to
> > >> > > >>>> connect to a server that it cannot communicate.
> > >> > >
> > >> > > Actually, ZooKeeper client has retry mechanism.
> > >> > > Client sends a ping every 1/3 the session timeout (here, 3 is
the
> > no.
> > >> of
> > >> > > listed servers, A, B, C)
> > >> > > and then looks for a response before another 1/3 elapses. This
> > allows
> > >> > time
> > >> > > to reconnect to a
> > >> > > different server (and still maintain the session) if the connected
> > >> server
> > >> > > becomes unavailable.
> > >> > >
> > >> > > Could you grep the following log message in your client log and
> tell
> > >> me
> > >> > how
> > >> > > much time C3 taken for the re-connection attempts.
> > >> > > "Client session timed out, have not heard from server in "
> > >> > >
> > >> > > C3 might have first attempted to reconnect to B and then A. Also,
> > >> need to
> > >> > > check how much time C3 taken to detect connection failure from
> > server
> > >> C.
> > >> > >
> > >> > > Could you please share the zk client log to dig more.
> > >> > >
> > >> > > Rakesh
> > >> > >
> > >> > >
> > >> > > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
> > >> > > zonik.hatkumara@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > >  > ​
> > >> > > > 1) Could you tell me the status of Server C, is this lost
> > >> connection to
> > >> > > the
> > >> > > >  >     quorum and fails to join quorum continuously as
B is the
> > >> Leader
> > >> > ?
> > >> > > >
> > >> > > > Yes, B the leader. C Server is completely isolated from
the
> > >> Leader(B)
> > >> > > > and It cannot communicate with the Leader. C cannot continuously
> > >> > connect
> > >> > > to
> > >> > > > the
> > >> > > >
> > >> > > > Leader.
> > >> > > >
> > >> > > >
> > >> > > >  > 2) C3 is connected C. Please tell me the connection
host
> string
> > >> > passed
> > >> > > > to
> > >> > > >  >     this client. Does it contains all three servers
info
> > >> > > "A:clientport,
> > >> > > >  >    B:clientport, C:clientport" ?
> > >> > > >
> > >> > > > Yes, C3's connection string contains all three servers.
> > >> ("A:clientport,
> > >> > > > B:clientport, C:clientport")
> > >> > > >
> > >> > > >
> > >> > > >  > 3) Please check all three servers and client C3 logs
to see
> any
> > >> > > >  >    inconsistencies or exceptions.
> > >> > > >
> > >> > > > After looking at logs, it seems when the server C isolated
from
> > the
> > >> > > Leader,
> > >> > > >
> > >> > > > a disconnect event fires to client C3. Then it (C3) tries
too
> much
> > >> time
> > >> > > to
> > >> > > > connect to Server B(Leader) .
> > >> > > >
> > >> > > > But it cannot connect to server B, as we blocked the connection
> > >> between
> > >> > > > Server C and
> > >> > > >
> > >> > > > Server B. Basically, C3 tries more than half of the session
> > timeout
> > >> > time
> > >> > > to
> > >> > > > connect to Server B.
> > >> > > >
> > >> > > > Then after figuring out that C3 cannot to connect to Server
B,
> it
> > >> tries
> > >> > > to
> > >> > > > connect
> > >> > > >
> > >> > > > to Server A, and it connects to Server A successfully. But
this
> is
> > >> too
> > >> > > > late, because
> > >> > > >
> > >> > > > session is already expired at the time C3 connected.
> > >> > > >
> > >> > > > And this happens sometimes only. Because when we specify
all the
> > >> > servers
> > >> > > in
> > >> > > > the client's
> > >> > > >
> > >> > > > connect string, sometimes after C3 disconnecting from Server
C,
> > >> instead
> > >> > > of
> > >> > > > trying to connect to
> > >> > > >
> > >> > > > Server B it connects to Server A as the first attempt. In
this
> > case
> > >> the
> > >> > > > client C3 connects to the
> > >> > > >
> > >> > > > quorum successfully before the session expiration.
> > >> > > >
> > >> > > > According to my understanding, it looks like, when a client
> trying
> > >> to
> > >> > > > connect to a server that it cannot
> > >> > > >
> > >> > > > connect due to a network partitioning, it uses a blocking
call
> and
> > >> it
> > >> > > waits
> > >> > > > too much time trying to
> > >> > > >
> > >> > > > connect to a server that it cannot communicate.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >  > 4) ZooKeeper version used in your testing ?
> > >> > > >
> > >> > > > I used zookeeper 3.4.9 (current stable release)
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <
> > >> > rakeshr@apache.org
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > Could you please give few more details,
> > >> > > > >
> > >> > > > > ​​
> > >> > > > > 1) Could you tell me the status of Server C, is this
lost
> > >> connection
> > >> > to
> > >> > > > the
> > >> > > > > quorum and fails to join quorum continuously as B is
the
> Leader
> > ?
> > >> > > > >
> > >> > > > > 2) C3 is connected C. Please tell me the connection
host
> string
> > >> > passed
> > >> > > to
> > >> > > > > this client. Does it contains all three servers info
> > >> "A:clientport,
> > >> > > > > B:clientport, C:clientport" ?
> > >> > > > >
> > >> > > > > 3) Please check all three servers and client C3 logs
to see
> any
> > >> > > > > inconsistencies or exceptions.
> > >> > > > >
> > >> > > > > 4) ZooKeeper version used in your testing ?
> > >> > > > >
> > >> > > > >
> > >> > > > > Rakesh
> > >> > > > >
> > >> > > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> > >> > > > zonik.hatkumara@gmail.com
> > >> > > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > ​Recently, carried out a test to to find the
behavior of
> > clients
> > >> > > when a
> > >> > > > > > client is partitioned from the ensemble.
> > >> > > > > >
> > >> > > > > > Here I used a ensemble of 3 zookeeper servers
called A, B
> and
> > C.
> > >> > And
> > >> > > > > quorum
> > >> > > > > > was set up like below.
> > >> > > > > >
> > >> > > > > > A - Follower
> > >> > > > > > B - Leader
> > >> > > > > > C - Follower​
> > >> > > > > >
> > >> > > > > > A  <---> B <---> C
> > >> > > > > >    \____________/
> > >> > > > > >
> > >> > > > > > And 3 clients are connected to ensemble like below.
> > >> > > > > >
> > >> > > > > > C1 is connected A
> > >> > > > > > C2 is connected B
> > >> > > > > > C3 is connected C.
> > >> > > > > >
> > >> > > > > > I used iptables to remove the network link between
B and C.
> > >> > > > > >
> > >> > > > > > command used: iptables -I INPUT -s 123.123.45.123
-j DROP
> > >> > > > > >
> > >> > > > > > After removing the link connections looks like
below.
> > >> > > > > >
> > >> > > > > > A  <----> B         C
> > >> > > > > >    \____________/
> > >> > > > > >
> > >> > > > > > Simply there is no way to communicate from B to
C and vice
> > >> versa.
> > >> > > > > >
> > >> > > > > > Here What I noticed is that the client connected
to
> Zookeeper
> > >> > Server
> > >> > > > "C",
> > >> > > > > > could not connect to the ensemble resulting a
session
> > expiration
> > >> > > > timeout.
> > >> > > > > >
> > >> > > > > > For this experiment I used tickTime of 3000ms
and client
> > session
> > >> > > > > expiration
> > >> > > > > > timeout of 45000ms. And tested with different
combinations
> > also.
> > >> > > > > >
> > >> > > > > > Can someone please explain what is the root cause
for this
> > >> > behavior?
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message