zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chang Song (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1049) Session expire/close flooding renders heartbeats to delay significantly
Date Tue, 19 Apr 2011 22:18:05 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021827#comment-13021827

Chang Song commented on ZOOKEEPER-1049:


Linger option with 2 second timeout on Apache web server works because it is connection per
process/thread model. Blocking in one process does not block other request processing.

If you look at the code below, I don't see why it shouldn't block.
One 2 sec delay causes some other closing activity to 4 or 6 or 8 seconds, and so forth.

synchronized void closeSession(long sessionId) { selector.wakeup(); closeSessionWithoutWakeup(sessionId);

private void closeSessionWithoutWakeup(long sessionId) {
  HashSet<NIOServerCnxn> cnxns;
  synchronized (this.cnxns) { cnxns = (HashSet<NIOServerCnxn>)this.cnxns.clone(); }

  for (NIOServerCnxn cnxn : cnxns) {
    if (cnxn.sessionId == sessionId) {
      try { cnxn.close(); } 
      catch (Exception e) { LOG.warn("exception during session close", e); }

One thing I need to make correction, my team didn't set linger to 0.
They disabled the linger option, close() returns immediately (lingering done by TCP stack)

This is different than setting linger to 0, which simply flush all the buffers and send TCP
By disabling linger option, system will hold on to the socket in FIN_WAIT_1 state for dead
note that we have no timeout for FIN_WAIT_1 sockets.

We'll have to experiment with setting linger option 0 to move socket from ESTABLISHED to CLOSED
states immediately. 

> Session expire/close flooding renders heartbeats to delay significantly
> -----------------------------------------------------------------------
>                 Key: ZOOKEEPER-1049
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1049
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>         Environment: CentOS 5.3, three node ZK ensemble
>            Reporter: Chang Song
>            Priority: Critical
>         Attachments: ZookeeperPingTest.zip, zk_ping_latency.pdf
> Let's say we have 100 clients (group A) already connected to three-node ZK ensemble with
session timeout of 15 second.  And we have 1000 clients (group B) already connected to the
same ZK ensemble, all watching several nodes (with 15 second session timeout)
> Consider a case in which All clients in group B suddenly hung or deadlocked (JVM OOME)
all at the same time. 15 seconds later, all sessions in group B gets expired, creating session
closing stampede. Depending on the number of this clients in group B, all request/response
ZK ensemble should process get delayed up to 8 seconds (1000 clients we have tested).
> This delay causes some clients in group A their sessions expired due to delay in getting
heartbeat response. This causes normal servers to drop out of clusters. This is a serious
problem in our installation, since some of our services running batch servers or CI servers
creating the same scenario as above almost everyday.
> I am attaching a graph showing ping response time delay.
> I think ordering of creating/closing sessions and ping exchange isn't important (quorum
state machine). at least ping request / response should be handle independently (different
queue and different thread) to keep realtime-ness of ping.
> As a workaround, we are raising session timeout to 50 seconds.
> But this causes max. failover of cluster to significantly increased, thus initial QoS
we promised cannot be met.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message