Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 53198 invoked from network); 12 Feb 2011 00:32:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2011 00:32:42 -0000 Received: (qmail 56886 invoked by uid 500); 12 Feb 2011 00:32:42 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 56854 invoked by uid 500); 12 Feb 2011 00:32:41 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 56846 invoked by uid 99); 12 Feb 2011 00:32:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 00:32:41 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Camille.Fournier@gs.com designates 207.17.33.101 as permitted sender) Received: from [207.17.33.101] (HELO mxebdp02ex-public.idz.gs.com) (207.17.33.101) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 00:32:35 +0000 X-IronPort-AV: E=Sophos;i="4.60,459,1291611600"; d="scan'208";a="23674592" Received: from unknown (HELO mxpbd01-public.ny.fw.gs.com) ([148.86.115.129]) by mxebdp02ex.idz.gs.com with ESMTP; 11 Feb 2011 19:32:14 -0500 From: "Fournier, Camille F. [Tech]" X-sendergroup: RELAYLIST Received: from gshccdp08ex.firmwide.corp.gs.com ([139.172.184.154]) by mxpbd01.ny.fw.gs.com with ESMTP; 11 Feb 2011 19:32:14 -0500 Received: from GSCMAMP02EX.firmwide.corp.gs.com ([139.172.184.156]) by gshccdp08ex.firmwide.corp.gs.com ([139.172.184.154]) with mapi; Fri, 11 Feb 2011 19:32:14 -0500 To: "'user@zookeeper.apache.org'" Date: Fri, 11 Feb 2011 19:32:13 -0500 Subject: RE: tickTime and sessionTimeout Thread-Topic: tickTime and sessionTimeout Thread-Index: AcvKQjOMSDxSdwxWSQOQDAe23GciAAAB/kUQ Message-ID: <69D3016305F9084FBD2C4A0DF189BD5C16B7F38506@GSCMAMP02EX.firmwide.corp.gs.com> References: <69D3016305F9084FBD2C4A0DF189BD5C16B7F384FD@GSCMAMP02EX.firmwide.corp.gs.com> <69D3016305F9084FBD2C4A0DF189BD5C16B7F384FE@GSCMAMP02EX.firmwide.corp.gs.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-retentionstamp: Firmwide Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 I think the gist of the problem is that while the ZK Server is expiring SES= SION_0, SESSION_1 is coming in and trying to send a ping. It is blocked a b= it behind the expiring of SESSION_0 due to the synchronous nature of touchi= ng the session table, and then its request for a ping will be processed beh= ind the session expiration processing for SESSION_0. So either the expirer = takes long enough dealing with SESSION_0 that it immediately wants to expir= e SESSION_1 on next processing, or, the processing of the session expiratio= n takes just long enough for SESSION_1 to not get a response to its heartbe= at back from the server, which then causes it to disconnect and reconnect, = and in the interim the server determines the session dead due to timeout. Long story short, those timeouts are too short for the server to reliably e= xecute responses in time to guarantee they won't be incorrectly activated. C -----Original Message----- From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]=20 Sent: Friday, February 11, 2011 6:19 PM To: user@zookeeper.apache.org Subject: Re: tickTime and sessionTimeout 3.3.2-1031432 On Fri, Feb 11, 2011 at 6:15 PM, Fournier, Camille F. [Tech] wrote: > Which version of ZK? > > -----Original Message----- > From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com] > Sent: Friday, February 11, 2011 6:10 PM > To: user@zookeeper.apache.org > Subject: Re: tickTime and sessionTimeout > > A single server. =A0I've intentionally made it trivial to demonstrate > the behavior. > If I increase the timeout, the issue goes away. > > On Fri, Feb 11, 2011 at 6:00 PM, Fournier, Camille F. [Tech] > wrote: >> What is your ZooKeeper setup here? And do you continue to see this issue= if you increase your session timeout? >> >> C >> >> -----Original Message----- >> From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com] >> Sent: Friday, February 11, 2011 5:24 PM >> To: user@zookeeper.apache.org >> Subject: Re: tickTime and sessionTimeout >> >> Patrick, thanks for your input. >> >> I have rerun the test several times now while logging GC on the server >> and running ping from CLIENT_1: >> >> =A0- No GC happens on the server during the period of interest (a >> couple of young generation runs happen before I kill HOST_0, and they >> complete in under 3ms). >> =A0- Round trip times for ping from CLIENT_1 are consistently under >> 250us throughout. >> >> Regarding client GC -- I can consistently reproduce this using a C >> client. =A0Regarding the theory of swapping on CLIENT_1 -- if that were >> the cause, the problem wouldn't be 100% reproducible. =A0I also looked >> through the client log for SESSION_1 -- I see "Got ping response ... >> after 1ms" repeatedly, followed by "Client session timed out, have not >> heard from server in 666ms"... >> >> On Fri, Feb 11, 2011 at 1:46 PM, Patrick Hunt wrote: >>> Those are pretty short timeouts, many sources of delay could be >>> causing this. Network jitter/latency, GC/swap (server or client), IO >>> write latency, etc... See if any of this might be your issue: >>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting >>> >>> I can clearly see this sequence in your log for session1: >>> >>> --- >>> 2011-02-11 14:18:52,672 - =A0sessionid:SESSION_1 type:ping >>> >>> 2011-02-11 14:18:54,502 - Expiring session SESSION_1, timeout of 1000ms= exceeded >>> >>> 2011-02-11 14:18:55,011 - Processing request:: sessionid:SESSION_1 type= :ping >>> --- >>> >>> from the looks of it session 1 doesn't send a ping to the server for >>> ~2.5 seconds, as a result it's expired. >>> >>> You should also look at your session 1 client log and see what it's >>> view of the world is like. (is it gc/swapping?). Use the "stat" 4 >>> letter word to see the server's request processing latency, see if >>> that's high (higer than the timeout is bad news). Again, checkout the >>> troubleshooting guide. >>> >>> Patrick >>> >>> ps please use pastebin or attachment, otw the formatting of wrecked >>> and it's harder to read the log >>> >>> On Fri, Feb 11, 2011 at 6:41 AM, Nick Patania >>> wrote: >>>> This is the portion that seems relevant. =A0For readability, I replace= d >>>> the host and session for the host that I kill with HOST_0 and >>>> SESSION_0 (I expect these to timeout). =A0The client that should be >>>> healthy is HOST_1 and SESSION_1. >>>> >>>> >>>> 2011-02-11 14:18:51,901 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_0 type:ping cxid:0xfffffffffffffffe >>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:51,901 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_0 >>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>> txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:52,005 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:52,005 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>> txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:52,339 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:52,339 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>> txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:52,672 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:52,672 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>> txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:53,002 - INFO =A0[SessionTracker:ZooKeeperServer@314]= - >>>> Expiring session SESSION_0, timeout of 1000ms exceeded >>>> 2011-02-11 14:18:53,002 - INFO >>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session >>>> termination for sessionid: SESSION_0 >>>> 2011-02-11 14:18:53,010 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_0 type:closeSession cxid:0x0 zxid:0x103 txntype:-11 >>>> reqpath:n/a >>>> 2011-02-11 14:18:53,010 - INFO =A0[SyncThread:0:NIOServerCnxn@1435] - >>>> Closed socket connection for client /HOST_0:34618 which had sessionid >>>> SESSION_0 >>>> 2011-02-11 14:18:54,502 - INFO =A0[SessionTracker:ZooKeeperServer@314]= - >>>> Expiring session SESSION_1, timeout of 1000ms exceeded >>>> 2011-02-11 14:18:54,502 - INFO >>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session >>>> termination for sessionid: SESSION_1 >>>> 2011-02-11 14:18:55,011 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:55,011 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>> txntype:unknown reqpath:n/a >>>> 2011-02-11 14:18:55,011 - INFO >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@251] >>>> - Accepted socket connection from /HOST_1:40556 >>>> 2011-02-11 14:18:55,019 - DEBUG >>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>> sessionid:SESSION_1 type:closeSession cxid:0x0 zxid:0x104 txntype:-11 >>>> reqpath:n/a >>>> 2011-02-11 14:18:55,019 - DEBUG >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:ZooKeeperServer@590] - >>>> Dropping request: No session with sessionid SESSION_1 exists, probably >>>> expired and removed >>>> 2011-02-11 14:18:55,019 - INFO =A0[SyncThread:0:NIOServerCnxn@1435] - >>>> Closed socket connection for client /HOST_1:40555 which had sessionid >>>> SESSION_1 >>>> 2011-02-11 14:18:55,020 - DEBUG [SyncThread:0:NIOServerCnxn@1451] - >>>> ignoring exception during output shutdown >>>> java.net.SocketException: Transport endpoint is not connected >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketCh= annelImpl.java:651) >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdapto= r.java:368) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.closeSock(= NIOServerCnxn.java:1447) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.close(NIOS= erverCnxn.java:1412) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.cl= oseSessionWithoutWakeup(NIOServerCnxn.java:343) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.cl= oseSession(NIOServerCnxn.java:330) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.FinalRequestProcessor.pr= ocessRequest(FinalRequestProcessor.java:133) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.flu= sh(SyncRequestProcessor.java:161) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.run= (SyncRequestProcessor.java:98) >>>> 2011-02-11 14:18:55,021 - DEBUG [SyncThread:0:NIOServerCnxn@1459] - >>>> ignoring exception during input shutdown >>>> java.net.SocketException: Transport endpoint is not connected >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketCha= nnelImpl.java:640) >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor= .java:360) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.closeSock(= NIOServerCnxn.java:1455) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.close(NIOS= erverCnxn.java:1412) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.cl= oseSessionWithoutWakeup(NIOServerCnxn.java:343) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.cl= oseSession(NIOServerCnxn.java:330) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.FinalRequestProcessor.pr= ocessRequest(FinalRequestProcessor.java:133) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.flu= sh(SyncRequestProcessor.java:161) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.run= (SyncRequestProcessor.java:98) >>>> 2011-02-11 14:18:55,022 - WARN >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@272] >>>> - Ignoring unexpected runtime exception >>>> java.nio.channels.CancelledKeyException >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKey= Impl.java:55) >>>> =A0 =A0 =A0 =A0at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImp= l.java:69) >>>> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.ru= n(NIOServerCnxn.java:241) >>>> 2011-02-11 14:18:55,023 - DEBUG >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@735] - >>>> Session establishment request from client /HOST_1:40556 client's >>>> lastZxid is 0x0 >>>> 2011-02-11 14:18:55,023 - INFO >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@770] - Client >>>> attempting to renew session SESSION_1 at /HOST_1:40556 >>>> 2011-02-11 14:18:55,024 - INFO >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1573] - >>>> Invalid session SESSION_1 for client /HOST_1:40556, probably expired >>>> 2011-02-11 14:18:55,025 - WARN >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@634] - >>>> EndOfStreamException: Unable to read additional data from client >>>> sessionid SESSION_1, likely client has closed socket >>>> 2011-02-11 14:18:55,025 - INFO >>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1435] - >>>> Closed socket connection for client /HOST_1:40556 which had sessionid >>>> SESSION_1 >>>> >>>> >>>> On Thu, Feb 10, 2011 at 8:11 PM, Benjamin Reed w= rote: >>>>> do you see anything in the server log? >>>>> >>>>> ben >>>>> >>>>> On 02/10/2011 03:16 PM, Patania, Nick wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I run the following test: >>>>>> >>>>>> >>>>>> * =A0 =A0 =A0 =A0 Two clients connect to a zookeeper server; the tic= kTime on the >>>>>> server is 500, and the sessionTimeout on the client is 1000. >>>>>> >>>>>> * =A0 =A0 =A0 =A0 Kill the host running one of the clients. >>>>>> >>>>>> * =A0 =A0 =A0 =A0 The second client receives a session timeout. >>>>>> >>>>>> Is there any reason why this might happen? >>>>>> >>>>>> Thanks >>>>>> Nick Patania >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------= ------ >>>>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the >>>>>> opinions or views contained herein are not intended to be, and do no= t >>>>>> constitute, advice within the meaning of Section 975 of the Dodd-Fra= nk Wall >>>>>> Street Reform and Consumer Protection Act. If you have received this >>>>>> communication in error, please destroy all electronic and paper copi= es and >>>>>> notify the sender immediately. Mistransmission is not intended to wa= ive >>>>>> confidentiality or privilege. Morgan Stanley reserves the right, to = the >>>>>> extent permitted under applicable law, to monitor electronic communi= cations. >>>>>> This message is subject to terms available at the following link: >>>>>> http://www.morganstanley.com/disclaimers. If you cannot access these= links, >>>>>> please notify us by reply message and we will send the contents to y= ou. By >>>>>> messaging with Morgan Stanley you consent to the foregoing. >>>>> >>>>> >>>> >>> >> >