Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 6369 invoked from network); 11 Feb 2011 23:00:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Feb 2011 23:00:53 -0000 Received: (qmail 74556 invoked by uid 500); 11 Feb 2011 23:00:53 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 74481 invoked by uid 500); 11 Feb 2011 23:00:52 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 74453 invoked by uid 99); 11 Feb 2011 23:00:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Feb 2011 23:00:52 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Camille.Fournier@gs.com designates 204.4.187.100 as permitted sender) Received: from [204.4.187.100] (HELO mxecd05.gs.com) (204.4.187.100) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Feb 2011 23:00:45 +0000 X-IronPort-AV: E=Sophos;i="4.60,458,1291611600"; d="scan'208";a="272952092" Received: from unknown (HELO mxpcd02-public.ny.fw.gs.com) ([148.86.97.79]) by mxecd05.idz.gs.com with ESMTP; 11 Feb 2011 18:00:23 -0500 From: "Fournier, Camille F. [Tech]" X-sendergroup: RELAYLIST Received: from gshccdp07ex.firmwide.corp.gs.com ([139.172.184.153]) by cd02-mxp-vip-prod.ny.fw.gs.com with ESMTP; 11 Feb 2011 18:00:23 -0500 Received: from GSCMAMP02EX.firmwide.corp.gs.com ([139.172.184.156]) by gshccdp07ex.firmwide.corp.gs.com ([139.172.184.153]) with mapi; Fri, 11 Feb 2011 18:00:23 -0500 To: "'user@zookeeper.apache.org'" Date: Fri, 11 Feb 2011 18:00:21 -0500 Subject: RE: tickTime and sessionTimeout Thread-Topic: tickTime and sessionTimeout Thread-Index: AcvKOpPuuhI8hAR9RkSgIV1TLfzJiQAAG02g Message-ID: <69D3016305F9084FBD2C4A0DF189BD5C16B7F384FD@GSCMAMP02EX.firmwide.corp.gs.com> References: <6642FC1CAF133548AA8FDF497C547F0A23C0C5265B@NYWEXMBX2126.msad.ms.com> <4D548CC9.6090102@yahoo-inc.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-retentionstamp: Firmwide Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 What is your ZooKeeper setup here? And do you continue to see this issue if= you increase your session timeout? C -----Original Message----- From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]=20 Sent: Friday, February 11, 2011 5:24 PM To: user@zookeeper.apache.org Subject: Re: tickTime and sessionTimeout Patrick, thanks for your input. I have rerun the test several times now while logging GC on the server and running ping from CLIENT_1: - No GC happens on the server during the period of interest (a couple of young generation runs happen before I kill HOST_0, and they complete in under 3ms). - Round trip times for ping from CLIENT_1 are consistently under 250us throughout. Regarding client GC -- I can consistently reproduce this using a C client. Regarding the theory of swapping on CLIENT_1 -- if that were the cause, the problem wouldn't be 100% reproducible. I also looked through the client log for SESSION_1 -- I see "Got ping response ... after 1ms" repeatedly, followed by "Client session timed out, have not heard from server in 666ms"... On Fri, Feb 11, 2011 at 1:46 PM, Patrick Hunt wrote: > Those are pretty short timeouts, many sources of delay could be > causing this. Network jitter/latency, GC/swap (server or client), IO > write latency, etc... See if any of this might be your issue: > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting > > I can clearly see this sequence in your log for session1: > > --- > 2011-02-11 14:18:52,672 - =A0sessionid:SESSION_1 type:ping > > 2011-02-11 14:18:54,502 - Expiring session SESSION_1, timeout of 1000ms e= xceeded > > 2011-02-11 14:18:55,011 - Processing request:: sessionid:SESSION_1 type:p= ing > --- > > from the looks of it session 1 doesn't send a ping to the server for > ~2.5 seconds, as a result it's expired. > > You should also look at your session 1 client log and see what it's > view of the world is like. (is it gc/swapping?). Use the "stat" 4 > letter word to see the server's request processing latency, see if > that's high (higer than the timeout is bad news). Again, checkout the > troubleshooting guide. > > Patrick > > ps please use pastebin or attachment, otw the formatting of wrecked > and it's harder to read the log > > On Fri, Feb 11, 2011 at 6:41 AM, Nick Patania > wrote: >> This is the portion that seems relevant. =A0For readability, I replaced >> the host and session for the host that I kill with HOST_0 and >> SESSION_0 (I expect these to timeout). =A0The client that should be >> healthy is HOST_1 and SESSION_1. >> >> >> 2011-02-11 14:18:51,901 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_0 type:ping cxid:0xfffffffffffffffe >> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >> 2011-02-11 14:18:51,901 - DEBUG >> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_0 >> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >> txntype:unknown reqpath:n/a >> 2011-02-11 14:18:52,005 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >> 2011-02-11 14:18:52,005 - DEBUG >> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >> txntype:unknown reqpath:n/a >> 2011-02-11 14:18:52,339 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >> 2011-02-11 14:18:52,339 - DEBUG >> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >> txntype:unknown reqpath:n/a >> 2011-02-11 14:18:52,672 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >> 2011-02-11 14:18:52,672 - DEBUG >> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >> txntype:unknown reqpath:n/a >> 2011-02-11 14:18:53,002 - INFO =A0[SessionTracker:ZooKeeperServer@314] - >> Expiring session SESSION_0, timeout of 1000ms exceeded >> 2011-02-11 14:18:53,002 - INFO >> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session >> termination for sessionid: SESSION_0 >> 2011-02-11 14:18:53,010 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_0 type:closeSession cxid:0x0 zxid:0x103 txntype:-11 >> reqpath:n/a >> 2011-02-11 14:18:53,010 - INFO =A0[SyncThread:0:NIOServerCnxn@1435] - >> Closed socket connection for client /HOST_0:34618 which had sessionid >> SESSION_0 >> 2011-02-11 14:18:54,502 - INFO =A0[SessionTracker:ZooKeeperServer@314] - >> Expiring session SESSION_1, timeout of 1000ms exceeded >> 2011-02-11 14:18:54,502 - INFO >> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session >> termination for sessionid: SESSION_1 >> 2011-02-11 14:18:55,011 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >> 2011-02-11 14:18:55,011 - DEBUG >> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >> txntype:unknown reqpath:n/a >> 2011-02-11 14:18:55,011 - INFO >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@251] >> - Accepted socket connection from /HOST_1:40556 >> 2011-02-11 14:18:55,019 - DEBUG >> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >> sessionid:SESSION_1 type:closeSession cxid:0x0 zxid:0x104 txntype:-11 >> reqpath:n/a >> 2011-02-11 14:18:55,019 - DEBUG >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:ZooKeeperServer@590] - >> Dropping request: No session with sessionid SESSION_1 exists, probably >> expired and removed >> 2011-02-11 14:18:55,019 - INFO =A0[SyncThread:0:NIOServerCnxn@1435] - >> Closed socket connection for client /HOST_1:40555 which had sessionid >> SESSION_1 >> 2011-02-11 14:18:55,020 - DEBUG [SyncThread:0:NIOServerCnxn@1451] - >> ignoring exception during output shutdown >> java.net.SocketException: Transport endpoint is not connected >> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChan= nelImpl.java:651) >> =A0 =A0 =A0 =A0at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.= java:368) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NI= OServerCnxn.java:1447) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.close(NIOSer= verCnxn.java:1412) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.clos= eSessionWithoutWakeup(NIOServerCnxn.java:343) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.clos= eSession(NIOServerCnxn.java:330) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.FinalRequestProcessor.proc= essRequest(FinalRequestProcessor.java:133) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.flush= (SyncRequestProcessor.java:161) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.run(S= yncRequestProcessor.java:98) >> 2011-02-11 14:18:55,021 - DEBUG [SyncThread:0:NIOServerCnxn@1459] - >> ignoring exception during input shutdown >> java.net.SocketException: Transport endpoint is not connected >> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >> =A0 =A0 =A0 =A0at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChann= elImpl.java:640) >> =A0 =A0 =A0 =A0at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.j= ava:360) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NI= OServerCnxn.java:1455) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn.close(NIOSer= verCnxn.java:1412) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.clos= eSessionWithoutWakeup(NIOServerCnxn.java:343) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.clos= eSession(NIOServerCnxn.java:330) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.FinalRequestProcessor.proc= essRequest(FinalRequestProcessor.java:133) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.flush= (SyncRequestProcessor.java:161) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.SyncRequestProcessor.run(S= yncRequestProcessor.java:98) >> 2011-02-11 14:18:55,022 - WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@272] >> - Ignoring unexpected runtime exception >> java.nio.channels.CancelledKeyException >> =A0 =A0 =A0 =A0at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyIm= pl.java:55) >> =A0 =A0 =A0 =A0at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.= java:69) >> =A0 =A0 =A0 =A0at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(= NIOServerCnxn.java:241) >> 2011-02-11 14:18:55,023 - DEBUG >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@735] - >> Session establishment request from client /HOST_1:40556 client's >> lastZxid is 0x0 >> 2011-02-11 14:18:55,023 - INFO >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@770] - Client >> attempting to renew session SESSION_1 at /HOST_1:40556 >> 2011-02-11 14:18:55,024 - INFO >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1573] - >> Invalid session SESSION_1 for client /HOST_1:40556, probably expired >> 2011-02-11 14:18:55,025 - WARN >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@634] - >> EndOfStreamException: Unable to read additional data from client >> sessionid SESSION_1, likely client has closed socket >> 2011-02-11 14:18:55,025 - INFO >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1435] - >> Closed socket connection for client /HOST_1:40556 which had sessionid >> SESSION_1 >> >> >> On Thu, Feb 10, 2011 at 8:11 PM, Benjamin Reed wro= te: >>> do you see anything in the server log? >>> >>> ben >>> >>> On 02/10/2011 03:16 PM, Patania, Nick wrote: >>>> >>>> Hello, >>>> >>>> I run the following test: >>>> >>>> >>>> * =A0 =A0 =A0 =A0 Two clients connect to a zookeeper server; the tickT= ime on the >>>> server is 500, and the sessionTimeout on the client is 1000. >>>> >>>> * =A0 =A0 =A0 =A0 Kill the host running one of the clients. >>>> >>>> * =A0 =A0 =A0 =A0 The second client receives a session timeout. >>>> >>>> Is there any reason why this might happen? >>>> >>>> Thanks >>>> Nick Patania >>>> >>>> >>>> ----------------------------------------------------------------------= ---- >>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the >>>> opinions or views contained herein are not intended to be, and do not >>>> constitute, advice within the meaning of Section 975 of the Dodd-Frank= Wall >>>> Street Reform and Consumer Protection Act. If you have received this >>>> communication in error, please destroy all electronic and paper copies= and >>>> notify the sender immediately. Mistransmission is not intended to waiv= e >>>> confidentiality or privilege. Morgan Stanley reserves the right, to th= e >>>> extent permitted under applicable law, to monitor electronic communica= tions. >>>> This message is subject to terms available at the following link: >>>> http://www.morganstanley.com/disclaimers. If you cannot access these l= inks, >>>> please notify us by reply message and we will send the contents to you= . By >>>> messaging with Morgan Stanley you consent to the foregoing. >>> >>> >> >