zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: tickTime and sessionTimeout
Date Mon, 14 Feb 2011 06:32:39 GMT
Is the server a virtualized host? I still think this might shed some
light, what are you seeing for this? before/during/after the event:

>>>> Use the "stat" 4
>>>> letter word to see the server's request processing latency, see if
>>>> that's high (higer than the timeout is bad news). Again, checkout the
>>>> troubleshooting guide.

On Fri, Feb 11, 2011 at 4:32 PM, Fournier, Camille F. [Tech]
<Camille.Fournier@gs.com> wrote:
> I think the gist of the problem is that while the ZK Server is expiring SESSION_0, SESSION_1
is coming in and trying to send a ping. It is blocked a bit behind the expiring of SESSION_0
due to the synchronous nature of touching the session table, and then its request for a ping
will be processed behind the session expiration processing for SESSION_0. So either the expirer
takes long enough dealing with SESSION_0 that it immediately wants to expire SESSION_1 on
next processing, or, the processing of the session expiration takes just long enough for SESSION_1
to not get a response to its heartbeat back from the server, which then causes it to disconnect
and reconnect, and in the interim the server determines the session dead due to timeout.
>
> Long story short, those timeouts are too short for the server to reliably execute responses
in time to guarantee they won't be incorrectly activated.
>
> C
>
> -----Original Message-----
> From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]
> Sent: Friday, February 11, 2011 6:19 PM
> To: user@zookeeper.apache.org
> Subject: Re: tickTime and sessionTimeout
>
> 3.3.2-1031432
>
> On Fri, Feb 11, 2011 at 6:15 PM, Fournier, Camille F. [Tech]
> <Camille.Fournier@gs.com> wrote:
>> Which version of ZK?
>>
>> -----Original Message-----
>> From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]
>> Sent: Friday, February 11, 2011 6:10 PM
>> To: user@zookeeper.apache.org
>> Subject: Re: tickTime and sessionTimeout
>>
>> A single server.  I've intentionally made it trivial to demonstrate
>> the behavior.
>> If I increase the timeout, the issue goes away.
>>
>> On Fri, Feb 11, 2011 at 6:00 PM, Fournier, Camille F. [Tech]
>> <Camille.Fournier@gs.com> wrote:
>>> What is your ZooKeeper setup here? And do you continue to see this issue if you
increase your session timeout?
>>>
>>> C
>>>
>>> -----Original Message-----
>>> From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]
>>> Sent: Friday, February 11, 2011 5:24 PM
>>> To: user@zookeeper.apache.org
>>> Subject: Re: tickTime and sessionTimeout
>>>
>>> Patrick, thanks for your input.
>>>
>>> I have rerun the test several times now while logging GC on the server
>>> and running ping from CLIENT_1:
>>>
>>>  - No GC happens on the server during the period of interest (a
>>> couple of young generation runs happen before I kill HOST_0, and they
>>> complete in under 3ms).
>>>  - Round trip times for ping from CLIENT_1 are consistently under
>>> 250us throughout.
>>>
>>> Regarding client GC -- I can consistently reproduce this using a C
>>> client.  Regarding the theory of swapping on CLIENT_1 -- if that were
>>> the cause, the problem wouldn't be 100% reproducible.  I also looked
>>> through the client log for SESSION_1 -- I see "Got ping response ...
>>> after 1ms" repeatedly, followed by "Client session timed out, have not
>>> heard from server in 666ms"...
>>>
>>> On Fri, Feb 11, 2011 at 1:46 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>> Those are pretty short timeouts, many sources of delay could be
>>>> causing this. Network jitter/latency, GC/swap (server or client), IO
>>>> write latency, etc... See if any of this might be your issue:
>>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
>>>>
>>>> I can clearly see this sequence in your log for session1:
>>>>
>>>> ---
>>>> 2011-02-11 14:18:52,672 -  sessionid:SESSION_1 type:ping
>>>>
>>>> 2011-02-11 14:18:54,502 - Expiring session SESSION_1, timeout of 1000ms exceeded
>>>>
>>>> 2011-02-11 14:18:55,011 - Processing request:: sessionid:SESSION_1 type:ping
>>>> ---
>>>>
>>>> from the looks of it session 1 doesn't send a ping to the server for
>>>> ~2.5 seconds, as a result it's expired.
>>>>
>>>> You should also look at your session 1 client log and see what it's
>>>> view of the world is like. (is it gc/swapping?). Use the "stat" 4
>>>> letter word to see the server's request processing latency, see if
>>>> that's high (higer than the timeout is bad news). Again, checkout the
>>>> troubleshooting guide.
>>>>
>>>> Patrick
>>>>
>>>> ps please use pastebin or attachment, otw the formatting of wrecked
>>>> and it's harder to read the log
>>>>
>>>> On Fri, Feb 11, 2011 at 6:41 AM, Nick Patania
>>>> <Nicholas.Patania@morganstanley.com> wrote:
>>>>> This is the portion that seems relevant.  For readability, I replaced
>>>>> the host and session for the host that I kill with HOST_0 and
>>>>> SESSION_0 (I expect these to timeout).  The client that should be
>>>>> healthy is HOST_1 and SESSION_1.
>>>>>
>>>>>
>>>>> 2011-02-11 14:18:51,901 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_0 type:ping cxid:0xfffffffffffffffe
>>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:51,901 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_0
>>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>>> txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:52,005 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:52,005 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>>> txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:52,339 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:52,339 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>>> txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:52,672 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:52,672 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>>> txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:53,002 - INFO  [SessionTracker:ZooKeeperServer@314]
-
>>>>> Expiring session SESSION_0, timeout of 1000ms exceeded
>>>>> 2011-02-11 14:18:53,002 - INFO
>>>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session
>>>>> termination for sessionid: SESSION_0
>>>>> 2011-02-11 14:18:53,010 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_0 type:closeSession cxid:0x0 zxid:0x103 txntype:-11
>>>>> reqpath:n/a
>>>>> 2011-02-11 14:18:53,010 - INFO  [SyncThread:0:NIOServerCnxn@1435] -
>>>>> Closed socket connection for client /HOST_0:34618 which had sessionid
>>>>> SESSION_0
>>>>> 2011-02-11 14:18:54,502 - INFO  [SessionTracker:ZooKeeperServer@314]
-
>>>>> Expiring session SESSION_1, timeout of 1000ms exceeded
>>>>> 2011-02-11 14:18:54,502 - INFO
>>>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session
>>>>> termination for sessionid: SESSION_1
>>>>> 2011-02-11 14:18:55,011 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:55,011 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>>> txntype:unknown reqpath:n/a
>>>>> 2011-02-11 14:18:55,011 - INFO
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@251]
>>>>> - Accepted socket connection from /HOST_1:40556
>>>>> 2011-02-11 14:18:55,019 - DEBUG
>>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>>> sessionid:SESSION_1 type:closeSession cxid:0x0 zxid:0x104 txntype:-11
>>>>> reqpath:n/a
>>>>> 2011-02-11 14:18:55,019 - DEBUG
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:ZooKeeperServer@590] -
>>>>> Dropping request: No session with sessionid SESSION_1 exists, probably
>>>>> expired and removed
>>>>> 2011-02-11 14:18:55,019 - INFO  [SyncThread:0:NIOServerCnxn@1435] -
>>>>> Closed socket connection for client /HOST_1:40555 which had sessionid
>>>>> SESSION_1
>>>>> 2011-02-11 14:18:55,020 - DEBUG [SyncThread:0:NIOServerCnxn@1451] -
>>>>> ignoring exception during output shutdown
>>>>> java.net.SocketException: Transport endpoint is not connected
>>>>>        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>>>        at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>>>>        at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1447)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1412)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSessionWithoutWakeup(NIOServerCnxn.java:343)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSession(NIOServerCnxn.java:330)
>>>>>        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:133)
>>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161)
>>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
>>>>> 2011-02-11 14:18:55,021 - DEBUG [SyncThread:0:NIOServerCnxn@1459] -
>>>>> ignoring exception during input shutdown
>>>>> java.net.SocketException: Transport endpoint is not connected
>>>>>        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>>>        at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>>>>        at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1455)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1412)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSessionWithoutWakeup(NIOServerCnxn.java:343)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSession(NIOServerCnxn.java:330)
>>>>>        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:133)
>>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161)
>>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
>>>>> 2011-02-11 14:18:55,022 - WARN
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@272]
>>>>> - Ignoring unexpected runtime exception
>>>>> java.nio.channels.CancelledKeyException
>>>>>        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>>>>>        at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
>>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:241)
>>>>> 2011-02-11 14:18:55,023 - DEBUG
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@735] -
>>>>> Session establishment request from client /HOST_1:40556 client's
>>>>> lastZxid is 0x0
>>>>> 2011-02-11 14:18:55,023 - INFO
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@770] - Client
>>>>> attempting to renew session SESSION_1 at /HOST_1:40556
>>>>> 2011-02-11 14:18:55,024 - INFO
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1573] -
>>>>> Invalid session SESSION_1 for client /HOST_1:40556, probably expired
>>>>> 2011-02-11 14:18:55,025 - WARN
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@634] -
>>>>> EndOfStreamException: Unable to read additional data from client
>>>>> sessionid SESSION_1, likely client has closed socket
>>>>> 2011-02-11 14:18:55,025 - INFO
>>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1435] -
>>>>> Closed socket connection for client /HOST_1:40556 which had sessionid
>>>>> SESSION_1
>>>>>
>>>>>
>>>>> On Thu, Feb 10, 2011 at 8:11 PM, Benjamin Reed <breed@yahoo-inc.com>
wrote:
>>>>>> do you see anything in the server log?
>>>>>>
>>>>>> ben
>>>>>>
>>>>>> On 02/10/2011 03:16 PM, Patania, Nick wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I run the following test:
>>>>>>>
>>>>>>>
>>>>>>> *         Two clients connect to a zookeeper server; the
tickTime on the
>>>>>>> server is 500, and the sessionTimeout on the client is 1000.
>>>>>>>
>>>>>>> *         Kill the host running one of the clients.
>>>>>>>
>>>>>>> *         The second client receives a session timeout.
>>>>>>>
>>>>>>> Is there any reason why this might happen?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Nick Patania
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and
the
>>>>>>> opinions or views contained herein are not intended to be, and
do not
>>>>>>> constitute, advice within the meaning of Section 975 of the Dodd-Frank
Wall
>>>>>>> Street Reform and Consumer Protection Act. If you have received
this
>>>>>>> communication in error, please destroy all electronic and paper
copies and
>>>>>>> notify the sender immediately. Mistransmission is not intended
to waive
>>>>>>> confidentiality or privilege. Morgan Stanley reserves the right,
to the
>>>>>>> extent permitted under applicable law, to monitor electronic
communications.
>>>>>>> This message is subject to terms available at the following link:
>>>>>>> http://www.morganstanley.com/disclaimers. If you cannot access
these links,
>>>>>>> please notify us by reply message and we will send the contents
to you. By
>>>>>>> messaging with Morgan Stanley you consent to the foregoing.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message