zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fournier, Camille F. [Tech]" <Camille.Fourn...@gs.com>
Subject RE: tickTime and sessionTimeout
Date Sat, 12 Feb 2011 00:32:13 GMT
I think the gist of the problem is that while the ZK Server is expiring SESSION_0, SESSION_1
is coming in and trying to send a ping. It is blocked a bit behind the expiring of SESSION_0
due to the synchronous nature of touching the session table, and then its request for a ping
will be processed behind the session expiration processing for SESSION_0. So either the expirer
takes long enough dealing with SESSION_0 that it immediately wants to expire SESSION_1 on
next processing, or, the processing of the session expiration takes just long enough for SESSION_1
to not get a response to its heartbeat back from the server, which then causes it to disconnect
and reconnect, and in the interim the server determines the session dead due to timeout.

Long story short, those timeouts are too short for the server to reliably execute responses
in time to guarantee they won't be incorrectly activated.

C

-----Original Message-----
From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com] 
Sent: Friday, February 11, 2011 6:19 PM
To: user@zookeeper.apache.org
Subject: Re: tickTime and sessionTimeout

3.3.2-1031432

On Fri, Feb 11, 2011 at 6:15 PM, Fournier, Camille F. [Tech]
<Camille.Fournier@gs.com> wrote:
> Which version of ZK?
>
> -----Original Message-----
> From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]
> Sent: Friday, February 11, 2011 6:10 PM
> To: user@zookeeper.apache.org
> Subject: Re: tickTime and sessionTimeout
>
> A single server.  I've intentionally made it trivial to demonstrate
> the behavior.
> If I increase the timeout, the issue goes away.
>
> On Fri, Feb 11, 2011 at 6:00 PM, Fournier, Camille F. [Tech]
> <Camille.Fournier@gs.com> wrote:
>> What is your ZooKeeper setup here? And do you continue to see this issue if you increase
your session timeout?
>>
>> C
>>
>> -----Original Message-----
>> From: Nick Patania [mailto:Nicholas.Patania@morganstanley.com]
>> Sent: Friday, February 11, 2011 5:24 PM
>> To: user@zookeeper.apache.org
>> Subject: Re: tickTime and sessionTimeout
>>
>> Patrick, thanks for your input.
>>
>> I have rerun the test several times now while logging GC on the server
>> and running ping from CLIENT_1:
>>
>>  - No GC happens on the server during the period of interest (a
>> couple of young generation runs happen before I kill HOST_0, and they
>> complete in under 3ms).
>>  - Round trip times for ping from CLIENT_1 are consistently under
>> 250us throughout.
>>
>> Regarding client GC -- I can consistently reproduce this using a C
>> client.  Regarding the theory of swapping on CLIENT_1 -- if that were
>> the cause, the problem wouldn't be 100% reproducible.  I also looked
>> through the client log for SESSION_1 -- I see "Got ping response ...
>> after 1ms" repeatedly, followed by "Client session timed out, have not
>> heard from server in 666ms"...
>>
>> On Fri, Feb 11, 2011 at 1:46 PM, Patrick Hunt <phunt@apache.org> wrote:
>>> Those are pretty short timeouts, many sources of delay could be
>>> causing this. Network jitter/latency, GC/swap (server or client), IO
>>> write latency, etc... See if any of this might be your issue:
>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
>>>
>>> I can clearly see this sequence in your log for session1:
>>>
>>> ---
>>> 2011-02-11 14:18:52,672 -  sessionid:SESSION_1 type:ping
>>>
>>> 2011-02-11 14:18:54,502 - Expiring session SESSION_1, timeout of 1000ms exceeded
>>>
>>> 2011-02-11 14:18:55,011 - Processing request:: sessionid:SESSION_1 type:ping
>>> ---
>>>
>>> from the looks of it session 1 doesn't send a ping to the server for
>>> ~2.5 seconds, as a result it's expired.
>>>
>>> You should also look at your session 1 client log and see what it's
>>> view of the world is like. (is it gc/swapping?). Use the "stat" 4
>>> letter word to see the server's request processing latency, see if
>>> that's high (higer than the timeout is bad news). Again, checkout the
>>> troubleshooting guide.
>>>
>>> Patrick
>>>
>>> ps please use pastebin or attachment, otw the formatting of wrecked
>>> and it's harder to read the log
>>>
>>> On Fri, Feb 11, 2011 at 6:41 AM, Nick Patania
>>> <Nicholas.Patania@morganstanley.com> wrote:
>>>> This is the portion that seems relevant.  For readability, I replaced
>>>> the host and session for the host that I kill with HOST_0 and
>>>> SESSION_0 (I expect these to timeout).  The client that should be
>>>> healthy is HOST_1 and SESSION_1.
>>>>
>>>>
>>>> 2011-02-11 14:18:51,901 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_0 type:ping cxid:0xfffffffffffffffe
>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:51,901 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_0
>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>> txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:52,005 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:52,005 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>> txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:52,339 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:52,339 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>> txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:52,672 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:52,672 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>> txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:53,002 - INFO  [SessionTracker:ZooKeeperServer@314] -
>>>> Expiring session SESSION_0, timeout of 1000ms exceeded
>>>> 2011-02-11 14:18:53,002 - INFO
>>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session
>>>> termination for sessionid: SESSION_0
>>>> 2011-02-11 14:18:53,010 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_0 type:closeSession cxid:0x0 zxid:0x103 txntype:-11
>>>> reqpath:n/a
>>>> 2011-02-11 14:18:53,010 - INFO  [SyncThread:0:NIOServerCnxn@1435] -
>>>> Closed socket connection for client /HOST_0:34618 which had sessionid
>>>> SESSION_0
>>>> 2011-02-11 14:18:54,502 - INFO  [SessionTracker:ZooKeeperServer@314] -
>>>> Expiring session SESSION_1, timeout of 1000ms exceeded
>>>> 2011-02-11 14:18:54,502 - INFO
>>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session
>>>> termination for sessionid: SESSION_1
>>>> 2011-02-11 14:18:55,011 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe
>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:55,011 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1
>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
>>>> txntype:unknown reqpath:n/a
>>>> 2011-02-11 14:18:55,011 - INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@251]
>>>> - Accepted socket connection from /HOST_1:40556
>>>> 2011-02-11 14:18:55,019 - DEBUG
>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request::
>>>> sessionid:SESSION_1 type:closeSession cxid:0x0 zxid:0x104 txntype:-11
>>>> reqpath:n/a
>>>> 2011-02-11 14:18:55,019 - DEBUG
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:ZooKeeperServer@590] -
>>>> Dropping request: No session with sessionid SESSION_1 exists, probably
>>>> expired and removed
>>>> 2011-02-11 14:18:55,019 - INFO  [SyncThread:0:NIOServerCnxn@1435] -
>>>> Closed socket connection for client /HOST_1:40555 which had sessionid
>>>> SESSION_1
>>>> 2011-02-11 14:18:55,020 - DEBUG [SyncThread:0:NIOServerCnxn@1451] -
>>>> ignoring exception during output shutdown
>>>> java.net.SocketException: Transport endpoint is not connected
>>>>        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>>        at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>>>        at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1447)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1412)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSessionWithoutWakeup(NIOServerCnxn.java:343)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSession(NIOServerCnxn.java:330)
>>>>        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:133)
>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161)
>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
>>>> 2011-02-11 14:18:55,021 - DEBUG [SyncThread:0:NIOServerCnxn@1459] -
>>>> ignoring exception during input shutdown
>>>> java.net.SocketException: Transport endpoint is not connected
>>>>        at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>>        at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>>>        at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1455)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1412)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSessionWithoutWakeup(NIOServerCnxn.java:343)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSession(NIOServerCnxn.java:330)
>>>>        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:133)
>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161)
>>>>        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
>>>> 2011-02-11 14:18:55,022 - WARN
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@272]
>>>> - Ignoring unexpected runtime exception
>>>> java.nio.channels.CancelledKeyException
>>>>        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>>>>        at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
>>>>        at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:241)
>>>> 2011-02-11 14:18:55,023 - DEBUG
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@735] -
>>>> Session establishment request from client /HOST_1:40556 client's
>>>> lastZxid is 0x0
>>>> 2011-02-11 14:18:55,023 - INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@770] - Client
>>>> attempting to renew session SESSION_1 at /HOST_1:40556
>>>> 2011-02-11 14:18:55,024 - INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1573] -
>>>> Invalid session SESSION_1 for client /HOST_1:40556, probably expired
>>>> 2011-02-11 14:18:55,025 - WARN
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@634] -
>>>> EndOfStreamException: Unable to read additional data from client
>>>> sessionid SESSION_1, likely client has closed socket
>>>> 2011-02-11 14:18:55,025 - INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1435] -
>>>> Closed socket connection for client /HOST_1:40556 which had sessionid
>>>> SESSION_1
>>>>
>>>>
>>>> On Thu, Feb 10, 2011 at 8:11 PM, Benjamin Reed <breed@yahoo-inc.com>
wrote:
>>>>> do you see anything in the server log?
>>>>>
>>>>> ben
>>>>>
>>>>> On 02/10/2011 03:16 PM, Patania, Nick wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I run the following test:
>>>>>>
>>>>>>
>>>>>> *         Two clients connect to a zookeeper server; the tickTime
on the
>>>>>> server is 500, and the sessionTimeout on the client is 1000.
>>>>>>
>>>>>> *         Kill the host running one of the clients.
>>>>>>
>>>>>> *         The second client receives a session timeout.
>>>>>>
>>>>>> Is there any reason why this might happen?
>>>>>>
>>>>>> Thanks
>>>>>> Nick Patania
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the
>>>>>> opinions or views contained herein are not intended to be, and do
not
>>>>>> constitute, advice within the meaning of Section 975 of the Dodd-Frank
Wall
>>>>>> Street Reform and Consumer Protection Act. If you have received this
>>>>>> communication in error, please destroy all electronic and paper copies
and
>>>>>> notify the sender immediately. Mistransmission is not intended to
waive
>>>>>> confidentiality or privilege. Morgan Stanley reserves the right,
to the
>>>>>> extent permitted under applicable law, to monitor electronic communications.
>>>>>> This message is subject to terms available at the following link:
>>>>>> http://www.morganstanley.com/disclaimers. If you cannot access these
links,
>>>>>> please notify us by reply message and we will send the contents to
you. By
>>>>>> messaging with Morgan Stanley you consent to the foregoing.
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message