zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Zookeeper session expiration
Date Thu, 07 Dec 2017 15:01:57 GMT
Easy enough to try out. Give it a shot and enter a jira if you find an
issue.

Regards,

Patrick

On Thu, Dec 7, 2017 at 5:47 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> System.nanoTime() is not affected by clock changes. Really everyone - this
> is simply not an issue in ZooKeeper.
>
> ====================
> Jordan Zimmerman
>
> > On Dec 7, 2017, at 7:43 AM, Kathryn Hogg <Kathryn.Hogg@oati.net> wrote:
> >
> > I'm pretty new to zookeeper but have a fair amount of experience with
> virtual synchrony going back many years.  Even though time is relative, it
> is possible that if the clock suddenly jumps forward on the server to
> prematurely declare timeouts as expired.  I'm not sure how Zookeeper
> handles that but in Isis, if 2 consecutive calls to gettimeofday had too
> large of a difference, it considered it fishy.
> >
> > Of course, this is why we use ntp with adjtime to avoid clocks going
> backwards or making large jumps forward.
> >
> > -----Original Message-----
> > From: Patrick Hunt [mailto:phunt@apache.org]
> > Sent: Wednesday, December 06, 2017 5:18 PM
> > To: UserZooKeeper <user@zookeeper.apache.org>
> > Subject: Re: Zookeeper session expiration
> >
> > {External email message: This email is from an external source. Please
> exercise caution prior to opening attachments, clicking on links, or
> providing any sensitive information.}
> >
> > What Jordan said + time use is only in the relative sense, not the
> absolute. Session tracking (expiration) is relative to the start of
> leadership.
> >
> > Patrick
> >
> >> On Mon, Dec 4, 2017 at 12:21 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
> >>
> >> ZooKeeper, indeed, does not use wall clock time. It uses
> >> System.nanoTime() for most operations. Further, all operations go
> >> through the Leader node so only the Leader's notion of time matters.
> >> The Leader manages the session via a "SessionTracker" instance. The
> code is in SessionTrackerImpl.java.
> >> There is a sessionExpiryQueue which is a kind of priority queue that
> >> returns expired sessions based on System.nanoTime().
> >>
> >> -JZ
> >>
> >>> On Dec 4, 2017, at 12:09 PM, Abraham Fine <afine@apache.org> wrote:
> >>>
> >>> Hello Anthony and Shawn-
> >>>
> >>> To the best of my knowledge ZooKeeper does not use the "wall clock"
> >>> time anywhere. So that should not be the problem.
> >>>
> >>> Please consider enabling debug logging, which should allow you to
> >>> track the "pings".
> >>>
> >>> Thanks,
> >>> Abe
> >>>
> >>>> On Mon, Dec 4, 2017, at 11:51, Anthony Shaya wrote:
> >>>> Thanks Shawn, should I message the developer mailing list for a
> >>>> more definitive answer?
> >>>>
> >>>> Thanks again for the reply.
> >>>>
> >>>> -----Original Message-----
> >>>> From: Shawn Heisey [mailto:apache@elyograg.org]
> >>>> Sent: Monday, December 4, 2017 2:49 PM
> >>>> To: user@zookeeper.apache.org
> >>>> Subject: Re: Zookeeper session expiration
> >>>>
> >>>>> On 12/4/2017 8:22 AM, Anthony Shaya wrote:
> >>>>> My question is related to how session expiration works, I noticed
> >>>>> on
> >> many of the client machines the times across these machines were all
> >> off (by anywhere from 1 minute to 20 minutes - which was resolved
> >> after discovery - haven't verified this completely yet). Can this
> >> directly affect session expiration within the zookeeper cluster?
> >>>>>
> >>>>>  *   I read the following in https://na01.safelinks.
> >> protection.outlook.com/?url=https%3A%2F%2Fwiki.apache.org%
> >> 2Fhadoop%2FZooKeeper%2FFAQ&data=02%7C01%7C%7C6d6643860a4e4a8194c808d53
> >> b50 23ec%7Cc61157e903cb47589165ee7845cb0ca3%7C0%7C0%
> >> 7C636480137750841475&sdata=RwGGH19FLeYFmXMrg5GBkSLJ65ANj1
> >> EXkTvwyk6OLd4%3D&reserved=0 , "Expirations happens when the cluster
> >> does not hear from the client within the specified session timeout
> period (i.e.
> >> no heartbeat).". So in some case it seems like if the times were wrong
> >> across the machines its possible one of the clients could of
> >> effectively sent a heart beat in the past (not sure about this tbh)
> >> and then the cluster expires the session?
> >>>>
> >>>> I make these comments without any knowledge of what ZK code
> >>>> actually does.  I am a member of this list because I'm a
> >>>> representative of the Apache Solr project, which uses the ZK client
> >>>> in order to maintain a cluster.
> >>>>
> >>>> IMHO, any software which makes actual decisions based on the
> >>>> timestamps in messages from another system is badly designed.  I
> >>>> would hope that
> >> the
> >>>> ZK designers know this, and always make any decisions related to
> >>>> time using the clock in the local system only.
> >>>>
> >>>> If ZK's designers did the right thing, then a session timeout would
> >>>> indicate that quite literally no heartbeats were received in X
> >>>> seconds, as measured by the local clock, and the local clock ONLY
> >>>> ... NOT from timestamp information received from another system.
> >>>>
> >>>> Although such a lack of communication could be caused by any number
> >>>> of things, including network hardware failure, one of the most
> >>>> common reasons I have seen for problems like this is extreme java
> >>>> garbage collection pauses in the client software.
> >>>>
> >>>> Situations where the heap is a little bit too small can cause a
> >>>> java program to basically be doing garbage collection constantly,
> >>>> so it doesn't have much time to do anything else, like send
> >>>> heartbeats to ZK servers.
> >>>>
> >>>> Situations where the heap is HUGE and garbage collection is not
> >>>> well tuned can lead to pauses of a minute or longer while Java does
> >>>> a massive full GC.
> >>>>
> >>>>>  *   I don't have the zookeeper node log for the above time to see
> >> what was going on in zookeeper when the cluster determined the session
> >> expired.
> >>>>>
> >>>>>  *   Is there any additional logging I can turn on to troubleshoot
zk
> >> session expiration issues?
> >>>>
> >>>> Hopefully your ZK clients also have logging.  Failing that, you
> >>>> could turn on GC logging for the software with the ZK client
> >>>> (assuming it's a Java client) and find a program or website that
> >>>> can examine the log and give you statistics or a graph of GC pauses.
> >>>>
> >>>> If there is a problem in software using the client and whatever
> >>>> logging is available doesn't help you figure out what's wrong,
> >>>> you're generally going to need to talk to whoever wrote that
> >>>> software for help troubleshooting it.
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>>>
> >>>>
> >>>> This message is intended exclusively for the individual or entity
> >>>> to which it is addressed. This communication may contain
> >>>> information that
> >> is
> >>>> proprietary, privileged, confidential or otherwise legally exempt
> >>>> from disclosure. If you are not the named addressee, or have been
> >>>> inadvertently and erroneously referenced in the address line, you
> >>>> are
> >> not
> >>>> authorized to read, print, retain, copy or disseminate this message
> >>>> or any part of it. If you have received this message in error,
> >>>> please
> >> notify
> >>>> the sender immediately by e-mail and delete all copies of the message.
> >>>> (ID m031214)
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message