hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qing Yan <qing...@gmail.com>
Subject Re: Session expiration caused by time change
Date Thu, 19 Aug 2010 07:52:40 GMT
Oh.. our servers are also running in a virtualized environment.

On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite <waite.134@gmail.com> wrote:

> Hi,
>
> I have tripped over similar problems testing Red Hat Cluster in virtualised
> environments.  I don't know whether recent linux kernels have improved
> their
> interaction with VMWare, but in our environments clock drift caused by lost
> ticks can be substantial, requiring NTP to sometimes jump the clock rather
> than control acceleration.   In one of our internal production rigs, the
> local NTP servers themselves were virtualised - causing absolute mayhem
> when
> heavy loads hit the other guests on the same physical hosts.
>
> The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time by 10
> seconds always causes a member to prematurely time-out on a network read,
> causing the member to drop out and trigger a cluster reconfiguration.
> Apparently NTP is integrated with RHCS version 3, but I don't know what is
> meant by that.
>
> I guess this post is not entirely relevent to ZK, but I am just making the
> point that virtualisation (of NTP servers and or clients) can cause
> repeated
> premature timeouts.  On Linux, I believe that there is a class of timers
> provided that is immune to this, but I doubt that there is a platform
> independent way of coping with this.
>
> My 2p.
>
> regards,
> Martin
>
> On 18 August 2010 16:53, Patrick Hunt <phunt@apache.org> wrote:
>
> > Do you expect the time to be "wrong" frequently? If ntp is running it
> > should never get out of sync more than a small amount. As long as this is
> > less than ~your timeout you should be fine.
> >
> > Patrick
> >
> >
> > On 08/18/2010 01:04 AM, Qing Yan wrote:
> >
> >> Hi,
> >>
> >>    The testcase is fairly simple. We have a client which connects to ZK,
> >> registers an ephemeral node and watches on it. Now change the client
> >> machine's time - session killed..
> >>
> >>    Here is the log:
> >>
> >> *2010-08-18 04:24:57,782 INFO
> >> com.taobao.timetunnel2.cluster.service.AgentService: Host name
> >> kgbtest1.corp.alimama.com
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51
> GMT
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:host.name=kgbtest1.corp.alimama.com
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:java.version=1.6.0_13
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:java.vendor=Sun Microsystems Inc.
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:java.home=/usr/java/jdk1.6.0_13/jre
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >>
> >>
> environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >>
> >>
> environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:java.io.tmpdir=/tmp
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:java.compiler=<NA>
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:os.name=Linux
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:os.arch=amd64
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:os.version=2.6.18-164.el5
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:user.name=admin
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:user.home=/home/admin
> >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> >> environment:user.dir=/home/admin/TimeTunnel2/cluster/log
> >> 2010-08-18 04:24:57,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
> >> client connection, connectString=xentest10-vm5.corp.alimama.com:2181,
> >> xentest10-vm6.corp.alimama.com:2181,xentest10-vm9.corp.alimama.com:2181
> >> sessionTimeout=600000
> >> watcher=com.taobao.timetunnel2.cluster.service.AgentService@48d6c16c
> >> 2010-08-18 04:24:57,791 INFO org.apache.zookeeper.ClientCnxn:
> >> zookeeper.disableAutoWatchReset is false
> >> 2010-08-18 04:24:57,817 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server xentest10-vm5.corp.alimama.com/10.254.5.160:2181
> >> 2010-08-18 04:24:57,818 INFO org.apache.zookeeper.ClientCnxn: Priming
> >> connection to java.nio.channels.SocketChannel[connected local=/
> >> 10.254.2.160:48536 remote=
> >> xentest10-vm5.corp.alimama.com/10.254.5.160:2181]
> >> 2010-08-18 04:24:57,821 INFO org.apache.zookeeper.ClientCnxn: Server
> >> connection successful
> >> 2010-08-18 04:24:57,832 INFO
> >> com.taobao.timetunnel2.cluster.service.AgentService: Register...
> >> 2010-08-18 04:24:57,834 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> >> reply
> >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> >> finished:false header:: -8,101  replyHeader:: -8,-1,0  request::
> >> 0,v{},v{},v{}  response:: null
> >> 2010-08-18 04:24:57,846 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> >> reply
> >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> >> finished:false header:: 1,1  replyHeader:: 1,8589934649,-110  request::
> >> '/agent,,v{s{31,s{'world,'anyone}}},0  response::
> >> 2010-08-18 04:24:57,853 DEBUG
> >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: Path
> >> /agent
> >> already exist
> >> 2010-08-18 04:24:57,882 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> >> reply
> >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> >> finished:false header:: 2,1  replyHeader:: 2,8589934650,-110  request::
> >> '/agent/status,,v{s{31,s{'world,'anyone}}},0  response::
> >> 2010-08-18 04:24:57,882 DEBUG
> >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: Path
> >> /agent/status already exist
> >> 2010-08-18 04:24:57,886 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> >> reply
> >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> >> finished:false header:: 3,1  replyHeader:: 3,8589934651,0  request::
> >> '/agent/status/kgbtest1.corp.alimama.com,,v{s{31,s{'world,'anyone}}},1
> >> response:: '/agent/status/kgbtest1.corp.alimama.com
> >> 2010-08-18 04:24:57,886 DEBUG
> >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: Path
> >> /agent/status/kgbtest1.corp.alimama.com creation OK
> >> 2010-08-18 04:24:57,886 INFO
> >> com.taobao.timetunnel2.cluster.service.AgentService: Watch for ->
> >> /agent/status/kgbtest1.corp.alimama.com
> >> 2010-08-18 04:24:57,892 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> >> reply
> >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> >> finished:false header:: 4,4  replyHeader:: 4,-1,0  request::
> >> '/agent/status/
> >> kgbtest1.corp.alimama.com,T  response::
> >>
> >>
> ,s{8589934651,8589934651,1282116165777,1282116165777,0,0,0,84024764344696835,0,0,8589934651}
> >>
> >> 2010-08-18 04:24:57,892 INFO
> >> com.taobao.timetunnel2.cluster.service.AgentService: Data changed
> >> @/agent/status/kgbtest1.corp.alimama.com
> >> 2010-08-18 04:25:11,226 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> >> response for sessionid:0x12a8413f71a0003 after 1ms
> >> 2010-08-18 04:25:24,560 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> >> response for sessionid:0x12a8413f71a0003 after 2ms
> >> 2010-08-18 04:25:37,893 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> >> response for sessionid:0x12a8413f71a0003 after 1ms
> >> 2010-08-18 04:19:44,853 DEBUG org.apache.zookeeper.ClientCnxn: Got
> >> notification sessionid:0x12a8413f71a0003
> >> 2010-08-18 04:19:44,854 DEBUG org.apache.zookeeper.ClientCnxn: Got
> >> WatchedEvent: Znode change. Path:
> >> /agent/status/kgbtest1.corp.alimama.comType: NodeDeleted for sessionid
> >> 0x12a8413f71a0003
> >> 2010-08-18 04:19:44,854 DEBUG
> >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher:
> >> getChildren event =>WatchedEvent: Znode change. Path: /agent/status/
> >> kgbtest1.corp.alimama.com Type: NodeDeleted
> >> 2010-08-18 04:19:44,855 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x12a8413f71a0003 to
> sun.nio.ch.SelectionKeyImpl@7ca3d4cf
> >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
> >> lim=4 cap=4]
> >>     at
> >>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):701)
> >>     at
> >>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
> >> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: Attempting
> >> connection to server xentest10-vm6.corp.alimama.com/10.254.5.161:2181
> >> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: Priming
> >> connection to java.nio.channels.SocketChannel[connected local=/
> >> 10.254.2.160:45776 remote=
> >> xentest10-vm6.corp.alimama.com/10.254.5.161:2181]
> >> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: Server
> >> connection successful
> >> 2010-08-18 04:19:45,593 WARN org.apache.zookeeper.ClientCnxn: Exception
> >> closing session 0x12a8413f71a0003 to
> sun.nio.ch.SelectionKeyImpl@2fdb7df8
> >> java.io.IOException: Session Expired
> >>     at
> >>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(zookeeper:ClientCnxn.java):589)
> >>     at
> >>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):709)
> >>     at
> >>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
> >>
> >> *   I guess ZK might rely on timestamp to  keep sessions alive, but we
> >> have
> >> NTP daemon running so machine time can get changed
> >> automatically, is there a conflict?
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message