zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal K <vishalm...@gmail.com>
Subject Re: znode inconsistencies across ZooKeeper servers
Date Wed, 06 Oct 2010 21:57:27 GMT
Hi Patrick,

You are correct, the test restarts both ZooKeeper server and the client. The
client opens a new connection after restarting. So we would expect that the
ephmeral znode (/foo) to expire after the session timeout. However, the
client with the new session creates the ephemeral znode (/foo) again after
it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
doesn't exist). The client is not reusing the session ID. What I expect to
see is that the older /foo should expire after which a new /foo should get
created. Is my expectation correct?

What confuses me is the following output of 3 successive getstat /foo
requests on A (the zxid, time and owner fields).  Notice that the older
znode reappeared.
At the same time when I do getstat at B and C, I see the newer /foo.

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x1000005ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x1000005ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x1000005ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce40000
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x100000607
ctime = Tue Oct 05 15:01:07 UTC 2010
mZxid = 0x100000607
mtime = Tue Oct 05 15:01:07 UTC 2010
pZxid = 0x100000607
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce5bda40000
dataLength = 54
numChildren = 0

log4j:WARN No appenders could be found for logger
(org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
cZxid = 0x1000005ef
ctime = Tue Oct 05 15:00:50 UTC 2010
mZxid = 0x1000005ef
mtime = Tue Oct 05 15:00:50 UTC 2010
pZxid = 0x1000005ef
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2b7ce57ce40000
dataLength = 54
numChildren = 0

Thanks for your help.

-Vishal

On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <phunt@apache.org> wrote:

> Vishal the attachment seems to be getting removed by the list daemon (I
> don't have it), can you create a JIRA and attach? Also this is a good
> question for the ppl on zookeeper-user. (ccing)
>
> You are aware that ephemeral znodes are tied to the session? And that
> sessions only expire after the session timeout period? At which time any
> znodes created during that session are then deleted. The fact that you are
> "kill"ing your client process leads me to believe that you are not closing
> the session cleanly (meaning that it will eventually expire after the
> session timeout period), in which case the ephemeral znodes _should_
> reappear when A is restarted and successfully rejoins the cluster. (at
> least
> until the session timeout is exceeded)
>
> Patrick
>
> On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vishalmlst@gmail.com> wrote:
>
> > Hi,
> >
> > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I
> > have a ZK client running that connects to the local server and creates an
> > ephemeral znode to indicate clients on other nodes that it is online.
> >
> > I have test script that reboots the zookeeper server as well as client on
> > A. The test does a getstat on the ephemeral znode created by the client
> on
> > A. I am seeing that the view of znodes on A is different from the other 2
> > nodes. I can tell this from the session ID that the client gets after
> > reconnecting to the local ZK server.
> >
> > So the test is simple:
> > - kill zookeeper server and client process
> > - wait for a few seconds
> > - do zkCli.sh stat ... > test.out
> >
> > What I am seeing is that the ephemeral znode with old zxid, time, and
> > session ID is reappearing on node A. I have attached the output of 3
> > consecutive getstat requests of the test (see client_getstat.out). Notice
> > that the third output is the same as the first one. That is, the old
> > ephemeral znode reappeared at A. However, both B and C are showing the
> > latest znode with correct time, zxid and session ID (output not
> attached).
> >
> > After this point, all following getstat requests on A are showing the old
> > znode. Whereas, B and C show the correct znode every time the client on A
> > comes online. This is something very perplexing. Earlier I thought this
> was
> > a bug in my client implementation. But the test shows that the ZK server
> on
> > A after reboot is out of sync with rest of the servers.
> >
> > The stat command to each server shows that the servers are in sync as far
> > as zxid's are concerned (see stat.out). So there is something wrong with
> A's
> > local database that is causing this problem.
> >
> > Has anyone seen this before? I will be doing more debugging in the next
> few
> > days. Comments/suggestions for further debugging are welcomed.
> >
> > -Vishal
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message