zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject Re: znode metadata consistency
Date Wed, 02 Mar 2011 00:09:44 GMT
My very shaky understanding from skimming those issues was that in some 
cases there were two threads handling different types of data that are 
related to the same transaction -- but maybe that's only true when 
there's a leader and a follower.  But I also saw something in there 
about restoring data from a snapshot vs. restoring it from a log, which 
seems like it could have happened in a single node case.

In any case, now the 3.3.3 is out I'll give it a try and report back if 
we keep seeing this.

Thanks!

Jeremy

On 03/01/2011 02:59 PM, Vishal Kher wrote:
> Hi Jeremy,
>
> I just realized that you are using a standalone ZK server. I don't 
> think the bugs apply to you, so I don't have an answer to your question.
> I think 3.3.3 should be released soon: 
> http://zookeeper-dev.578911.n2.nabble.com/VOTE-Release-ZooKeeper-3-3-3-candidate-1-td6059109.html
>
> -Vishal
>
> On Tue, Mar 1, 2011 at 4:15 PM, Jeremy Stribling <strib@nicira.com 
> <mailto:strib@nicira.com>> wrote:
>
>     Thanks for the pointers Vishal, I hadn't seen those.  They look
>     like they could be related, but without knowing how metadata
>     updates are grouped into transactions, it's hard for me to say.  I
>     would expect the cversion update to happen within the same
>     transaction as the creation of a new child, but if they get
>     written to the log in two separate steps, perhaps these issues
>     could explain it.
>
>     Any estimate on when 3.3.3 will be released?  I haven't seen any
>     updates on the user list about it.  Thanks,
>
>     Jeremy
>
>
>     On 03/01/2011 12:40 PM, Vishal Kher wrote:
>
>         Hi Jermy,
>
>         One of the main reasons for 3.3.3 release was to include fixes
>         for znode
>         inconsistency bugs.
>         Have you taken a look at
>         https://issues.apache.org/jira/browse/ZOOKEEPER-962and
>         https://issues.apache.org/jira/browse/ZOOKEEPER-919?
>         The problem that you are seeing sounds similar to the ones
>         reported.
>
>         -Vishal
>
>
>
>         On Mon, Feb 28, 2011 at 8:04 PM, Jeremy
>         Stribling<strib@nicira.com <mailto:strib@nicira.com>>  wrote:
>
>
>             Hi all,
>
>             A while back I noticed that my Zookeeper cluster got into
>             a state where I
>             would get a "node exists" error back when creating a
>             sequential znode -- see
>             the thread starting at
>             http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201010.mbox/%3C4CCA1E2F.9020606@nicira.com%3Efor
>             more details.  The summary is that at the time, my
>             application had a bug
>
>             that could have been improperly bringing new nodes into a
>             cluster.
>
>             However, I've seen this a couple more times since fixing
>             that original bug.
>              I don't yet know how to reproduce it, but I am going to
>             keep trying.  In
>             one case, we restarted a node (in a one-node cluster), and
>             when it came back
>             up we could no longer create sequential nodes on a certain
>             parent node, with
>             a node exists (-110) error code.  The biggest child it saw
>             on restart was
>             /zkrsm/000000000000002d_record0000120804 (i.e., a sequence
>             number of
>             120804), however a stat on the parent node revealed that
>             the cversion was
>             only 120710:
>
>             [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
>             cZxid = 0x5
>             ctime = Mon Jan 17 18:28:19 PST 2011
>             mZxid = 0x5
>             mtime = Mon Jan 17 18:28:19 PST 2011
>             pZxid = 0x1d819
>             cversion = 120710
>             dataVersion = 0
>             aclVersion = 0
>             ephemeralOwner = 0x0
>             dataLength = 0
>             numChildren = 2955
>
>             So my question is: how is znode metadata persisted with
>             respect to the
>             actual znodes?  Is it possible that a node's children will
>             get synced to
>             disk before its own metadata, and if it crashes at a bad
>             time, the metadata
>             updates will be lost?  If so, is there any way to
>             constrain Zookeeper so
>             that it will sync its metadata before returning success
>             for write
>             operations?
>
>             (I'm using Zookeeper 3.3.2 on a Debian Squeeze 64-bit box,
>             with
>             openjdk-6-jre 6b18-1.8.3-2.)
>
>             I'd be happy to create a JIRA for this if that seems
>             useful, but without a
>             way to reproduce it I'm not sure that it is.
>
>             Thanks,
>
>             Jeremy
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message