zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject Re: znode metadata consistency
Date Tue, 01 Mar 2011 21:15:08 GMT
Thanks for the pointers Vishal, I hadn't seen those.  They look like 
they could be related, but without knowing how metadata updates are 
grouped into transactions, it's hard for me to say.  I would expect the 
cversion update to happen within the same transaction as the creation of 
a new child, but if they get written to the log in two separate steps, 
perhaps these issues could explain it.

Any estimate on when 3.3.3 will be released?  I haven't seen any updates 
on the user list about it.  Thanks,

Jeremy

On 03/01/2011 12:40 PM, Vishal Kher wrote:
> Hi Jermy,
>
> One of the main reasons for 3.3.3 release was to include fixes for znode
> inconsistency bugs.
> Have you taken a look at https://issues.apache.org/jira/browse/ZOOKEEPER-962and
> https://issues.apache.org/jira/browse/ZOOKEEPER-919?
> The problem that you are seeing sounds similar to the ones reported.
>
> -Vishal
>
>
>
> On Mon, Feb 28, 2011 at 8:04 PM, Jeremy Stribling<strib@nicira.com>  wrote:
>
>    
>> Hi all,
>>
>> A while back I noticed that my Zookeeper cluster got into a state where I
>> would get a "node exists" error back when creating a sequential znode -- see
>> the thread starting at
>> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201010.mbox/%3C4CCA1E2F.9020606@nicira.com%3Efor
more details.  The summary is that at the time, my application had a bug
>> that could have been improperly bringing new nodes into a cluster.
>>
>> However, I've seen this a couple more times since fixing that original bug.
>>   I don't yet know how to reproduce it, but I am going to keep trying.  In
>> one case, we restarted a node (in a one-node cluster), and when it came back
>> up we could no longer create sequential nodes on a certain parent node, with
>> a node exists (-110) error code.  The biggest child it saw on restart was
>> /zkrsm/000000000000002d_record0000120804 (i.e., a sequence number of
>> 120804), however a stat on the parent node revealed that the cversion was
>> only 120710:
>>
>> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
>> cZxid = 0x5
>> ctime = Mon Jan 17 18:28:19 PST 2011
>> mZxid = 0x5
>> mtime = Mon Jan 17 18:28:19 PST 2011
>> pZxid = 0x1d819
>> cversion = 120710
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x0
>> dataLength = 0
>> numChildren = 2955
>>
>> So my question is: how is znode metadata persisted with respect to the
>> actual znodes?  Is it possible that a node's children will get synced to
>> disk before its own metadata, and if it crashes at a bad time, the metadata
>> updates will be lost?  If so, is there any way to constrain Zookeeper so
>> that it will sync its metadata before returning success for write
>> operations?
>>
>> (I'm using Zookeeper 3.3.2 on a Debian Squeeze 64-bit box, with
>> openjdk-6-jre 6b18-1.8.3-2.)
>>
>> I'd be happy to create a JIRA for this if that seems useful, but without a
>> way to reproduce it I'm not sure that it is.
>>
>> Thanks,
>>
>> Jeremy
>>
>>
>>      
>    

Mime
View raw message