hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: znode cversion decreasing?
Date Mon, 12 Apr 2010 22:09:20 GMT
We did have a case where the user setup 3 servers, each was standalone. 
:-) Doesn't look like that's the problem here though given you only 
specify 1 server in the connect string (although as mahadev mentioned 
you don't need to worry about that aspect).

After it goes 7->11->9, does it ever go back to 11 or just 9?

It would be good to capture the server log files (all 3) when this 
happens next time. Please provide those as well, would be critical for 
discovering this. In particular not many users are running cross-colo 
clusters.

If you can provide the config files too that will be useful.

What version of java/OS is being used?

Might be a good time to create a JIRA, attach all this to the JIRA so 
that you don't have to repeat. :-)

Patrick

On 04/12/2010 02:26 PM, Kevin Webb wrote:
> On Mon, 12 Apr 2010 09:27:46 -0700
> Mahadev Konar<mahadev@yahoo-inc.com>  wrote:
>
>> HI Kevin,
>>
>>   The cversion should be monotonically increasing for the the znode.
>> It would be a bug if its not. Can you please elaborate in which cases
>> you are seeing the cversion decreasing? If you can reproduce with an
>> example that would be great.
>>
>> Thanks
>> mahadev
>
> Thanks Mahadev and Patrick!
>
> Here are some more details:
>
> I'm using the C client and running three servers on PlanetLab, with
> each server on a different continent.  Most of the time, the cversion
> is increasing as expected.  I'm never deleting the group node, so
> that's not the issue.
>
> Of course, now that I've emailed this list, I haven't seen it happen
> again...
>
> I do have one old log file though:
>
> ZK(10): 1270514949 (Re)Connected to zookeeper server.
> ZK(10): 1270514952 Beginning new view #7.  Unsetting panic...
> GOSSIP(10): 1270514952 Changing view to 7
> ZK(10): 1270515798 Disconnected from zookeeper.  Setting panic...
> ZK(10): 1270515803 (Re)Connected to zookeeper server.
> ZK(10): 1270515806 Beginning new view #7.  Unsetting panic...
> GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current
> view is 7.
> ZK(10): 1270516812 Disconnected from zookeeper.  Setting panic...
> ZK(10): 1270516823 (Re)Connected to zookeeper server.
> ZK(10): 1270516826 Beginning new view #11.  Unsetting panic...
> GOSSIP(10): 1270516826 Changing view to 11
> ZK(10): 1270519191 Disconnected from zookeeper.  Setting panic...
> ZK(10): 1270519195 (Re)Connected to zookeeper server.
> ZK(10): 1270519198 Beginning new view #9.  Unsetting panic...
> GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current
> view is 11.
>
> The large integral number is a Unix seconds-since-epoch timestamp (the
> result of calling time(NULL)).
>
> In this case, the client connected, got group #7, disconnected,
> reconnected, got #7 again, disconnected, reconnected, got #11,
> disconnected, reconnected, and then got #9.
>
> The host string that I pass to zookeeper_init contains only one
> address:port, so it's not an issue of re-connecting to a different
> server and getting old/stale information.
>
>
> If/when it does happen again, I'll be sure to also save the zookeeper
> server logs.
>
> -Kevin

Mime
View raw message