zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Song <tru64...@me.com>
Subject Re: question about ZK robustness
Date Wed, 01 Dec 2010 14:23:41 GMT

I think it is not too difficult to reproduce.
Just create 3 node ensemble, and have some clients create ephemeral nodes.
And then kill one of ensemble by kill -9.
I don't remember it was a leader or a follower.

and then if you see those ephemeral nodes gone, restart the ensemble Java process.

I think I have seen this happening twice when I continued this same experiment multiple times.

I am not trying to create FUD around Zookeeper. Actually it is exact opposite.
I fell in love with Zookeeper, and I still am.  I am using Zookeeper for our production system.
In fact, it is THE only Java solution I believe in. Really.

I just couldn't find time to reproduce and report a bug.

Chang


Dec 1, 2010, 11:08 PM, Fournier, Camille F. [Tech] 작성:

> Would love to hear more about your ensemble settings to try and recreate this issue.
Would be a very bad thing for my deployment as well...
> 
> Camille
> 
> ----- Original Message -----
> From: Chang Song <tru64ufs@me.com>
> To: user@zookeeper.apache.org <user@zookeeper.apache.org>
> Cc: zookeeper-user@hadoop.apache.org <zookeeper-user@hadoop.apache.org>
> Sent: Wed Dec 01 08:09:30 2010
> Subject: Re: question about ZK robustness
> 
> 
> Ted.
> 
> I have been inconsistency between different ensemble servers when we did
> some torture testing.
> 
> I killed Java process with -9 on one ensemble server, and restarted it, and saw
> that ephemeral nodes that disappeared from other two ensemble servers stuck in
> newly restarted ensemble. No matter what I do, "create, sync, get", the ephemeral
> nodes did not disappear.  I had to remove the log and force re-sync from scratch.
> 
> I had seen this behavior twice. Exactly the same behavior. I had about 2000 clients connected
> ensemble servers. I had no time to file a bug report, but when I have time to do another
> torture testing, I will definitely file a bug report.
> 
> This is not a data loss, but a serious, dead serious inconsistency as far as my application
goes.
> Please let me know if you happened to know related bug.
> 
> Thank you.
> 
> Chang
> 
> 
> Dec 1, 2010, 1:41 PM, Ted Dunning 작성:
> 
>> Sure.  Let me know when.  I have learned a bit more from Ben since I wrote
>> that first bit so I could amplify the exposition
>> just a bit when the time comes.
>> 
>> On Tue, Nov 30, 2010 at 8:07 PM, Mahadev Konar <mahadev@yahoo-inc.com>wrote:
>> 
>>> I meant to say, we can wait a while before we are done moving to the new
>>> wiki tree.
>>> 
> 


Mime
View raw message