Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 44919 invoked from network); 1 Dec 2010 15:05:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Dec 2010 15:05:43 -0000 Received: (qmail 11489 invoked by uid 500); 1 Dec 2010 15:05:43 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 11378 invoked by uid 500); 1 Dec 2010 15:05:43 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 11370 invoked by uid 99); 1 Dec 2010 15:05:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 15:05:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vishalmlst@gmail.com designates 209.85.215.170 as permitted sender) Received: from [209.85.215.170] (HELO mail-ey0-f170.google.com) (209.85.215.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Dec 2010 15:05:36 +0000 Received: by eyf5 with SMTP id 5so5143674eyf.15 for ; Wed, 01 Dec 2010 07:05:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=w9o36yP+W4LUgVF8WAHPhPsVqoDvCwDjrWosvF0OsQE=; b=YZRTDPNDY76d2n5Rpv7JcsYGQNMFxhqnKdwxaJcbQMpsy1rFkTiKw9k24EWLcDFdO9 Nsn+FomhSSZEFHoWdHrSf/h5cvc+9vPOKhq1ab8YY4Q2ncCCEkdl7NMP1VX2+dOYcF7U unEhybsNtjrZGwYaEI9pMFDoriXK45gZpBrcc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=M30IGwlBZkOzxbuAt8D9C3vPm0v3xxhAdEQ8OCGsShgnslIhkCvm/fFbnHb6KPQ9QE TDx4BoKjk+0SnwMC40b71RKNckNd0N07WhmztqEEu56QjKnHeyfiZL/fJR2sFmXEBi6N II6fXu4JhC/4JSojd0KDu8u24yAaXzVt8DmZI= MIME-Version: 1.0 Received: by 10.14.37.79 with SMTP id x55mr7822627eea.16.1291215904303; Wed, 01 Dec 2010 07:05:04 -0800 (PST) Received: by 10.14.37.10 with HTTP; Wed, 1 Dec 2010 07:05:02 -0800 (PST) In-Reply-To: References: <69D3016305F9084FBD2C4A0DF189BD5C16B448F706@GSCMAMP02EX.firmwide.corp.gs.com> Date: Wed, 1 Dec 2010 10:05:02 -0500 Message-ID: Subject: Re: question about ZK robustness From: Vishal Kher To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=90e6ba5bb8db4df5b904965aa19d X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba5bb8db4df5b904965aa19d Content-Type: text/plain; charset=EUC-KR Content-Transfer-Encoding: quoted-printable Agreed with Chang on all fronts. I will repro the problem and upload logs. 2010/12/1 Chang Song > > I think it is not too difficult to reproduce. > Just create 3 node ensemble, and have some clients create ephemeral nodes= . > And then kill one of ensemble by kill -9. > I don't remember it was a leader or a follower. > > and then if you see those ephemeral nodes gone, restart the ensemble Java > process. > > I think I have seen this happening twice when I continued this same > experiment multiple times. > > I am not trying to create FUD around Zookeeper. Actually it is exact > opposite. > I fell in love with Zookeeper, and I still am. I am using Zookeeper for > our production system. > In fact, it is THE only Java solution I believe in. Really. > > I just couldn't find time to reproduce and report a bug. > > Chang > > > Dec 1, 2010, 11:08 PM, Fournier, Camille F. [Tech] =C0=DB=BC=BA: > > > Would love to hear more about your ensemble settings to try and recreat= e > this issue. Would be a very bad thing for my deployment as well... > > > > Camille > > > > ----- Original Message ----- > > From: Chang Song > > To: user@zookeeper.apache.org > > Cc: zookeeper-user@hadoop.apache.org > > Sent: Wed Dec 01 08:09:30 2010 > > Subject: Re: question about ZK robustness > > > > > > Ted. > > > > I have been inconsistency between different ensemble servers when we di= d > > some torture testing. > > > > I killed Java process with -9 on one ensemble server, and restarted it, > and saw > > that ephemeral nodes that disappeared from other two ensemble servers > stuck in > > newly restarted ensemble. No matter what I do, "create, sync, get", the > ephemeral > > nodes did not disappear. I had to remove the log and force re-sync fro= m > scratch. > > > > I had seen this behavior twice. Exactly the same behavior. I had about > 2000 clients connected > > ensemble servers. I had no time to file a bug report, but when I have > time to do another > > torture testing, I will definitely file a bug report. > > > > This is not a data loss, but a serious, dead serious inconsistency as f= ar > as my application goes. > > Please let me know if you happened to know related bug. > > > > Thank you. > > > > Chang > > > > > > Dec 1, 2010, 1:41 PM, Ted Dunning =C0=DB=BC=BA: > > > >> Sure. Let me know when. I have learned a bit more from Ben since I > wrote > >> that first bit so I could amplify the exposition > >> just a bit when the time comes. > >> > >> On Tue, Nov 30, 2010 at 8:07 PM, Mahadev Konar >wrote: > >> > >>> I meant to say, we can wait a while before we are done moving to the > new > >>> wiki tree. > >>> > > > > --90e6ba5bb8db4df5b904965aa19d--