Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B091117F0B for ; Wed, 14 Jan 2015 21:33:57 +0000 (UTC) Received: (qmail 57284 invoked by uid 500); 14 Jan 2015 21:33:58 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 57237 invoked by uid 500); 14 Jan 2015 21:33:58 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 57224 invoked by uid 99); 14 Jan 2015 21:33:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2015 21:33:56 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FORGED_YAHOO_RCVD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of fpjunqueira@yahoo.com designates 98.139.212.187 as permitted sender) Received: from [98.139.212.187] (HELO nm28.bullet.mail.bf1.yahoo.com) (98.139.212.187) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2015 21:33:28 +0000 Received: from [66.196.81.171] by nm28.bullet.mail.bf1.yahoo.com with NNFMP; 14 Jan 2015 21:30:16 -0000 Received: from [98.139.211.207] by tm17.bullet.mail.bf1.yahoo.com with NNFMP; 14 Jan 2015 21:30:16 -0000 Received: from [127.0.0.1] by smtp216.mail.bf1.yahoo.com with NNFMP; 14 Jan 2015 21:30:16 -0000 X-Yahoo-Newman-Id: 896295.81762.bm@smtp216.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: W2H1QJEVM1nyKzoN3NtojC6nItnFA1A4m_a_MevInBmP83k JPb4sQs7ickQWgdXMbrh8GWw0voXhvzC6TiI_hkgzQhoQ9WZ_qoVMCCA8dCu yXE_2R4jwbWEzOCJWH11nna7NMLrGshC9Dr4.Oy7Jq5lg7zWgubSM6.4c4nD xCsFO9wnRV43JCWi3t8l6BDA.eEKbXVrMKspkOOq.GqujWLPlLgZFirH_ZEk xy_zJx.Sz_IDyjoiZuwoQAuVgMhJHTgIq0jBxjWauMm0YJF__4eQ4ZHZRjWT ks0gInouu6ns54XzsfWzGag7LZ2PoF6hUW69pw6v4dO8SLmOa7c9f.UQgAiW _hMc4Cjb9nugo_6iEbqIsjUG42Itdn9C8LfamZIVqb5WcJ8r6vIbHv8Dnb1e qV2aL6vGGysvYSJ3Hn0KrYNhF6qEKsjPCS3yC5m28Bzg71BlcGv1iRI_Rx.c nf0Kh5PEbCKBiUfhVwAVem1jPmg_OuEN11aXPEvV1jg64rGrFgspOVRZAE5Y tDizwdhhdjubrMpghDmp168r_ X-Yahoo-SMTP: HT5UJDeswBACWJPOeualxAa.da..S.fl Content-Type: multipart/alternative; boundary="Apple-Mail=_E08DD13B-A104-4F39-B515-E32DDBC2E59D" Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: cluster/ephemeral nodes inconsistency From: Flavio Junqueira In-Reply-To: <77166901.1003416.1421239240841.JavaMail.yahoo@jws10630.mail.bf1.yahoo.com> Date: Wed, 14 Jan 2015 21:30:15 +0000 Cc: "user@zookeeper.apache.org" Message-Id: <9B6081EB-8AC6-45E3-9036-063E9210B6AF@yahoo.com> References: <77166901.1003416.1421239240841.JavaMail.yahoo@jws10630.mail.bf1.yahoo.com> To: Flavio Junqueira X-Mailer: Apple Mail (2.1993) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_E08DD13B-A104-4F39-B515-E32DDBC2E59D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Also, what was the last operation that changed the messed up znode and = when has the operation been executed? -Flavio > On 14 Jan 2015, at 12:40, Flavio Junqueira = wrote: >=20 > But you do observe the session being closed, yes? And the ephemeral = can be listed with getChildren but you can't get it with getData, is it = right? >=20 > -Flavio >=20 >=20 > On Wednesday, January 14, 2015 11:42 AM, Kuba Lekstan = wrote: >=20 >=20 > German, today it had happen on our secondary cluster which consist of = 3 > nodes, the leader didn't see the node but two other followers did. >=20 > Flavio, I browsed the logs but was unable to find anything = interesting, > only setData operations were issued. >=20 > Problematic znode was last modified at 13 Jan 2015 17:xx, we have = noticed > the issue at 14 Jan 2015 11:xx. >=20 > 2015-01-14 10:52 GMT+01:00 Flavio Junqueira = >: >=20 > > Hi there, > > I suggest a couple of things here: > > - Use LogFormatter to look into the transaction logs to check the > > operations that are actually coming across.- It would be nice be = able to > > reproduce it outside your app, ideally as a junit test so that we = can start > > working on it. > > I vaguely remember coming across such a problem, but I'll need to = dig into > > it. Does anyone on this list recall a similar problem? > > -Flavio > > > > On Wednesday, January 14, 2015 9:14 AM, Kuba Lekstan < > > kuebzky@gmail.com > wrote: > > > > > > > > German do you have any idea what might be causing these? Today same = issue > > had happen. > > > > 2014-11-21 5:42 GMT+01:00 Yogesh Patil >: > > > > > Hi Zookeepers, > > > I am also experiencing the similar problem since yestderday. I = have > > pretty > > > much similar setup and ephemeral znodes in place for keep-alive = kind of > > > function. I too see in spite of ZK session going down, ephemeral = znodes > > > still LIVES. > > > > > > I am using ZK 3.5.0. > > > > > > Any solution/fix for this type of an issue?? > > > > > > > > > -- > > > Sincerely, > > > > > > *Yogesh Patil* > > > > > > > > > > > > On Thu, Nov 13, 2014 at 2:10 PM, Kuba Lekstan > wrote: > > > > > > > Sorry, forgot to mention. Version: 3.4.6. > > > > > > > > Thanks. > > > > > > > > 2014-11-13 18:11 GMT+01:00 German Blanco < > > german.blanco.blanco@gmail.com = > > > >: > > > > > > > > > Hello, > > > > > > > > > > which version of Zookeeper are you using? > > > > > > > > > > On Thu, Nov 13, 2014 at 5:25 PM, Kuba Lekstan = > > > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > A bit of details: > > > > > > We have 5 node cluster, which we use for configuration = distrubution > > > and > > > > > > monitoring active instances of our applications. Each = application > > > > creates > > > > > > its ephemeral node, so we know which apps are alive, how = many of > > them > > > > > there > > > > > > is and what they are doing. > > > > > > > > > > > > The problem had happen at 4th November, first time it was = around > > 4AM, > > > > > > second time around 12PM. > > > > > > First time it was middle of the night when I got woken up, = the > > > support > > > > > guys > > > > > > told me that something is wrong with config distribution. > > > > > > > > > > > > First I've checked apps for errors but didn't find anything > > > > interesting, > > > > > > then I looked at what's in zookeeper (using = node-zk-browser). > > > > > > I've noticed that there are 3 ephemeral nodes which were = created at > > > 1st > > > > > nov > > > > > > (while the oldest application was started on 3rd nov), I = could read > > > its > > > > > > data but was not able to delete them - was getting NONODE > > exception. > > > > > > > > > > > > I thought wtf - why I cannot delete these nodes, something = very bad > > > had > > > > > to > > > > > > happen with ZK. > > > > > > > > > > > > So I sshed on the leader and using CLI I tried to read these = nodes > > > but > > > > I > > > > > > was not able to - the leader was telling me that such nodes = doesn't > > > > > exist. > > > > > > After this I started to ssh to the rest of the nodes in = cluster and > > > > > trying > > > > > > to read these nodes. Finally I found the server which did = let me > > read > > > > the > > > > > > data of these nodes. > > > > > > Because of the inconsistency I've decided to restart it. = Restart > > did > > > > > help, > > > > > > everything went back to normal state. The ephemeral nodes > > > disappeared. > > > > > > > > > > > > Similar situation had happen at 12PM but this time I had a = lot more > > > > time > > > > > to > > > > > > look what is wrong. Second time the problem was about 3 = ephemeral > > > nodes > > > > > > which were created at 1st now (again?). This time I dig a = bit > > deeper > > > > and > > > > > > look into logs and 4 letter commands - but could not find = anything > > > > > > interesting except the all these 3 nodes were created under > > different > > > > > > sessionids but zk had no hosts connected under this = sessionids. > > > > > > Solution was similar to the one from 4AM but this time I've = delete > > > all > > > > > > files in ZK data directory. > > > > > > > > > > > > Oddly enough the problem happened twice on the same ZK node, = the > > > final > > > > > > solution was to clear ZK data directory. After clearing the > > directory > > > > the > > > > > > problem didn't happen again. > > > > > > > > > > > > I tried to look for solution/similar problems, I found the = posts > > > where > > > > > > people were complaining about ephemeral nodes not being = removed > > after > > > > > > client session gets closed. But I was not able to find posts = about > > ZK > > > > not > > > > > > being consistent. > > > > > > > > > > > > What do you think about this? Can we do something to fix = this? > > > > > > > > > > > > Sorry for my english, I was doing my best. :) > > > > > > > > > > > > Thanks, Kuba. > > > > > > > > > > > > > > > > > > > > > > > > > > > > >=20 >=20 --Apple-Mail=_E08DD13B-A104-4F39-B515-E32DDBC2E59D--