Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F096717E4C for ; Wed, 14 Jan 2015 11:45:04 +0000 (UTC) Received: (qmail 72307 invoked by uid 500); 14 Jan 2015 11:45:06 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 72256 invoked by uid 500); 14 Jan 2015 11:45:06 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 72243 invoked by uid 99); 14 Jan 2015 11:45:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2015 11:45:01 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kuebzky@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jan 2015 11:44:55 +0000 Received: by mail-ob0-f182.google.com with SMTP id wo20so7474203obc.13 for ; Wed, 14 Jan 2015 03:44:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=5OKj9yaSwU1CzIw3tdVfnVDLfYmzY5fsoG/Nl8Kna6s=; b=0VdhBjueviSVZZsMqwG/jzSoBXuEWYptrdXmTktcVfRhugWuY+7Ky3cKrVN/u5RVnw dwo5MCEEZVGFAWzBaPWgSNzytJqsNTEqHDxn0DfUS0/Q2TYGuTxdN3wcHD2w16pnu+uU iINQ7Rqh4GD9FY4tUl7s/E/xmt0MwKZm4MYcSIJdBOrJ9HZDrxJSJJM8Jp4A8SJbSwtU av7Ov+Y1Jb9FelDpS7MwctFGua7tt09ZyP6TH3taasU2MAVWZ8QPOcspsVSrNZDBcuVn eD946q2zK20bly5S7SX+3fqK9Ui6WuXrKQq16GSuZ1mF/I8iUHihOEhz88y9OVXjMueb AUHw== MIME-Version: 1.0 X-Received: by 10.202.59.136 with SMTP id i130mr1963333oia.114.1421235875354; Wed, 14 Jan 2015 03:44:35 -0800 (PST) Received: by 10.202.173.20 with HTTP; Wed, 14 Jan 2015 03:44:35 -0800 (PST) In-Reply-To: References: <2128162825.915682.1421229120952.JavaMail.yahoo@jws10686.mail.bf1.yahoo.com> Date: Wed, 14 Jan 2015 12:44:35 +0100 Message-ID: Subject: Re: cluster/ephemeral nodes inconsistency From: Kuba Lekstan To: user@zookeeper.apache.org, Flavio Junqueira Content-Type: multipart/alternative; boundary=001a113cd5747e1e20050c9b4362 X-Virus-Checked: Checked by ClamAV on apache.org --001a113cd5747e1e20050c9b4362 Content-Type: text/plain; charset=UTF-8 As far as I understand this issue: https://issues.apache.org/jira/browse/ZOOKEEPER-1777 is about some ZK nodes not seeing part of existing ephemeral znodes. I have opposite problem, some ZK nodes are seeing part of not existing ephemeral nodes. 2015-01-14 12:39 GMT+01:00 Kuba Lekstan : > German, today it had happen on our secondary cluster which consist of 3 > nodes, the leader didn't see the node but two other followers did. > > Flavio, I browsed the logs but was unable to find anything interesting, > only setData operations were issued. > > Problematic znode was last modified at 13 Jan 2015 17:xx, we have noticed > the issue at 14 Jan 2015 11:xx. > > 2015-01-14 10:52 GMT+01:00 Flavio Junqueira >: > >> Hi there, >> I suggest a couple of things here: >> - Use LogFormatter to look into the transaction logs to check the >> operations that are actually coming across.- It would be nice be able to >> reproduce it outside your app, ideally as a junit test so that we can start >> working on it. >> I vaguely remember coming across such a problem, but I'll need to dig >> into it. Does anyone on this list recall a similar problem? >> -Flavio >> >> On Wednesday, January 14, 2015 9:14 AM, Kuba Lekstan < >> kuebzky@gmail.com> wrote: >> >> >> >> German do you have any idea what might be causing these? Today same issue >> had happen. >> >> 2014-11-21 5:42 GMT+01:00 Yogesh Patil : >> >> > Hi Zookeepers, >> > I am also experiencing the similar problem since yestderday. I have >> pretty >> > much similar setup and ephemeral znodes in place for keep-alive kind of >> > function. I too see in spite of ZK session going down, ephemeral znodes >> > still LIVES. >> > >> > I am using ZK 3.5.0. >> > >> > Any solution/fix for this type of an issue?? >> > >> > >> > -- >> > Sincerely, >> > >> > *Yogesh Patil* >> > >> > >> > >> > On Thu, Nov 13, 2014 at 2:10 PM, Kuba Lekstan >> wrote: >> > >> > > Sorry, forgot to mention. Version: 3.4.6. >> > > >> > > Thanks. >> > > >> > > 2014-11-13 18:11 GMT+01:00 German Blanco < >> german.blanco.blanco@gmail.com >> > >: >> > > >> > > > Hello, >> > > > >> > > > which version of Zookeeper are you using? >> > > > >> > > > On Thu, Nov 13, 2014 at 5:25 PM, Kuba Lekstan >> > wrote: >> > > > >> > > > > Hello, >> > > > > >> > > > > A bit of details: >> > > > > We have 5 node cluster, which we use for configuration >> distrubution >> > and >> > > > > monitoring active instances of our applications. Each application >> > > creates >> > > > > its ephemeral node, so we know which apps are alive, how many of >> them >> > > > there >> > > > > is and what they are doing. >> > > > > >> > > > > The problem had happen at 4th November, first time it was around >> 4AM, >> > > > > second time around 12PM. >> > > > > First time it was middle of the night when I got woken up, the >> > support >> > > > guys >> > > > > told me that something is wrong with config distribution. >> > > > > >> > > > > First I've checked apps for errors but didn't find anything >> > > interesting, >> > > > > then I looked at what's in zookeeper (using node-zk-browser). >> > > > > I've noticed that there are 3 ephemeral nodes which were created >> at >> > 1st >> > > > nov >> > > > > (while the oldest application was started on 3rd nov), I could >> read >> > its >> > > > > data but was not able to delete them - was getting NONODE >> exception. >> > > > > >> > > > > I thought wtf - why I cannot delete these nodes, something very >> bad >> > had >> > > > to >> > > > > happen with ZK. >> > > > > >> > > > > So I sshed on the leader and using CLI I tried to read these nodes >> > but >> > > I >> > > > > was not able to - the leader was telling me that such nodes >> doesn't >> > > > exist. >> > > > > After this I started to ssh to the rest of the nodes in cluster >> and >> > > > trying >> > > > > to read these nodes. Finally I found the server which did let me >> read >> > > the >> > > > > data of these nodes. >> > > > > Because of the inconsistency I've decided to restart it. Restart >> did >> > > > help, >> > > > > everything went back to normal state. The ephemeral nodes >> > disappeared. >> > > > > >> > > > > Similar situation had happen at 12PM but this time I had a lot >> more >> > > time >> > > > to >> > > > > look what is wrong. Second time the problem was about 3 ephemeral >> > nodes >> > > > > which were created at 1st now (again?). This time I dig a bit >> deeper >> > > and >> > > > > look into logs and 4 letter commands - but could not find anything >> > > > > interesting except the all these 3 nodes were created under >> different >> > > > > sessionids but zk had no hosts connected under this sessionids. >> > > > > Solution was similar to the one from 4AM but this time I've delete >> > all >> > > > > files in ZK data directory. >> > > > > >> > > > > Oddly enough the problem happened twice on the same ZK node, the >> > final >> > > > > solution was to clear ZK data directory. After clearing the >> directory >> > > the >> > > > > problem didn't happen again. >> > > > > >> > > > > I tried to look for solution/similar problems, I found the posts >> > where >> > > > > people were complaining about ephemeral nodes not being removed >> after >> > > > > client session gets closed. But I was not able to find posts >> about ZK >> > > not >> > > > > being consistent. >> > > > > >> > > > > What do you think about this? Can we do something to fix this? >> > > > > >> > > > > Sorry for my english, I was doing my best. :) >> > > > > >> > > > > Thanks, Kuba. >> > > > > >> > > > >> > > >> > >> >> >> >> >> > > --001a113cd5747e1e20050c9b4362--