zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <fpjunque...@yahoo.com.INVALID>
Subject Re: cluster/ephemeral nodes inconsistency
Date Wed, 14 Jan 2015 09:52:00 GMT
Hi there,
I suggest a couple of things here:
- Use LogFormatter to look into the transaction logs to check the operations that are actually
coming across.- It would be nice be able to reproduce it outside your app, ideally as a junit
test so that we can start working on it.
I vaguely remember coming across such a problem, but I'll need to dig into it. Does anyone
on this list recall a similar problem?
-Flavio  

     On Wednesday, January 14, 2015 9:14 AM, Kuba Lekstan <kuebzky@gmail.com> wrote:
   
 

 German do you have any idea what might be causing these? Today same issue
had happen.

2014-11-21 5:42 GMT+01:00 Yogesh Patil <patyogesh@gmail.com>:

> Hi Zookeepers,
> I am also experiencing the similar problem since yestderday. I have pretty
> much similar setup and ephemeral znodes in place for keep-alive kind of
> function. I too see in spite of ZK session going down, ephemeral znodes
> still LIVES.
>
> I am using ZK 3.5.0.
>
> Any solution/fix for this type of an issue??
>
>
> --
> Sincerely,
>
> *Yogesh Patil*
>
>
>
> On Thu, Nov 13, 2014 at 2:10 PM, Kuba Lekstan <kuebzky@gmail.com> wrote:
>
> > Sorry, forgot to mention. Version: 3.4.6.
> >
> > Thanks.
> >
> > 2014-11-13 18:11 GMT+01:00 German Blanco <german.blanco.blanco@gmail.com
> >:
> >
> > > Hello,
> > >
> > > which version of Zookeeper are you using?
> > >
> > > On Thu, Nov 13, 2014 at 5:25 PM, Kuba Lekstan <kuebzky@gmail.com>
> wrote:
> > >
> > > > Hello,
> > > >
> > > > A bit of details:
> > > > We have 5 node cluster, which we use for configuration distrubution
> and
> > > > monitoring active instances of our applications. Each application
> > creates
> > > > its ephemeral node, so we know which apps are alive, how many of them
> > > there
> > > > is and what they are doing.
> > > >
> > > > The problem had happen at 4th November, first time it was around 4AM,
> > > > second time around 12PM.
> > > > First time it was middle of the night when I got woken up, the
> support
> > > guys
> > > > told me that something is wrong with config distribution.
> > > >
> > > > First I've checked apps for errors but didn't find anything
> > interesting,
> > > > then I looked at what's in zookeeper (using node-zk-browser).
> > > > I've noticed that there are 3 ephemeral nodes which were created at
> 1st
> > > nov
> > > > (while the oldest application was started on 3rd nov), I could read
> its
> > > > data but was not able to delete them - was getting NONODE exception.
> > > >
> > > > I thought wtf - why I cannot delete these nodes, something very bad
> had
> > > to
> > > > happen with ZK.
> > > >
> > > > So I sshed on the leader and using CLI I tried to read these nodes
> but
> > I
> > > > was not able to - the leader was telling me that such nodes doesn't
> > > exist.
> > > > After this I started to ssh to the rest of the nodes in cluster and
> > > trying
> > > > to read these nodes. Finally I found the server which did let me read
> > the
> > > > data of these nodes.
> > > > Because of the inconsistency I've decided to restart it. Restart did
> > > help,
> > > > everything went back to normal state. The ephemeral nodes
> disappeared.
> > > >
> > > > Similar situation had happen at 12PM but this time I had a lot more
> > time
> > > to
> > > > look what is wrong. Second time the problem was about 3 ephemeral
> nodes
> > > > which were created at 1st now (again?). This time I dig a bit deeper
> > and
> > > > look into logs and 4 letter commands - but could not find anything
> > > > interesting except the all these 3 nodes were created under different
> > > > sessionids but zk had no hosts connected under this sessionids.
> > > > Solution was similar to the one from 4AM but this time I've delete
> all
> > > > files in ZK data directory.
> > > >
> > > > Oddly enough the problem happened twice on the same ZK node, the
> final
> > > > solution was to clear ZK data directory. After clearing the directory
> > the
> > > > problem didn't happen again.
> > > >
> > > > I tried to look for solution/similar problems, I found the posts
> where
> > > > people were complaining about ephemeral nodes not being removed after
> > > > client session gets closed. But I was not able to find posts about ZK
> > not
> > > > being consistent.
> > > >
> > > > What do you think about this? Can we do something to fix this?
> > > >
> > > > Sorry for my english, I was doing my best. :)
> > > >
> > > > Thanks, Kuba.
> > > >
> > >
> >
>


 
   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message