zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kuba Lekstan <kueb...@gmail.com>
Subject Re: cluster/ephemeral nodes inconsistency
Date Wed, 14 Jan 2015 11:44:35 GMT
As far as I understand this issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-1777 is about some ZK nodes
not seeing part of existing ephemeral znodes. I have opposite problem, some
ZK nodes are seeing part of not existing ephemeral nodes.

2015-01-14 12:39 GMT+01:00 Kuba Lekstan <kuebzky@gmail.com>:

> German, today it had happen on our secondary cluster which consist of 3
> nodes, the leader didn't see the node but two other followers did.
>
> Flavio, I browsed the logs but was unable to find anything interesting,
> only setData operations were issued.
>
> Problematic znode was last modified at 13 Jan 2015 17:xx, we have noticed
> the issue at 14 Jan 2015 11:xx.
>
> 2015-01-14 10:52 GMT+01:00 Flavio Junqueira <fpjunqueira@yahoo.com.invalid
> >:
>
>> Hi there,
>> I suggest a couple of things here:
>> - Use LogFormatter to look into the transaction logs to check the
>> operations that are actually coming across.- It would be nice be able to
>> reproduce it outside your app, ideally as a junit test so that we can start
>> working on it.
>> I vaguely remember coming across such a problem, but I'll need to dig
>> into it. Does anyone on this list recall a similar problem?
>> -Flavio
>>
>>      On Wednesday, January 14, 2015 9:14 AM, Kuba Lekstan <
>> kuebzky@gmail.com> wrote:
>>
>>
>>
>>  German do you have any idea what might be causing these? Today same issue
>> had happen.
>>
>> 2014-11-21 5:42 GMT+01:00 Yogesh Patil <patyogesh@gmail.com>:
>>
>> > Hi Zookeepers,
>> > I am also experiencing the similar problem since yestderday. I have
>> pretty
>> > much similar setup and ephemeral znodes in place for keep-alive kind of
>> > function. I too see in spite of ZK session going down, ephemeral znodes
>> > still LIVES.
>> >
>> > I am using ZK 3.5.0.
>> >
>> > Any solution/fix for this type of an issue??
>> >
>> >
>> > --
>> > Sincerely,
>> >
>> > *Yogesh Patil*
>> >
>> >
>> >
>> > On Thu, Nov 13, 2014 at 2:10 PM, Kuba Lekstan <kuebzky@gmail.com>
>> wrote:
>> >
>> > > Sorry, forgot to mention. Version: 3.4.6.
>> > >
>> > > Thanks.
>> > >
>> > > 2014-11-13 18:11 GMT+01:00 German Blanco <
>> german.blanco.blanco@gmail.com
>> > >:
>> > >
>> > > > Hello,
>> > > >
>> > > > which version of Zookeeper are you using?
>> > > >
>> > > > On Thu, Nov 13, 2014 at 5:25 PM, Kuba Lekstan <kuebzky@gmail.com>
>> > wrote:
>> > > >
>> > > > > Hello,
>> > > > >
>> > > > > A bit of details:
>> > > > > We have 5 node cluster, which we use for configuration
>> distrubution
>> > and
>> > > > > monitoring active instances of our applications. Each application
>> > > creates
>> > > > > its ephemeral node, so we know which apps are alive, how many
of
>> them
>> > > > there
>> > > > > is and what they are doing.
>> > > > >
>> > > > > The problem had happen at 4th November, first time it was around
>> 4AM,
>> > > > > second time around 12PM.
>> > > > > First time it was middle of the night when I got woken up, the
>> > support
>> > > > guys
>> > > > > told me that something is wrong with config distribution.
>> > > > >
>> > > > > First I've checked apps for errors but didn't find anything
>> > > interesting,
>> > > > > then I looked at what's in zookeeper (using node-zk-browser).
>> > > > > I've noticed that there are 3 ephemeral nodes which were created
>> at
>> > 1st
>> > > > nov
>> > > > > (while the oldest application was started on 3rd nov), I could
>> read
>> > its
>> > > > > data but was not able to delete them - was getting NONODE
>> exception.
>> > > > >
>> > > > > I thought wtf - why I cannot delete these nodes, something very
>> bad
>> > had
>> > > > to
>> > > > > happen with ZK.
>> > > > >
>> > > > > So I sshed on the leader and using CLI I tried to read these
nodes
>> > but
>> > > I
>> > > > > was not able to - the leader was telling me that such nodes
>> doesn't
>> > > > exist.
>> > > > > After this I started to ssh to the rest of the nodes in cluster
>> and
>> > > > trying
>> > > > > to read these nodes. Finally I found the server which did let
me
>> read
>> > > the
>> > > > > data of these nodes.
>> > > > > Because of the inconsistency I've decided to restart it. Restart
>> did
>> > > > help,
>> > > > > everything went back to normal state. The ephemeral nodes
>> > disappeared.
>> > > > >
>> > > > > Similar situation had happen at 12PM but this time I had a lot
>> more
>> > > time
>> > > > to
>> > > > > look what is wrong. Second time the problem was about 3 ephemeral
>> > nodes
>> > > > > which were created at 1st now (again?). This time I dig a bit
>> deeper
>> > > and
>> > > > > look into logs and 4 letter commands - but could not find anything
>> > > > > interesting except the all these 3 nodes were created under
>> different
>> > > > > sessionids but zk had no hosts connected under this sessionids.
>> > > > > Solution was similar to the one from 4AM but this time I've delete
>> > all
>> > > > > files in ZK data directory.
>> > > > >
>> > > > > Oddly enough the problem happened twice on the same ZK node,
the
>> > final
>> > > > > solution was to clear ZK data directory. After clearing the
>> directory
>> > > the
>> > > > > problem didn't happen again.
>> > > > >
>> > > > > I tried to look for solution/similar problems, I found the posts
>> > where
>> > > > > people were complaining about ephemeral nodes not being removed
>> after
>> > > > > client session gets closed. But I was not able to find posts
>> about ZK
>> > > not
>> > > > > being consistent.
>> > > > >
>> > > > > What do you think about this? Can we do something to fix this?
>> > > > >
>> > > > > Sorry for my english, I was doing my best. :)
>> > > > >
>> > > > > Thanks, Kuba.
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message