zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "snair123 ." <nair...@outlook.com>
Subject RE: Tracking down possible network partition
Date Fri, 10 Jul 2015 18:21:20 GMT
2.) It appears that the leader closes connections to the affected followers
 after a “transaction timeout” occurs. Where would I find out what this
 timeout is ? Is this the same thing as a session timout (e.g. The default
 of 20 * tickTime) ?

 
 https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L496

a. So the Leader closes connections to Followers and Observers after syncLimit*ticktime milliseconds
? 
b. So what purpose does the SyncLimit serve in followers and in observers ?
c. If i needed the Observer to stay connected to ZKEnsemble for a longer time - in case of
network partitiions - do i increase the syncLimit at the leader or at the Observer ?



> Date: Fri, 26 Jun 2015 18:10:45 -0700
> Subject: Re: Tracking down possible network partition
> From: rgs@itevenworks.net
> To: user@zookeeper.apache.org
> 
> On 25 June 2015 at 07:28, Round, Mark <Mark.Round@sky.uk> wrote:
> 
> > I have a 5-node Zookeeper 3.4.6 cluster across 3 data centres (2
> > zookeepers in each “main” DC, and a 5th in a 3rd DC for quorum). I see that
> > the two nodes in one DC have regular “issues” where they get kicked out of
> > the cluster and the ZooKeeperServer process stops for a few minutes until
> > the node rejoins. I’d like to know a couple of things, if someone could
> > please point me in the direction of the relevant docs I’d greatly
> > appreciate it.
> >
> > 1.) Is it expected behaviour that when a node is kicked from the cluster,
> > it will not be allowed to re-join for a period ? From the logs below I can
> > see that re-establishing a valid cluster took around 15 minutes.
> >
> 
> I don't think so.
> 
> 2.) It appears that the leader closes connections to the affected followers
> > after a “transaction timeout” occurs. Where would I find out what this
> > timeout is ? Is this the same thing as a session timout (e.g. The default
> > of 20 * tickTime) ?
> >
> 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L496
> 
> 
> > 3.) Where can I find the definition of the different fields in the
> > election log messages (I.e. What are “n.round”, “n.zxid”, “n.state”
and so
> > on) ?
> 
> 
> Not sure if there's a better source than the source:
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L687
> 
> 
> 
> -rgs
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message