zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Leader election problems
Date Fri, 26 Jun 2015 02:23:37 GMT
A Group Communication / Virtual Synchrony may be better suited here. In
Paxos-based systems
you need a majority of the servers to remain connected to have progress. It
tolerates a minority being
offline but stops working once you have more failures / disconnects.  You
could reconfigure
to remove someone from the cluster when he is suspected (otherwise you risk
remaining without a quorum).
With group communication / virtual synchrony reconfiguration happens
automatically when a server
leaves and it gives you primitives to communicate with members of the
current group, such as atomic broadcast
(equivalent to consensus). You can achieve the same with ZK, but it may be
more work (I'm not sure).

Check out Isis2 from (Ken Birman's project), Ensemble group communication,
JGroups and others.




On Wed, Jun 24, 2015 at 11:39 PM, Filip Deleersnijder <filip@motum.be>
wrote:

> Hi,
>
> Thanks for your response.
>
> Our application consists of 8 automatic vehicles in a warehouse setting.
> Those vehicles need some consensus decisions, and that is what we use
> Zookeeper for.
> Because vehicles can come and go at random, we installed a ZK participant
> on every vehicle. The ZK client is some other piece of software that is
> also running on the vehicles.
>
> Therefor :
>         - We can not choose the number of ZK-participants because it just
> depends on the number of vehicles.
>         - The participants communicate over Wifi
>         - The client is running on the same machine, so it communicates
> over the local network
>
> We are running Zookeeper version 3.4.6
>
> Our zoo.cfg can be found below this e-mail.
>
> Thanks in advance !
>
> Filip
>
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> # do not use /tmp for storage, /tmp here is just
> # example sakes.
> dataDir=c:/motum/config/MASS/ZK
> # the port at which the clients will connect
> clientPort=2181
>
> server.1=172.17.35.11:2888:3888
> server.2=172.17.35.12:2888:3888
> server.3=172.17.35.13:2888:3888
> server.4=172.17.35.14:2888:3888
> server.5=172.17.35.15:2888:3888
> server.6=172.17.35.16:2888:3888
> server.7=172.17.35.17:2888:3888
> server.8=172.17.35.18:2888:3888
>
> # The number of snapshots to retain in dataDir
> # Purge task interval in hours
> # Set to "0" to disable auto purge feature
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
>
>
>
> > On 24 Jun 2015, at 18:54, Raúl Gutiérrez Segalés <rgs@itevenworks.net>
> wrote:
> >
> > Hi,
> >
> > On 24 June 2015 at 06:05, Filip Deleersnijder <filip@motum.be> wrote:
> >
> >> Hi,
> >>
> >> Let’s start with some description of our system :
> >>
> >> - We our using a Zookeeper cluster with 8 participants for an
> application
> >> with mobile nodes ( connected over Wifi ).
> >>
> >
> > You mean the participants talk over wifi or the clients?
> >
> >
> >> ( Ip of the different nodes are according to the following structure :
> >> Node X has IP : 172.17.35.1X )
> >>
> >
> > Why 8 and not an odd number of machines (i.e.:
> >
> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
> > )?
> >
> > - It is not that unusual to have a node being shut-down or restarted
> >> - We haven’t benchmarked the number of write operations yet, but I would
> >> estimate that it would be less than 10 writes / second
> >>
> >
> > What version of ZK are you using?
> >
> >
> >>
> >> The problem we are having however is that sometimes(*), some instances
> >> seem to be having problems with leader election.
> >> Under the header “Attachment 1” below, you can find the leader election
> >> times that were needed over 24h ( from 1 node ).  One average it took
> more
> >> than 1 minute !
> >> I assume that this is not normal behaviour ? ( If somebody could confirm
> >> that in a 8-node cluster, these are not normal leader election times,
> that
> >> would be nice )
> >>
> >> In attachement 2 : I included an extract from the logging during a
> leader
> >> election that took 101874ms for 1 node ( server 2 ).
> >>
> >> Any help is greatly appreciated.
> >> If further or more specific logging is required, please ask !
> >>
> >>
> > Do you mind sharing a copy of your config file (zoo.cfg)? Thanks!
> >
> >
> > -rgs
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message