zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filip Deleersnijder <fi...@motum.be>
Subject Re: Leader election problems
Date Thu, 25 Jun 2015 14:34:27 GMT
Hi,

I can see that all of our logs contain the following log-statements pretty often.

2015-06-22 12:02:00,752 [myid:2] - DEBUG [main:DataTree@949][] - Ignoring processTxn failure
hdr: -1 : error: -2
2015-06-22 12:02:00,753 [myid:2] - DEBUG [main:DataTree@949][] - Ignoring processTxn failure
hdr: 14 : error: -101

2015-06-25 14:02:39,505 [myid:3] - DEBUG [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileTxnLog$FileTxnIterator@636]
- EOF excepton java.io.EOFException: Failed to read c:\motum\config\MASS\ZK\version-2\log.1aa00000001
Since we don’t properly shut the ZK process down ( we just shutdown windows ), this properly
can cause corruption of files.

Is there somebody that has a clear idea about whether the “EOF” or the “Ignoring processTxn”
problems could cause frequent and long during Leader Elections ?

Any help is greatly appreciated,

Filip



> On 25 Jun 2015, at 11:51, Guy Moshkowich <guy.moshkowich@gmail.com> wrote:
> 
> Are you using ZK client on your vehicles or ZK servers?
> You mentioned below 8 vehicles and i see 8 servers defined in the config.
> I would expect you have 8 client(running on your vehicles) communicating
> against 1 or 3 ZK servers as this will be more than enough for 8 clients.
> Guy
> 
> On Thursday, June 25, 2015, Filip Deleersnijder <filip@motum.be <mailto:filip@motum.be>>
wrote:
> 
>> Hi,
>> 
>> Thanks for your response.
>> 
>> Our application consists of 8 automatic vehicles in a warehouse setting.
>> Those vehicles need some consensus decisions, and that is what we use
>> Zookeeper for.
>> Because vehicles can come and go at random, we installed a ZK participant
>> on every vehicle. The ZK client is some other piece of software that is
>> also running on the vehicles.
>> 
>> Therefor :
>>        - We can not choose the number of ZK-participants because it just
>> depends on the number of vehicles.
>>        - The participants communicate over Wifi
>>        - The client is running on the same machine, so it communicates
>> over the local network
>> 
>> We are running Zookeeper version 3.4.6
>> 
>> Our zoo.cfg can be found below this e-mail.
>> 
>> Thanks in advance !
>> 
>> Filip
>> 
>> # The number of milliseconds of each tick
>> tickTime=2000
>> # The number of ticks that the initial
>> # synchronization phase can take
>> initLimit=10
>> # The number of ticks that can pass between
>> # sending a request and getting an acknowledgement
>> syncLimit=5
>> # the directory where the snapshot is stored.
>> # do not use /tmp for storage, /tmp here is just
>> # example sakes.
>> dataDir=c:/motum/config/MASS/ZK
>> # the port at which the clients will connect
>> clientPort=2181
>> 
>> server.1=172.17.35.11:2888:3888
>> server.2=172.17.35.12:2888:3888
>> server.3=172.17.35.13:2888:3888
>> server.4=172.17.35.14:2888:3888
>> server.5=172.17.35.15:2888:3888
>> server.6=172.17.35.16:2888:3888
>> server.7=172.17.35.17:2888:3888
>> server.8=172.17.35.18:2888:3888
>> 
>> # The number of snapshots to retain in dataDir
>> # Purge task interval in hours
>> # Set to "0" to disable auto purge feature
>> autopurge.snapRetainCount=3
>> autopurge.purgeInterval=1
>> 
>> 
>> 
>>> On 24 Jun 2015, at 18:54, Raúl Gutiérrez Segalés <rgs@itevenworks.net
>> <javascript:;>> wrote:
>>> 
>>> Hi,
>>> 
>>> On 24 June 2015 at 06:05, Filip Deleersnijder <filip@motum.be <mailto:filip@motum.be>
>> <javascript:;>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Let’s start with some description of our system :
>>>> 
>>>> - We our using a Zookeeper cluster with 8 participants for an
>> application
>>>> with mobile nodes ( connected over Wifi ).
>>>> 
>>> 
>>> You mean the participants talk over wifi or the clients?
>>> 
>>> 
>>>> ( Ip of the different nodes are according to the following structure :
>>>> Node X has IP : 172.17.35.1X )
>>>> 
>>> 
>>> Why 8 and not an odd number of machines (i.e.:
>>> 
>> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup <http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup>
>>> )?
>>> 
>>> - It is not that unusual to have a node being shut-down or restarted
>>>> - We haven’t benchmarked the number of write operations yet, but I would
>>>> estimate that it would be less than 10 writes / second
>>>> 
>>> 
>>> What version of ZK are you using?
>>> 
>>> 
>>>> 
>>>> The problem we are having however is that sometimes(*), some instances
>>>> seem to be having problems with leader election.
>>>> Under the header “Attachment 1” below, you can find the leader election
>>>> times that were needed over 24h ( from 1 node ).  One average it took
>> more
>>>> than 1 minute !
>>>> I assume that this is not normal behaviour ? ( If somebody could confirm
>>>> that in a 8-node cluster, these are not normal leader election times,
>> that
>>>> would be nice )
>>>> 
>>>> In attachement 2 : I included an extract from the logging during a
>> leader
>>>> election that took 101874ms for 1 node ( server 2 ).
>>>> 
>>>> Any help is greatly appreciated.
>>>> If further or more specific logging is required, please ask !
>>>> 
>>>> 
>>> Do you mind sharing a copy of your config file (zoo.cfg)? Thanks!
>>> 
>>> 
>>> -rgs


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message