Hi,
I can see that all of our logs contain the following log-statements pretty often.
2015-06-22 12:02:00,752 [myid:2] - DEBUG [main:DataTree@949][] - Ignoring processTxn failure
hdr: -1 : error: -2
2015-06-22 12:02:00,753 [myid:2] - DEBUG [main:DataTree@949][] - Ignoring processTxn failure
hdr: 14 : error: -101
2015-06-25 14:02:39,505 [myid:3] - DEBUG [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileTxnLog$FileTxnIterator@636]
- EOF excepton java.io.EOFException: Failed to read c:\motum\config\MASS\ZK\version-2\log.1aa00000001
Since we don’t properly shut the ZK process down ( we just shutdown windows ), this properly
can cause corruption of files.
Is there somebody that has a clear idea about whether the “EOF” or the “Ignoring processTxn”
problems could cause frequent and long during Leader Elections ?
Any help is greatly appreciated,
Filip
> On 25 Jun 2015, at 11:51, Guy Moshkowich <guy.moshkowich@gmail.com> wrote:
>
> Are you using ZK client on your vehicles or ZK servers?
> You mentioned below 8 vehicles and i see 8 servers defined in the config.
> I would expect you have 8 client(running on your vehicles) communicating
> against 1 or 3 ZK servers as this will be more than enough for 8 clients.
> Guy
>
> On Thursday, June 25, 2015, Filip Deleersnijder <filip@motum.be <mailto:filip@motum.be>>
wrote:
>
>> Hi,
>>
>> Thanks for your response.
>>
>> Our application consists of 8 automatic vehicles in a warehouse setting.
>> Those vehicles need some consensus decisions, and that is what we use
>> Zookeeper for.
>> Because vehicles can come and go at random, we installed a ZK participant
>> on every vehicle. The ZK client is some other piece of software that is
>> also running on the vehicles.
>>
>> Therefor :
>> - We can not choose the number of ZK-participants because it just
>> depends on the number of vehicles.
>> - The participants communicate over Wifi
>> - The client is running on the same machine, so it communicates
>> over the local network
>>
>> We are running Zookeeper version 3.4.6
>>
>> Our zoo.cfg can be found below this e-mail.
>>
>> Thanks in advance !
>>
>> Filip
>>
>> # The number of milliseconds of each tick
>> tickTime=2000
>> # The number of ticks that the initial
>> # synchronization phase can take
>> initLimit=10
>> # The number of ticks that can pass between
>> # sending a request and getting an acknowledgement
>> syncLimit=5
>> # the directory where the snapshot is stored.
>> # do not use /tmp for storage, /tmp here is just
>> # example sakes.
>> dataDir=c:/motum/config/MASS/ZK
>> # the port at which the clients will connect
>> clientPort=2181
>>
>> server.1=172.17.35.11:2888:3888
>> server.2=172.17.35.12:2888:3888
>> server.3=172.17.35.13:2888:3888
>> server.4=172.17.35.14:2888:3888
>> server.5=172.17.35.15:2888:3888
>> server.6=172.17.35.16:2888:3888
>> server.7=172.17.35.17:2888:3888
>> server.8=172.17.35.18:2888:3888
>>
>> # The number of snapshots to retain in dataDir
>> # Purge task interval in hours
>> # Set to "0" to disable auto purge feature
>> autopurge.snapRetainCount=3
>> autopurge.purgeInterval=1
>>
>>
>>
>>> On 24 Jun 2015, at 18:54, Raúl Gutiérrez Segalés <rgs@itevenworks.net
>> <javascript:;>> wrote:
>>>
>>> Hi,
>>>
>>> On 24 June 2015 at 06:05, Filip Deleersnijder <filip@motum.be <mailto:filip@motum.be>
>> <javascript:;>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Let’s start with some description of our system :
>>>>
>>>> - We our using a Zookeeper cluster with 8 participants for an
>> application
>>>> with mobile nodes ( connected over Wifi ).
>>>>
>>>
>>> You mean the participants talk over wifi or the clients?
>>>
>>>
>>>> ( Ip of the different nodes are according to the following structure :
>>>> Node X has IP : 172.17.35.1X )
>>>>
>>>
>>> Why 8 and not an odd number of machines (i.e.:
>>>
>> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup <http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup>
>>> )?
>>>
>>> - It is not that unusual to have a node being shut-down or restarted
>>>> - We haven’t benchmarked the number of write operations yet, but I would
>>>> estimate that it would be less than 10 writes / second
>>>>
>>>
>>> What version of ZK are you using?
>>>
>>>
>>>>
>>>> The problem we are having however is that sometimes(*), some instances
>>>> seem to be having problems with leader election.
>>>> Under the header “Attachment 1” below, you can find the leader election
>>>> times that were needed over 24h ( from 1 node ). One average it took
>> more
>>>> than 1 minute !
>>>> I assume that this is not normal behaviour ? ( If somebody could confirm
>>>> that in a 8-node cluster, these are not normal leader election times,
>> that
>>>> would be nice )
>>>>
>>>> In attachement 2 : I included an extract from the logging during a
>> leader
>>>> election that took 101874ms for 1 node ( server 2 ).
>>>>
>>>> Any help is greatly appreciated.
>>>> If further or more specific logging is required, please ask !
>>>>
>>>>
>>> Do you mind sharing a copy of your config file (zoo.cfg)? Thanks!
>>>
>>>
>>> -rgs
|