Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
From: Filip Deleersnijder <filip@motum.be>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_6FC4ECB5-D2BC-4342-A89B-AD5941C32271"
Message-Id: <26F7CD6B-9381-4DD7-B612-6E506736A044@motum.be>
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\))
Subject: Re: Leader election problems
Date: Thu, 25 Jun 2015 16:34:27 +0200
References: <0ABDD99C-B3C1-4278-B6E8-6A997658B988@motum.be>
 <CAMUR=WjxUaXbMD8tz07On_tGOdzYyK+0D4Dj93bd6+rf66wvEA@mail.gmail.com>
 <24927B98-DABB-4D6D-8BCB-9F8D60896EAC@motum.be>
 <CAMxCP3hQTymrqisR+5XCAAWxFdK+FghMBv9zFxt6RdHJRRZaXg@mail.gmail.com>
To: user@zookeeper.apache.org
In-Reply-To: 
 <CAMxCP3hQTymrqisR+5XCAAWxFdK+FghMBv9zFxt6RdHJRRZaXg@mail.gmail.com>

--Apple-Mail=_6FC4ECB5-D2BC-4342-A89B-AD5941C32271
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hi,

I can see that all of our logs contain the following log-statements =
pretty often.

2015-06-22 12:02:00,752 [myid:2] - DEBUG [main:DataTree@949][] - =
Ignoring processTxn failure hdr: -1 : error: -2
2015-06-22 12:02:00,753 [myid:2] - DEBUG [main:DataTree@949][] - =
Ignoring processTxn failure hdr: 14 : error: -101

2015-06-25 14:02:39,505 [myid:3] - DEBUG =
[QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FileTxnLog$FileTxnIterator@636]=
 - EOF excepton java.io.EOFException: Failed to read =
c:\motum\config\MASS\ZK\version-2\log.1aa00000001
Since we don=E2=80=99t properly shut the ZK process down ( we just =
shutdown windows ), this properly can cause corruption of files.

Is there somebody that has a clear idea about whether the =E2=80=9CEOF=E2=80=
=9D or the =E2=80=9CIgnoring processTxn=E2=80=9D problems could cause =
frequent and long during Leader Elections ?

Any help is greatly appreciated,

Filip


> On 25 Jun 2015, at 11:51, Guy Moshkowich <guy.moshkowich@gmail.com> =
wrote:
>=20
> Are you using ZK client on your vehicles or ZK servers?
> You mentioned below 8 vehicles and i see 8 servers defined in the =
config.
> I would expect you have 8 client(running on your vehicles) =
communicating
> against 1 or 3 ZK servers as this will be more than enough for 8 =
clients.
> Guy
>=20
> On Thursday, June 25, 2015, Filip Deleersnijder <filip@motum.be =
<mailto:filip@motum.be>> wrote:
>=20
>> Hi,
>>=20
>> Thanks for your response.
>>=20
>> Our application consists of 8 automatic vehicles in a warehouse =
setting.
>> Those vehicles need some consensus decisions, and that is what we use
>> Zookeeper for.
>> Because vehicles can come and go at random, we installed a ZK =
participant
>> on every vehicle. The ZK client is some other piece of software that =
is
>> also running on the vehicles.
>>=20
>> Therefor :
>>        - We can not choose the number of ZK-participants because it =
just
>> depends on the number of vehicles.
>>        - The participants communicate over Wifi
>>        - The client is running on the same machine, so it =
communicates
>> over the local network
>>=20
>> We are running Zookeeper version 3.4.6
>>=20
>> Our zoo.cfg can be found below this e-mail.
>>=20
>> Thanks in advance !
>>=20
>> Filip
>>=20
>> # The number of milliseconds of each tick
>> tickTime=3D2000
>> # The number of ticks that the initial
>> # synchronization phase can take
>> initLimit=3D10
>> # The number of ticks that can pass between
>> # sending a request and getting an acknowledgement
>> syncLimit=3D5
>> # the directory where the snapshot is stored.
>> # do not use /tmp for storage, /tmp here is just
>> # example sakes.
>> dataDir=3Dc:/motum/config/MASS/ZK
>> # the port at which the clients will connect
>> clientPort=3D2181
>>=20
>> server.1=3D172.17.35.11:2888:3888
>> server.2=3D172.17.35.12:2888:3888
>> server.3=3D172.17.35.13:2888:3888
>> server.4=3D172.17.35.14:2888:3888
>> server.5=3D172.17.35.15:2888:3888
>> server.6=3D172.17.35.16:2888:3888
>> server.7=3D172.17.35.17:2888:3888
>> server.8=3D172.17.35.18:2888:3888
>>=20
>> # The number of snapshots to retain in dataDir
>> # Purge task interval in hours
>> # Set to "0" to disable auto purge feature
>> autopurge.snapRetainCount=3D3
>> autopurge.purgeInterval=3D1
>>=20
>>=20
>>=20
>>> On 24 Jun 2015, at 18:54, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=A9s =
<rgs@itevenworks.net
>> <javascript:;>> wrote:
>>>=20
>>> Hi,
>>>=20
>>> On 24 June 2015 at 06:05, Filip Deleersnijder <filip@motum.be =
<mailto:filip@motum.be>
>> <javascript:;>> wrote:
>>>=20
>>>> Hi,
>>>>=20
>>>> Let=E2=80=99s start with some description of our system :
>>>>=20
>>>> - We our using a Zookeeper cluster with 8 participants for an
>> application
>>>> with mobile nodes ( connected over Wifi ).
>>>>=20
>>>=20
>>> You mean the participants talk over wifi or the clients?
>>>=20
>>>=20
>>>> ( Ip of the different nodes are according to the following =
structure :
>>>> Node X has IP : 172.17.35.1X )
>>>>=20
>>>=20
>>> Why 8 and not an odd number of machines (i.e.:
>>>=20
>> =
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServer=
Setup =
<http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServe=
rSetup>
>>> )?
>>>=20
>>> - It is not that unusual to have a node being shut-down or restarted
>>>> - We haven=E2=80=99t benchmarked the number of write operations =
yet, but I would
>>>> estimate that it would be less than 10 writes / second
>>>>=20
>>>=20
>>> What version of ZK are you using?
>>>=20
>>>=20
>>>>=20
>>>> The problem we are having however is that sometimes(*), some =
instances
>>>> seem to be having problems with leader election.
>>>> Under the header =E2=80=9CAttachment 1=E2=80=9D below, you can find =
the leader election
>>>> times that were needed over 24h ( from 1 node ).  One average it =
took
>> more
>>>> than 1 minute !
>>>> I assume that this is not normal behaviour ? ( If somebody could =
confirm
>>>> that in a 8-node cluster, these are not normal leader election =
times,
>> that
>>>> would be nice )
>>>>=20
>>>> In attachement 2 : I included an extract from the logging during a
>> leader
>>>> election that took 101874ms for 1 node ( server 2 ).
>>>>=20
>>>> Any help is greatly appreciated.
>>>> If further or more specific logging is required, please ask !
>>>>=20
>>>>=20
>>> Do you mind sharing a copy of your config file (zoo.cfg)? Thanks!
>>>=20
>>>=20
>>> -rgs


--Apple-Mail=_6FC4ECB5-D2BC-4342-A89B-AD5941C32271--