Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: unknown (athena.apache.org: error in processing during lookup of
 azimmerman@sproutsocial.com)
MIME-Version: 1.0
In-Reply-To: 
 <CA+KRSq7ht0Qm7ZsUutnSmkY3UrTqPBBnQ-1vv2JiB-cNpgjTOQ@mail.gmail.com>
References: 
 <CA+KRSq5doeWLZguU7xtjZR_a3BdBh5ND35mRT908+faxzHOy0A@mail.gmail.com>
	<CAMUR=Wjk5YeorQokoJR6kD8bb9QxXtu3RMPyZJFhUU8kNrTwig@mail.gmail.com>
	<CA+KRSq5i+Q7c9Xx8isSQGbRsExvYw5Ce_-BBrSwchQLURdGTgg@mail.gmail.com>
	<CAMUR=Wi9KwtZWLMxTLZg_2Dkooq5p5BGaUgH1JPcW+Tb+=0EeA@mail.gmail.com>
	<8C226235-F318-4D7C-9E58-A902111E53A5@yahoo.com>
	<CAKR25EQ+yKUuTRxURiFrHB5RCnB6BQ1XGDNbixynswsCjxj4rQ@mail.gmail.com>
	<6033F6AC-4BFF-41D7-8980-EC5563961BE3@yahoo.com>
	<CA+KRSq7ht0Qm7ZsUutnSmkY3UrTqPBBnQ-1vv2JiB-cNpgjTOQ@mail.gmail.com>
Date: Sun, 6 Jul 2014 18:39:25 -0500
Message-ID: 
 <CA+KRSq7Mfr_iYM=2UAJt26V6wYURH-KaQ10ROkUAproQUTS4mQ@mail.gmail.com>
Subject: Re: entire cluster dies with EOFException
From: Aaron Zimmerman <azimmerman@sproutsocial.com>
To: user@zookeeper.apache.org
Content-Type: multipart/alternative; boundary=089e013a1cfe6e25fb04fd8ede81

--089e013a1cfe6e25fb04fd8ede81
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I should also mention that these servers only have 1G of ram.  Zookeeper
usually takes around 300m.


On Sun, Jul 6, 2014 at 6:32 PM, Aaron Zimmerman <azimmerman@sproutsocial.co=
m
> wrote:

> This happened again this afternoon.  I captured the log file as well as
> the preceding snapshot, but I can't upload to jira obviously, as its 10G =
or
> so.  Is there an easy way to step through it and see what the last,
> presumably corrupted line is?  I'll try to get zookeeper running in debug
> mode or something.
>
> I think reducing the snap count seems like a good idea, having just read
> about this setting.  What has me most worried is that this issue somehow
> propagates through the entire cluster.  Is there some way to gauge these
> concerns?  Is there a rule of thumb like "log files should be less than 5=
g"
> or something that we can watch for to head off this problem before it
> becomes a production issue?
>
> But snap count seems like memory defense, would that cause EOF exceptions
> when reading a DataInputStream?
>
> This is the config file, it looks a bit odd since its auto-generated by
> exhibitor, but here it is verbatim.  I was just reading about having
> separate devices for dataLogDir and dataDir.  That is definitely somethin=
g
> I plan to put in place.
>
> server.3=3Dzookeeper02-n3\:2888\:3888
> server.2=3Dzookeeper02-n2\:2888\:3888
> server.1=3Dzookeeper02-n1\:2888\:3888
> initLimit=3D10
> syncLimit=3D5
> clientPort=3D2181
> tickTime=3D2000
> dataDir=3D/var/lib/zookeeper
> dataLogDir=3D/var/lib/zookeeper
> server.5=3Dzookeeper02-n5\:2888\:3888
> server.4=3Dzookeeper02-n4\:2888\:3888
>
>
> On Sun, Jul 6, 2014 at 5:30 PM, Flavio Junqueira <
> fpjunqueira@yahoo.com.invalid> wrote:
>
>> That's a good point, some znodes in the snapshot Aaron shared have over
>> 400k bytes. There are some znodes with tens of kbytes too.
>>
>> Given that we are giving hints about the config parameters, it'd be nice
>> if we could see the config file.
>>
>> -Flavio
>>
>> On 06 Jul 2014, at 22:34, Benjamin Reed <breed@apache.org> wrote:
>>
>> > the log files seem very large to me. is the snap count really large? o=
r
>> do
>> > you have a lot of data in each znode? you might try reducing the snap
>> count
>> > to make the log files smaller.
>> > On Jul 6, 2014 2:27 PM, "Flavio Junqueira"
>> <fpjunqueira@yahoo.com.invalid>
>> > wrote:
>> >
>> >> But what is it that was causing problems in your scenario, Raul? Is i=
t
>> >> reading the log? In any case, it sounds like initLimit is the
>> parameter you
>> >> want to change, no?
>> >>
>> >> -Flavio
>> >>
>> >> On 06 Jul 2014, at 19:09, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=A9s <rgs@=
itevenworks.net>
>> >> wrote:
>> >>
>> >>> Oh, storm right. Yeah I've seen this. The transaction rate is so hug=
e
>> the
>> >>> the initial sync fails.. perhaps you could try bigger tickTime,
>> initLimit
>> >>> and syncLimit params...
>> >>>
>> >>>
>> >>> -rgs
>> >>>
>> >>>
>> >>> On 6 July 2014 04:48, Aaron Zimmerman <azimmerman@sproutsocial.com>
>> >> wrote:
>> >>>
>> >>>> Ra=C3=BAl,
>> >>>>
>> >>>> zk_approximate_data_size 4899392
>> >>>>
>> >>>> That is about the size of the snapshots also.
>> >>>>
>> >>>> Benjamin,
>> >>>>
>> >>>> We are not running out of disk space.
>> >>>> But the log.XXXX files are quite large, is this normal?  In less
>> than 3
>> >>>> hours, the log file since the last snapshot is 8.2G, and the older
>> log
>> >>>> files are as large as 12G.
>> >>>>
>> >>>> We are using Storm Trident, this uses zookeeper pretty heavily for
>> >> tracking
>> >>>> transactional state, but i'm not sure if that could account for thi=
s
>> >> much
>> >>>> storage.  Is there an easy way to track which znodes are being
>> updated
>> >> most
>> >>>> frequently?
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Aaron
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sun, Jul 6, 2014 at 1:55 AM, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=
=A9s <
>> >>>> rgs@itevenworks.net>
>> >>>> wrote:
>> >>>>
>> >>>>> What's the total size of the data in your ZK cluster? i.e.:
>> >>>>>
>> >>>>> $ echo mntr | nc localhost 2181 | grep zk_approximate_data_size
>> >>>>>
>> >>>>> And/or the size of the snapshot?
>> >>>>>
>> >>>>>
>> >>>>> -rgs
>> >>>>>
>> >>>>>
>> >>>>> On 4 July 2014 06:29, Aaron Zimmerman <azimmerman@sproutsocial.com=
>
>> >>>> wrote:
>> >>>>>
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> We have a 5 node zookeeper cluster that has been operating normal=
ly
>> >> for
>> >>>>>> several months.  Starting a few days ago, the entire cluster
>> crashes a
>> >>>>> few
>> >>>>>> times per day, all nodes at the exact same time.  We can't track
>> down
>> >>>> the
>> >>>>>> exact issue, but deleting the snapshots and logs and restarting
>> >>>> resolves.
>> >>>>>>
>> >>>>>> We are running exhibitor to monitor the cluster.
>> >>>>>>
>> >>>>>> It appears that something bad gets into the logs, causing an
>> >>>> EOFException
>> >>>>>> and this cascades through the entire cluster:
>> >>>>>>
>> >>>>>> 2014-07-04 12:55:26,328 [myid:1] - WARN
>> >>>>>> [QuorumPeer[myid=3D1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Excepti=
on
>> >>>> when
>> >>>>>> following the leader
>> >>>>>> java.io.EOFException
>> >>>>>>       at java.io.DataInputStream.readInt(DataInputStream.java:375=
)
>> >>>>>>       at
>> >>>>>>
>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>> >>>>>>       at
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket=
.java:83)
>> >>>>>>       at
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:10=
8)
>> >>>>>>       at
>> >>>>>>
>> >> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:15=
2)
>> >>>>>>       at
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:8=
5)
>> >>>>>>       at
>> >>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
>> >>>>>> 2014-07-04 12:55:26,328 [myid:1] - INFO
>> >>>>>> [QuorumPeer[myid=3D1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdo=
wn
>> >>>>> called
>> >>>>>> java.lang.Exception: shutdown Follower
>> >>>>>>       at
>> >>>>>>
>> >> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:16=
6)
>> >>>>>>       at
>> >>>>>>
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
>> >>>>>>
>> >>>>>>
>> >>>>>> Then the server dies, exhibitor tries to restart each node, and
>> they
>> >>>> all
>> >>>>>> get stuck trying to replay the bad transaction, logging things
>> like:
>> >>>>>>
>> >>>>>>
>> >>>>>> 2014-07-04 12:58:52,734 [myid:1] - INFO  [main:FileSnap@83] -
>> Reading
>> >>>>>> snapshot /var/lib/zookeeper/version-2/snapshot.300011fc0
>> >>>>>> 2014-07-04 12:58:52,896 [myid:1] - DEBUG
>> >>>>>> [main:FileTxnLog$FileTxnIterator@575] - Created new input stream
>> >>>>>> /var/lib/zookeeper/version-2/log.300000021
>> >>>>>> 2014-07-04 12:58:52,915 [myid:1] - DEBUG
>> >>>>>> [main:FileTxnLog$FileTxnIterator@578] - Created new input archive
>> >>>>>> /var/lib/zookeeper/version-2/log.300000021
>> >>>>>> 2014-07-04 12:59:25,870 [myid:1] - DEBUG
>> >>>>>> [main:FileTxnLog$FileTxnIterator@618] - EOF excepton
>> >>>>> java.io.EOFException:
>> >>>>>> Failed to read /var/lib/zookeeper/version-2/log.300000021
>> >>>>>> 2014-07-04 12:59:25,871 [myid:1] - DEBUG
>> >>>>>> [main:FileTxnLog$FileTxnIterator@575] - Created new input stream
>> >>>>>> /var/lib/zookeeper/version-2/log.300011fc2
>> >>>>>> 2014-07-04 12:59:25,872 [myid:1] - DEBUG
>> >>>>>> [main:FileTxnLog$FileTxnIterator@578] - Created new input archive
>> >>>>>> /var/lib/zookeeper/version-2/log.300011fc2
>> >>>>>> 2014-07-04 12:59:48,722 [myid:1] - DEBUG
>> >>>>>> [main:FileTxnLog$FileTxnIterator@618] - EOF excepton
>> >>>>> java.io.EOFException:
>> >>>>>> Failed to read /var/lib/zookeeper/version-2/log.300011fc2
>> >>>>>>
>> >>>>>> And the cluster is dead.  The only way we have found to recover i=
s
>> to
>> >>>>>> delete all of the data and restart.
>> >>>>>>
>> >>>>>> Anyone seen this before?  Any ideas how I can track down what is
>> >>>> causing
>> >>>>>> the EOFException, or insulate zookeeper from completely crashing?
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>>
>> >>>>>> Aaron Zimmerman
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>>
>

--089e013a1cfe6e25fb04fd8ede81--