zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Zimmerman <azimmer...@sproutsocial.com>
Subject Re: entire cluster dies with EOFException
Date Mon, 07 Jul 2014 16:39:11 GMT
What I don't understand is how the entire cluster could die in such a
situation.  I was able to load zookeeper locally using the snapshot and 10g
log file without apparent issue.  I can see how large amounts of data could
cause latency issues in syncing causing a single worker to die, but how
would that explain the node's inability to restart?  When the server
replays the log file, does it have to sync the transactions to other nodes
while it does so?

I can alter the settings as has been discussed, but I worry that I'm just
delaying the same thing from happening again, if I deploy another storm
topology or something.  How can I get the cluster in a state where I can be
confident that it won't crash in a similar way as load increases, or at
least set up some kind of monitoring that will let me know something is
unhealthy?



On Mon, Jul 7, 2014 at 3:57 AM, Flavio Junqueira <
fpjunqueira@yahoo.com.invalid> wrote:

> In general, you want to minimize the amount of data you need to process
> during recovery (change of leader). When I've seen problems like yours
> happening, it was due to a large snapshot. In your case the snapshot size
> isn't very large, but the txn log files are. Given that the snapshot is of
> reasonable size but the log files are large, I understand that your app is
> creating and deleting a lot of data over time. I'm a bit worried that
> dropping the snapCount might cause your servers to start generating a lot
> of snapshots during short time periods.
>
> If you can find a value of snapCount that leads to reasonably sized logs
> and snapshots without overwhelming servers with the number of snapshots
> generated, then you're good. If you can't, then I suppose you'll need to
> leave  snapCount as is currently or a bit smaller maybe, and adjust
> initLimit, which controls the amount of time a follower to sync with the
> leader when it connects to the leader.
>
> This discussion actually makes me think that a version of snapCount that
> counts bytes instead of txns might work better in your case, but this isn't
> what we have currently.
>
> -Flavio
>
> On 07 Jul 2014, at 00:32, Aaron Zimmerman <azimmerman@sproutsocial.com>
> wrote:
>
> > This happened again this afternoon.  I captured the log file as well as
> the
> > preceding snapshot, but I can't upload to jira obviously, as its 10G or
> so.
> > Is there an easy way to step through it and see what the last, presumably
> > corrupted line is?  I'll try to get zookeeper running in debug mode or
> > something.
> >
> > I think reducing the snap count seems like a good idea, having just read
> > about this setting.  What has me most worried is that this issue somehow
> > propagates through the entire cluster.  Is there some way to gauge these
> > concerns?  Is there a rule of thumb like "log files should be less than
> 5g"
> > or something that we can watch for to head off this problem before it
> > becomes a production issue?
> >
> > But snap count seems like memory defense, would that cause EOF exceptions
> > when reading a DataInputStream?
> >
> > This is the config file, it looks a bit odd since its auto-generated by
> > exhibitor, but here it is verbatim.  I was just reading about having
> > separate devices for dataLogDir and dataDir.  That is definitely
> something
> > I plan to put in place.
> >
> > server.3=zookeeper02-n3\:2888\:3888
> > server.2=zookeeper02-n2\:2888\:3888
> > server.1=zookeeper02-n1\:2888\:3888
> > initLimit=10
> > syncLimit=5
> > clientPort=2181
> > tickTime=2000
> > dataDir=/var/lib/zookeeper
> > dataLogDir=/var/lib/zookeeper
> > server.5=zookeeper02-n5\:2888\:3888
> > server.4=zookeeper02-n4\:2888\:3888
> >
> >
> > On Sun, Jul 6, 2014 at 5:30 PM, Flavio Junqueira <
> > fpjunqueira@yahoo.com.invalid> wrote:
> >
> >> That's a good point, some znodes in the snapshot Aaron shared have over
> >> 400k bytes. There are some znodes with tens of kbytes too.
> >>
> >> Given that we are giving hints about the config parameters, it'd be nice
> >> if we could see the config file.
> >>
> >> -Flavio
> >>
> >> On 06 Jul 2014, at 22:34, Benjamin Reed <breed@apache.org> wrote:
> >>
> >>> the log files seem very large to me. is the snap count really large? or
> >> do
> >>> you have a lot of data in each znode? you might try reducing the snap
> >> count
> >>> to make the log files smaller.
> >>> On Jul 6, 2014 2:27 PM, "Flavio Junqueira"
> <fpjunqueira@yahoo.com.invalid
> >>>
> >>> wrote:
> >>>
> >>>> But what is it that was causing problems in your scenario, Raul? Is
it
> >>>> reading the log? In any case, it sounds like initLimit is the
> parameter
> >> you
> >>>> want to change, no?
> >>>>
> >>>> -Flavio
> >>>>
> >>>> On 06 Jul 2014, at 19:09, Raúl Gutiérrez Segalés <rgs@itevenworks.net
> >
> >>>> wrote:
> >>>>
> >>>>> Oh, storm right. Yeah I've seen this. The transaction rate is so
huge
> >> the
> >>>>> the initial sync fails.. perhaps you could try bigger tickTime,
> >> initLimit
> >>>>> and syncLimit params...
> >>>>>
> >>>>>
> >>>>> -rgs
> >>>>>
> >>>>>
> >>>>> On 6 July 2014 04:48, Aaron Zimmerman <azimmerman@sproutsocial.com>
> >>>> wrote:
> >>>>>
> >>>>>> Raúl,
> >>>>>>
> >>>>>> zk_approximate_data_size 4899392
> >>>>>>
> >>>>>> That is about the size of the snapshots also.
> >>>>>>
> >>>>>> Benjamin,
> >>>>>>
> >>>>>> We are not running out of disk space.
> >>>>>> But the log.XXXX files are quite large, is this normal?  In
less
> than
> >> 3
> >>>>>> hours, the log file since the last snapshot is 8.2G, and the
older
> log
> >>>>>> files are as large as 12G.
> >>>>>>
> >>>>>> We are using Storm Trident, this uses zookeeper pretty heavily
for
> >>>> tracking
> >>>>>> transactional state, but i'm not sure if that could account
for this
> >>>> much
> >>>>>> storage.  Is there an easy way to track which znodes are being
> updated
> >>>> most
> >>>>>> frequently?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Aaron
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Jul 6, 2014 at 1:55 AM, Raúl Gutiérrez Segalés <
> >>>>>> rgs@itevenworks.net>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> What's the total size of the data in your ZK cluster? i.e.:
> >>>>>>>
> >>>>>>> $ echo mntr | nc localhost 2181 | grep zk_approximate_data_size
> >>>>>>>
> >>>>>>> And/or the size of the snapshot?
> >>>>>>>
> >>>>>>>
> >>>>>>> -rgs
> >>>>>>>
> >>>>>>>
> >>>>>>> On 4 July 2014 06:29, Aaron Zimmerman <azimmerman@sproutsocial.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> We have a 5 node zookeeper cluster that has been operating
> normally
> >>>> for
> >>>>>>>> several months.  Starting a few days ago, the entire
cluster
> >> crashes a
> >>>>>>> few
> >>>>>>>> times per day, all nodes at the exact same time.  We
can't track
> >> down
> >>>>>> the
> >>>>>>>> exact issue, but deleting the snapshots and logs and
restarting
> >>>>>> resolves.
> >>>>>>>>
> >>>>>>>> We are running exhibitor to monitor the cluster.
> >>>>>>>>
> >>>>>>>> It appears that something bad gets into the logs, causing
an
> >>>>>> EOFException
> >>>>>>>> and this cascades through the entire cluster:
> >>>>>>>>
> >>>>>>>> 2014-07-04 12:55:26,328 [myid:1] - WARN
> >>>>>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89]
- Exception
> >>>>>> when
> >>>>>>>> following the leader
> >>>>>>>> java.io.EOFException
> >>>>>>>>      at java.io.DataInputStream.readInt(DataInputStream.java:375)
> >>>>>>>>      at
> >>>>>>>>
> >> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> >>>>>>>>      at
> >>>>>>>>
> >>>>
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> >>>>>>>>      at
> >>>>>>>>
> >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> >>>>>>>> 2014-07-04 12:55:26,328 [myid:1] - INFO
> >>>>>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166]
- shutdown
> >>>>>>> called
> >>>>>>>> java.lang.Exception: shutdown Follower
> >>>>>>>>      at
> >>>>>>>>
> >>>>
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> >>>>>>>>      at
> >>>>>>>>
> >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Then the server dies, exhibitor tries to restart each
node, and
> they
> >>>>>> all
> >>>>>>>> get stuck trying to replay the bad transaction, logging
things
> like:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2014-07-04 12:58:52,734 [myid:1] - INFO  [main:FileSnap@83]
-
> >> Reading
> >>>>>>>> snapshot /var/lib/zookeeper/version-2/snapshot.300011fc0
> >>>>>>>> 2014-07-04 12:58:52,896 [myid:1] - DEBUG
> >>>>>>>> [main:FileTxnLog$FileTxnIterator@575] - Created new
input stream
> >>>>>>>> /var/lib/zookeeper/version-2/log.300000021
> >>>>>>>> 2014-07-04 12:58:52,915 [myid:1] - DEBUG
> >>>>>>>> [main:FileTxnLog$FileTxnIterator@578] - Created new
input archive
> >>>>>>>> /var/lib/zookeeper/version-2/log.300000021
> >>>>>>>> 2014-07-04 12:59:25,870 [myid:1] - DEBUG
> >>>>>>>> [main:FileTxnLog$FileTxnIterator@618] - EOF excepton
> >>>>>>> java.io.EOFException:
> >>>>>>>> Failed to read /var/lib/zookeeper/version-2/log.300000021
> >>>>>>>> 2014-07-04 12:59:25,871 [myid:1] - DEBUG
> >>>>>>>> [main:FileTxnLog$FileTxnIterator@575] - Created new
input stream
> >>>>>>>> /var/lib/zookeeper/version-2/log.300011fc2
> >>>>>>>> 2014-07-04 12:59:25,872 [myid:1] - DEBUG
> >>>>>>>> [main:FileTxnLog$FileTxnIterator@578] - Created new
input archive
> >>>>>>>> /var/lib/zookeeper/version-2/log.300011fc2
> >>>>>>>> 2014-07-04 12:59:48,722 [myid:1] - DEBUG
> >>>>>>>> [main:FileTxnLog$FileTxnIterator@618] - EOF excepton
> >>>>>>> java.io.EOFException:
> >>>>>>>> Failed to read /var/lib/zookeeper/version-2/log.300011fc2
> >>>>>>>>
> >>>>>>>> And the cluster is dead.  The only way we have found
to recover is
> >> to
> >>>>>>>> delete all of the data and restart.
> >>>>>>>>
> >>>>>>>> Anyone seen this before?  Any ideas how I can track
down what is
> >>>>>> causing
> >>>>>>>> the EOFException, or insulate zookeeper from completely
crashing?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Aaron Zimmerman
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message