Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (athena.apache.org: domain of fpjunqueira@yahoo.com
 designates 212.82.96.106 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com;
	b=KXIipcLOB9jOB87tEnaOiuB70uuBAo2FqHemH4xJ+50IVPP6kTYfyz2W1G1O2XJGc+CeMGj/UbA/74bdL6LG4hvXZDpROZ484rohUdHkiin028R/HeY1+u+VKSKUf3ehzrZOKG9S22gEKE7pu8d+jNx8xNKGUPYZAoTZBh9XgC96Cwc974JGA1o9J9ezm93mEnCzU303FH6UDYYxbQSMZLrKFP5UImIqPm0sugTbJz8DoYnMmIIl7zjDRptrd9q992qk1ta0NLmqgA+4if7W+jtepzWXn9ERR7N30jXlH+cU+l2QUmXF+o3a+q9qSRw8pP2mJMylsZIVbBjKNGwLoQ==;
From: "FPJ" <fpjunqueira@yahoo.com.INVALID>
To: <user@zookeeper.apache.org>
References: 
 <CA+KRSq5doeWLZguU7xtjZR_a3BdBh5ND35mRT908+faxzHOy0A@mail.gmail.com>
	<CAMUR=Wjk5YeorQokoJR6kD8bb9QxXtu3RMPyZJFhUU8kNrTwig@mail.gmail.com>
	<CA+KRSq5i+Q7c9Xx8isSQGbRsExvYw5Ce_-BBrSwchQLURdGTgg@mail.gmail.com>
	<CAMUR=Wi9KwtZWLMxTLZg_2Dkooq5p5BGaUgH1JPcW+Tb+=0EeA@mail.gmail.com>
	<8C226235-F318-4D7C-9E58-A902111E53A5@yahoo.com>
	<CAKR25EQ+yKUuTRxURiFrHB5RCnB6BQ1XGDNbixynswsCjxj4rQ@mail.gmail.com>
	<6033F6AC-4BFF-41D7-8980-EC5563961BE3@yahoo.com>
	<CA+KRSq7ht0Qm7ZsUutnSmkY3UrTqPBBnQ-1vv2JiB-cNpgjTOQ@mail.gmail.com>
	<410D550C-C686-4334-8EB2-C0CE3DE8C8F0@yahoo.com>
	<CA+KRSq4xVsU7tun4NgcjLgB8BpEaRh9yGkatrRCiy5d7r7hJ2g@mail.gmail.com>
	<CAMUR=WgXjcFD6QPO_BttRdAV752L2kkZEKytEMjoc89KdN0s0g@mail.gmail.com>
	<CA+KRSq6ds1D6oYXZym185sQnWWbT=5nO3u0eoJ4NqBXvig9MYA@mail.gmail.com>
	<1B0F661A-366B-4326-A074-A7EC5FD7AB67@yahoo.com>
	<CA+KRSq5CEbG1ZNLqn8QHVtmxrLKqZFMg6qf-o2H6VYBHfVuAMQ@mail.gmail.com>
	<3100C4B7-A0DF-4CAA-A3BD-77DED5E111EE@yahoo.com>
 <CA+KRSq7LsffkL=wJWAkZ8oOeorZnQety3k71oS=xm_MRmxDFD
 g@mail.gmail.c
	om>
In-Reply-To: 
 <CA+KRSq7LsffkL=wJWAkZ8oOeorZnQety3k71oS=xm_MRmxDFDg@mail.gmail.com>
Subject: RE: entire cluster dies with EOFException
Date: Mon, 14 Jul 2014 17:18:28 +0100
Message-ID: <020701cf9f7f$3f48eac0$bddac040$@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: 
 AQNwZeL4622LAD+B9TWvFrPqyAEQpQIegpKmAcTLXz4CVjQdwgGp/3WkAiHFFQsBabS4LwE7nl6OAbKD6TABtqyuvAFXpFpXAgN7XHcCST73bANemFVXArl+Z/kCcbJSDpdsG12A
Content-Language: en-gb

Thanks for reporting back, Aaron. Shall we close the jira you created?

-Flavio

> -----Original Message-----
> From: Aaron Zimmerman [mailto:azimmerman@sproutsocial.com]
> Sent: 14 July 2014 16:21
> To: user@zookeeper.apache.org
> Subject: Re: entire cluster dies with EOFException
>=20
> Closing the loop on this, It appears that upping the initLimit did =
resolve the
> issue.  Thanks all for the help.
>=20
> Thanks,
>=20
> Aaron Zimmerman
>=20
>=20
> On Tue, Jul 8, 2014 at 4:40 PM, Flavio Junqueira <
> fpjunqueira@yahoo.com.invalid> wrote:
>=20
> > Agreed, but we need that check because we expect bytes for the
> > checksum computation right underneath. The bit that's odd is that we
> > make the same check again below:
> >
> >         try {
> >                 long crcValue =3D ia.readLong("crcvalue");
> >                 byte[] bytes =3D Util.readTxnBytes(ia);
> >                 // Since we preallocate, we define EOF to be an
> >                 if (bytes =3D=3D null || bytes.length=3D=3D0) {
> >                     throw new EOFException("Failed to read " + =
logFile);
> >                 }
> >                 // EOF or corrupted record
> >                 // validate CRC
> >                 Checksum crc =3D makeChecksumAlgorithm();
> >                 crc.update(bytes, 0, bytes.length);
> >                 if (crcValue !=3D crc.getValue())
> >                     throw new IOException(CRC_ERROR);
> >                 if (bytes =3D=3D null || bytes.length =3D=3D 0)
> >                     return false;
> >                 hdr =3D new TxnHeader();
> >                 record =3D SerializeUtils.deserializeTxn(bytes, =
hdr);
> >             } catch (EOFException e) {
> >
> > I'm moving this discussion, to the jira, btw.
> >
> > -Flavio
> >
> > On 07 Jul 2014, at 22:03, Aaron Zimmerman
> > <azimmerman@sproutsocial.com>
> > wrote:
> >
> > > Flavio,
> > >
> > > Yes that is the initial error, and then the nodes in the cluster =
are
> > > restarted but fail to restart with
> > >
> > > 2014-07-04 12:58:52,734 [myid:1] - INFO  [main:FileSnap@83] -
> > > Reading snapshot /var/lib/zookeeper/version-2/snapshot.300011fc0
> > > 2014-07-04 12:58:52,896 [myid:1] - DEBUG
> > > [main:FileTxnLog$FileTxnIterator@575] - Created new input stream
> > > /var/lib/zookeeper/version-2/log.300000021
> > > 2014-07-04 12:58:52,915 [myid:1] - DEBUG
> > > [main:FileTxnLog$FileTxnIterator@578] - Created new input archive
> > > /var/lib/zookeeper/version-2/log.300000021
> > > 2014-07-04 12:59:25,870 [myid:1] - DEBUG
> > > [main:FileTxnLog$FileTxnIterator@618] - EOF excepton
> > java.io.EOFException:
> > > Failed to read /var/lib/zookeeper/version-2/log.300000021
> > > 2014-07-04 12:59:25,871 [myid:1] - DEBUG
> > > [main:FileTxnLog$FileTxnIterator@575] - Created new input stream
> > > /var/lib/zookeeper/version-2/log.300011fc2
> > > 2014-07-04 12:59:25,872 [myid:1] - DEBUG
> > > [main:FileTxnLog$FileTxnIterator@578] - Created new input archive
> > > /var/lib/zookeeper/version-2/log.300011fc2
> > > 2014-07-04 12:59:48,722 [myid:1] - DEBUG
> > > [main:FileTxnLog$FileTxnIterator@618] - EOF excepton
> > java.io.EOFException:
> > > Failed to read /var/lib/zookeeper/version-2/log.300011fc2
> > >
> > > Thanks,
> > >
> > > AZ
> > >
> > >
> > > On Mon, Jul 7, 2014 at 3:33 PM, Flavio Junqueira <
> > > fpjunqueira@yahoo.com.invalid> wrote:
> > >
> > >> I'm a bit confused, the stack trace you reported was this one:
> > >>
> > >> [QuorumPeer[myid=3D1]/0:0:0:0:0:0:0:0:2181:Follower@89] - =
Exception
> > >> when following the leader java.io.EOFException
> > >>       at =
java.io.DataInputStream.readInt(DataInputStream.java:375)
> > >>       at
> > >>
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> > >>       at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPa
> ck
> > et.java:83)
> > >>       at
> > >>
> > =
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:
> > 108)
> > >>       at
> > >>
> =
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> > >>       at
> > >>
> >
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java
> > :85)
> > >>       at
> > >>
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:7
> > >> 40)
> > >>
> > >>
> > >> That's in a different part of the code.
> > >>
> > >> -Flavio
> > >>
> > >> On 07 Jul 2014, at 18:50, Aaron Zimmerman
> > >> <azimmerman@sproutsocial.com>
> > >> wrote:
> > >>
> > >>> Util.readTxnBytes reads from the buffer and if the length is 0, =
it
> > return
> > >>> the zero length array, seemingly indicating the end of the file.
> > >>>
> > >>> Then this is detected in FileTxnLog.java:671:
> > >>>
> > >>>               byte[] bytes =3D Util.readTxnBytes(ia);
> > >>>               // Since we preallocate, we define EOF to be an
> > >>>               if (bytes =3D=3D null || bytes.length=3D=3D0) {
> > >>>                   throw new EOFException("Failed to read " + =
logFile);
> > >>>               }
> > >>>
> > >>>
> > >>> This exception is caught a few lines later, and the streams =
closed etc.
> > >>>
> > >>> So this seems to be not really an error condition, but a signal
> > >>> that
> > the
> > >>> entire file has been read? Is this exception a red herring?
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jul 7, 2014 at 11:50 AM, Ra=C3=BAl Guti=C3=A9rrez =
Segal=C3=A9s <
> > >> rgs@itevenworks.net
> > >>>> wrote:
> > >>>
> > >>>> On 7 July 2014 09:39, Aaron Zimmerman
> > >>>> <azimmerman@sproutsocial.com>
> > >> wrote:
> > >>>>
> > >>>>> What I don't understand is how the entire cluster could die in
> > >>>>> such a situation.  I was able to load zookeeper locally using
> > >>>>> the snapshot
> > and
> > >>>> 10g
> > >>>>> log file without apparent issue.
> > >>>>
> > >>>>
> > >>>> Sure, but it's syncing up with other learners that becomes
> > >>>> challenging
> > >> when
> > >>>> having either big snapshots or too many txnlogs, right?
> > >>>>
> > >>>>
> > >>>>> I can see how large amounts of data could cause latency issues
> > >>>>> in syncing causing a single worker to die, but
> > how
> > >>>>> would that explain the node's inability to restart?  When the
> > >>>>> server replays the log file, does it have to sync the
> > >>>>> transactions to other
> > >>>> nodes
> > >>>>> while it does so?
> > >>>>>
> > >>>>
> > >>>> Given that your txn churn is so big, by the time it finished up
> > reading
> > >>>> from disc it'll need
> > >>>> to catch up with the quorum.. how many txns have happened by =
that
> > >> point? By
> > >>>> the way, we use
> > >>>> this patch:
> > >>>>
> > >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1804
> > >>>>
> > >>>> to measure transaction rate, do you have any approximation of
> > >>>> what
> > your
> > >>>> transaction rate might be?
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> I can alter the settings as has been discussed, but I worry =
that
> > >>>>> I'm
> > >> just
> > >>>>> delaying the same thing from happening again, if I deploy
> > >>>>> another
> > storm
> > >>>>> topology or something.  How can I get the cluster in a state
> > >>>>> where I
> > >> can
> > >>>> be
> > >>>>> confident that it won't crash in a similar way as load
> > >>>>> increases, or
> > at
> > >>>>> least set up some kind of monitoring that will let me know
> > >>>>> something
> > is
> > >>>>> unhealthy?
> > >>>>>
> > >>>>
> > >>>> I think it depends on what your txn rate is, lets measure that
> > >>>> first I guess.
> > >>>>
> > >>>>
> > >>>> -rgs
> > >>>>
> > >>
> > >>
> >
> >