zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com.INVALID>
Subject Re: Observer went down with Read timed out exception
Date Tue, 03 Jul 2018 13:29:32 GMT
Hi Rammohan,

Would you please elaborate on the details of your cluster setup?
Which ZooKeeper version do you use?
Do you use authentication / encryption?
Would you please attach config files and log files of other nodes like
leader and followers?

How did you make sure that there was no network problem at the time when
issue happened?
Would you please attach graphs / diagrams on the network traffic including
latency and bandwidth usage between the affected data centers?

Regards,
Andor




On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Yes I am sure there is no network issues, if leader is busy in GC followers
> on the same DC would have been shutdown as we right but it wasn't the case.
>
> On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar <nkalmar@cloudera.com.invalid>
> wrote:
>
> > Hi Ram,
> >
> > Are you sure there were no network error? For me, this looks like it
> could
> > be due to failed heartbeats (as shutdown was called after the timeout).
> >
> > It is also possible the leader was busy (maybe garbage collection caused
> > pause?) - especially if you store big(ish) chunks of data in ZooKeeper.
> > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this reason
> > actually).
> >
> > Regards,
> > Norbert
> >
> > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> > > All,
> > >
> > > I have multi data-center ldap cluster setup with other data-center with
> > all
> > > observers all of sudden all the observer threads went down with the
> > > following message, any idea why they went down? We don't see any
> network
> > > related issues between data-centers.
> > >
> > >
> > > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> when
> > > observing the leader
> > > java.net.SocketTimeoutException: Read timed out
> > > at java.net.SocketInputStream.socketRead0(Native Method)
> > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > at org.apache.jute.BinaryInputArchive.readInt(
> BinaryInputArchive.java:63)
> > > at
> > >
> > >
> > org.apache.zookeeper.server.quorum.QuorumPacket.
> deserialize(QuorumPacket.java:83)
> > > at
> > >
> > org.apache.jute.BinaryInputArchive.readRecord(
> BinaryInputArchive.java:108)
> > > at
> > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> > > at
> > >
> > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> Observer.java:75)
> > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:727)
> > > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > called
> > > java.lang.Exception: shutdown Observer
> > > at
> > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:731)
> > >
> > >
> > > Thanks,
> > > Ram
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message