zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rammohan ganapavarapu <rammohanga...@gmail.com>
Subject Re: Observer went down with Read timed out exception
Date Tue, 03 Jul 2018 18:13:08 GMT
Andor,

Zk  version that i use is zk_version 3.4.5-1392090, built on 09/30/2012
17:52 GMT
No Auth or encryption config
None my of network graphs showing any dip or unusual pattern thats why i am
thinking there may not be any n/w issue. I have those nodes in cloud so
checking with them to see if any n/w issue between regions.

Thanks,
Ram


On Tue, Jul 3, 2018 at 6:29 AM Andor Molnar <andor@cloudera.com.invalid>
wrote:

> Hi Rammohan,
>
> Would you please elaborate on the details of your cluster setup?
> Which ZooKeeper version do you use?
> Do you use authentication / encryption?
> Would you please attach config files and log files of other nodes like
> leader and followers?
>
> How did you make sure that there was no network problem at the time when
> issue happened?
> Would you please attach graphs / diagrams on the network traffic including
> latency and bandwidth usage between the affected data centers?
>
> Regards,
> Andor
>
>
>
>
> On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
> > Yes I am sure there is no network issues, if leader is busy in GC
> followers
> > on the same DC would have been shutdown as we right but it wasn't the
> case.
> >
> > On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar <nkalmar@cloudera.com.invalid
> >
> > wrote:
> >
> > > Hi Ram,
> > >
> > > Are you sure there were no network error? For me, this looks like it
> > could
> > > be due to failed heartbeats (as shutdown was called after the timeout).
> > >
> > > It is also possible the leader was busy (maybe garbage collection
> caused
> > > pause?) - especially if you store big(ish) chunks of data in ZooKeeper.
> > > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this
> reason
> > > actually).
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> > > rammohanganap@gmail.com> wrote:
> > >
> > > > All,
> > > >
> > > > I have multi data-center ldap cluster setup with other data-center
> with
> > > all
> > > > observers all of sudden all the observer threads went down with the
> > > > following message, any idea why they went down? We don't see any
> > network
> > > > related issues between data-centers.
> > > >
> > > >
> > > > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> > when
> > > > observing the leader
> > > > java.net.SocketTimeoutException: Read timed out
> > > > at java.net.SocketInputStream.socketRead0(Native Method)
> > > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> > > > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > > > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > > at org.apache.jute.BinaryInputArchive.readInt(
> > BinaryInputArchive.java:63)
> > > > at
> > > >
> > > >
> > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > deserialize(QuorumPacket.java:83)
> > > > at
> > > >
> > > org.apache.jute.BinaryInputArchive.readRecord(
> > BinaryInputArchive.java:108)
> > > > at
> > > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> > > > at
> > > >
> > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > Observer.java:75)
> > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > QuorumPeer.java:727)
> > > > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > > called
> > > > java.lang.Exception: shutdown Observer
> > > > at
> > > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > QuorumPeer.java:731)
> > > >
> > > >
> > > > Thanks,
> > > > Ram
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message