zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@cloudera.com>
Subject Re: shutdown Observer
Date Fri, 10 Mar 2017 05:30:42 GMT
It helps. An extreme case is network partition and packet loss is 100%. ZK
rely on TCP for communications between quorum peers, so the lost packet
will be retransmitted by TCP, so unless your network is partitioned
forever, the system will move forward once the partition heals. There is no
worries about a packet loss forever because of the TCP guarantee. In this
case the timeout can be set to infinite (pass 0 to setSoTimeout) so socket
IO will block indefinitely until partition heals.

The socket timeout is really just to provide an opportunity for ZK server
to take action when we think we should bail out for a bad network condition
rather than blocking indefinitely, as ZK needs to satisfy some basic
liveness guarantee.

On Thu, Mar 9, 2017 at 3:12 PM, Jai Bheemsen Rao Dhanwada <
jaibheemsen@gmail.com> wrote:

> If there is packet loss, does increasing the initLimit value help?
>
> ref: http://efod.se/blog/archive/2013/02/09/zookeeper-initlimit
>
> Any thoughts?
>
> On Thu, Mar 9, 2017 at 10:12 AM, Dan Benediktson <
> dbenediktson@twitter.com.invalid> wrote:
>
> > It's also likely you have a fair bit of packet loss between your
> > datacenters, unless you know you have a solid network between them. If
> your
> > observers are falling offline "randomly", packet loss is a pretty likely
> > culprit.
> >
> > On Thu, Mar 9, 2017 at 9:54 AM, Michael Han <hanm@cloudera.com> wrote:
> >
> > > The log indicates that your server socket on observer timed out after
> > > syncing with leader. It could simply because that the latency between
> > your
> > > DCs exceeds the socket timeout configuration ZK uses. The timeout is
> > > calculated as tickTime * syncLimit so you might want tweak these values
> > to
> > > fit the latency between your DCs.
> > >
> > > On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu <
> > > rammohanganap@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > We have a multi data-center zk cluster with all the followers are in
> > one
> > > > data-center and observers in other data-centers, for some reason
> > > observers
> > > > are going down with the following exception and i am not sure what
> > could
> > > be
> > > > the reason and how to avoid this issue, any thoughts?
> > > >
> > > > Ram
> > > >
> > > >
> > > >
> > > > 2017-03-09 09:00:18,305 - WARN
> > > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> > when
> > > > observing the leader
> > > > java.net.SocketTimeoutException: Read timed out
> > > >         at java.net.SocketInputStream.socketRead0(Native Method)
> > > >         at java.net.SocketInputStream.read(SocketInputStream.java:
> 152)
> > > >         at java.net.SocketInputStream.read(SocketInputStream.java:
> 122)
> > > >         at java.io.BufferedInputStream.
> fill(BufferedInputStream.java:
> > > 235)
> > > >         at java.io.BufferedInputStream.
> read(BufferedInputStream.java:
> > > 254)
> > > >         at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >         at
> > > > org.apache.jute.BinaryInputArchive.readInt(
> BinaryInputArchive.java:63)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > > > deserialize(QuorumPacket.java:83)
> > > >         at
> > > > org.apache.jute.BinaryInputArchive.readRecord(
> > > BinaryInputArchive.java:108)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.Learner.readPacket(
> > Learner.java:152)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > > > Observer.java:75)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:727)
> > > > 2017-03-09 09:00:18,306 - INFO
> > > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > > called
> > > > java.lang.Exception: shutdown Observer
> > > >         at
> > > > org.apache.zookeeper.server.quorum.Observer.shutdown(
> > Observer.java:137)
> > > >
> > >
> > >
> > >
> > > --
> > > Cheers
> > > Michael.
> > >
> >
>



-- 
Cheers
Michael.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message