zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From upendar devu <devulapal...@gmail.com>
Subject Re: Zookeeper -java.net.SocketException: Socket closed
Date Thu, 25 Jan 2018 15:19:01 GMT
Thanks for sharing analysis , the instances running on EC2 instances and we
have kafka,zk,storm and es instances as well but not seen such error in
those components if there is network latency  then there should be socket
error in other components as data is being processed every sec.

Lets hear from zookeeper dev team , hope they will respond

On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <andor@cloudera.com> wrote:

> No, this is not the bug I was thinking of.
>
> Looks like your network connection is poor between the leader and the
> follower which the logs was attached. Do you have any other network
> monitoring tools in place or do you see any network related error messages
> in your kernel logs?
> Follower lost the connection to the leader:
> 2018-01-23 07:40:21,709 [myid:3] - WARN
> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader,
> exception during packet send
>
> ...and took ages to recover: 944 secs!!
> 2018-01-23 07:56:05,742 [myid:3] - INFO
> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
> ELECTION TOOK - 944020
>
> Additionally, a disk write has taken too long as well:
> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
>
> I believe this stuff is worth to take a closer look, though I'm not an
> expert of Zookeeper, maybe somebody else can give you more insight.
>
> Regards,
> Andor
>
>
> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <devulapalli8@gmail.com>
> wrote:
>
> > Thanks Andor for the reply.
> >
> > We are using zookeeper version 3.4.6; we have 3 instances ; please see
> > below configuration , I believe we are using default configuration and
> > attached zk log  and issue is occurred at First Occurrence: 01/23/2018
> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
> >
> >
> > The issue occurs 3 to 4 times in a month and get auto resolved in few
> mins
> > but this is really annoying our operations team. please let me know if
> you
> > need any additional details
> >
> >
> >
> > # The number of milliseconds of each tick
> > tickTime=2000
> >
> > # The number of ticks that the initial synchronization phase can take
> > initLimit=10
> >
> > # The number of ticks that can pass between sending a request and getting
> > an acknowledgement
> > syncLimit=5
> >
> > # The directory where the snapshot is stored.
> > dataDir=/opt/zookeeper/current/data
> >
> > # The port at which the clients will connect
> > clientPort=2181
> >
> > # This is the list of Zookeeper peers:
> > server.1=zookeeper1:2888:3888
> > server.2=zookeeper2:2888:3888
> > server.3=zookeeper3:2888:3888
> >
> > # The interface IP address(es) from which zookeeper will listen from
> > clientPortAddress=<IP of zk>
> >
> > # The number of snapshots to retain in dataDir
> > autopurge.snapRetainCount=3
> >
> > # Purge task interval in hours
> > # Set to "0" to disable auto purge feature
> > autopurge.purgeInterval=1
> >
> >
> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <andor@cloudera.com>
> wrote:
> >
> >> Hi Upendar,
> >>
> >> Thanks for reporting the issue.
> >> I've a gut feeling which existing bug you've run into, but would you
> >> please
> >> share some more detail (version of ZK, log context, config files, etc.)
> to
> >> get confidence?
> >>
> >> Thanks,
> >> Andor
> >>
> >>
> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <devulapalli8@gmail.com>
> >> wrote:
> >>
> >> > we are getting below error twice in a month , though its auto resolved
> >> but
> >> > anyone can explain why this error occurring and what needs to be done
> to
> >> > prevent the error , is this common error and can be ignored?
> >> >
> >> > Please suggest.
> >> >
> >> >
> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken
> for
> >> id
> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at
> >> > java.net.SocketInputStream.socketRead0(Native Method) at
> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> >> > QuorumCnxManager.java:765)
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message