zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com>
Subject Re: Zookeeper -java.net.SocketException: Socket closed
Date Thu, 25 Jan 2018 16:10:19 GMT
Use EBS drives and make sure you allocate enough IOPS for the load.

Andor


On Thu, Jan 25, 2018 at 4:21 PM, upendar devu <devulapalli8@gmail.com>
wrote:

> a disk write has taken too long as well:  I will check on this, thanks for
> finding it.  zk logs really bit diff to understand for me.
>
> On Thu, Jan 25, 2018 at 10:19 AM, upendar devu <devulapalli8@gmail.com>
> wrote:
>
> > Thanks for sharing analysis , the instances running on EC2 instances and
> > we have kafka,zk,storm and es instances as well but not seen such error
> in
> > those components if there is network latency  then there should be socket
> > error in other components as data is being processed every sec.
> >
> > Lets hear from zookeeper dev team , hope they will respond
> >
> > On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <andor@cloudera.com>
> wrote:
> >
> >> No, this is not the bug I was thinking of.
> >>
> >> Looks like your network connection is poor between the leader and the
> >> follower which the logs was attached. Do you have any other network
> >> monitoring tools in place or do you see any network related error
> messages
> >> in your kernel logs?
> >> Follower lost the connection to the leader:
> >> 2018-01-23 07:40:21,709 [myid:3] - WARN
> >> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to
> leader,
> >> exception during packet send
> >>
> >> ...and took ages to recover: 944 secs!!
> >> 2018-01-23 07:56:05,742 [myid:3] - INFO
> >> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
> >> ELECTION TOOK - 944020
> >>
> >> Additionally, a disk write has taken too long as well:
> >> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334]
> -
> >> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
> >> adversely effect operation latency. See the ZooKeeper troubleshooting
> >> guide
> >>
> >> I believe this stuff is worth to take a closer look, though I'm not an
> >> expert of Zookeeper, maybe somebody else can give you more insight.
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <devulapalli8@gmail.com>
> >> wrote:
> >>
> >> > Thanks Andor for the reply.
> >> >
> >> > We are using zookeeper version 3.4.6; we have 3 instances ; please see
> >> > below configuration , I believe we are using default configuration and
> >> > attached zk log  and issue is occurred at First Occurrence: 01/23/2018
> >> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
> >> >
> >> >
> >> > The issue occurs 3 to 4 times in a month and get auto resolved in few
> >> mins
> >> > but this is really annoying our operations team. please let me know if
> >> you
> >> > need any additional details
> >> >
> >> >
> >> >
> >> > # The number of milliseconds of each tick
> >> > tickTime=2000
> >> >
> >> > # The number of ticks that the initial synchronization phase can take
> >> > initLimit=10
> >> >
> >> > # The number of ticks that can pass between sending a request and
> >> getting
> >> > an acknowledgement
> >> > syncLimit=5
> >> >
> >> > # The directory where the snapshot is stored.
> >> > dataDir=/opt/zookeeper/current/data
> >> >
> >> > # The port at which the clients will connect
> >> > clientPort=2181
> >> >
> >> > # This is the list of Zookeeper peers:
> >> > server.1=zookeeper1:2888:3888
> >> > server.2=zookeeper2:2888:3888
> >> > server.3=zookeeper3:2888:3888
> >> >
> >> > # The interface IP address(es) from which zookeeper will listen from
> >> > clientPortAddress=<IP of zk>
> >> >
> >> > # The number of snapshots to retain in dataDir
> >> > autopurge.snapRetainCount=3
> >> >
> >> > # Purge task interval in hours
> >> > # Set to "0" to disable auto purge feature
> >> > autopurge.purgeInterval=1
> >> >
> >> >
> >> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <andor@cloudera.com>
> >> wrote:
> >> >
> >> >> Hi Upendar,
> >> >>
> >> >> Thanks for reporting the issue.
> >> >> I've a gut feeling which existing bug you've run into, but would you
> >> >> please
> >> >> share some more detail (version of ZK, log context, config files,
> >> etc.) to
> >> >> get confidence?
> >> >>
> >> >> Thanks,
> >> >> Andor
> >> >>
> >> >>
> >> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <
> devulapalli8@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > we are getting below error twice in a month , though its auto
> >> resolved
> >> >> but
> >> >> > anyone can explain why this error occurring and what needs to
be
> >> done to
> >> >> > prevent the error , is this common error and can be ignored?
> >> >> >
> >> >> > Please suggest.
> >> >> >
> >> >> >
> >> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
> >> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken
> >> for
> >> >> id
> >> >> > 3, my id = 2, error = java.net.SocketException: Socket closed
at
> >> >> > java.net.SocketInputStream.socketRead0(Native Method) at
> >> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at
> >> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
> >> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
> >> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
> >> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
> >> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$
> RecvWorker.run(
> >> >> > QuorumCnxManager.java:765)
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message