zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From upendar devu <devulapal...@gmail.com>
Subject Re: Zookeeper -java.net.SocketException: Socket closed
Date Thu, 25 Jan 2018 15:21:10 GMT
a disk write has taken too long as well:  I will check on this, thanks for
finding it.  zk logs really bit diff to understand for me.

On Thu, Jan 25, 2018 at 10:19 AM, upendar devu <devulapalli8@gmail.com>
wrote:

> Thanks for sharing analysis , the instances running on EC2 instances and
> we have kafka,zk,storm and es instances as well but not seen such error in
> those components if there is network latency  then there should be socket
> error in other components as data is being processed every sec.
>
> Lets hear from zookeeper dev team , hope they will respond
>
> On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <andor@cloudera.com> wrote:
>
>> No, this is not the bug I was thinking of.
>>
>> Looks like your network connection is poor between the leader and the
>> follower which the logs was attached. Do you have any other network
>> monitoring tools in place or do you see any network related error messages
>> in your kernel logs?
>> Follower lost the connection to the leader:
>> 2018-01-23 07:40:21,709 [myid:3] - WARN
>> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader,
>> exception during packet send
>>
>> ...and took ages to recover: 944 secs!!
>> 2018-01-23 07:56:05,742 [myid:3] - INFO
>> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
>> ELECTION TOOK - 944020
>>
>> Additionally, a disk write has taken too long as well:
>> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334] -
>> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
>> adversely effect operation latency. See the ZooKeeper troubleshooting
>> guide
>>
>> I believe this stuff is worth to take a closer look, though I'm not an
>> expert of Zookeeper, maybe somebody else can give you more insight.
>>
>> Regards,
>> Andor
>>
>>
>> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <devulapalli8@gmail.com>
>> wrote:
>>
>> > Thanks Andor for the reply.
>> >
>> > We are using zookeeper version 3.4.6; we have 3 instances ; please see
>> > below configuration , I believe we are using default configuration and
>> > attached zk log  and issue is occurred at First Occurrence: 01/23/2018
>> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
>> >
>> >
>> > The issue occurs 3 to 4 times in a month and get auto resolved in few
>> mins
>> > but this is really annoying our operations team. please let me know if
>> you
>> > need any additional details
>> >
>> >
>> >
>> > # The number of milliseconds of each tick
>> > tickTime=2000
>> >
>> > # The number of ticks that the initial synchronization phase can take
>> > initLimit=10
>> >
>> > # The number of ticks that can pass between sending a request and
>> getting
>> > an acknowledgement
>> > syncLimit=5
>> >
>> > # The directory where the snapshot is stored.
>> > dataDir=/opt/zookeeper/current/data
>> >
>> > # The port at which the clients will connect
>> > clientPort=2181
>> >
>> > # This is the list of Zookeeper peers:
>> > server.1=zookeeper1:2888:3888
>> > server.2=zookeeper2:2888:3888
>> > server.3=zookeeper3:2888:3888
>> >
>> > # The interface IP address(es) from which zookeeper will listen from
>> > clientPortAddress=<IP of zk>
>> >
>> > # The number of snapshots to retain in dataDir
>> > autopurge.snapRetainCount=3
>> >
>> > # Purge task interval in hours
>> > # Set to "0" to disable auto purge feature
>> > autopurge.purgeInterval=1
>> >
>> >
>> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <andor@cloudera.com>
>> wrote:
>> >
>> >> Hi Upendar,
>> >>
>> >> Thanks for reporting the issue.
>> >> I've a gut feeling which existing bug you've run into, but would you
>> >> please
>> >> share some more detail (version of ZK, log context, config files,
>> etc.) to
>> >> get confidence?
>> >>
>> >> Thanks,
>> >> Andor
>> >>
>> >>
>> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <devulapalli8@gmail.com>
>> >> wrote:
>> >>
>> >> > we are getting below error twice in a month , though its auto
>> resolved
>> >> but
>> >> > anyone can explain why this error occurring and what needs to be
>> done to
>> >> > prevent the error , is this common error and can be ignored?
>> >> >
>> >> > Please suggest.
>> >> >
>> >> >
>> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
>> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken
>> for
>> >> id
>> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at
>> >> > java.net.SocketInputStream.socketRead0(Native Method) at
>> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
>> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
>> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
>> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
>> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
>> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
>> >> > QuorumCnxManager.java:765)
>> >> >
>> >>
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message