zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com>
Subject Re: Zookeeper -java.net.SocketException: Socket closed
Date Thu, 25 Jan 2018 11:39:08 GMT
No, this is not the bug I was thinking of.

Looks like your network connection is poor between the leader and the
follower which the logs was attached. Do you have any other network
monitoring tools in place or do you see any network related error messages
in your kernel logs?
Follower lost the connection to the leader:
2018-01-23 07:40:21,709 [myid:3] - WARN
[SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader,
exception during packet send

...and took ages to recover: 944 secs!!
2018-01-23 07:56:05,742 [myid:3] - INFO
[QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
ELECTION TOOK - 944020

Additionally, a disk write has taken too long as well:
2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334] -
fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
adversely effect operation latency. See the ZooKeeper troubleshooting guide

I believe this stuff is worth to take a closer look, though I'm not an
expert of Zookeeper, maybe somebody else can give you more insight.

Regards,
Andor


On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <devulapalli8@gmail.com>
wrote:

> Thanks Andor for the reply.
>
> We are using zookeeper version 3.4.6; we have 3 instances ; please see
> below configuration , I believe we are using default configuration and
> attached zk log  and issue is occurred at First Occurrence: 01/23/2018
> 07:42:22   Last Occurrence: 01/23/2018 07:43:22
>
>
> The issue occurs 3 to 4 times in a month and get auto resolved in few mins
> but this is really annoying our operations team. please let me know if you
> need any additional details
>
>
>
> # The number of milliseconds of each tick
> tickTime=2000
>
> # The number of ticks that the initial synchronization phase can take
> initLimit=10
>
> # The number of ticks that can pass between sending a request and getting
> an acknowledgement
> syncLimit=5
>
> # The directory where the snapshot is stored.
> dataDir=/opt/zookeeper/current/data
>
> # The port at which the clients will connect
> clientPort=2181
>
> # This is the list of Zookeeper peers:
> server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888
> server.3=zookeeper3:2888:3888
>
> # The interface IP address(es) from which zookeeper will listen from
> clientPortAddress=<IP of zk>
>
> # The number of snapshots to retain in dataDir
> autopurge.snapRetainCount=3
>
> # Purge task interval in hours
> # Set to "0" to disable auto purge feature
> autopurge.purgeInterval=1
>
>
> On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <andor@cloudera.com> wrote:
>
>> Hi Upendar,
>>
>> Thanks for reporting the issue.
>> I've a gut feeling which existing bug you've run into, but would you
>> please
>> share some more detail (version of ZK, log context, config files, etc.) to
>> get confidence?
>>
>> Thanks,
>> Andor
>>
>>
>> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <devulapalli8@gmail.com>
>> wrote:
>>
>> > we are getting below error twice in a month , though its auto resolved
>> but
>> > anyone can explain why this error occurring and what needs to be done to
>> > prevent the error , is this common error and can be ignored?
>> >
>> > Please suggest.
>> >
>> >
>> > 2018-01-16 20:36:17,378 [myid:2] - WARN
>> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for
>> id
>> > 3, my id = 2, error = java.net.SocketException: Socket closed at
>> > java.net.SocketInputStream.socketRead0(Native Method) at
>> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
>> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
>> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
>> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
>> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
>> > QuorumCnxManager.java:765)
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message