zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Nixon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket
Date Thu, 10 Jan 2019 20:19:00 GMT
Brian Nixon created ZOOKEEPER-3240:

             Summary: Close socket on Learner shutdown to avoid dangling socket
                 Key: ZOOKEEPER-3240
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
             Project: ZooKeeper
          Issue Type: Improvement
          Components: server
    Affects Versions: 3.6.0
            Reporter: Brian Nixon

There was a Learner that had two connections to the Leader after that Learner hit an unexpected
exception during flush txn to disk, which will shutdown previous follower instance and restart
a new one.
{quote}2018-10-26 02:31:35,568 ERROR [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable
error, from thread : SyncThread:3
java.io.IOException: Input/output error
        at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
        at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
        at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
        at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
        at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - Thread SyncThread:3
exits, error code 1
2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - SyncRequestProcessor
It is supposed to close the previous socket, but it doesn't seem to be done anywhere in the
code. This leaves the socket open with no one reading from it, and caused the queue full and
blocked on sender.
Since the LearnerHandler didn't shutdown gracefully, the learner queue size keeps growing,
the JVM heap size on leader keeps growing and added pressure to the GC, and cause high GC
time and latency in the quorum.
The simple fix is to gracefully shutdown the socket.

This message was sent by Atlassian JIRA

View raw message