zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket
Date Wed, 30 Jan 2019 15:45:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756244#comment-16756244
] 

Hudson commented on ZOOKEEPER-3240:
-----------------------------------

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #374 (See [https://builds.apache.org/job/ZooKeeper-trunk/374/])
Revert "ZOOKEEPER-3240: Close socket on Learner shutdown to avoid (andor: rev bcbf64884f2ee3e8a150b0b3c20a8fa03a05162e)
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Observer.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Follower.java


> Close socket on Learner shutdown to avoid dangling socket
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-3240
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.6.0
>            Reporter: Brian Nixon
>            Assignee: Brian Nixon
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.6.0, 3.5.5
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There was a Learner that had two connections to the Leader after that Learner hit an
unexpected exception during flush txn to disk, which will shutdown previous follower instance
and restart a new one.
>  
> {quote}2018-10-26 02:31:35,568 ERROR [SyncThread:3:ZooKeeperCriticalThread@48] - Severe
unrecoverable error, from thread : SyncThread:3
> java.io.IOException: Input/output error
>         at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>         at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
>         at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
>         at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
>         at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
>         at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
>         at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
>         at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - Thread
SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - SyncRequestProcessor
exited!{quote}
>  
> It is supposed to close the previous socket, but it doesn't seem to be done anywhere
in the code. This leaves the socket open with no one reading from it, and caused the queue
full and blocked on sender.
>  
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size keeps growing,
the JVM heap size on leader keeps growing and added pressure to the GC, and cause high GC
time and latency in the quorum.
>  
> The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message