hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
Date Thu, 22 Dec 2016 00:16:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768562#comment-15768562
] 

Hudson commented on HBASE-17341:
--------------------------------

SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1829 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1829/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 1999c15a9adf774c39478d181accd6a15bdf29ff)
* (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> Add a timeout during replication endpoint termination
> -----------------------------------------------------
>
>                 Key: HBASE-17341
>                 URL: https://issues.apache.org/jira/browse/HBASE-17341
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
>         Attachments: HBASE-17341.branch-1.1.v1.patch, HBASE-17341.branch-1.1.v2.patch,
HBASE-17341.master.v1.patch, HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from ReplicationEndpoint#stop().
 Future.get() is then called, but can potentially hang there if something went wrong in the
endpoint stop().
> Hanging there has serious implications, because the thread could potentially be the ZK
event thread (e.g. watcher calls ReplicationSourceManager#removePeer() -> ReplicationSource#terminate()
-> blocked).  This means no other events in the ZK event queue will get processed, which
for HBase means other ZK watches such as replication watch notifications, snapshot watch notifications,
even RegionServer shutdown will all get blocked.
> The short term fix addressed here is to simply add a timeout for Future.get().  But the
severe consequences seen here perhaps suggest a broader refactoring of the ZKWatcher usage
in HBase is in order, to protect against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message