hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have an upper limit
Date Fri, 19 May 2017 20:29:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017976#comment-16017976

Hudson commented on HBASE-18058:

FAILURE: Integrated in Jenkins build HBase-1.4 #739 (See [https://builds.apache.org/job/HBase-1.4/739/])
HBASE-18058 Zookeeper retry sleep time should have an upper limit (Allan (tedyu: rev 300c5388f2358418faff53558967e00e616c8e1a)
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* (edit) hbase-common/src/main/resources/hbase-default.xml

> Zookeeper retry sleep time should have an upper limit
> -----------------------------------------------------
>                 Key: HBASE-18058
>                 URL: https://issues.apache.org/jira/browse/HBASE-18058
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-18058-branch-1.patch, HBASE-18058-branch-1.v2.patch, HBASE-18058-branch-1.v3.patch,
HBASE-18058.patch, HBASE-18058.v2.patch
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow exponentially, but
it doesn't have any up limit. It directly lead to a long long recovery time after Zookeeper
going down for some while and come back.
> A case of damage done by high sleep time:
> If the server hosting zookeeper is disk full, the zookeeper quorum won't really went
down but reject all write request. So at HBase side, new zk write request will suffers from
exception and retry. But connection remains so the session won't timeout. When disk full situation
have been resolved, the zookeeper quorum can work normally again. But the very high sleep
time cause some module of RegionServer/HMaster will still sleep for a long time(for example,
the balancer) before working.

This message was sent by Atlassian JIRA

View raw message