hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-18058) Zookeeper retry sleep time should have an upper limit
Date Thu, 18 May 2017 22:51:04 GMT

     [ https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ted Yu updated HBASE-18058:
    Summary: Zookeeper retry sleep time should have an upper limit  (was: Zookeeper retry
sleep time should have a up limit)

> Zookeeper retry sleep time should have an upper limit
> -----------------------------------------------------
>                 Key: HBASE-18058
>                 URL: https://issues.apache.org/jira/browse/HBASE-18058
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>         Attachments: HBASE-18058-branch-1.patch, HBASE-18058-branch-1.v2.patch, HBASE-18058.patch,
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow exponentially, but
it doesn't have any up limit. It directly lead to a long long recovery time after Zookeeper
going down for some while and come back.
> A case of damage done by high sleep time:
> If the server hosting zookeeper is disk full, the zookeeper quorum won't really went
down but reject all write request. So at HBase side, new zk write request will suffers from
exception and retry. But connection remains so the session won't timeout. When disk full situation
have been resolved, the zookeeper quorum can work normally again. But the very high sleep
time cause some module of RegionServer/HMaster will still sleep for a long time(for example,
the balancer) before working.

This message was sent by Atlassian JIRA

View raw message