hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit
Date Wed, 17 May 2017 04:03:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013479#comment-16013479

Allan Yang commented on HBASE-18058:

Normally in this case RegionServer will crash due to zookeeper session timeout, similar like
when RS full GC, right? Mind share the case in your scenario? How do you keep RS alive while
zookeeper down for some while? Thanks. Allan Yang
Yes, It is a very interesting case and really happened. If the server hosting zookeeper is
disk full, the zookeeper quorum won't really went down but reject all connection and request.
So at HBase side, it will suffers from connection loss and retry. When disk full situation
have been resolved, the zookeeper quorum can work normally again and all session won't time
out. So HBase server won't crash due to session timeout, but the very high sleep time cause
some module of RegionServer will still sleep for a long time(in our case, the balancer) before

> Zookeeper retry sleep time should have a up limit
> -------------------------------------------------
>                 Key: HBASE-18058
>                 URL: https://issues.apache.org/jira/browse/HBASE-18058
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>         Attachments: HBASE-18058-branch-1.patch, HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow exponentially, but
it doesn't have any up limit. It directly lead to a long long recovery time after Zookeeper
going down for some while and come back.

This message was sent by Atlassian JIRA

View raw message