hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-21744) timeout for server list refresh calls
Date Sat, 19 Jan 2019 00:26:00 GMT
Sergey Shelukhin created HBASE-21744:

             Summary: timeout for server list refresh calls 
                 Key: HBASE-21744
                 URL: https://issues.apache.org/jira/browse/HBASE-21744
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin

Not sure why yet, but we are seeing the case when cluster is in overall a bad state, where
after RS dies and deletes its znode, the notification looks like it's lost, so the master
doesn't detect the failure. ZK itself appears to be healthy and doesn't report anything special.
After some other change is made to the server list, master rescans the list and picks up the
stale notification. Might make sense to add a config that would trigger the refresh if it
hasn't happened for a while (e.g. 1 minute).

This message was sent by Atlassian JIRA

View raw message