hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amitanand Aiyer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-8216) Be able to differentiate Power failures from Rack switch reboot
Date Thu, 28 Mar 2013 18:55:16 GMT
Amitanand Aiyer created HBASE-8216:

             Summary: Be able to differentiate Power failures from Rack switch reboot
                 Key: HBASE-8216
                 URL: https://issues.apache.org/jira/browse/HBASE-8216
             Project: HBase
          Issue Type: Bug
            Reporter: Amitanand Aiyer
            Assignee: Amitanand Aiyer
            Priority: Minor
             Fix For: 0.89-fb

The master in 0.89-fb waits for 5-6 mins to check if RS'es become accessible; when it sees
a co-related failure such as a rack-switch-reboot.

The rationale behind doing this is that it is not worth assigning and reassigning regions
-- causing churn, when the rack switch reboots are expected to heal themselves in 5-6 mins.
In earlier deployments, where this feature was not present, we used to find ourselves in a
bad situation for 30mins-1hr.

However, co-related failures also happen when there is a power failure for the rack. These
cases take much longer to heal; so waiting for 5-6 mins is a wasted effort.

The master should be able to differentiate the two scenario, by checking if *any* of the RS
in the rack is able to communicate. Unless all the servers in the rack are unaccessible, we
should proceed with reassigning the regions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message