hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Justin Lynn (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1675) HBase cluster crash condition during rebalance
Date Tue, 21 Jul 2009 20:20:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733808#action_12733808
] 

Justin Lynn commented on HBASE-1675:
------------------------------------

It appears this is caused due to the Region Server and Master getting different data from
our clustered DNS services due to someone forgetting in increment the serial number of the
reverse zone when adding a node. This means the master is hitting DNS each time a region server
heartbeat comes in and we are vulnerable to DNS problems during the cluster life cycle. Explicitly
setting the hostname and ip of each node in each node's host file eliminates this failure.

> HBase cluster crash condition during rebalance
> ----------------------------------------------
>
>                 Key: HBASE-1675
>                 URL: https://issues.apache.org/jira/browse/HBASE-1675
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>         Environment: Ubuntu Linux x86/64, Java JVM 1.6.0_06-b02, Hadoop 0.20.0, Hbase
trunk revision 795201
>            Reporter: Justin Lynn
>            Priority: Blocker
>         Attachments: hbase-hadoop-master-ha02.socialmedia.com.crash.log, hbase-hadoop-regionserver-ha12.socialmedia.com.log
>
>
> During cluster idle and rebalance both META and ROOT tables become unavailable leading
to cascading NotServingRegion exceptions and cluster unavailability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message