hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-3442) Master failing when node disconnects or dies
Date Sat, 19 Jul 2014 01:05:39 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell resolved HBASE-3442.

    Resolution: Invalid

Issue wasn't actionable

> Master failing when node disconnects or dies
> --------------------------------------------
>                 Key: HBASE-3442
>                 URL: https://issues.apache.org/jira/browse/HBASE-3442
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.90.0
>         Environment: CentOS 5, Hbase .90 RC3, Amazon EC2
>            Reporter: Justin
>            Priority: Minor
> We've got our servers running on Amazon EC2 and nodes will go through some shutdown scripts
if/when we want to take them out of the mix.  Ended up shutting down one of the nodes, in
this case Node98, which cased the immediate crash of the master server.  Upon restarting the
master, it would attempt to contact the missing node, and then stop it's startup process.
 I believe the node removed itself from the DNS server first, then ran a stop on the datanode,
and regionserver.  The missing node was also removed from any slave/regionserver list on the
master server.  I finally put in a bogus entry in the /etc/hosts file for the missing node,
pointing it back to, and the master server finally marked it as a dead node, ignored
it, and finished the startup process.
> Going to try and replicate it again and save some more logs, the following log is the
only thing I saved from the first occurrence;  It's the master failing to start up while checking
for the missing node:  http://pastebin.com/ZyQMQm91

This message was sent by Atlassian JIRA

View raw message