hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3224) Bug in check for DN re-registration with different storage ID
Date Fri, 05 Oct 2012 21:28:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470677#comment-13470677

Jason Lowe commented on HDFS-3224:

This bug seems benign but is causing issues with ops monitoring scripts because it allows
a node to be reported as simultaneously live and dead by the NN web UI and JMX.  Here's one

* Node is registered and appears as a live node
* Node fails badly, starts showing up as a dead node
* Node is re-imaged by ops as a fresh node
* Node rejoins the cluster, and now the same host is reported as both live and dead

Since re-imaging the node causes it to get a new storage ID, the failure to recognized it
by name means the NN thinks it's a totally different node and therefore the node is placed
in the datanode map twice for the two storage IDs.

In this case I think we should be calling getDatanodeByName (i.e.: where we include the port).
 This would help us properly distinguish datanodes that are using ephemeral ports (e.g.: miniclusters).
> Bug in check for DN re-registration with different storage ID
> -------------------------------------------------------------
>                 Key: HDFS-3224
>                 URL: https://issues.apache.org/jira/browse/HDFS-3224
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eli Collins
>            Priority: Minor
> DatanodeManager#registerDatanode checks the host to node map using an IP:port key, however
the map is keyed on IP, so this check will always fail. It's performing the check to determine
if a DN with the same IP and storage ID has already registered, and if so to remove this DN
from the map and indicate that eg it's no longer hosting these blocks. This bug has been here

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message