hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-577) Name node doesn't always properly recognize health of data node
Date Mon, 31 Aug 2009 18:15:32 GMT

    [ https://issues.apache.org/jira/browse/HDFS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749562#action_12749562
] 

Allen Wittenauer commented on HDFS-577:
---------------------------------------

Just to put what I suspect is the basic concern to rest :), I don't want a double check on
every block report/heartbeat.   But I think it might be useful if the name node attempted
to connect to the data node over a long period of time [probably another configurable :( ].
 

I'm trying to think of a use case where it would be beneficial/useful if data node/name node
had one way communication and coming up empty.

As to the network partitioning problem (where data nodes lose connectivity to each other,
but name node still has connectivity), it may be worth while to have an algorithm such that
if x% percent cannot communicate, then we enter safe mode.  From a practical perspective,
chances are good the job tracker is going to go down in flames in those sorts of situations
anyway since the tasktrackers should end up on the dead pile.  Even in a pure HDFS setup,
at some point the replication list is going to get very large if we start declaring nodes
dead based upon %... so probably better off to just safemode ourselves and alert the admin
that the network is horked.

> Name node doesn't always properly recognize health of data node
> ---------------------------------------------------------------
>
>                 Key: HDFS-577
>                 URL: https://issues.apache.org/jira/browse/HDFS-577
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Allen Wittenauer
>
> The one-way communication (data node -> name node) for node health does not guarantee
that the data node is actually healthy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message