hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thanh Do (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1235) Namenode returning the same Datanode to client, due to infrequent heartbeat
Date Thu, 17 Jun 2010 05:39:23 GMT
Namenode returning the same Datanode to client, due to infrequent heartbeat

                 Key: HDFS-1235
                 URL: https://issues.apache.org/jira/browse/HDFS-1235
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
            Reporter: Thanh Do

This bug has been reported.
Basically since datanode's hearbeat messages are infrequent (~ every 10 minutes),
NameNode always gives the client the same datanode even if the datanode is dead.
We want to point out that the client wait 6 seconds before retrying,
which could be considered long and useless retries in this scenario,
because in 6 secs, the namenode hasn't declared the datanode dead.

Overall this happens when a datanode is dead during the first phase of the pipeline (file
If a datanode is dead during the second phase (byte transfer), the DFSClient still
could proceed with the other surviving datanodes (which is consistent with what
Hadoop books always say -- the write should proceed if at least we have one good
datanode).  But unfortunately this specification is not true during the first phase of the
pipeline.  Overall we suggest that the namenode take into consideration the client's
view of unreachable datanodes.  That is, if a client says that it cannot reach DN-X,
then the namenode might give the client another node other than X (but the namenode
does not have to declare N dead). 

This bug was found by our Failure Testing Service framework:
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
Haryadi Gunawi (haryadi@eecs.berkeley.edu)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message