hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HDFS-1235) Namenode returning the same Datanode to client, due to infrequent heartbeat
Date Thu, 17 Jun 2010 17:40:26 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon resolved HDFS-1235.

    Resolution: Duplicate

Fixed by HDFS-630

> Namenode returning the same Datanode to client, due to infrequent heartbeat
> ---------------------------------------------------------------------------
>                 Key: HDFS-1235
>                 URL: https://issues.apache.org/jira/browse/HDFS-1235
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Thanh Do
> This bug has been reported.
> Basically since datanode's hearbeat messages are infrequent (~ every 10 minutes),
> NameNode always gives the client the same datanode even if the datanode is dead.
> We want to point out that the client wait 6 seconds before retrying,
> which could be considered long and useless retries in this scenario,
> because in 6 secs, the namenode hasn't declared the datanode dead.
> Overall this happens when a datanode is dead during the first phase of the pipeline (file
> If a datanode is dead during the second phase (byte transfer), the DFSClient still
> could proceed with the other surviving datanodes (which is consistent with what
> Hadoop books always say -- the write should proceed if at least we have one good
> datanode).  But unfortunately this specification is not true during the first phase of
> pipeline.  Overall we suggest that the namenode take into consideration the client's
> view of unreachable datanodes.  That is, if a client says that it cannot reach DN-X,
> then the namenode might give the client another node other than X (but the namenode
> does not have to declare N dead). 
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
> Haryadi Gunawi (haryadi@eecs.berkeley.edu)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message