hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HDFS-1384) NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack.
Date Fri, 10 Sep 2010 09:22:33 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dhruba borthakur resolved HDFS-1384.
------------------------------------

    Resolution: Duplicate

This bug has been fixed in trunk because the client sends the excluded list to the namenode
with the addBlock RPC. The NN ensures that it does not return a datanode from the excluded
list.

This bug is still present in the 0.20-append branch

> NameNode should give client the first node in the pipeline from different rack  other
than that of excludedNodes list in the same rack.
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1384
>                 URL: https://issues.apache.org/jira/browse/HDFS-1384
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20-append, 0.20.1
>            Reporter: Thanh Do
>
> We saw a case that NN keeps giving client nodes from the same rack, hence an exception

> from client when try to setup the pipeline. Client retries 5 times and fails.
>  
> Here is more details. Support we have 2 rack
> - Rack 0: from dn1 to dn7
> - Rack 1: from dn8 to dn14
> Client asks for 3 dns and NN replies with dn1, dn8 and dn9, for example.
> Because there is network partition, so client doesn't see any node in Rack 0.
> Hence, client add dn1 to excludedNodes list, and ask NN again.
> Interestingly, NN picks a different node (from those in excludedNodes) in Rack 0, 
> and gives back to client, and so on. Client keeps retrying and after 5 times of retrials,

> write fails.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
> Haryadi Gunawi (haryadi@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message