hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
Date Mon, 12 Apr 2010 07:21:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855872#action_12855872
] 

dhruba borthakur commented on HDFS-1094:
----------------------------------------

One simple solution that come to mind. Let's arrange all possible racks in a cluster and number
them in a logical fashion 0, 1, 2, ...n

1. The first replica is on the local node on rack r. Then the other two replicas be randomly
selected nodes on either rack r-1 or r+1. In this approach, three datanodes in *two* consecutive
racks have to fail simultaneously for a block loss to occur. This is better than the current
implementation where any three datanode failures in the entire cluster can cause one block
to be lost. 

2. The first replica is on the local node on rack r. Let's say that the local node is the
6th node in the local rack. Then the other two replicas of this block will be also reside
on the 6th node of any randomly selected remote racks. In this approach, any three datanodes
in the same p-th position in a rack has to fail for one block to be lost.


> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a non-trivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message