hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aravind Menon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
Date Fri, 14 May 2010 23:39:49 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aravind Menon updated HDFS-1094:

    Attachment: prob.pdf


We did some analysis of the data loss probability in HDFS under different block placement
schemes and replication factors. We consider two simple placement schemes: 1. random placement,
where each data block can be placed at random on any machine, and 2. in-rack placement, where
all replicas of a block are placed in the same rack. The detailed analysis of these scenarios
is covered in the attached pdf.

The main observations from the analysis are:

1. Probability of data loss increases with cluster size
2. Probability of data loss is lower with in-rack placement than with random placement
3. Probability of data loss is lower with higher degree of replication


> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: prob.pdf
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a non-trivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message