hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
Date Wed, 07 Jul 2010 06:45:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885852#action_12885852
] 

Joydeep Sen Sarma commented on HDFS-1094:
-----------------------------------------

but map-reduce programs rarely read from just one file. as long as the total input data spans
most racks in a cluster - that should be good enough.

data population in most clusters (like in our case) is itself via parallel processes (whether
it's crawlers or distcp like parallel copiers). these in turn are consumed by other map-reduce
jobs and so on. based on this - i would imagine that most input data sets have a number of
files populated via different nodes/writers (and hence likely to span, in totality, a large
number of racks in the cluster).

> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a non-trivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message