hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
Date Sat, 10 Jul 2010 01:41:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886976#action_12886976

Joydeep Sen Sarma commented on HDFS-1094:

>  - Pick random rack r2 that is within R racks from r
>  - Pick random machine m2 in r2 that is within window [i, (i+M-1)%racksize]

 a few points:

- dangerous to choose physically contiguous racks for node groups (because of correlated failures
in consecutive racks). may make things a lot worse.
- if rack numbering is based on some arithmetic (so that logically contiguous is not physically
contiguous) - then one has to reason about what happens when new rack is added (i think it's
ok to leave existing replicated data as is - but it's worth talking about this case. what
would the rebalancer do in this case?)
- easy to reduce the overlap between node-groups (and thereby decrease loss probability):
  - instead of [i, (i+M-1)%racksize] - choose [ (i / (r/M))*r/M, (i / (r/M))*r/M + M-1]  //
fixed offset groups of M nodes each in a rack

 i glanced through the Ceph algorithm (http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf) - it
doesn't try to do what's described here. the number of placement groups is not controlled
(and that is not an objective of the algo.). 

a close analog of the problem here is seen RAID arrays. When choosing to do parity + mirroring
to tolerate multiple disk failures - one can choose to mirror RAID-parity groups or apply
parity over RAID mirrored groups (1+5 vs. 5+1). turns out 5+1 is a lot better from a data
loss probability perspective. the reasoning and math are similar (both are susceptible to
data loss on 4-disk failures - but in 5+1 - the 4-disk failures have to be contained within
2 2-disk mirrored pairs. this is combinatorially much harder than the 1+5 case - where the
the 4 disk failures have to cause 2 failures each in the N/2 node groups).

> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a non-trivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message