[ https://issues.apache.org/jira/browse/HDFS1094?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12886823#action_12886823
]
Rodrigo Schmidt commented on HDFS1094:

@Joydeep: Using a term to describe the groups is a great idea. Just bear in mind that we are
not dividing the cluster into exclusive groups. Each block has a limited number of blocks
in which it can be replicated, but separate blocks might have different sets of nodes associated
with, and the intersection between these sets might not be empty.
Here is a sketch of the algorithm to make things a little more clear. This is just a simplification
of the algorithm, and I am not describing several corner cases and reconfiguration scenarios.
{code}
Configuration parameters:
 R: rack window, distance from initial rack we are allowed to place replicas
 M: machine window, size of the machine window within a rack
Whenever network topology changes:
 Sort racks into a logical ring, based on rack name
 Sort nodes within each rack into logical rings, based on node names
For the first replica:
 Write to the local machine, if possible, or pick up a random one
For the second replica:
 Let r be the rack in which the first replica was placed
 Let i be the index of the machine in r that keeps the first replica
 Pick random rack r2 that is within R racks from r
 Pick random machine m2 in r2 that is within window [i, (i+M1)%racksize]
 Place replica in m2
For the third replica:
 Given steps above, pick another random machine m3 in r2 that is within the same window
used for m2
 Make sure m2 != m3
 Place replica in m3
{code}
I hope this explanation helps solve the confusion.
> Intelligent block placement policy to decrease probability of block loss
> 
>
> Key: HDFS1094
> URL: https://issues.apache.org/jira/browse/HDFS1094
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: dhruba borthakur
> Assignee: Rodrigo Schmidt
> Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a nontrivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
