[ https://issues.apache.org/jira/browse/HDFS1094?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12856139#action_12856139
]
Brian Bockelman commented on HDFS1094:

Hey Karthik,
Let me play dumb (it might not be playing after all) and try to work out the math a bit.
First, let's assume that on any given day, a node has 1/1000 chance of failing.
CURRENT SCHEME: A block is on 3 random nodes. Probability of loss is a simultaneous failure
of nodes X, Y, Z. Let's assume these are independent. P(X and Y and Z) = P(X) P(Y) P(Z)
= 1 in a billion.
PROPOSED SCHEME: Well, the probability is the same.
So, given a specific block, we don't change the probability it is lost.
What you seem to be calculating is the probability that three nodes go down out of N nodes:
P(nodes X, Y, and Z fail for any three distinct X, Y, Z) = 1  P(N3 nodes stay up) = 1 
[999/1000]^[N3]
Sure enough, if you use a small subset (N=40 maybe), then the probability of 3 nodes failing
is smaller for small subsets than the whole cluster.
However, that's not the number you want! You want the probability that *any* block is lost
when three nodes go down. That is, P(nodes X, Y, and Z fail for any three distinct X, Y,
Z and X, Y, Z have at least one distinct block) (call this P_1). Assuming that overlapping
blocks, node death, and subset of nodes are all independent, you get:
P_1 = P(three nodes having at least one common block) * P(3 node death) * (# of distinct 3node
subsets)
The first number is decreasing with N, the second is constant with N, the third is increasing
with N. The third is a wellknown formula, while I don't have a good formula for the first
value. Unless you can calculate or estimate the first, I don't think you can really say anything
about decreasing the value of P_1.
I *think* we are incorrectly assuming the probability of data loss as being proportional to
to the probability of 3 machines in a subset being lost without taking into account the probability
of common blocks. The probabilities get tricky, hence me asking for someone to sketch it
out mathematically...
> Intelligent block placement policy to decrease probability of block loss
> 
>
> Key: HDFS1094
> URL: https://issues.apache.org/jira/browse/HDFS1094
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
>
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a nontrivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa

For more information on JIRA, see: http://www.atlassian.com/software/jira
