[ https://issues.apache.org/jira/browse/HDFS1094?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12889732#action_12889732
]
Rodrigo Schmidt commented on HDFS1094:

I got permission from Alex Smith to post his script that calculates probabilities here. I'll
do that in a minute, together with a wrapper I wrote that, given some cluster configurations,
displays the data loss probabilities of the following approaches:
 DEFAULT: The default block placement policy used by HDFS
 RING GROUPS: The algorithm I described
 DISJOINT GROUPS: Separating the cluster in disjoint node groups as Joy proposed.
Both RING and DISJOINT depend on windows of choice (node groups) that I define two dimensions:
racks and machines_within_rack. Thus, the total size of the window is racks * machines_within_rack
machines.
Below you will find some numbers. You might notice that the difference between RING and DISJOINT
is not as big as one could possibly think. Assuming the scripts are right, here is a quick
explanation.
RING doesn't have as much freedom in selecting triples inside the window as DISJOINT has.
In RING, the first replica is fixed and defines the group. So, the space of choice is limited
to only two replicas that must be in the same rack.
DISJOINT allows the primary replica to be in any rack in the window, allowing more triples.
This was part of the confusion I was having understanding the 'node groups' definition and
math, since it was not taking this factor into consideration.
Here are the numbers:
===== 6 racks of 20 machines = 120 machines =====
DEFAULT => 0.00010667
RING GROUPS (window = 2 racks, 5 machines) => 2.37756e06
DISJOINT GROUPS (window = 2 racks, 5 machines) => 1.19392e06
RING GROUPS (window = 2 racks, 10 machines) => 1.04558e05
DISJOINT GROUPS (window = 2 racks, 10 machines) => 5.3346e06
RING GROUPS (window = 2 racks, 20 machines) => 3.97347e05
DISJOINT GROUPS (window = 2 racks, 20 machines) => 2.22069e05
RING GROUPS (window = 3 racks, 5 machines) => 4.77748e06
DISJOINT GROUPS (window = 3 racks, 5 machines) => 2.38034e06
RING GROUPS (window = 3 racks, 10 machines) => 2.01693e05
DISJOINT GROUPS (window = 3 racks, 10 machines) => 1.06015e05
RING GROUPS (window = 3 racks, 20 machines) => 6.95331e05
DISJOINT GROUPS (window = 3 racks, 20 machines) => 4.38487e05
RING GROUPS (window = 6 racks, 5 machines) => 1.15237e05
DISJOINT GROUPS (window = 6 racks, 5 machines) => 5.91359e06
RING GROUPS (window = 6 racks, 10 machines) => 4.57052e05
DISJOINT GROUPS (window = 6 racks, 10 machines) => 2.61534e05
RING GROUPS (window = 6 racks, 20 machines) => 0.00010667
DISJOINT GROUPS (window = 6 racks, 20 machines) => 0.00010667
===== 50 racks of 20 machines = 1000 machines =====
DEFAULT => 0.00584495
RING GROUPS (window = 2 racks, 5 machines) => 1.99588e05
DISJOINT GROUPS (window = 2 racks, 5 machines) => 9.94932e06
RING GROUPS (window = 2 racks, 10 machines) => 8.92791e05
DISJOINT GROUPS (window = 2 racks, 10 machines) => 4.44541e05
RING GROUPS (window = 2 racks, 20 machines) => 0.000369203
DISJOINT GROUPS (window = 2 racks, 20 machines) => 0.000185042
RING GROUPS (window = 5 racks, 5 machines) => 7.92698e05
DISJOINT GROUPS (window = 5 racks, 5 machines) => 3.94854e05
RING GROUPS (window = 5 racks, 10 machines) => 0.000348759
DISJOINT GROUPS (window = 5 racks, 10 machines) => 0.000174956
RING GROUPS (window = 5 racks, 20 machines) => 0.00133532
DISJOINT GROUPS (window = 5 racks, 20 machines) => 0.000716171
RING GROUPS (window = 10 racks, 5 machines) => 0.000174504
DISJOINT GROUPS (window = 10 racks, 5 machines) => 8.85916e05
RING GROUPS (window = 10 racks, 10 machines) => 0.000742117
DISJOINT GROUPS (window = 10 racks, 10 machines) => 0.00038451
RING GROUPS (window = 10 racks, 20 machines) => 0.00262672
DISJOINT GROUPS (window = 10 racks, 20 machines) => 0.00153868
===== 100 racks of 20 machines = 2000 machines =====
DEFAULT => 0.0160829
RING GROUPS (window = 2 racks, 5 machines) => 4.04848e05
DISJOINT GROUPS (window = 2 racks, 5 machines) => 1.98985e05
RING GROUPS (window = 2 racks, 10 machines) => 0.000181715
DISJOINT GROUPS (window = 2 racks, 10 machines) => 8.89063e05
RING GROUPS (window = 2 racks, 20 machines) => 0.000737402
DISJOINT GROUPS (window = 2 racks, 20 machines) => 0.000370051
RING GROUPS (window = 10 racks, 5 machines) => 0.000352741
DISJOINT GROUPS (window = 10 racks, 5 machines) => 0.000177175
RING GROUPS (window = 10 racks, 10 machines) => 0.00151145
DISJOINT GROUPS (window = 10 racks, 10 machines) => 0.000768873
RING GROUPS (window = 10 racks, 20 machines) => 0.00550483
DISJOINT GROUPS (window = 10 racks, 20 machines) => 0.00307498
RING GROUPS (window = 25 racks, 5 machines) => 0.000906714
DISJOINT GROUPS (window = 25 racks, 5 machines) => 0.000452064
> Intelligent block placement policy to decrease probability of block loss
> 
>
> Key: HDFS1094
> URL: https://issues.apache.org/jira/browse/HDFS1094
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: dhruba borthakur
> Assignee: Rodrigo Schmidt
> Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a nontrivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
