hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rodrigo Schmidt (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss
Date Mon, 19 Jul 2010 01:51:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889732#action_12889732
] 

Rodrigo Schmidt commented on HDFS-1094:
---------------------------------------

I got permission from Alex Smith to post his script that calculates probabilities here. I'll
do that in a minute, together with a wrapper I wrote that, given some cluster configurations,
displays the data loss probabilities of the following approaches:

- DEFAULT: The default block placement policy used by HDFS
- RING GROUPS: The algorithm I described
- DISJOINT GROUPS: Separating the cluster in disjoint node groups as Joy proposed.

Both RING and DISJOINT depend on windows of choice (node groups) that I define two dimensions:
racks and machines_within_rack. Thus, the total size of the window is racks * machines_within_rack
machines.

Below you will find some numbers. You might notice that the difference between RING and DISJOINT
is not as big as one could possibly think. Assuming the scripts are right, here is a quick
explanation.

RING doesn't have as much freedom in selecting triples inside the window as DISJOINT has.
In RING, the first replica is fixed and defines the group. So, the space of choice is limited
to only two replicas that must be in the same rack.

DISJOINT allows the primary replica to be in any rack in the window, allowing more triples.

This was part of the confusion I was having understanding the 'node groups' definition and
math, since it was not taking this factor into consideration.

Here are the numbers:

=====  6 racks of 20 machines = 120 machines =====

DEFAULT => 0.00010667

RING GROUPS (window = 2 racks, 5 machines) => 2.37756e-06
DISJOINT GROUPS (window = 2 racks, 5 machines) => 1.19392e-06

RING GROUPS (window = 2 racks, 10 machines) => 1.04558e-05
DISJOINT GROUPS (window = 2 racks, 10 machines) => 5.3346e-06

RING GROUPS (window = 2 racks, 20 machines) => 3.97347e-05
DISJOINT GROUPS (window = 2 racks, 20 machines) => 2.22069e-05

RING GROUPS (window = 3 racks, 5 machines) => 4.77748e-06
DISJOINT GROUPS (window = 3 racks, 5 machines) => 2.38034e-06

RING GROUPS (window = 3 racks, 10 machines) => 2.01693e-05
DISJOINT GROUPS (window = 3 racks, 10 machines) => 1.06015e-05

RING GROUPS (window = 3 racks, 20 machines) => 6.95331e-05
DISJOINT GROUPS (window = 3 racks, 20 machines) => 4.38487e-05

RING GROUPS (window = 6 racks, 5 machines) => 1.15237e-05
DISJOINT GROUPS (window = 6 racks, 5 machines) => 5.91359e-06

RING GROUPS (window = 6 racks, 10 machines) => 4.57052e-05
DISJOINT GROUPS (window = 6 racks, 10 machines) => 2.61534e-05

RING GROUPS (window = 6 racks, 20 machines) => 0.00010667
DISJOINT GROUPS (window = 6 racks, 20 machines) => 0.00010667

=====  50 racks of 20 machines = 1000 machines =====

DEFAULT => 0.00584495

RING GROUPS (window = 2 racks, 5 machines) => 1.99588e-05
DISJOINT GROUPS (window = 2 racks, 5 machines) => 9.94932e-06

RING GROUPS (window = 2 racks, 10 machines) => 8.92791e-05
DISJOINT GROUPS (window = 2 racks, 10 machines) => 4.44541e-05

RING GROUPS (window = 2 racks, 20 machines) => 0.000369203
DISJOINT GROUPS (window = 2 racks, 20 machines) => 0.000185042

RING GROUPS (window = 5 racks, 5 machines) => 7.92698e-05
DISJOINT GROUPS (window = 5 racks, 5 machines) => 3.94854e-05

RING GROUPS (window = 5 racks, 10 machines) => 0.000348759
DISJOINT GROUPS (window = 5 racks, 10 machines) => 0.000174956

RING GROUPS (window = 5 racks, 20 machines) => 0.00133532
DISJOINT GROUPS (window = 5 racks, 20 machines) => 0.000716171

RING GROUPS (window = 10 racks, 5 machines) => 0.000174504
DISJOINT GROUPS (window = 10 racks, 5 machines) => 8.85916e-05

RING GROUPS (window = 10 racks, 10 machines) => 0.000742117
DISJOINT GROUPS (window = 10 racks, 10 machines) => 0.00038451

RING GROUPS (window = 10 racks, 20 machines) => 0.00262672
DISJOINT GROUPS (window = 10 racks, 20 machines) => 0.00153868

=====  100 racks of 20 machines = 2000 machines =====

DEFAULT => 0.0160829

RING GROUPS (window = 2 racks, 5 machines) => 4.04848e-05
DISJOINT GROUPS (window = 2 racks, 5 machines) => 1.98985e-05

RING GROUPS (window = 2 racks, 10 machines) => 0.000181715
DISJOINT GROUPS (window = 2 racks, 10 machines) => 8.89063e-05

RING GROUPS (window = 2 racks, 20 machines) => 0.000737402
DISJOINT GROUPS (window = 2 racks, 20 machines) => 0.000370051

RING GROUPS (window = 10 racks, 5 machines) => 0.000352741
DISJOINT GROUPS (window = 10 racks, 5 machines) => 0.000177175

RING GROUPS (window = 10 racks, 10 machines) => 0.00151145
DISJOINT GROUPS (window = 10 racks, 10 machines) => 0.000768873

RING GROUPS (window = 10 racks, 20 machines) => 0.00550483
DISJOINT GROUPS (window = 10 racks, 20 machines) => 0.00307498

RING GROUPS (window = 25 racks, 5 machines) => 0.000906714
DISJOINT GROUPS (window = 25 racks, 5 machines) => 0.000452064




> Intelligent block placement policy to decrease probability of block loss
> ------------------------------------------------------------------------
>
>                 Key: HDFS-1094
>                 URL: https://issues.apache.org/jira/browse/HDFS-1094
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Rodrigo Schmidt
>         Attachments: prob.pdf, prob.pdf
>
>
> The current HDFS implementation specifies that the first replica is local and the other
two replicas are on any two random nodes on a random remote rack. This means that if any three
datanodes die together, then there is a non-trivial probability of losing at least one block
in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability
of losing a block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message