hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7891) A block placement policy with best fault tolerance
Date Thu, 19 Mar 2015 21:56:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370205#comment-14370205
] 

Zhe Zhang commented on HDFS-7891:
---------------------------------

Great analysis [~walter.k.su]! 

It seems the 003 patch has removed {{rack2hosts}}. So the {{sortedRack}} results were obtained
with 002 patch right?

Conceptually, I think the {{sortedRack}} method makes sense for EC. It essentially introduces
2 levels of random selection: first choosing a rack and the choosing a node in the selected
rack. This is much more efficient than selecting a random node from the entire cluster with
the rack constraint. In particular, in a typical setup, the number of racks in the cluster
should be close to the EC width. So the choice of racks should be easy (e.g., choosing 14
from 15). Its performance benefit should be even larger if you have more nodes per rack, like
20. 

So I think the question is whether we can have a simpler implementation of the {{sortedRack}}
method, without duplicating a lot of code.

> A block placement policy with best fault tolerance
> --------------------------------------------------
>
>                 Key: HDFS-7891
>                 URL: https://issues.apache.org/jira/browse/HDFS-7891
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-7891.002.patch, HDFS-7891.003.patch, HDFS-7891.patch, PlacementPolicyBenchmark.txt,
testresult.txt
>
>
> a block placement policy tries its best to place replicas to most racks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message