hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7891) A block placement policy with best fault tolerance
Date Thu, 19 Mar 2015 21:56:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370205#comment-14370205

Zhe Zhang commented on HDFS-7891:

Great analysis [~walter.k.su]! 

It seems the 003 patch has removed {{rack2hosts}}. So the {{sortedRack}} results were obtained
with 002 patch right?

Conceptually, I think the {{sortedRack}} method makes sense for EC. It essentially introduces
2 levels of random selection: first choosing a rack and the choosing a node in the selected
rack. This is much more efficient than selecting a random node from the entire cluster with
the rack constraint. In particular, in a typical setup, the number of racks in the cluster
should be close to the EC width. So the choice of racks should be easy (e.g., choosing 14
from 15). Its performance benefit should be even larger if you have more nodes per rack, like

So I think the question is whether we can have a simpler implementation of the {{sortedRack}}
method, without duplicating a lot of code.

> A block placement policy with best fault tolerance
> --------------------------------------------------
>                 Key: HDFS-7891
>                 URL: https://issues.apache.org/jira/browse/HDFS-7891
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-7891.002.patch, HDFS-7891.003.patch, HDFS-7891.patch, PlacementPolicyBenchmark.txt,
> a block placement policy tries its best to place replicas to most racks.

This message was sent by Atlassian JIRA

View raw message