hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Jie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal
Date Wed, 04 Apr 2018 08:00:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tao Jie updated HDFS-13279:
---------------------------
    Attachment: HDFS-13279.005.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> ----------------------------------------------------------------------
>
>                 Key: HDFS-13279
>                 URL: https://issues.apache.org/jira/browse/HDFS-13279
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.8.3, 3.0.0
>            Reporter: Tao Jie
>            Assignee: Tao Jie
>            Priority: Major
>         Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, HDFS-13279.003.patch,
HDFS-13279.004.patch, HDFS-13279.005.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For example, we have
50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes on the last rack. In
this situation, we find that storage usage on the last 5 nodes would be much higher than other
nodes.
>  With the default blockplacement policy, for each block, the first replication has the
same probability to write to each datanode, but the probability for the 2nd/3rd replication
to write to the last 5 nodes would much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block would
distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep is on rack1(15 reps)
would send equally to other 35 nodes and each nodes receive 0.428 rep. So does blocks on rack2
and rack3. As a result, node on rack4(5 nodes) would receive 1.29 replications in all, while
other node would receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message