hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Dautkhanov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-13992) cross-cluster rack awareness for distcp
Date Mon, 15 Oct 2018 00:29:00 GMT
Ruslan Dautkhanov created HDFS-13992:

             Summary: cross-cluster rack awareness for distcp
                 Key: HDFS-13992
                 URL: https://issues.apache.org/jira/browse/HDFS-13992
             Project: Hadoop HDFS
          Issue Type: New Feature
    Affects Versions: 2.7.7, 3.0.3, 3.1.1, 2.8.4
            Reporter: Ruslan Dautkhanov

Would be great if distcp supported cross-cluster rack awareness.

For example, we have hdfs cluster1 and hdfs cluster2.
Both clusters span three switches, and both have rack awareness enabled.
And also both clusters name same switches same way.

So when distcp runs data replication job, it could replicate hdfs blocks 
only to counterpart datanodes on destination cluster that are in the same physical network 
switch, minimizing latencies and maximizing bandwidth. 

It could be an option, activate through `distcp` clommand-line switch.
We have multiple clusters with default replication of 3 and all those cluster live in same
three different "racks" / "top of the rack switches".

This could drastically minimize inter-switch network traffic during huge distcp jobs.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message