hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Linxiao Jin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8878) An HDFS built-in DistCp
Date Fri, 07 Aug 2015 18:45:45 GMT
Linxiao Jin created HDFS-8878:

             Summary: An HDFS built-in DistCp 
                 Key: HDFS-8878
                 URL: https://issues.apache.org/jira/browse/HDFS-8878
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: Linxiao Jin
            Assignee: Linxiao Jin

For now, we use DistCp to do directory copy, which works quite good. However, it would be
better if there is an HDFS built-in, efficient, directory copy tool. It could be faster by
cut off the redundant communication between HDFS, YARN and MapReduce. It could also release
the resource DistCp consumed in job tracker and YARN and easier for debugging.

We need more discussion on the new protocol between NN and DN from different clusters to achieve
HDFS-level command sending and data transfer. One available hacky solution could be, the srcNN
get the block distribution of the target file, ask each datanode to start a DFSClient and
copy their local shortcircuited block as a file in dst cluster. After all the block-file in
dst cluster is completed, use a DFSClient to concat them together to form the target destination
file. There might be some optimized solution by implement a newly designed protocol to communicate
over cluster rather than DFSClient and use methods from lower bottom layer.

This message was sent by Atlassian JIRA

View raw message