hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pritam Damania (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-2139) Fast copy for HDFS.
Date Mon, 11 Jul 2011 19:32:00 GMT
Fast copy for HDFS.

                 Key: HDFS-2139
                 URL: https://issues.apache.org/jira/browse/HDFS-2139
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: Pritam Damania

There is a need to perform fast file copy on HDFS. The fast copy mechanism for a file works
follows :

1) Query metadata for all blocks of the source file.

2) For each block 'b' of the file, find out its datanode locations.

3) For each block of the file, add an empty block to the namesystem for
the destination file.

4) For each location of the block, instruct the datanode to make a local
copy of that block.

5) Once each datanode has copied over its respective blocks, they
report to the namenode about it.

6) Wait for all blocks to be copied and exit.

This would speed up the copying process considerably by removing top of
the rack data transfers.

Note : An extra improvement, would be to instruct the datanode to create a
hardlink of the block file if we are copying a block on the same datanode

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message