hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaoyunjiong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2139) Fast copy for HDFS.
Date Fri, 13 Jun 2014 05:55:03 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

zhaoyunjiong updated HDFS-2139:

    Attachment: HDFS-2139.patch

Thanks Guo Ruijing  & Daryn Sharp for your time.
Update patch according to the comments:
1. add clone in DistributedFileSystem
2. add check block tokens
3. support clone part of the file, the last block still use hardlink, then use truncateBlock
to adjust block size and meta file.

Yes, DN enforce no linking of UC blocks.

> Fast copy for HDFS.
> -------------------
>                 Key: HDFS-2139
>                 URL: https://issues.apache.org/jira/browse/HDFS-2139
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Pritam Damania
>         Attachments: HDFS-2139.patch, HDFS-2139.patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> There is a need to perform fast file copy on HDFS. The fast copy mechanism for a file
works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode

This message was sent by Atlassian JIRA

View raw message