Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 2 Feb 2016 03:26:40 +0000 (UTC)
From: "Liu Junhong (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12513711.1310412667000.272537.1454383600028@Atlassian.JIRA>
In-Reply-To: <JIRA.12513711.1310412667000@Atlassian.JIRA>
References: <JIRA.12513711.1310412667000@Atlassian.JIRA>
 <JIRA.12513711.1310412667787@arcas>
Subject: [jira] [Commented] (HDFS-2139) Fast copy for HDFS.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127592#comment-15127592 ] 

Liu Junhong commented on HDFS-2139:
-----------------------------------

I think it's useful when copy files between two namespaces(with same datanodes).
I need a reviewer [~dhruba] 

> Fast copy for HDFS.
> -------------------
>
>                 Key: HDFS-2139
>                 URL: https://issues.apache.org/jira/browse/HDFS-2139
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Pritam Damania
>            Assignee: Rituraj
>         Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)