hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2117) Superfast Distcp when copying data within the same hdfs cluster
Date Fri, 08 Oct 2010 17:44:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919331#action_12919331
] 

Doug Cutting commented on MAPREDUCE-2117:
-----------------------------------------

HDFS support for something like hard links would make this even faster, no?  One could hard-link
to blocks in a tree to checkpoint it.  Hard links would be a bigger, deeper change to HDFS,
requiring the maintenance of link counts per block, but might provide a better long-term solution
for such checkpoints.

> Superfast Distcp when copying data within the same hdfs cluster
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-2117
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2117
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>            Reporter: dhruba borthakur
>
> There are use cases when distcp is used to copy a bunch of files/directories from one
part of the HDFS namespace to another part within the same HDFS cluster. It is superfast if
we can instruct relevant datanodes to make local replicas of relevant blocks and limit network
usage to a minimum. It is especially useful to make HBase take a backup of a region with minimum
downtime. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message