hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2117) Superfast Distcp when copying data within the same hdfs cluster
Date Fri, 08 Oct 2010 20:14:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919371#action_12919371
] 

dhruba borthakur commented on MAPREDUCE-2117:
---------------------------------------------

Doug, I agree. This is more like a fully materialized snapshot rather than a true copy-on-write
snapshot. If the data in each region is small and is scattered among a relatively large set
of machines, the fully materialized approach works ok, otherwise the more performant copy-on-write
snapshot would be needed.

> Superfast Distcp when copying data within the same hdfs cluster
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-2117
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2117
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>            Reporter: dhruba borthakur
>
> There are use cases when distcp is used to copy a bunch of files/directories from one
part of the HDFS namespace to another part within the same HDFS cluster. It is superfast if
we can instruct relevant datanodes to make local replicas of relevant blocks and limit network
usage to a minimum. It is especially useful to make HBase take a backup of a region with minimum
downtime. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message