hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7535) Utilize Snapshot diff report for distcp
Date Thu, 26 Feb 2015 16:01:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338607#comment-14338607
] 

Tsz Wo Nicholas Sze commented on HDFS-7535:
-------------------------------------------

> We verify these assumptions before the sync and we fallback to the default distcp behavior
...

Is it better to throw an exception instead since the user may not want to fallback?

> Utilize Snapshot diff report for distcp
> ---------------------------------------
>
>                 Key: HDFS-7535
>                 URL: https://issues.apache.org/jira/browse/HDFS-7535
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp, snapshots
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch
>
>
> Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename
and modification under a snapshottable directory. We can use the diff report for distcp between
the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially
useful when there is a big directory rename happening in the primary cluster: the current
distcp cannot detect the rename op thus this rename usually leads to large amounts of real
data copy.
> More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message