hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Huo (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-7535) Utilize Snapshot diff report for distcp
Date Tue, 25 Apr 2017 09:01:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982590#comment-15982590
] 

Benjamin Huo edited comment on HDFS-7535 at 4/25/17 9:01 AM:
-------------------------------------------------------------

I've one question regarding the following comments:
"This snapshot diff report represents the delta that should be applied to the backup cluster.
For changes like deletion and rename we can directly apply the same operations (following
some specific order based on their dependency) in the backup cluster. For changes like creation,
append, and other metadata modification we keep using the functionality of the current distcp."

I'm not very clear about what "we keep using the functionality of the current distcp" means.

After fix HDFS-7535, the file changes list for creation and modification are generated based
on snapshots s1 and s2 on the source cluster, or it's generated based on the file changes
between source cluster and destination cluster(with extra cost to transfer file list between
source and target cluster )?

Thanks
Ben




was (Author: benjaminh):
I've one question regarding the following comments:
"This snapshot diff report represents the delta that should be applied to the backup cluster.
For changes like deletion and rename we can directly apply the same operations (following
some specific order based on their dependency) in the backup cluster. For changes like creation,
append, and other metadata modification we keep using the functionality of the current distcp."

I'm not very clear about what "we keep using the functionality of the current distcp" means.

After fix HDFS-7535, the file changes list for creation and modification are generated based
on snapshots s1 and s2 on the source cluster, or it's generated based on the file changes
between source cluster and destination cluster?

Thanks
Ben



> Utilize Snapshot diff report for distcp
> ---------------------------------------
>
>                 Key: HDFS-7535
>                 URL: https://issues.apache.org/jira/browse/HDFS-7535
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp, snapshots
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>             Fix For: 2.7.0
>
>         Attachments: HDFS-7535.000.patch, HDFS-7535.001.patch, HDFS-7535.002.patch, HDFS-7535.003.patch,
HDFS-7535.004.patch
>
>
> Currently HDFS snapshot diff report can identify file/directory creation, deletion, rename
and modification under a snapshottable directory. We can use the diff report for distcp between
the primary cluster and a backup cluster to avoid unnecessary data copy. This is especially
useful when there is a big directory rename happening in the primary cluster: the current
distcp cannot detect the rename op thus this rename usually leads to large amounts of real
data copy.
> More details of the approach will come in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message