hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10314) A new tool to sync current HDFS view to specified snapshot
Date Wed, 21 Sep 2016 16:11:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510391#comment-15510391
] 

Yongjun Zhang edited comment on HDFS-10314 at 9/21/16 4:11 PM:
---------------------------------------------------------------

Hi [~jingzhao],

For clarity, and as a recap, here is a comparison table between -diff and the proposed -rdiff,
which shows the symmetricity:

||Comparison||-diff s1 s2 <src> <tgt>||-rdiff s2 s1 <src> <tgt>||
|Current feature state|Existing in distcp|Proposed Addition |
|Functionality| Given <tgt>'s current state is s1, make <tgt>'s current state
the same as newer snapshot s2 | Given <tgt>'s current state is s2, make <tgt>'s
current state the same as older snapshot s1 | 
|Requirements| # <src> and <tgt> need to be different paths
# both <src> and <tgt> have snapshot s1 with exact same content 
# <src> has snapshot s2
# s2 is newer than s1
# <tgt>'s current state is the same as s1
# <tgt> doesn't have snapshot s2 | # <src> and <tgt> can be the same or
different paths
# both <src> and <tgt> have snapshot s1 with exact same content
# <tgt> has snapshot s2
#  s2 is newer than s1 
# <tgt>'s current state is the same as s2
# <src> may or may not have snapshot s2 |
|Steps|# calculate snapshotDiff<s1,s2> at <src> 
# apply rename/delete part of snapshotDiff on <tgt> 
# copy modified part of snapshotDiff from s2 of <src> to <tgt> | # calculate snapshotDiff<s2,s1>
at <tgt> 
# apply rename/delete part of snapshotDiff on <tgt> 
# copy modified part of snapshotDiff from s1 of <src> to <tgt> |

The original thinking was to add -ridff to distcp (solution A), but because of the concern
of confusing semantics, it's suggested to introduce a new command here (solution B). 

Thanks.



was (Author: yzhangal):
Hi [~jingzhao],

For clarity, and as a recap, here is a comparison table between -diff and the proposed -rdiff,
which shows the symmetricity:

||Comparison||-diff s1 s2 <src> <tgt>||-rdiff s2 s1 <src> <tgt>||
|Current feature state|Existing in distcp|Proposed Addition |
|Functionality| Given <tgt>'s current state is s1, make <tgt>'s current state
the same as newer snapshot s2 | Given <tgt>'s current state is s2, make <tgt>'s
current state the same as older snapshot s1 | 
|Requirements| # <src> and <tgt> need to be different paths
# both <src> and <tgt> have snapshot s1 with exact same content 
# <src> has snapshot s2
# s2 is newer than s1
# <tgt>'s current state is the same as s1
# <tgt> doesn't have snapshot s2 | # <src> and <tgt> can be the same or
different paths
# both <src> and <tgt> have snapshot s1 with exact same content
# <tgt> has snapshot s2
#  s2 is newer than s1 
# <tgt>'s current state is the same as s2
# <src> may or may not have snapshot s2 |
|Steps|# calculate snapshotDiff<s1,s2> at <src> 
# apply rename/delete part of snapshotDiff on <tgt> 
# copy modified part of snapshotDiff from s1 of <src> to <tgt> | # calculate snapshotDiff<s2,s1>
at <tgt> 
# apply rename/delete part of snapshotDiff on <tgt> 
# copy modified part of snapshotDiff from s1 of <src> to <tgt> |

The original thinking was to add -ridff to distcp (solution A), but because of the concern
of confusing semantics, it's suggested to introduce a new command here (solution B). 

Thanks.


> A new tool to sync current HDFS view to specified snapshot
> ----------------------------------------------------------
>
>                 Key: HDFS-10314
>                 URL: https://issues.apache.org/jira/browse/HDFS-10314
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-10314.001.patch
>
>
> HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of -diff switch.

> Upon discussion with [~jingzhao], we will introduce a new tool that wraps around distcp
to achieve the same purpose.
> I'm thinking about calling the new tool "rsync", similar to unix/linux command "rsync".
The "r" here means remote.
> The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
> {code}
> rsync <fromSnapshotName>  <toSnapshotName>  <source> <target>
> {code}
> This command ensure <fromSnapshotName>  is newer than <toSnapshotName>.
> I think, In the future, we can add another command to have the functionality of -diff
switch of distcp.
> {code}
> sync <fromSnapshotName>  <toSnapshotName>  <source> <target>
> {code}
> that ensures <fromSnapshotName>  is older than <toSnapshotName>.
> Thanks [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message