hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10314) A new tool to sync current HDFS view to specified snapshot
Date Wed, 28 Sep 2016 19:20:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530595#comment-15530595

Yongjun Zhang commented on HDFS-10314:

Had a discussion with [~jingzhao], and we had the following agreement:

1. For now, he will be fine with option 2 stated in


as long as we document it well, even though it's not his favorite. In that case, we can continue
to work on HDFS-9820. 

2. When creating a new tool in the future (HDFS-10314), we need to do the following: 
* refactor the DistCp code to separate out the snapshot sync part (that handles rename/delete
per snapshot diff) and copyList calculation part to its own class, e.g., DistCpPrepare. 
* let both DistCp and DistSync to call DistCpPrepare for the functionality they need
* Modify DistCp to take an optional new argument copyListing.
* Let DistSync call DistCpPrepare to do the snapshot sync part and copyListing creation part,
and then pass the copyListing to DIstCp.

Please feel free to correct/add if I'm inaccurate or missed anything.

Thanks much Jing.

> A new tool to sync current HDFS view to specified snapshot
> ----------------------------------------------------------
>                 Key: HDFS-10314
>                 URL: https://issues.apache.org/jira/browse/HDFS-10314
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-10314.001.patch
> HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of -diff switch.

> Upon discussion with [~jingzhao], we will introduce a new tool that wraps around distcp
to achieve the same purpose.
> I'm thinking about calling the new tool "rsync", similar to unix/linux command "rsync".
The "r" here means remote.
> The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
> {code}
> rsync <fromSnapshotName>  <toSnapshotName>  <source> <target>
> {code}
> This command ensure <fromSnapshotName>  is newer than <toSnapshotName>.
> I think, In the future, we can add another command to have the functionality of -diff
switch of distcp.
> {code}
> sync <fromSnapshotName>  <toSnapshotName>  <source> <target>
> {code}
> that ensures <fromSnapshotName>  is older than <toSnapshotName>.
> Thanks [~jingzhao].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message