hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9820) Improve distcp to support efficient restore to an earlier snapshot
Date Tue, 18 Oct 2016 07:38:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584754#comment-15584754
] 

Hadoop QA commented on HDFS-9820:
---------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 28s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 5 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 53s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 20s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 18s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 22s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 15s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 33s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 16s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 18s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 14s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 14s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 12s{color}
| {color:orange} hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged -
12 fixed = 198 total (was 186) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 18s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 11s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 32s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 13s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 41s{color} | {color:green}
hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 15s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m  7s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-9820 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12833897/HDFS-9820.008.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux e05f337300d9 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c023c74 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17199/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17199/testReport/ |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17199/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve distcp to support efficient restore to an earlier snapshot
> ------------------------------------------------------------------
>
>                 Key: HDFS-9820
>                 URL: https://issues.apache.org/jira/browse/HDFS-9820
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: distcp
>    Affects Versions: 2.6.4
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, HDFS-9820.003.patch, HDFS-9820.004.patch,
HDFS-9820.005.patch, HDFS-9820.006.patch, HDFS-9820.007.patch, HDFS-9820.008.patch, HDFS-9820.branch-2.patch
>
>
> A common use scenario (scenaio 1): 
> # create snapshot sx in clusterX, 
> # do some experiemnts in clusterX, which creates some files. 
> # throw away the files changed and go back to sx.
> Another scenario (scenario 2) is, there is a production cluster and a backup cluster,
we periodically sync up the data from production cluster to the backup cluster with distcp.

> The cluster in scenario 1 could be the backup cluster in scenario 2.
> For scenario 1:
> HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are some complexity
and challenges.  Before that jira is implemented, we count on distcp to copy from snapshot
to the current state. However, the performance of this operation could be very bad because
we have to go through all files even if we only changed a few files.
> For scenario 2:
> HDFS-7535 improved distcp performance by avoiding copying files that changed name since
last backup.
> On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data from source
to target cluster, by only copying changed files since last backup. The way it works is use
snapshot diff to find out all files changed, and copy the changed files only.
> See https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/
> This jira is to propose a variation of HDFS-8828, to find out the files changed in target
cluster since last snapshot sx, and copy these from snapshot sx of either the source or the
target cluster, to restore target cluster's current state to sx. 
> Specifically,
> If a file/dir is
> - renamed, rename it back
> - created in target cluster, delete it
> - modified, put it to the copy list
> - run distcp with the copy list, copy from the source cluster's corresponding snapshot
> This could be a new command line switch -rdiff in distcp.
> As a native restore feature, HDFS-4167 would still be ideal to have. However,  HDFS-9820
would hopefully be easier to implement, before HDFS-4167 is in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message