hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5685) DistCp will fail to copy with -delete switch
Date Fri, 03 Jan 2014 02:30:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861119#comment-13861119

Yongjun Zhang commented on HDFS-5685:

Hi Aaron,

Thanks a lot for reviewing the change! Some thoughts to share below (1-to-1 correspondence
to your comments):

1. The reason that I introduced doneFilteringJobDir and doneFilteringJobDirDstLsr is for performance
purpose. If the jobDir is already examined, you don't have to do it for the remaining paths.

2. I introduced "cmp_job_dir" because I saw another similar existing var "dst_cmp_lsr" in
the code neighborhood. I will change the one I introduced.
3. The comment "lsrpath does not exist" meant "lsr path that doesn't exist in source", I added
"delete only if it's not jobDir or jobDir's ancestor" for this bug fix. I will make the correction.
4. I had one earlier version that is re-factored, but there are quite a few if-else' in the
code, a bit trade-off here. Let me give another attempt to see if it will look better.

> DistCp will fail to copy with -delete switch
> --------------------------------------------
>                 Key: HDFS-5685
>                 URL: https://issues.apache.org/jira/browse/HDFS-5685
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 1.2.1
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>             Fix For: 1.3.0
>         Attachments: HDFS-5685.001.patch, HDFS-5685.002.patch
> When using distcp command to copy files with -delete switch, running as user <xyz>,
> hadoop distcp -p -i -update  -delete hdfs://srchost:<port>/user hdfs://dsthost:<port>/user
> It fails with the following exception:
> Copy failed: java.io.FileNotFoundException: File does not exist: hdfs://dsthost:<port>/user/xyz/.stagingdistcp_urjb0g/_distcp_src_files
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:557)
>         at org.apache.hadoop.tools.DistCp$CopyInputFormat.getSplits(DistCp.java:266)
>         at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
>         at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
>         at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>         at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

This message was sent by Atlassian JIRA

View raw message