hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5685) DistCp will fail to copy with -delete switch
Date Wed, 25 Dec 2013 06:19:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856529#comment-13856529

Yongjun Zhang commented on HDFS-5685:

Analysis of the cause of the problem:

When we do distcp from one cluster to another, if the staging area for distcp (specified by
config property mapreduce.jobtracker.staging.root.dir, also allled jobDir in the coding) happens
to be in the target directory, distcp removes it before doing the copy when calling deleteNonexisting
if this staging area dir doesn't exist in the distcp source. The solution is to let distcp
be aware of the jobDir it created and filter them out in deleteNonexisting function.

I attempted to create a unit test case for this bug, and found that with the current unit
framework, it's not practical. What we need is a real deployment of Hadoop involves standalone
jobtracker and tasktracker, and set the staging area to DFS. With the current unit test framework,
the staging area can only be at local.

Thanks for reviewing the fix,

> DistCp will fail to copy with -delete switch
> --------------------------------------------
>                 Key: HDFS-5685
>                 URL: https://issues.apache.org/jira/browse/HDFS-5685
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 1.2.1
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>             Fix For: 1.3.0
>         Attachments: HDFS-5685.001.patch
> When using distcp command to copy files with -delete switch, running as user <xyz>,
> hadoop distcp -p -i -update  -delete hdfs://srchost:<port>/user hdfs://dsthost:<port>/user
> It fails with the following exception:
> Copy failed: java.io.FileNotFoundException: File does not exist: hdfs://dsthost:<port>/user/xyz/.stagingdistcp_urjb0g/_distcp_src_files
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:557)
>         at org.apache.hadoop.tools.DistCp$CopyInputFormat.getSplits(DistCp.java:266)
>         at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
>         at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
>         at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>         at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

This message was sent by Atlassian JIRA

View raw message