hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-648) distcp -update launches job when there is at least one dir in source paths to be copied, even though there is nothing to copy
Date Mon, 14 Sep 2009 20:50:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755189#action_12755189
] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-648:
--------------------------------------------------

Patch looks mostly good.  Some comments below:

- Why throwing IOException in the following?
{code}
+    if (destFileSys.exists(dst)) {
+      if (!destFileSys.getFileStatus(dst).isDir()) {
+        throw new IOException("Failed to mkdirs: " + dst+" is a file.");
+      }
+      return true;
+    }
{code}
Also, destFileSys.exists(dst) can be omitted for saving an RPC.  We may have try-catch on
destFileSys.getFileStatus(dst).  If dst does not exist, a FNFE will be caught.

- Could you rename skipfile to skippath since it includes dir after the patch?

> distcp -update launches job when there is at least one dir in source paths to be copied,
even though there is nothing to copy
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-648
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-648
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Minor
>         Attachments: d_dirCount648.patch, d_dirCount648.v1.patch, d_dirCount_648.patch
>
>
> distcp -update launches job when there is at least one dir in source paths to be copied,
even though there is nothing to copy.
> HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job. And
HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of empty directories
not getting copied to destination. With -update, dirCount is incremented without checking
if that dir already exists at the destination. So distcp job is launched because of dirCount
> 0 even though there is nothing to copy. Incrementing dirCount can be skipped if that
dir already exists at the destination in case of -update.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message