hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10234) DistCp log output should contain copied and deleted files and directories
Date Thu, 24 Aug 2017 00:04:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139334#comment-16139334
] 

Xiaoyu Yao edited comment on HDFS-10234 at 8/24/17 12:03 AM:
-------------------------------------------------------------

[~linyiqun], thanks for the patch. I take a look at the v2 patch here are my comments: 

1. We need replace the COPY count update in createTargetDirsWithRetry with a DIR_COPY count
update. The change to update DIR_COPY counter in map() can be removed after that. Without
that,  [~k.shaposhnikov@gmail.com]'s earlier comment are not fully addressed. 

{code}
@@ -260,7 +268,7 @@ private void createTargetDirsWithRetry(String description,
     } catch (Exception e) {
       throw new IOException("mkdir failed for " + target, e);
     }
-    incrementCounter(context, Counter.COPY, 1);
+    incrementCounter(context, Counter.DIR_COPY, 1);
   }
{code}

2. Can we include both the source (path, size) and destination (path, size) in the SKIP/COPY
log inside map()? The information is available from sourceCurrStatus and targetStatus there.
This way, many applications can just parse the distcp log offline to get information without
adding extra load on namenode.

3. We will need a switch (e.g., -v) to enable these additional log output for backward compatibility.
By default, the log only contains the information as it is today. 



was (Author: xyao):
[~linyiqun], thanks for the patch. I take a look at the v2 patch here are my comments: 

1. We need replace the COPY count update in createTargetDirsWithRetry with a DIR_COPY count
update. The change to update DIR_COPY counter in map() can be removed after that. Without
that,  [~k.shaposhnikov@gmail.com]'s earlier comment are not fully addressed. 

{code}
@@ -260,7 +268,7 @@ private void createTargetDirsWithRetry(String description,
     } catch (Exception e) {
       throw new IOException("mkdir failed for " + target, e);
     }
-    incrementCounter(context, Counter.COPY, 1);
+    incrementCounter(context, Counter.DIR_COPY, 1);
   }
{code}

2. Can we include both the source (path, size) and destination (path, size) in the SKIP/COPY
log inside map()? The information is available from sourceCurrStatus and targetStatus there.
This way, many applications can just parse the distcp log offline to get information without
adding extra load on namenode.


> DistCp log output should contain copied and deleted files and directories
> -------------------------------------------------------------------------
>
>                 Key: HDFS-10234
>                 URL: https://issues.apache.org/jira/browse/HDFS-10234
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.7.1
>            Reporter: Konstantin Shaposhnikov
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10234.001.patch, HDFS-10234.002.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently contains only
skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and created directories.
> This should be fixed in https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message