hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6152) distcp V2 doesn't preserve root dir's attributes when -p is specified
Date Mon, 24 Mar 2014 23:44:42 GMT
Yongjun Zhang created HDFS-6152:

             Summary: distcp V2 doesn't preserve root dir's attributes when -p is specified
                 Key: HDFS-6152
                 URL: https://issues.apache.org/jira/browse/HDFS-6152
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.3.0
            Reporter: Yongjun Zhang

Two issues were observed with distcpV2

ISSUE 1. when copying a source dir to target dir with "-pu" option using command 

  "distcp -pu source-dir target-dir"
The source dir's owner is not preserved at target dir. Simiarly other attributes of source
dir are not preserved.  Supposedly they should be preserved when no -update and no -overwrite

There are two scenarios with the above command:

a. when target-dir already exists. Issuing the above command will  result in target-dir/source-dir
(source-dir here refers to the last component of the source-dir path in the command line)
at target file system, with all contents in source-dir copied to under target-dir/src-dir.
The issue in this case is, the attributes of src-dir is not preserved.

b. when target-dir doesn't exist. It will result in target-dir with all contents of source-dir
copied to under target-dir. This issue in this  case is, the attributes of source-dir is not
carried over to target-dir.

For multiple source cases, e.g., command 

  "distcp -pu source-dir1 source-dir2 target-dir"

No matter whether the target-dir exists or not, the multiple sources are copied to under the
target dir (target-dir is created if it didn't exist). And their attributes are preserved.

ISSUE 2. with the following command:

distcp source-dir target-dir

when source-dir is an empty directory, and when target-dir doesn't exist, source-dir is not
copied, actually the command behaves like a no-op. 

However, when the source-dir is not empty, it would be copied and results in target-dir at
the target file system containing a copy of source-dir's children.

To be consistent, empty source dir should be copied too. Basically the  above distcp command
should cause target-dir get created at target file 
system, and the source-dir's attributes are preserved at target-dir when 
-p is passed.

This message was sent by Atlassian JIRA

View raw message