hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.Andreina (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8673) HDFS reports file already exists if there is a file/dir name end with ._COPYING_
Date Thu, 24 Sep 2015 12:48:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906296#comment-14906296
] 

J.Andreina commented on HDFS-8673:
----------------------------------

[~airbots], , Thanks for the patch.

The patch looks good to me , but i feel the current behavior would be broken.

*Before patch :* FileAlreadyExistsException will be thrown , only if "Directory" with the
"File1._COPYING_" exist. But not for a file with the same name.
*After patch    :* Exception will be thrown , for both file and directory.

*This is a kind of behavior change for end-users.*

*For example:*
Say while user is writing a 10GB file ( "File1" ) and if the write operation is interrupted
and if "File1._COPYING_" file is retained in Filesystem,
then user might re-try to write the same "File1".
Write will success , as we overwrite the "File1._COPYING_"

*But after patch:*
	      User re-try to write "File1" will fail with exception that the "File1._COPYING_" already
exist. 

[~stevel@apache.org], can you provide your feedback on this / correct me if iam wrong.

> HDFS reports file already exists if there is a file/dir name end with ._COPYING_
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-8673
>                 URL: https://issues.apache.org/jira/browse/HDFS-8673
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.0
>            Reporter: Chen He
>            Assignee: Chen He
>         Attachments: HDFS-8673.000-WIP.patch, HDFS-8673.000.patch, HDFS-8673.001.patch,
HDFS-8673.002.patch, HDFS-8673.003.patch, HDFS-8673.003.patch
>
>
> Because CLI is using CommandWithDestination.java which add "._COPYING_" to the tail of
file name when it does the copy. It will cause problem if there is a file/dir already called
*._COPYING_ on HDFS.
> For file:
> -bash-4.1$ hadoop fs -put 5M /user/occ/
> -bash-4.1$ hadoop fs -mv /user/occ/5M /user/occ/5M._COPYING_
> -bash-4.1$ hadoop fs -ls /user/occ/
> Found 1 items
> -rw-r--r--   1 occ supergroup    5242880 2015-06-26 05:16 /user/occ/5M._COPYING_
> -bash-4.1$ hadoop fs -put 128K /user/occ/5M
> -bash-4.1$ hadoop fs -ls /user/occ/
> Found 1 items
> -rw-r--r--   1 occ supergroup     131072 2015-06-26 05:19 /user/occ/5M
> For dir:
> -bash-4.1$ hadoop fs -mkdir /user/occ/5M._COPYING_
> -bash-4.1$ hadoop fs -ls /user/occ/
> Found 1 items
> drwxr-xr-x   - occ supergroup          0 2015-06-26 05:24 /user/occ/5M._COPYING_
> -bash-4.1$ hadoop fs -put 128K /user/occ/5M
> put: /user/occ/5M._COPYING_ already exists as a directory
> -bash-4.1$ hadoop fs -ls /user/occ/
> (/user/occ/5M._COPYING_ is gone)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message