hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6240) Rename operation is not consistent between different implementations of FileSystem
Date Tue, 15 Sep 2009 17:21:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755594#action_12755594
] 

Doug Cutting commented on HADOOP-6240:
--------------------------------------

Which behaviors are different from the current behavior?  My goal is to determine which mandate
an HDFS-specific implementation for 0.21 (before we've added AbstractFileSystem) and cannot
be correctly implemented in FileContext.

My guesses are:
 # rename will return void and throw IOException to indicate failures.  This is new, but would
be easy to fix generically in FileContext, right?
#  rename will fail when renaming from file to directory or directory to file.  Do we permit
this currently?  If so, then, in FileContext, we could first stat the files and throw an exception.
 That would potentially be incorrect if another process removed and/or replaced the files
between the stat and the rename, since an illegal rename might then succeed.  Is that sort
of atomicity critical?
 # rename from /a/b to /c/d will fail if {c} does not exist.  I assume this currently succeeds.
 A generic implementation in FileContext would stat the parent directory, and, if it does
not exist, throw an exception.  That would potentially be incorrect if another process created
the parent directory between the stat and the mkdir, since the rename would succeed.  Is that
sort of atomicity critical?
# rename will not have consistent behavior when the dst directory exists - if dst directory
exists it will not behave like move as it does today.  Again, this could be handled by first
stat'ing the file, and again, it has the same atomicity concerns.

So the question is, would applications that depend on atomic rename have troubles with a generic
implementation of these?  Mostly what folks depend on atomic rename for is to know that something
has indeed completed.  So rename-by-copy is especially dangerous here, since a file or directory's
existence can give the appearance of completion when the copy did not in fact complete.  But
I don't see that peril in the above cases.

What are the actual risks of a generic implementation of the above?  I don't see any to the
common use case of promoting result files, but perhaps I'm missing something or there are
other important use cases.


> Rename operation is not consistent between different implementations of FileSystem
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-6240
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6240
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.21.0
>
>
> The rename operation has many scenarios that are not consistently implemented across
file systems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message