hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11452) Revisit FileSystem.rename(path, path, options)
Date Sat, 07 Jan 2017 17:35:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807864#comment-15807864
] 

Steve Loughran commented on HADOOP-11452:
-----------------------------------------

We've kind of gone round in circles on the "what features" probe, because it's so fluid. HADOOP-9565
has discussed this. I think it's time to look at the method again, with a list of well known
strings to look for. Blobstores can add their own "atomic-put-on-close", etc.

Now regarding a patch to say "I must have atomic", well, yes, if you declare you want it,
why not have the thing fail-fast? As it is, right now you get non-atomic renames *and don't
even know*.

w.r.t S3A, we are going to do things which relies on PUT being atomic, see HADOOP-13786 for
the full algorithm. All I was proposing was a way tor people to say "This really, really must
be atomic, so that peoples code which contain fundamental requirements of rename semantics
aren't going to get deep into trouble on S3 or Swift (but not Azure). What gets into trouble?
MRv1 and MRv2 committers, for example.

Making things public? Well, FileStatus is ubiquitous; too late to remove, And, because it
lets the underlying implementation do what it wants, is great to work with from blobstore
code as we can do lots to minimise overhead. For example, {{FileContext.listFiles()}} implements
its recursive treewalk, which would seemingly make HADOOP-13208 impossible to support. I know
FC is cleaner, but for playing blobstore games, the simpler FS API is easier to improve, despite
its lack of consistency across impls.

So instead we have classic {{boolean rename(src, dest)}} where nobody really knows what to
do when, say, the source doesn't exist, dest is "/", etc, etc. And we have a rename(src, dest,
options), where the base implementation, the protected one in {{FileSystem}}, is in fact broken
as in "will delete your data" broken. I consider that important to fix, even if it currently
only bites anyone using FileContext.rename(src, src, overwrite).

Now, the current patch *doesn't* do anything w.r.t renames, it opens up the method, fixes
its base rename call to not delete the source, tries to specify what actually goes on in HFDS,
pulls the error strings out of DFS & makes them shared constants, so that the other implementations
can raise exceptions with identical methods.

Do you want to review it? I know it's not complete, it doesn't have the tests for the corner
cases I've managed to identify, but at least have a look at the FS spec document and show
me where i've misunderstood thngs.

> Revisit FileSystem.rename(path, path, options)
> ----------------------------------------------
>
>                 Key: HADOOP-11452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11452
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs
>    Affects Versions: 2.7.3
>            Reporter: Yi Liu
>            Assignee: Steve Loughran
>         Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch
>
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected and with _deprecated_
annotation. And the default implementation is not atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a good and
atomic implementation. (Also an interesting thing in {{DFSClient}}, the _deprecated_ annotations
for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since it's atomic
for rename+overwrite, also it saves RPC calls if user desires rename+overwrite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message