hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-11452) Revisit FileSystem.rename(path, path, options)
Date Wed, 04 Jan 2017 11:40:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264495#comment-14264495
] 

Steve Loughran edited comment on HADOOP-11452 at 1/4/17 11:40 AM:
------------------------------------------------------------------

# We can't remove {{rename()}}. People would be surprised and upset. Therefore "constrain"
is not something you can now mandate. Sorry.
# We can declare that it SHOULD be atomic —which is precisely what we do in [the FS specs|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html]
# Some extensions to object stores (e.g. Netflix S3mper) do retrofit atomicity to rename operations,
so can be used as the destination of speculative operations.
# HADOOP-9565 proposes moving filesystems that are really object stores to under a {{BlobStore}}
subclass of {{FileSystem}} and to offer a way to get a bitmask of consistency and atomicity
features. The design is intended to allow subclasses (e.g S3mper) to override semantics, and
alternate S3 and swift service providers to offer stricter semantics.
# Code that wants to explicitly check for the required semantics could then look for this
interface and, if present, get the semantics. Actually, if we really care, we may want to
push it to the base FS class -as a barrier for apps, not as a way for them to work around
things.

If you look at {{FileSystem#protected void rename(final Path src, final Path dst,  final Rename...
options) }} in detail, you can see that apart from HDFS/webhdfs we aren't implementing rename()
atomically. Specifically
# we look for the file existing
# raise an error if the condition (source is dir, dest is file) && overwrite==false
# then rename

There's a race condition between the stat and rename. Maybe now that we support Java7+ only
we can think about using native IO operations which offer better atomicity.

We can talk about how to expose this stuff, which is something you should raise on  HDFS list.
Hadoop-common may be were the APIs live, but its hadoop-dev that owns the semantics.

Finally, regarding {{FileSystemRMStateStore}}. If that were moved to {{FileContext}} it gets
the public APIs. YARN already depends on implementations of {{AbstractFileSystem}}


was (Author: stevel@apache.org):
# We can't remove {{rename()}. People would be surprised and upset. Therefore "constrain"
is not something you can now mandate. Sorry.
# We can declare that it SHOULD be atomic —which is precisely what we do in [the FS specs|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html]
# Some extensions to object stores (e.g. Netflix S3mper) do retrofit atomicity to rename operations,
so can be used as the destination of speculative operations.
# HADOOP-9565 proposes moving filesystems that are really object stores to under a {{BlobStore}}
subclass of {{FileSystem}} and to offer a way to get a bitmask of consistency and atomicity
features. The design is intended to allow subclasses (e.g S3mper) to override semantics, and
alternate S3 and swift service providers to offer stricter semantics.
# Code that wants to explicitly check for the required semantics could then look for this
interface and, if present, get the semantics. Actually, if we really care, we may want to
push it to the base FS class -as a barrier for apps, not as a way for them to work around
things.

If you look at {{FileSystem#protected void rename(final Path src, final Path dst,  final Rename...
options) }} in detail, you can see that apart from HDFS/webhdfs we aren't implementing rename()
atomically. Specifically
# we look for the file existing
# raise an error if the condition (source is dir, dest is file) && overwrite==false
# then rename

There's a race condition between the stat and rename. Maybe now that we support Java7+ only
we can think about using native IO operations which offer better atomicity.

We can talk about how to expose this stuff, which is something you should raise on  HDFS list.
Hadoop-common may be were the APIs live, but its hadoop-dev that owns the semantics.

Finally, regarding {{FileSystemRMStateStore}}. If that were moved to {{FileContext}} it gets
the public APIs. YARN already depends on implementations of {{AbstractFileSystem}}

> Revisit FileSystem.rename(path, path, options)
> ----------------------------------------------
>
>                 Key: HADOOP-11452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11452
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs
>    Affects Versions: 2.7.3
>            Reporter: Yi Liu
>            Assignee: Steve Loughran
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected and with _deprecated_
annotation. And the default implementation is not atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a good and
atomic implementation. (Also an interesting thing in {{DFSClient}}, the _deprecated_ annotations
for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since it's atomic
for rename+overwrite, also it saves RPC calls if user desires rename+overwrite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message