hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
Date Mon, 04 Dec 2017 13:49:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276796#comment-16276796

Steve Loughran commented on HADOOP-15086:

I don't disagree with you about the existence of the problem, just don't think it's easily
fixed. Essentially: blobstores tend not to have a rename() (or indeed: create(overwrite=false),
delete(directory), and the things we do to mimic this in our connectors aren't atomic

1. We cover this in [Object Stores|https://hado op.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Object_Stores_vs._Filesystems]
2. This is also common to: S3x, Swift, OSS, ADL, ...
3. By inference, the Hadoop FileOutputCommit protocol is not atomic on object stores either.

4. Compare with the requirements of rename() as covered in [rename()|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]

There is actually special support in Azure for atomic rename of HBase directories; this is
done with leasing, recovery and stuff. It manages exclusivity, but it is still not an O(1)

If you look at where we are going with this, the work is in moving to object-store specific
committers which provide the commit semantics without relying on renames. HADOOP-13786 is
the initial implementation of this for S3A, but the hooks put into FileOutputFormat are designed
to support filesystem-specific committers for any store which implements one. 

I'm closing as a WONTFIX. Sorry. It's not that we don't want to, it's just directory operations
are where the metaphor "object stores are like filesystems" fail if you look closely enough.

(On a brighter note: wasb is consistent of both metadata and data)

> NativeAzureFileSystem.rename is not atomic
> ------------------------------------------
>                 Key: HADOOP-15086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15086
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 2.7.3
>            Reporter: Shixiong Zhu
>         Attachments: RenameReproducer.java
> When multiple threads rename files to the same target path, more than 1 threads can succeed.
It's because check and copy file in `rename` is not atomic.
> I would expect it's atomic just like HDFS.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message